Aggregation

GROUP BY

A GROUP BY operation requires either rows to be read in sorted order or a temporary table to buffer intermediate results for aggregation. That is to say that MySQL can use an index for GROUP BY in the following ways:

  1. Loose index scan. If the GROUP BY condition is indexed, MySQL may elect to scan the index from start to finish, and avoid materializing intermediate results. This operation is preferred if there are not very selective predicates and the temporary table to be created would be quite large.
_images/loose-index-scan-continent-index.png
  1. Filtering rows. An index can be used to identify rows which are then stored in a temporary table. The results are then aggregated in the temporary table, and by default sorted by the GROUP BY condition. [1]
_images/group-by-filtering-rows.png
  1. A combination of filtering and order. This optimization applies where the index for filtering rows already has the data in the correct order for the GROUP BY operation.
_images/group-by-filtering-and-sort.png

Example 24: A GROUP BY using a loose-index scan

EXPLAIN FORMAT=JSON
SELECT count(*) as c, continent FROM Country GROUP BY continent;
{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "56.80"
    },
    "grouping_operation": {
      "using_filesort": false,
      "table": {
        "table_name": "Country",
        "access_type": "index",
        "possible_keys": [
          "PRIMARY",
          "p",
          "c",
          "p_c",
          "c_p",
          "Name"
        ],
        "key": "c",
        "used_key_parts": [
          "Continent"
        ],
        "key_length": "1",
        "rows_examined_per_scan": 239,
        "rows_produced_per_join": 239,
        "filtered": "100.00",
        "using_index": true,
        "cost_info": {
          "read_cost": "9.00",
          "eval_cost": "47.80",
          "prefix_cost": "56.80",
          "data_read_per_join": "61K"
        },
        "used_columns": [
          "Code",
          "Continent"
        ]
      }
    }
  }
}

Example 25: A GROUP BY using an index then sorting

EXPLAIN FORMAT=JSON
SELECT count(*) as c, continent FROM Country WHERE population > 500000000 GROUP BY continent;
{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "3.81"
    },
    "grouping_operation": {
      "using_temporary_table": true,
      "using_filesort": true,
      "cost_info": {
        "sort_cost": "2.00"
      },
      "table": {
        "table_name": "Country",
        "access_type": "range",
        "possible_keys": [
          "PRIMARY",
          "p",
          "c",
          "p_c",
          "c_p",
          "Name"
        ],
        "key": "p",
        "used_key_parts": [
          "Population"
        ],
        "key_length": "4",
        "rows_examined_per_scan": 2,
        "rows_produced_per_join": 2,
        "filtered": "100.00",
        "using_index": true,
        "cost_info": {
          "read_cost": "1.41",
          "eval_cost": "0.40",
          "prefix_cost": "1.81",
          "data_read_per_join": "528"
        },
        "used_columns": [
          "Code",
          "Continent",
          "Population"
        ],
        "attached_condition": "(`world`.`Country`.`Population` > 500000000)"
      }
    }
  }
}

Example 26: A GROUP BY using an index to filter and sort

EXPLAIN FORMAT=JSON
SELECT count(*) as c, continent FROM Country WHERE continent='Asia' GROUP BY continent;
{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "11.23"
    },
    "grouping_operation": {
      "using_filesort": false,
      "table": {
        "table_name": "Country",
        "access_type": "ref",
        "possible_keys": [
          "PRIMARY",
          "p",
          "c",
          "p_c",
          "c_p",
          "Name"
        ],
        "key": "c",
        "used_key_parts": [
          "Continent"
        ],
        "key_length": "1",
        "ref": [
          "const"
        ],
        "rows_examined_per_scan": 51,
        "rows_produced_per_join": 51,
        "filtered": "100.00",
        "using_index": true,
        "cost_info": {
          "read_cost": "1.03",
          "eval_cost": "10.20",
          "prefix_cost": "11.23",
          "data_read_per_join": "13K"
        },
        "used_columns": [
          "Continent"
        ]
      }
    }
  }
}

UNION

MySQL does not apply any special optimizations to UNION, which semantically combines the results of two queries together and removes duplicates. As seen in Example 27, the deduplication is performed in an intermediate temporary table. The temporary table is used in all UNION queries and thus, there is no cost assigned to it (or cost-based optimization possible).

Simple Union Example:

SELECT * FROM City WHERE CountryCode = 'CAN'
UNION
SELECT * FROM City WHERE CountryCode = 'USA'

Hypothetical Optimization:

SELECT * FROM City WHERE CountryCode IN ('CAN', 'USA')
Example 27

Unlike subqueries and views, where multiple accesses to the same table can be merged into a single access internally, MySQL will not perform a similar optimization on UNION queries. It will also not discover any cases where duplicates are not possible and the UNION could be rewritten to a UNION ALL. This leaves a number of cases where a skilled operator can apply manual modifications to queries (either in the application or via query rewrite) and improve performance.

Example 27: A union query requiring a temporary table

EXPLAIN FORMAT=JSON
SELECT * FROM City WHERE CountryCode = 'CAN'
UNION
SELECT * FROM City WHERE CountryCode = 'USA'
{
  "query_block": {
    "union_result": {
      "using_temporary_table": true,  # Temporary table required
      "table_name": "<union1,2>",     # Combines the results of two queries
      "access_type": "ALL",
      "query_specifications": [
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 1,
            "cost_info": {
              "query_cost": "58.80"
            },
            "table": {
              "table_name": "City",
              "access_type": "ref",
              "possible_keys": [
                "CountryCode"
              ],
              "key": "CountryCode",
              "used_key_parts": [
                "CountryCode"
              ],
              "key_length": "3",
              "ref": [
                "const"
              ],
              "rows_examined_per_scan": 49,
              "rows_produced_per_join": 49,
              "filtered": "100.00",
              "cost_info": {
                "read_cost": "49.00",
                "eval_cost": "9.80",
                "prefix_cost": "58.80",
                "data_read_per_join": "3K"
              },
              "used_columns": [
                "ID",
                "Name",
                "CountryCode",
                "District",
                "Population"
              ]
            }
          }
        },
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 2,
            "cost_info": {
              "query_cost": "129.80"
            },
            "table": {
              "table_name": "City",
              "access_type": "ref",
              "possible_keys": [
                "CountryCode"
              ],
              "key": "CountryCode",
              "used_key_parts": [
                "CountryCode"
              ],
              "key_length": "3",
              "ref": [
                "const"
              ],
              "rows_examined_per_scan": 274,
              "rows_produced_per_join": 274,
              "filtered": "100.00",
              "cost_info": {
                "read_cost": "75.00",
                "eval_cost": "54.80",
                "prefix_cost": "129.80",
                "data_read_per_join": "19K"
              },
              "used_columns": [
                "ID",
                "Name",
                "CountryCode",
                "District",
                "Population"
              ]
            }
          }
        }
      ]
    }
  }
}

UNION ALL

UNION ALL is semantically similar to UNION with one important difference: deduplication is not required. This means that in some cases MySQL is able to pipeline the results of a UNION ALL query without having to materialize and de-duplicate rows in an intermediate table.

Internally, a temporary table is always created for UNION ALL queries, but whether or not it needs to be used to materialize rows can be seen in EXPLAIN. Example 28 shows an example query using UNION ALL. After adding ORDER BY the query now requires the use of the intermediate temporary table.

Example 28

Example 28: UNION ALL without a temporary table

EXPLAIN FORMAT=JSON
SELECT * FROM City WHERE CountryCode = 'CAN'
UNION ALL
SELECT * FROM City WHERE CountryCode = 'USA';
{
  "query_block": {
    "union_result": {
      "using_temporary_table": false,   # Temporary table not required!
      "query_specifications": [
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 1,
            "cost_info": {
              "query_cost": "58.80"
            },
            "table": {
              "table_name": "City",
              "access_type": "ref",
              "possible_keys": [
                "CountryCode"
              ],
              "key": "CountryCode",
              "used_key_parts": [
                "CountryCode"
              ],
              "key_length": "3",
              "ref": [
                "const"
              ],
              "rows_examined_per_scan": 49,
              "rows_produced_per_join": 49,
              "filtered": "100.00",
              "cost_info": {
                "read_cost": "49.00",
                "eval_cost": "9.80",
                "prefix_cost": "58.80",
                "data_read_per_join": "3K"
              },
              "used_columns": [
                "ID",
                "Name",
                "CountryCode",
                "District",
                "Population"
              ]
            }
          }
        },
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 2,
            "cost_info": {
              "query_cost": "129.80"
            },
            "table": {
              "table_name": "City",
              "access_type": "ref",
              "possible_keys": [
                "CountryCode"
              ],
              "key": "CountryCode",
              "used_key_parts": [
                "CountryCode"
              ],
              "key_length": "3",
              "ref": [
                "const"
              ],
              "rows_examined_per_scan": 274,
              "rows_produced_per_join": 274,
              "filtered": "100.00",
              "cost_info": {
                "read_cost": "75.00",
                "eval_cost": "54.80",
                "prefix_cost": "129.80",
                "data_read_per_join": "19K"
              },
              "used_columns": [
                "ID",
                "Name",
                "CountryCode",
                "District",
                "Population"
              ]
            }
          }
        }
      ]
    }
  }
}

Example 29: UNION ALL requiring a temporary table because of ORDER BY

EXPLAIN FORMAT=JSON
SELECT * FROM City WHERE CountryCode = 'CAN'
UNION ALL
SELECT * FROM City WHERE CountryCode = 'USA' ORDER BY Name;
{
  "query_block": {
    "union_result": {
      "using_temporary_table": true,  # A UNION ALL requiring a temporary table
      "table_name": "<union1,2>",     # because of the ORDER BY
      "access_type": "ALL",
      "query_specifications": [
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 1,
            "cost_info": {
              "query_cost": "58.80"
            },
            "table": {
              "table_name": "City",
              "access_type": "ref",
              "possible_keys": [
                "CountryCode"
              ],
              "key": "CountryCode",
              "used_key_parts": [
                "CountryCode"
              ],
              "key_length": "3",
              "ref": [
                "const"
              ],
              "rows_examined_per_scan": 49,
              "rows_produced_per_join": 49,
              "filtered": "100.00",
              "cost_info": {
                "read_cost": "49.00",
                "eval_cost": "9.80",
                "prefix_cost": "58.80",
                "data_read_per_join": "3K"
              },
              "used_columns": [
                "ID",
                "Name",
                "CountryCode",
                "District",
                "Population"
              ]
            }
          }
        },
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 2,
            "cost_info": {
              "query_cost": "129.80"
            },
            "table": {
              "table_name": "City",
              "access_type": "ref",
              "possible_keys": [
                "CountryCode"
              ],
              "key": "CountryCode",
              "used_key_parts": [
                "CountryCode"
              ],
              "key_length": "3",
              "ref": [
                "const"
              ],
              "rows_examined_per_scan": 274,
              "rows_produced_per_join": 274,
              "filtered": "100.00",
              "cost_info": {
                "read_cost": "75.00",
                "eval_cost": "54.80",
                "prefix_cost": "129.80",
                "data_read_per_join": "19K"
              },
              "used_columns": [
                "ID",
                "Name",
                "CountryCode",
                "District",
                "Population"
              ]
            }
          }
        }
      ]
    }
  }
}
[1]This behavior is deprecated and may be removed in future. If ordering is not required, it is recommended to explicitly GROUP BY x ORDER BY NULL.