我有一張大約有100000個博客文章的表格,通過1:n關係鏈接到一個有50個Feed的表格。當我用一個select語句查詢這兩個表時,由發布表的日期時間字段排序,MySQL總是使用filesort,導致查詢時間非常慢(> 1秒)。以下是postings
表(簡化)的模式:
+---------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| feed_id | int(11) | NO | MUL | NULL | |
| crawl_date | datetime | NO | | NULL | |
| is_active | tinyint(1) | NO | MUL | 0 | |
| link | varchar(255) | NO | MUL | NULL | |
| author | varchar(255) | NO | | NULL | |
| title | varchar(255) | NO | | NULL | |
| excerpt | text | NO | | NULL | |
| long_excerpt | text | NO | | NULL | |
| user_offtopic_count | int(11) | NO | MUL | 0 | |
+---------------------+--------------+------+-----+---------+----------------+
這是feed
表:
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| type | int(11) | NO | MUL | 0 | |
| title | varchar(255) | NO | | NULL | |
| website | varchar(255) | NO | | NULL | |
| url | varchar(255) | NO | | NULL | |
+-------------+--------------+------+-----+---------+----------------+
以下是執行時間大於1秒的查詢。請注意,post_date
字段有一個索引,但是MySQL不使用它來對發布表進行排序:
SELECT
`postings`.`id`,
UNIX_TIMESTAMP(postings.post_date) as post_date,
`postings`.`link`,
`postings`.`title`,
`postings`.`author`,
`postings`.`excerpt`,
`postings`.`long_excerpt`,
`feeds`.`title` AS feed_title,
`feeds`.`website` AS feed_website
FROM
(`postings`)
JOIN
`feeds`
ON
`feeds`.`id` = `postings`.`feed_id`
WHERE
`feeds`.`type` = 1 AND
`postings`.`user_offtopic_count` < 10 AND
`postings`.`is_active` = 1
ORDER BY
`postings`.`post_date` desc
LIMIT
15
這個查詢的explain extended
命令的結果顯示MySQL正在使用filesort:
+----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+
| 1 | SIMPLE | postings | ref | feed_id,is_active,user_offtopic_count | is_active | 1 | const | 30996 | Using where; Using filesort |
| 1 | SIMPLE | feeds | eq_ref | PRIMARY,type | PRIMARY | 4 | feedian.postings.feed_id | 1 | Using where |
+----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+
當我刪除order by
部分時,MySQL停止使用filesort。求幫忙,如果你有任何想法如何優化這個查詢讓MySQL排序和選擇使用索引的數據。我已經嘗試了一些東西,比如像在一些博客帖子中提到的那樣,在所有的字段/字段上創建一個組合索引,但是這也不起作用。
最佳解決思路
在postings (is_active, post_date)
上創建一個複合索引(按此順序)。
它將用於is_active
的過濾和post_date
的排序。
MySQL
應在EXPLAIN EXTENDED
中顯示REF
訪問方法。
請注意,通過user_offtopic_count
您有一個RANGE
過濾條件,這就是為什麽您不能使用該字段的索引在過濾和其他字段進行排序。
根據user_offtopic_count
的選擇性(即多少行滿足user_offtopic_count < 10
)的不同,在user_offtopic_count
上創建索引並對post_dates進行排序可能會更有用。
為此,請在postings (is_active, user_offtopic_count)
上創建一個複合索引,並確保使用了通過此索引的RANGE
訪問方法。
哪個索引將更快取決於您的數據分布。創建兩個索引,FORCE
他們,看看哪個更快:
CREATE INDEX ix_active_offtopic ON postings (is_active, user_offtopic_count);
CREATE INDEX ix_active_date ON postings (is_active, post_date);
SELECT
`postings`.`id`,
UNIX_TIMESTAMP(postings.post_date) as post_date,
`postings`.`link`,
`postings`.`title`,
`postings`.`author`,
`postings`.`excerpt`,
`postings`.`long_excerpt`,
`feeds`.`title` AS feed_title,
`feeds`.`website` AS feed_website
FROM
`postings` FORCE INDEX (ix_active_offtopic)
JOIN
`feeds`
ON
`feeds`.`id` = `postings`.`feed_id`
WHERE
`feeds`.`type` = 1 AND
`postings`.`user_offtopic_count` < 10 AND
`postings`.`is_active` = 1
ORDER BY
`postings`.`post_date` desc
LIMIT
15
/* This should show RANGE access with few rows and keep the FILESORT */
SELECT
`postings`.`id`,
UNIX_TIMESTAMP(postings.post_date) as post_date,
`postings`.`link`,
`postings`.`title`,
`postings`.`author`,
`postings`.`excerpt`,
`postings`.`long_excerpt`,
`feeds`.`title` AS feed_title,
`feeds`.`website` AS feed_website
FROM
`postings` FORCE INDEX (ix_active_date)
JOIN
`feeds`
ON
`feeds`.`id` = `postings`.`feed_id`
WHERE
`feeds`.`type` = 1 AND
`postings`.`user_offtopic_count` < 10 AND
`postings`.`is_active` = 1
ORDER BY
`postings`.`post_date` desc
LIMIT
15
/* This should show REF access with lots of rows and no FILESORT */