mirror of
https://github.com/mattermost/mattermost.git
synced 2025-02-25 18:55:24 -06:00
MM-57344: Improve ElasticSearch indexing query performance (#26517)
* Use a row constructor comparison to avoid a filter The original query had a WHERE condition that looked like: (x > a) OR (x = a AND y < b) This commit changes it to a compound comparison that, given the lexicographical order it uses, is semantically the same: (x, y) > (a, b) This makes the plan in Postgres use an Index Cond instead of a Filter, having two main performance improvements: 1. The query's time is more or less constant, not increasing every time this query is executed. This is important because the query is executed repeatedly until all posts are indexed. 2. The query's time is much shorter: while the original query eventually takes 30s, hitting the timeout, the new one takes ~30ms. Notice the difference in the plans below: - the original query uses a `Filter` step, while the new one uses an `IndexCond` + `Filter` step. - the original query has a shared hit count of 40557115 (109GiB), while the new one has a shared hit count of 13951 (109MiBA). This does not show exactly how much data is read from disk, given it's a nested loop node, but it gives us an idea of the amount of data processed in the server. Original plan: agnivaltdb=> EXPLAIN (ANALYZE, BUFFERS) SELECT Posts.*, Channels.TeamId FROM Posts LEFT JOIN Channels ON Posts.ChannelId = Channels.Id WHERE Posts.CreateAt > '1687424888405' OR (Posts.CreateAt = '1687424888405' AND Posts.Id > 'tpomh9yu1tffmdp6dopobwuc9h') ORDER BY Posts.CreateAt ASC, Posts.Id ASC LIMIT 10000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=1.22..2287.94 rows=10000 width=464) (actual time=23921.236..23954.229 rows=10000 loops=1) Buffers: shared hit=40557115 -> Incremental Sort (cost=1.22..13998881.93 rows=61217938 width=464) (actual time=23921.235..23953.205 rows=10000 loops=1) Sort Key: posts.createat, posts.id Presorted Key: posts.createat Full-sort Groups: 311 Sort Method: quicksort Average Memory: 45kB Peak Memory: 45kB Buffers: shared hit=40557115 -> Nested Loop Left Join (cost=1.00..11421751.31 rows=61217938 width=464) (actual time=23920.970..23947.067 rows=10001 loops=1) Buffers: shared hit=40557115 -> Index Scan using idx_posts_create_at on posts (cost=0.57..9889434.54 rows=61217938 width=461) (actual time=23920.930..23931.063 rows=10001 loops=1) Filter: ((createat > '1687424888405'::bigint) OR ((createat = '1687424888405'::bigint) AND ((id)::text > 'tpomh9yu1tffmdp6dopobwuc9h'::text))) Rows Removed by Filter: 40920000 Buffers: shared hit=40553119 -> Memoize (cost=0.43..0.70 rows=1 width=30) (actual time=0.001..0.001 rows=1 loops=10001) Cache Key: posts.channelid Cache Mode: logical Hits: 9002 Misses: 999 Evictions: 0 Overflows: 0 Memory Usage: 151kB Buffers: shared hit=3996 -> Index Scan using channels_pkey on channels (cost=0.42..0.69 rows=1 width=30) (actual time=0.007..0.007 rows=1 loops=999) Index Cond: ((id)::text = (posts.channelid)::text) Buffers: shared hit=3996 Planning: Buffers: shared hit=112 Planning Time: 0.501 ms Execution Time: 23954.974 ms (25 rows) New plan: agnivaltdb=> EXPLAIN (ANALYZE, BUFFERS) SELECT Posts.*, Channels.TeamId FROM Posts LEFT JOIN Channels ON Posts.ChannelId = Channels.Id WHERE (Posts.CreateAt, Posts.Id) > ('1687424888405', 'tpomh9yu1tffmdp6dopobwuc9h') ORDER BY Posts.CreateAt ASC, Posts.Id ASC LIMIT 10000; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=1.15..1666.83 rows=10000 width=464) (actual time=0.366..34.053 rows=10000 loops=1) Buffers: shared hit=13951 -> Incremental Sort (cost=1.15..10196977.36 rows=61217938 width=464) (actual time=0.365..33.031 rows=10000 loops=1) Sort Key: posts.createat, posts.id Presorted Key: posts.createat Full-sort Groups: 311 Sort Method: quicksort Average Memory: 45kB Peak Memory: 45kB Buffers: shared hit=13951 -> Nested Loop Left Join (cost=1.00..7619846.74 rows=61217938 width=464) (actual time=0.059..26.840 rows=10001 loops=1) Buffers: shared hit=13951 -> Index Scan using idx_posts_create_at on posts (cost=0.57..6087529.97 rows=61217938 width=461) (actual time=0.040..10.548 rows=10001 loops=1) Index Cond: (createat >= '1687424888405'::bigint) Filter: (ROW(createat, (id)::text) > ROW('1687424888405'::bigint, 'tpomh9yu1tffmdp6dopobwuc9h'::text)) Rows Removed by Filter: 2 Buffers: shared hit=9955 -> Memoize (cost=0.43..0.70 rows=1 width=30) (actual time=0.001..0.001 rows=1 loops=10001) Cache Key: posts.channelid Cache Mode: logical Hits: 9002 Misses: 999 Evictions: 0 Overflows: 0 Memory Usage: 151kB Buffers: shared hit=3996 -> Index Scan using channels_pkey on channels (cost=0.42..0.69 rows=1 width=30) (actual time=0.007..0.007 rows=1 loops=999) Index Cond: ((id)::text = (posts.channelid)::text) Buffers: shared hit=3996 Planning: Buffers: shared hit=112 Planning Time: 0.471 ms Execution Time: 34.716 ms (26 rows) * Go back to the old query for MySQL As one could expect, the two databases have completely opposite behaviours, and for MySQL, the old query is significantly faster than the new one, exactly for the same reason than in Postgres, but the other way around: the old query uses an Index range scan while the new one would use a Filter. Old query (0.18s): mysql> EXPLAIN ANALYZE SELECT Posts.*, Channels.TeamId FROM Posts USE INDEX(idx_posts_create_at_id) LEFT JOIN Channels ON Posts.ChannelId = Channels.Id WHERE Posts.CreateAt > 1557752415221 OR (Posts.CreateAt = 1557752415221 AND Posts.Id > 'ad59ire57tfwmjr5r8xqxc75qw') ORDER BY Posts.CreateAt ASC, Posts.Id ASC LIMIT 10000; -------------- EXPLAIN ANALYZE SELECT Posts.*, Channels.TeamId FROM Posts USE INDEX(idx_posts_create_at_id) LEFT JOIN Channels ON Posts.ChannelId = Channels.Id WHERE Posts.CreateAt > 1557752415221 OR (Posts.CreateAt = 1557752415221 AND Posts.Id > 'ad59ire57tfwmjr5r8xqxc75qw') ORDER BY Posts.CreateAt ASC, Posts.Id ASC LIMIT 10000 -------------- +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Limit: 10000 row(s) (cost=7472068.91 rows=10000) (actual time=0.063..164.174 rows=10000 loops=1) -> Nested loop left join (cost=7472068.91 rows=5559093) (actual time=0.062..163.450 rows=10000 loops=1) -> Index range scan on Posts using idx_posts_create_at_id over (CreateAt = 1557752415221 AND 'ad59ire57tfwmjr5r8xqxc75qw' < Id) OR (1557752415221 < CreateAt), with index condition: ((Posts.CreateAt > 1557752415221) or ((Posts.CreateAt = 1557752415221) and (Posts.Id > 'ad59ire57tfwmjr5r8xqxc75qw'))) (cost=1357066.29 rows=5559093) (actual time=0.043..97.358 rows=10000 loops=1) -> Single-row index lookup on Channels using PRIMARY (Id=Posts.ChannelId) (cost=1.00 rows=1) (actual time=0.006..0.006 rows=1 loops=10000) | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.18 sec) New query (5s): mysql> EXPLAIN ANALYZE SELECT Posts.*, Channels.TeamId FROM Posts USE INDEX(idx_posts_create_at_id) LEFT JOIN Channels ON Posts.ChannelId = Channels.Id WHERE (Posts.CreateAt, Posts.Id) > (1557752415221, 'ad 59ire57tfwmjr5r8xqxc75qw') ORDER BY Posts.CreateAt ASC, Posts.Id ASC LIMIT 10000; -------------- EXPLAIN ANALYZE SELECT Posts.*, Channels.TeamId FROM Posts USE INDEX(idx_posts_create_at_id) LEFT JOIN Channels ON Posts.ChannelId = Channels.Id WHERE (Posts.CreateAt, Posts.Id) > (1557752415221, 'ad59ire57tfwmjr5r8xqxc75qw') ORDER BY Posts.CreateAt ASC, Posts.Id ASC LIMIT 10000 -------------- +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Limit: 10000 row(s) (cost=11119405.48 rows=10000) (actual time=5295.106..5455.285 rows=10000 loops=1) -> Nested loop left join (cost=11119405.48 rows=10000) (actual time=5295.105..5454.572 rows=10000 loops=1) -> Filter: ((Posts.CreateAt,Posts.Id) > (1557752415221,'ad59ire57tfwmjr5r8xqxc75qw')) (cost=221.48 rows=10000) (actual time=5295.078..5388.668 rows=10000 loops=1) -> Index scan on Posts using idx_posts_create_at_id (cost=221.48 rows=10000) (actual time=0.055..5314.753 rows=600000 loops=1) -> Single-row index lookup on Channels using PRIMARY (Id=Posts.ChannelId) (cost=1.00 rows=1) (actual time=0.006..0.006 rows=1 loops=10000) | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (5.46 sec) ``` * Add comment explaining the difference between DBs --------- Co-authored-by: Mattermost Build <build@mattermost.com>
This commit is contained in:
parent
eb817966a4
commit
9a2d96073e
@ -2460,28 +2460,51 @@ func (s *SqlPostStore) GetEditHistoryForPost(postId string) ([]*model.Post, erro
|
||||
|
||||
func (s *SqlPostStore) GetPostsBatchForIndexing(startTime int64, startPostID string, limit int) ([]*model.PostForIndexing, error) {
|
||||
posts := []*model.PostForIndexing{}
|
||||
table := "Posts"
|
||||
// We force this index to avoid any chances of index merge intersection.
|
||||
if s.DriverName() == model.DatabaseDriverMysql {
|
||||
table += " USE INDEX(idx_posts_create_at_id)"
|
||||
}
|
||||
query := `SELECT
|
||||
Posts.*, Channels.TeamId
|
||||
FROM ` + table + `
|
||||
LEFT JOIN
|
||||
Channels
|
||||
ON
|
||||
Posts.ChannelId = Channels.Id
|
||||
WHERE
|
||||
Posts.CreateAt > ?
|
||||
OR
|
||||
(Posts.CreateAt = ? AND Posts.Id > ?)
|
||||
ORDER BY
|
||||
Posts.CreateAt ASC, Posts.Id ASC
|
||||
LIMIT
|
||||
?`
|
||||
err := s.GetSearchReplicaX().Select(&posts, query, startTime, startTime, startPostID, limit)
|
||||
|
||||
var err error
|
||||
// In order to use an index scan for both MySQL and Postgres, we need to
|
||||
// diverge the implementation of the query, specifically in the WHERE
|
||||
// condition: for MySQL, we need to do
|
||||
// (CreateAt > ?) OR (CreateAt = ? AND Id > ?)
|
||||
// while for Postgres we need
|
||||
// (CreateAt, Id) > (?, ?)
|
||||
// The wrong choice for any of the two databases makes the query go from
|
||||
// milliseconds to dozens of seconds.
|
||||
// More information in: https://github.com/mattermost/mattermost/pull/26517
|
||||
// and https://community.mattermost.com/core/pl/ui5dz96shinetb8nq83myggbma
|
||||
if s.DriverName() == model.DatabaseDriverMysql {
|
||||
query := `SELECT
|
||||
Posts.*, Channels.TeamId
|
||||
FROM Posts USE INDEX(idx_posts_create_at_id)
|
||||
LEFT JOIN
|
||||
Channels
|
||||
ON
|
||||
Posts.ChannelId = Channels.Id
|
||||
WHERE
|
||||
Posts.CreateAt > ?
|
||||
OR
|
||||
(Posts.CreateAt = ? AND Posts.Id > ?)
|
||||
ORDER BY
|
||||
Posts.CreateAt ASC, Posts.Id ASC
|
||||
LIMIT
|
||||
?`
|
||||
err = s.GetSearchReplicaX().Select(&posts, query, startTime, startTime, startPostID, limit)
|
||||
} else {
|
||||
query := `SELECT
|
||||
Posts.*, Channels.TeamId
|
||||
FROM Posts
|
||||
LEFT JOIN
|
||||
Channels
|
||||
ON
|
||||
Posts.ChannelId = Channels.Id
|
||||
WHERE
|
||||
(Posts.CreateAt, Posts.Id) > (?, ?)
|
||||
ORDER BY
|
||||
Posts.CreateAt ASC, Posts.Id ASC
|
||||
LIMIT
|
||||
?`
|
||||
err = s.GetSearchReplicaX().Select(&posts, query, startTime, startPostID, limit)
|
||||
}
|
||||
if err != nil {
|
||||
return nil, errors.Wrap(err, "failed to find Posts")
|
||||
}
|
||||
|
Loading…
Reference in New Issue
Block a user