From 93c33e02f0c3e38ead7097781b26675dc2e7cd11 Mon Sep 17 00:00:00 2001 From: David Taylor Date: Wed, 12 Apr 2023 18:39:10 +0100 Subject: [PATCH] PERF: Avoid full `posts` table scans during anonymisation (#21081) 2e78045a fixed the anonymization job so that it correctly updated self-mentions, which are not logged in the post_actions table. The solution was to scan the entire `posts` table with an `raw ILIKE` query. On sites with many posts, this can take a very long time. This commit updates the job to take a two-pass approach: First, we update posts based on the post_actions table. This is much more efficient than a full table scan, and takes care of all 'non-self' mentions. Then, we make a second pass using the `raw ILIKE` approach. Since we already took care of most posts, we can scope this down to self-mentions only. By filtering the query to a specific posts.user_id, it is significantly more performant than a full table scan. --- app/jobs/regular/update_username.rb | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/app/jobs/regular/update_username.rb b/app/jobs/regular/update_username.rb index 26c3f64099d..c56ac2ecf20 100644 --- a/app/jobs/regular/update_username.rb +++ b/app/jobs/regular/update_username.rb @@ -46,9 +46,21 @@ module Jobs def update_posts updated_post_ids = Set.new + # Other people mentioning this user + Post + .with_deleted + .joins(mentioned("posts.id")) + .where("a.user_id = :user_id", user_id: @user_id) + .find_each do |post| + update_post(post) + updated_post_ids << post.id + end + + # User mentioning self (not included in post_actions table) Post .with_deleted .where("raw ILIKE ?", "%@#{@old_username}%") + .where("posts.user_id = :user_id", user_id: @user_id) .find_each do |post| update_post(post) updated_post_ids << post.id