mirror of
https://github.com/discourse/discourse.git
synced 2024-11-25 18:30:26 -06:00
PERF: Avoid full posts
table scans during anonymisation (#21081)
2e78045a
fixed the anonymization job so that it correctly updated self-mentions, which are not logged in the post_actions table. The solution was to scan the entire `posts` table with an `raw ILIKE` query. On sites with many posts, this can take a very long time.
This commit updates the job to take a two-pass approach:
First, we update posts based on the post_actions table. This is much more efficient than a full table scan, and takes care of all 'non-self' mentions.
Then, we make a second pass using the `raw ILIKE` approach. Since we already took care of most posts, we can scope this down to self-mentions only. By filtering the query to a specific posts.user_id, it is significantly more performant than a full table scan.
This commit is contained in:
parent
fa5a423681
commit
93c33e02f0
@ -46,9 +46,21 @@ module Jobs
|
||||
def update_posts
|
||||
updated_post_ids = Set.new
|
||||
|
||||
# Other people mentioning this user
|
||||
Post
|
||||
.with_deleted
|
||||
.joins(mentioned("posts.id"))
|
||||
.where("a.user_id = :user_id", user_id: @user_id)
|
||||
.find_each do |post|
|
||||
update_post(post)
|
||||
updated_post_ids << post.id
|
||||
end
|
||||
|
||||
# User mentioning self (not included in post_actions table)
|
||||
Post
|
||||
.with_deleted
|
||||
.where("raw ILIKE ?", "%@#{@old_username}%")
|
||||
.where("posts.user_id = :user_id", user_id: @user_id)
|
||||
.find_each do |post|
|
||||
update_post(post)
|
||||
updated_post_ids << post.id
|
||||
|
Loading…
Reference in New Issue
Block a user