mirror of
https://github.com/mattermost/mattermost.git
synced 2025-02-25 18:55:24 -06:00
* MM-19548: Add a deadlock retry function for SaveChannel
A deadlock has been seen to occur in the upsertPublicChannelT method
during bulk import.
Here is a brief excerpt:
*** (1) TRANSACTION:
TRANSACTION 3141, ACTIVE 1 sec inserting
INSERT INTO
PublicChannels(Id, DeleteAt, TeamId, DisplayName, Name, Header, Purpose)
VALUES
(?, ?, ?, ?, ?, ?, ?)
ON DUPLICATE KEY UPDATE
DeleteAt = ?,
TeamId = ?,
DisplayName = ?,
Name = ?,
Header = ?,
Purpose = ?
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 76 page no 4 n bits 104 index Name of table `mydb`.`PublicChannels` trx id 3141 lock_mode X locks gap before rec insert intention waiting
** (2) TRANSACTION:
TRANSACTION 3140, ACTIVE 1 sec inserting
mysql tables in use 1, locked 1
5 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 2
MySQL thread id 50, OS thread handle 140641523848960, query id 3226 172.17.0.1 mmuser update
INSERT INTO
PublicChannels(Id, DeleteAt, TeamId, DisplayName, Name, Header, Purpose)
VALUES
(?, ?, ?, ?, ?, ?, ?)
ON DUPLICATE KEY UPDATE
DeleteAt = ?,
TeamId = ?,
DisplayName = ?,
Name = ?,
Header = ?,
Purpose = ?
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 76 page no 4 n bits 104 index Name of table `mydb`.`PublicChannels` trx id 3140 lock_mode X locks gap before rec
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 76 page no 4 n bits 104 index Name of table `mydb`.`PublicChannels` trx id 3140 lock_mode X locks gap before rec insert intention waiting
*** WE ROLL BACK TRANSACTION (1)
Following is my analysis:
From the deadlock output, it can be seen that it's due to a gap lock.
And that's clear because the index is Name which is a multi-column index using Name and TeamId.
But interestingly, both transactions seem to be inserting the same data, which is what is puzzling me.
The multi-column index on Name and TeamId will guarantee that they are always unique. And from looking at the code,
it does not seem possible to me that it will try to insert the same data from 2 different transactions.
But even if they do, why does tx 2 try to acquire the same lock again when it already has that ?
Here is what I think the order of events happening
Tx 2 gets a gap lock.
Tx 1 tries to get the same gap lock.
Tx 2 tries to again get the same gap lock ?
The last step is what is puzzling me. Why does an UPSERT statement acquire 2 gap locks ? From my reading of https://dev.mysql.com/doc/refman/8.0/en/innodb-locks-set.html:
> INSERT ... ON DUPLICATE KEY UPDATE differs from a simple INSERT in that an exclusive lock rather than a shared lock is placed on the row to be updated when a duplicate-key error occurs. An exclusive index-record lock is taken for a duplicate primary key value. An exclusive next-key lock is taken for a duplicate unique key value.
From what I understand, the expectation is that there will be one X lock and one gap lock is taken.
But that's not what the deadlock output seems to say.
The general advice on the internet seems to be that deadlocks will happen and not all of them can be understood.
For now, we add a generic deadlock retry function at the store package which can be reused by other queries too.
P.S.: This is a verbatim copy of my investigation posted at https://dba.stackexchange.com/questions/268652/mysql-deadlock-upsert-query-acquiring-gap-lock-twice
Testing:
This is ofcourse hard to test because it is impossible to reproduce this. I have tested this by manually returning an error
and confirming that it indeed retries.
WARN[2020-06-19T11:18:24.9585676+05:30] A deadlock happened. Retrying. caller="sqlstore/channel_store.go:568" error="Error 1213: mydeadlock"
WARN[2020-06-19T11:18:24.959158+05:30] A deadlock happened. Retrying. caller="sqlstore/channel_store.go:568" error="Error 1213: mydeadlock"
WARN[2020-06-19T11:18:24.9595072+05:30] A deadlock happened. Retrying. caller="sqlstore/channel_store.go:568" error="Error 1213: mydeadlock"
WARN[2020-06-19T11:18:24.9595451+05:30] Deadlock happened 3 times. Giving up caller="sqlstore/channel_store.go:579"
ERRO[2020-06-19T11:18:24.9596426+05:30] Unable to save channel. caller="mlog/log.go:175" err_details="Error 1213: mydeadlock" err_where=CreateChannel http_code=500 ip_addr="::1" method=POST path=/api/v4/channels request_id=745bsj13b7f6mnmsbn3t97grbw user_id=xcof1ipipbrfxpfjf6x4p6kx9e
* Fix tests
* MM-25890: Fix deadlock on deleting emoji reactions
A deadlock happens because `UPDATE_POST_HAS_REACTIONS_ON_DELETE_QUERY` is being called from 2 separate places.
1. From `DeleteAllWithEmojiName` where it's called as an independent query.
2. From `deleteReactionAndUpdatePost` where it's called as part of a transaction along with another DELETE query.
The deadlock occurs in such a scenario:
- tx #2 acquires an X lock from the DELETE query.
- tx #1 tries to acquire a S lock with the select query, but it's waiting for the X lock to be released from tx #2
- tx #2 now tries to acquire an S lock, but it can't because it is locked on tx #1.
Deadlock.
I have tested this and it does indeed deadlock. The root of the problem is that the Primary key is a multi-column index,
which means that a next-key lock has to be acquired to get a lock on the gap before the index.
Both the queries try to delete some reactions, and then select the new number of reactions. But they select different rows
due to which this happens.
This might just be an unavoidable deadlock due to the way the indexes are setup and next-key locks.
Unless we change the primary key to be a single-column index, it will be very hard to avoid this.
Therefore we just go with a simple retry.
* fix i18n
* address review comments
* address code review
* Fix scopelint
Co-authored-by: Mattermod <mattermod@users.noreply.github.com>