* api4/user: add verify user by id method
* Update api4/user.go
Co-Authored-By: Miguel de la Cruz <miguel@mcrx.me>
* Update model/client4.go
Co-Authored-By: Miguel de la Cruz <miguel@mcrx.me>
* api4/user: reflect review comments
* Update api4/user_test.go
Co-authored-by: Miguel de la Cruz <miguel@mcrx.me>
Co-authored-by: Miguel de la Cruz <miguel@mcrx.me>
Co-authored-by: mattermod <mattermod@users.noreply.github.com>
* Two missing methods to add in the channel layer
* Added delete user/channel posts methods
- Created in both search engines but only implemented in ES
- Add those methods in the search layer
- Included the PermanentDeleteByUser/Channel methods
* Two new delete documents are included in the bleve code with this
change:
- DeleteChannelPosts
- DeleteUserPosts
These two new functions delete post documents from the index-based
in the filed value provided
While resetting permissions, we were not removing old migration state residue
for the EMOJIS_PERMISSIONS_MIGRATION_KEY and GUEST_ROLES_CREATION_MIGRATION_KEY.
Therefore, during their migration, they got skipped because the code checks
if a migration key is already present or not.
To fix this, we remove the migration key just as we do for ADVANCED_PERMISSIONS_MIGRATION_KEY.
Co-authored-by: Mattermod <mattermod@users.noreply.github.com>
* MM-25404: Add GetHealthScore metric for the cluster
This PR adds the necessary function to the cluster interface
so that they can be called from App code.
* Add mocks
* Remove fakeapp
Co-authored-by: Mattermod <mattermod@users.noreply.github.com>
* MM-19548: Add a deadlock retry function for SaveChannel
A deadlock has been seen to occur in the upsertPublicChannelT method
during bulk import.
Here is a brief excerpt:
*** (1) TRANSACTION:
TRANSACTION 3141, ACTIVE 1 sec inserting
INSERT INTO
PublicChannels(Id, DeleteAt, TeamId, DisplayName, Name, Header, Purpose)
VALUES
(?, ?, ?, ?, ?, ?, ?)
ON DUPLICATE KEY UPDATE
DeleteAt = ?,
TeamId = ?,
DisplayName = ?,
Name = ?,
Header = ?,
Purpose = ?
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 76 page no 4 n bits 104 index Name of table `mydb`.`PublicChannels` trx id 3141 lock_mode X locks gap before rec insert intention waiting
** (2) TRANSACTION:
TRANSACTION 3140, ACTIVE 1 sec inserting
mysql tables in use 1, locked 1
5 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 2
MySQL thread id 50, OS thread handle 140641523848960, query id 3226 172.17.0.1 mmuser update
INSERT INTO
PublicChannels(Id, DeleteAt, TeamId, DisplayName, Name, Header, Purpose)
VALUES
(?, ?, ?, ?, ?, ?, ?)
ON DUPLICATE KEY UPDATE
DeleteAt = ?,
TeamId = ?,
DisplayName = ?,
Name = ?,
Header = ?,
Purpose = ?
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 76 page no 4 n bits 104 index Name of table `mydb`.`PublicChannels` trx id 3140 lock_mode X locks gap before rec
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 76 page no 4 n bits 104 index Name of table `mydb`.`PublicChannels` trx id 3140 lock_mode X locks gap before rec insert intention waiting
*** WE ROLL BACK TRANSACTION (1)
Following is my analysis:
From the deadlock output, it can be seen that it's due to a gap lock.
And that's clear because the index is Name which is a multi-column index using Name and TeamId.
But interestingly, both transactions seem to be inserting the same data, which is what is puzzling me.
The multi-column index on Name and TeamId will guarantee that they are always unique. And from looking at the code,
it does not seem possible to me that it will try to insert the same data from 2 different transactions.
But even if they do, why does tx 2 try to acquire the same lock again when it already has that ?
Here is what I think the order of events happening
Tx 2 gets a gap lock.
Tx 1 tries to get the same gap lock.
Tx 2 tries to again get the same gap lock ?
The last step is what is puzzling me. Why does an UPSERT statement acquire 2 gap locks ? From my reading of https://dev.mysql.com/doc/refman/8.0/en/innodb-locks-set.html:
> INSERT ... ON DUPLICATE KEY UPDATE differs from a simple INSERT in that an exclusive lock rather than a shared lock is placed on the row to be updated when a duplicate-key error occurs. An exclusive index-record lock is taken for a duplicate primary key value. An exclusive next-key lock is taken for a duplicate unique key value.
From what I understand, the expectation is that there will be one X lock and one gap lock is taken.
But that's not what the deadlock output seems to say.
The general advice on the internet seems to be that deadlocks will happen and not all of them can be understood.
For now, we add a generic deadlock retry function at the store package which can be reused by other queries too.
P.S.: This is a verbatim copy of my investigation posted at https://dba.stackexchange.com/questions/268652/mysql-deadlock-upsert-query-acquiring-gap-lock-twice
Testing:
This is ofcourse hard to test because it is impossible to reproduce this. I have tested this by manually returning an error
and confirming that it indeed retries.
WARN[2020-06-19T11:18:24.9585676+05:30] A deadlock happened. Retrying. caller="sqlstore/channel_store.go:568" error="Error 1213: mydeadlock"
WARN[2020-06-19T11:18:24.959158+05:30] A deadlock happened. Retrying. caller="sqlstore/channel_store.go:568" error="Error 1213: mydeadlock"
WARN[2020-06-19T11:18:24.9595072+05:30] A deadlock happened. Retrying. caller="sqlstore/channel_store.go:568" error="Error 1213: mydeadlock"
WARN[2020-06-19T11:18:24.9595451+05:30] Deadlock happened 3 times. Giving up caller="sqlstore/channel_store.go:579"
ERRO[2020-06-19T11:18:24.9596426+05:30] Unable to save channel. caller="mlog/log.go:175" err_details="Error 1213: mydeadlock" err_where=CreateChannel http_code=500 ip_addr="::1" method=POST path=/api/v4/channels request_id=745bsj13b7f6mnmsbn3t97grbw user_id=xcof1ipipbrfxpfjf6x4p6kx9e
* Fix tests
* MM-25890: Fix deadlock on deleting emoji reactions
A deadlock happens because `UPDATE_POST_HAS_REACTIONS_ON_DELETE_QUERY` is being called from 2 separate places.
1. From `DeleteAllWithEmojiName` where it's called as an independent query.
2. From `deleteReactionAndUpdatePost` where it's called as part of a transaction along with another DELETE query.
The deadlock occurs in such a scenario:
- tx #2 acquires an X lock from the DELETE query.
- tx #1 tries to acquire a S lock with the select query, but it's waiting for the X lock to be released from tx #2
- tx #2 now tries to acquire an S lock, but it can't because it is locked on tx #1.
Deadlock.
I have tested this and it does indeed deadlock. The root of the problem is that the Primary key is a multi-column index,
which means that a next-key lock has to be acquired to get a lock on the gap before the index.
Both the queries try to delete some reactions, and then select the new number of reactions. But they select different rows
due to which this happens.
This might just be an unavoidable deadlock due to the way the indexes are setup and next-key locks.
Unless we change the primary key to be a single-column index, it will be very hard to avoid this.
Therefore we just go with a simple retry.
* fix i18n
* address review comments
* address code review
* Fix scopelint
Co-authored-by: Mattermod <mattermod@users.noreply.github.com>
* MM-19548: Add a deadlock retry function for SaveChannel
A deadlock has been seen to occur in the upsertPublicChannelT method
during bulk import.
Here is a brief excerpt:
*** (1) TRANSACTION:
TRANSACTION 3141, ACTIVE 1 sec inserting
INSERT INTO
PublicChannels(Id, DeleteAt, TeamId, DisplayName, Name, Header, Purpose)
VALUES
(?, ?, ?, ?, ?, ?, ?)
ON DUPLICATE KEY UPDATE
DeleteAt = ?,
TeamId = ?,
DisplayName = ?,
Name = ?,
Header = ?,
Purpose = ?
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 76 page no 4 n bits 104 index Name of table `mydb`.`PublicChannels` trx id 3141 lock_mode X locks gap before rec insert intention waiting
** (2) TRANSACTION:
TRANSACTION 3140, ACTIVE 1 sec inserting
mysql tables in use 1, locked 1
5 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 2
MySQL thread id 50, OS thread handle 140641523848960, query id 3226 172.17.0.1 mmuser update
INSERT INTO
PublicChannels(Id, DeleteAt, TeamId, DisplayName, Name, Header, Purpose)
VALUES
(?, ?, ?, ?, ?, ?, ?)
ON DUPLICATE KEY UPDATE
DeleteAt = ?,
TeamId = ?,
DisplayName = ?,
Name = ?,
Header = ?,
Purpose = ?
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 76 page no 4 n bits 104 index Name of table `mydb`.`PublicChannels` trx id 3140 lock_mode X locks gap before rec
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 76 page no 4 n bits 104 index Name of table `mydb`.`PublicChannels` trx id 3140 lock_mode X locks gap before rec insert intention waiting
*** WE ROLL BACK TRANSACTION (1)
Following is my analysis:
From the deadlock output, it can be seen that it's due to a gap lock.
And that's clear because the index is Name which is a multi-column index using Name and TeamId.
But interestingly, both transactions seem to be inserting the same data, which is what is puzzling me.
The multi-column index on Name and TeamId will guarantee that they are always unique. And from looking at the code,
it does not seem possible to me that it will try to insert the same data from 2 different transactions.
But even if they do, why does tx 2 try to acquire the same lock again when it already has that ?
Here is what I think the order of events happening
Tx 2 gets a gap lock.
Tx 1 tries to get the same gap lock.
Tx 2 tries to again get the same gap lock ?
The last step is what is puzzling me. Why does an UPSERT statement acquire 2 gap locks ? From my reading of https://dev.mysql.com/doc/refman/8.0/en/innodb-locks-set.html:
> INSERT ... ON DUPLICATE KEY UPDATE differs from a simple INSERT in that an exclusive lock rather than a shared lock is placed on the row to be updated when a duplicate-key error occurs. An exclusive index-record lock is taken for a duplicate primary key value. An exclusive next-key lock is taken for a duplicate unique key value.
From what I understand, the expectation is that there will be one X lock and one gap lock is taken.
But that's not what the deadlock output seems to say.
The general advice on the internet seems to be that deadlocks will happen and not all of them can be understood.
For now, we add a generic deadlock retry function at the store package which can be reused by other queries too.
P.S.: This is a verbatim copy of my investigation posted at https://dba.stackexchange.com/questions/268652/mysql-deadlock-upsert-query-acquiring-gap-lock-twice
Testing:
This is ofcourse hard to test because it is impossible to reproduce this. I have tested this by manually returning an error
and confirming that it indeed retries.
WARN[2020-06-19T11:18:24.9585676+05:30] A deadlock happened. Retrying. caller="sqlstore/channel_store.go:568" error="Error 1213: mydeadlock"
WARN[2020-06-19T11:18:24.959158+05:30] A deadlock happened. Retrying. caller="sqlstore/channel_store.go:568" error="Error 1213: mydeadlock"
WARN[2020-06-19T11:18:24.9595072+05:30] A deadlock happened. Retrying. caller="sqlstore/channel_store.go:568" error="Error 1213: mydeadlock"
WARN[2020-06-19T11:18:24.9595451+05:30] Deadlock happened 3 times. Giving up caller="sqlstore/channel_store.go:579"
ERRO[2020-06-19T11:18:24.9596426+05:30] Unable to save channel. caller="mlog/log.go:175" err_details="Error 1213: mydeadlock" err_where=CreateChannel http_code=500 ip_addr="::1" method=POST path=/api/v4/channels request_id=745bsj13b7f6mnmsbn3t97grbw user_id=xcof1ipipbrfxpfjf6x4p6kx9e
* Fix tests
* Address review comments
* Address review comments
* Add forgotten test
Co-authored-by: Mattermod <mattermod@users.noreply.github.com>
* Ensure time is different when second update operation occurs
* Change all sleeps to one ms
Co-authored-by: Mattermod <mattermod@users.noreply.github.com>
* add ServiceProviderIdentifier to config
* Update config, add unit test
* fix unit test, update i18n
* add english translation for error
Co-authored-by: mattermod <mattermod@users.noreply.github.com>
* MM-25261: Add 'include_deleted' parameter to get all channels API request.
* MM-25261: Adds test for deleted channels.
* MM-25261: Switches to single liner.
* MM-25261: Adhere to beta config setting for viewing archived channels.
* MM-25261: Test fix.
Co-authored-by: Mattermod <mattermod@users.noreply.github.com>
* MM-26203: Fix TestChannelStore//SearchAllChannels flake
Using random strings for channel names creates chances of collisions
when searching for specific strings.
We increase the search space increasing the size of the search string
to be searched. This reduces collisions and makes the tests more reliable.
For the "off-" family of tests, this probably needs more length. But
it would also need the tests to be changed slightly as the "off-" part
acts as a prefix. I accidentally stumbled upon it while running the test
50 times. Adding a "-" is probably good enough for now. We can revisit
if it fails again in CI.
* improved team_store tests too
* MM-24674 Update channel members by group to look at distinct autoTimezone and manualTimezone.
Instead of just doing a blanket distinct on the column since that returns more entries than expected
* Use JSON extract instead of parsing text
* Use single quotes in mysql query too
* Dont need to prepend users on unambiguous column
* Use json extract instead of shorthand
* Dont count timezone if timezone default length
* CI
Co-authored-by: mattermod <mattermod@users.noreply.github.com>
* MM-25710: Use an efficient cache serialization algorithm
We investigate 3 packages for selecting a suitable replacement
for gob encoding. The algorithm chosen was msgpack which gives
a decent boost over the standard gob encoding.
Any external schema dependent algorithms like protobuf, flatbuffers, avro,
capn'proto were not considered as that would entail converting the model structs
into separate schema objects and then code generating the Go structs.
It could be done theoretically at a later stage specifically for structs
which are in the hot path. This is a general solution for now.
The packages considered were:
- github.com/tinylib/msgp
- github.com/ugorji/go/codec
- github.com/vmihailenco/msgpack/v5
msgp uses code generation to generate encoding/decoding code without the reflection overhead.
Theoretically, therefore this is supposed to give the fastest performance. However, a major
flaw in that package is that it works only at a file/directory level, not at a package level.
Therefore, for structs which are spread across multiple files, it becomes near to impossible
to chase down all transitive dependencies to generate the code. Even if that's done, it fails
on some complex type like xml.Name and time.Time. (See: https://github.com/tinylib/msgp/issues/274#issuecomment-643654611)
Therefore, we are left with 2 choices. Both of them use the same underlying algorithm.
But msgpack/v5 wraps the encoders/decoders in a sync.Pool. To make a perfect apples-apples
comparison, I wrote a sync.Pool for ugorji/go/codec too and compared performance.
msgpack/v5 came out to be the fastest by a small margin.
benchstat master.txt ugorji.txt
name old time/op new time/op delta
LRU/simple=new-8 5.62µs ± 3% 3.68µs ± 2% -34.64% (p=0.000 n=10+10)
LRU/complex=new-8 38.4µs ± 2% 9.1µs ± 2% -76.38% (p=0.000 n=9+9)
LRU/User=new-8 75.8µs ± 2% 23.5µs ± 2% -69.01% (p=0.000 n=10+10)
LRU/Post=new-8 125µs ± 2% 21µs ± 3% -82.92% (p=0.000 n=9+10)
LRU/Status=new-8 27.6µs ± 1% 5.4µs ± 4% -80.34% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
LRU/simple=new-8 3.20kB ± 0% 1.60kB ± 0% -49.97% (p=0.000 n=10+10)
LRU/complex=new-8 15.7kB ± 0% 4.4kB ± 0% -71.89% (p=0.000 n=9+10)
LRU/User=new-8 33.5kB ± 0% 9.2kB ± 0% -72.48% (p=0.000 n=10+8)
LRU/Post=new-8 38.7kB ± 0% 4.8kB ± 0% -87.48% (p=0.000 n=10+10)
LRU/Status=new-8 10.6kB ± 0% 1.7kB ± 0% -83.50% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
LRU/simple=new-8 46.0 ± 0% 20.0 ± 0% -56.52% (p=0.000 n=10+10)
LRU/complex=new-8 324 ± 0% 48 ± 0% -85.19% (p=0.000 n=10+10)
LRU/User=new-8 622 ± 0% 108 ± 0% -82.64% (p=0.000 n=10+10)
LRU/Post=new-8 902 ± 0% 74 ± 0% -91.80% (p=0.000 n=10+10)
LRU/Status=new-8 242 ± 0% 22 ± 0% -90.91% (p=0.000 n=10+10)
11:31:48-~/mattermost/mattermost-server/services/cache2$benchstat master.txt vmi.txt
name old time/op new time/op delta
LRU/simple=new-8 5.62µs ± 3% 3.68µs ± 3% -34.59% (p=0.000 n=10+10)
LRU/complex=new-8 38.4µs ± 2% 8.7µs ± 3% -77.45% (p=0.000 n=9+10)
LRU/User=new-8 75.8µs ± 2% 20.9µs ± 1% -72.45% (p=0.000 n=10+10)
LRU/Post=new-8 125µs ± 2% 21µs ± 2% -83.08% (p=0.000 n=9+10)
LRU/Status=new-8 27.6µs ± 1% 5.1µs ± 3% -81.66% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
LRU/simple=new-8 3.20kB ± 0% 1.60kB ± 0% -49.89% (p=0.000 n=10+10)
LRU/complex=new-8 15.7kB ± 0% 4.6kB ± 0% -70.87% (p=0.000 n=9+8)
LRU/User=new-8 33.5kB ± 0% 10.3kB ± 0% -69.40% (p=0.000 n=10+9)
LRU/Post=new-8 38.7kB ± 0% 6.0kB ± 0% -84.62% (p=0.000 n=10+10)
LRU/Status=new-8 10.6kB ± 0% 1.9kB ± 0% -82.41% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
LRU/simple=new-8 46.0 ± 0% 20.0 ± 0% -56.52% (p=0.000 n=10+10)
LRU/complex=new-8 324 ± 0% 46 ± 0% -85.80% (p=0.000 n=10+10)
LRU/User=new-8 622 ± 0% 106 ± 0% -82.96% (p=0.000 n=10+10)
LRU/Post=new-8 902 ± 0% 89 ± 0% -90.13% (p=0.000 n=10+10)
LRU/Status=new-8 242 ± 0% 23 ± 0% -90.50% (p=0.000 n=10+10)
In general, we can see that the time to marshal/unmarshal pays off as the size of the struct
increases.
We can see that msgpack/v5 is faster for CPU but very slightly heavier on memory.
Since we are interested in fastest speed, we choose msgpack/v5.
As a future optimization, we can use a mix of msgpack and msgp for hot structs.
To do that, we would need to shuffle around some code so that for the hot struct,
all its dependencies are in the same file.
Let's use this in production for some time, watch grafana graphs for the hottest caches
and come back to optimizing this more once we have more data.
Side note: we have to do with micro-benchmarks for the time being, because all the caches
aren't migrated to cache2 interface yet. Once that's in, we can actually run some load tests
and do comparisons.
* Bring back missing import
* Fix tests
* new job type created that checks for expired mobile sessions and pushes notifications.
* only send session expired notifications if ExtendSessionLengthWithActivity is enabled.
* includes schema change: field added to Sessions table
* Improve CreatePostCheckOnlineStatus
We improve the flakyness of the test by making the websocket
status checks synchronous after making each request.
This makes the status responses more reliable because in very
busy CI environments the goroutine scheduler can indeed send
the response which is made by a later HTTP request before the first one.
* Update api4/post_test.go
Co-authored-by: Ibrahim Serdar Acikgoz <serdaracikgoz86@gmail.com>
Co-authored-by: Ibrahim Serdar Acikgoz <serdaracikgoz86@gmail.com>