* MM-27916: Improve logging when session is not found
The error handling in this code is pretty bad and the same error message happens
for multiple conditions, making it difficult to diagnose the real issue.
Most of the times, we get a log like:
```
"Invalid session {"error": "GetSession: Invalid session token=jodb6sau47rnugaqj1fy7khmpr, err=<no value>, "}"
```
And it could have happened from multiple places. So this log turns out to be not that useful.
We improve this by populating the Error field to fix the "<no value>" issue and also add a separate detailed error
field for each log line to uniquely identify each error.
* fix
Co-authored-by: Mattermod <mattermod@users.noreply.github.com>
* MM-27525: Obscures the Global Relay SMTP password.
* MM-27525: Desanitize global relay's SMTP password.
* MM-27525: Does not set the fake value if the field is blank.
When we are reading from or putting to the cache, there is no need to deep copy
objects now because the cache is already serialized.
We also delete some lines from tests because the mock store directly returns the pointers
whereas actually they would be returned from the database where a serialization occurs again.
So we would be testing for the wrong thing by unnecessarily keeping the deep copies for DB reads.
https://mattermost.atlassian.net/browse/MM-27883
Co-authored-by: Mattermod <mattermod@users.noreply.github.com>
* Add a config for MM User Limit
* Adding graceful errors for if an administrator invites people passed their user limit
* Including changed vendor files
* Adding unit test
* Fix a bug
* Push up working tests (Thanks Joram)
* Add more cases, clean up logs in code
* One more case
* Refactoring based on PR comments
* Updating i18n
* Some changes based on PR review
* Remove a comment
* Bring back some translations that were somehow removed
Co-authored-by: Mattermod <mattermod@users.noreply.github.com>
* MM-27648: Fix a hub deadlock while revoking session
This is a bug which has always been there in the codebase.
And it can only occur in the extreme of edge-cases.
Following is the call trace due to which this happens:
```
0 0x0000000001dfea68 in github.com/mattermost/mattermost-server/v5/app.(*Hub).InvalidateUser // deadlock
at ./app/web_hub.go:369
1 0x0000000001dfc0bd in github.com/mattermost/mattermost-server/v5/app.(*App).InvalidateWebConnSessionCacheForUser
at ./app/web_hub.go:109
2 0x0000000001db1be5 in github.com/mattermost/mattermost-server/v5/app.(*App).ClearSessionCacheForUserSkipClusterSend
at ./app/session.go:209
3 0x0000000001db1763 in github.com/mattermost/mattermost-server/v5/app.(*App).ClearSessionCacheForUser
at ./app/session.go:170
4 0x0000000001db2d2f in github.com/mattermost/mattermost-server/v5/app.(*App).RevokeSession
at ./app/session.go:275
5 0x0000000001db2c09 in github.com/mattermost/mattermost-server/v5/app.(*App).RevokeSessionById
at ./app/session.go:260
6 0x0000000001daf442 in github.com/mattermost/mattermost-server/v5/app.(*App).GetSession
at ./app/session.go:93
7 0x0000000001df93f4 in github.com/mattermost/mattermost-server/v5/app.(*WebConn).IsAuthenticated
at ./app/web_conn.go:271
8 0x0000000001dfa29b in github.com/mattermost/mattermost-server/v5/app.(*WebConn).shouldSendEvent
at ./app/web_conn.go:323
9 0x0000000001e2667e in github.com/mattermost/mattermost-server/v5/app.(*Hub).Start.func1.3 // starting from hub
at ./app/web_hub.go:491
10 0x0000000001e27c01 in github.com/mattermost/mattermost-server/v5/app.(*Hub).Start.func1
at ./app/web_hub.go:504
11 0x0000000001e27ee2 in github.com/mattermost/mattermost-server/v5/app.(*Hub).Start.func2
at ./app/web_hub.go:528
12 0x0000000000473811 in runtime.goexit
at /usr/local/go/src/runtime/asm_amd64.s:1373
```
The stars have to align in such a way that the session idle timeout _has_ to happen
_exactly_ when a broadcast is happening for that user. Only then, this code path gets
triggered.
Since this is an extreme rabbit hole of calls, I have not attempted any big
refactors and went with the most sensible approach which is to make the RevokeSessionById
call asynchronous.
There are 2 main reasons:
- It was already treated as an asynchronous call because it happened during an error condition
and we were not checking for the return value anyways.
- Session idle timeout is a relatively infrequent event, so creating unbounded goroutines is not a concern.
As a bonus, we also get to check the error return and log it.
https://mattermost.atlassian.net/browse/MM-27648
* Add a test case
* Fix an incorrect comment
* To allow for ease of testing telemetry changes, we should make it so that the rudder key and dataplane URL can be customized through the config or environment.
* Instead of using a real config element that would be exposed to end users, we'll just use 'secret' environment variables to inject Rudder config data.
Co-authored-by: Alex Dovenmuehle <alex.dovenmuehle@mattermost.com>
Co-authored-by: Mattermod <mattermod@users.noreply.github.com>
* MM-27512: Use an authenticated user to bump up request rate limit
An unauthenticated user can only make 60 requests per hour which means 1
request every minute. This can lead to frequent rate limit errors while
getting the latest release.
We change that to use an authenticated user which is already available
in the CI. This moves us to make 5000 requests per hour.
We also add additional logging in the Makefile targets in case
the command fails again so that it's clear what has happened, and not return
cryptic 404 errors again.
Ideally, we should be able to inspect the output of the curl command, but since
the output value of the entire bash script is fed into the variable, it is a bit
difficult to print debug output.
If this still gives error, then we need to either use a cached artifact somehow
or add additional logging and add a retry logic on top of it.
* fix mistake
* add warning as success result
* revert config changes
* add unit test, update unit test
* add a couple more tests
* update store layers
Co-authored-by: Mattermod <mattermod@users.noreply.github.com>
Server panics when user joins team and one of the default channels has been archived.
This can be triggered by archiving "Off-Topic" or any of the channels listed in `TeamSettings.ExperimentalDefaultChannels`.
* storetest/channel_store: fix flaky test for search too short term
* storetest/channel_store: use another stop word instead
Co-authored-by: Mattermod <mattermod@users.noreply.github.com>
* MM-27507: Propagate rate limit errors to client
We return an error from SendInviteEmails instead of just logging it
to let the client know that a rate limit error has happened.
The status code is chosen as 413 (entity too large) instead of 429 (too many requests)
because it's not the request which is rate limited, but the payload inside it which is.
Ideally, the email sending should have been implemented by a queue which would just return
an error to the client when full. That is also why we are not returning an X-Retry-After
and X-Reset-After in the headers because that would mix with the actual rate limiting.
A separate header X-Email-Invite-Reset-After might do the job, but it comes at an extra cost
of additional API surface and a clunky API. Instead, that information is contained in the error
response. The web client needs to just surface the error. An API client will have to do
a bit more work to parse the error if it needs to automatically know when to retry. Given that
an email sending client is not a very common use case, we decide to keep the API clean.
This decision can be revisited if it becomes problematic in the future.
https://mattermost.atlassian.net/browse/MM-27507
* Fixing translations
* Added retry_after and reset_after in API response.
Summary
- Attempt at fixing the AutocompleteInTeam/empty_string test which is flaky on MySQL-5.6. This is likely an index_merge_intersection optimization issue. See https://community.mattermost.com/core/pl/uzgygwa9e3d4pns14ayfs7mobw
for more discussion. The fix here simply modifies the where clause to change the query plan so index_merge_intersection is not used, while also not introducing a temporary table or a filesort.
Ticket Link
- https://mattermost.atlassian.net/browse/MM-26837
Our codebase uses main_test.go files to control tests at a package level.
However, the problem with that is when we use ./... to build the codebase,
those binaries also get built.
This is unnecessary and creates further delays in the pipeline. All we need to build
are only mattermost and platform binaries in CI. The rest are dependant on the developer.
In my dev environment, this reduces the build times from 13 seconds to 4 seconds.
https://mattermost.atlassian.net/browse/MM-27735
* MM-26575: Move app initialization inside Websocket router out of handler
The websocket router struct is shared amongst multiple handlers. Therefore,
there is a race condition while setting the app field and reading from it.
We move the initialization of the app field when the router struct is initialized.
And we also remove the initializations of some app fields which caused race
conditions by being written from multiple goroutines. They are already being
written once inside the AppInitializedOnce.Do method and it's redundant
to set them again to the same value.
* Add missing fields in server connector
Co-authored-by: Mattermod <mattermod@users.noreply.github.com>
* MM-27040: Fix flaky tests due to same UpdateAt timestamp
Update queries are not guaranteed to always update the row.
Since we are only bumping the UpdateAt timestamps, if 2 update
requests hit concurrently within the same millisecond, then one
is bound to fail.
We fix all cases in the codebase where we were updating the UpdateAt
field to not check for count != 1, but rather count > 1.
A similar fix was already done in 3f46cf6f60.
* Remove incorrect test