Fixes a bug and adds a feature for shared channels:
- The Bug: when creating new shared channels, users that had already been sync'd via another channel were not added to the new channel's member list, since the users were not sync'd again. This PR sync's users per channel.
- The Feature: support custom statuses
Support for handling username collisions between remote clusters. Users belonging to remote clusters have their username changed to include the remote name e.g. wiggin becomes wiggin:mattermost.
@mentions are also modified so the munged username is replaced with the original username when the post is sync'd with the remote the user belongs to.
When adding remote users:
- append the remote name to the username with colon separator
- append the remote name to the email address with colon separator
- store the original username and email address in user props
- when resolving @mentions replace with the stored original username
* Add extract documents content command
* Adding the extraction command and making the pure go pdf library as secondary option
* Improving the memory usage and docextractor interface
* Enable content extraction by default in all the instances
* Tiny improvement on archive indexing
* Adding App interface generation and the opentracing layer
* Fixing linter errors
* Addressing PR review comments
* Addressing PR review comments
Remote Cluster Service
- provides ability for multiple Mattermost cluster instances to create a trusted connection with each other and exchange messages
- trusted connections are managed via slash commands (for now)
- facilitates features requiring inter-cluster communication, such as Shared Channels
Shared Channels Service
- provides ability to shared channels between one or more Mattermost cluster instances (using trusted connection)
- sharing/unsharing of channels is managed via slash commands (for now)
* MM-33818: Add replica lag metric
We add two new metrics for monitoring replica lag:
- Monitor absolute lag based on binlog distance/transaction queue length.
- Monitor time taken for the replica to catch up.
To achieve this, we add a config setting to run a user defined SQL query
on the database.
We need to specify a separate datasource field as part of the config because
in some databases, querying the replica lag value requires elevated credentials
which are not needed for usual running of the application, and can even be a security risk.
Arguably, a peculiar part of the design is the requirement of the query output to be in a (node, value)
format. But since from the application, the SQL query is a black box and the user can set any query
they want, we cannot, in any way templatize this.
And as an extra note, freno also does it in a similar way.
The last bit is because we need to have a separate datasources, now we consume one extra connection
rather than sharing it with the pool. This is an unfortunate result of the design, and while one extra
connection doesn't make much of a difference in a single-tenant scenario. It does make so, in a multi-tenant scenario.
But in a multi-tenant scenario, the expectation would already be to use a connection pool. So this
is not a big concern.
https://mattermost.atlassian.net/browse/MM-33818
```release-note
Two new gauge metrics were added:
mattermost_db_replica_lag_abs and mattermost_db_replica_lag_time, both
containing a label of "node", signifying which db host is the metric from.
These metrics signify the replica lag in absolute terms and in the time dimension
capturing the whole picture of replica lag.
To use these metrics, a separate config section ReplicaLagSettings was added
under SqlSettings. This is an array of maps which contain three keys: DataSource,
QueryAbsoluteLag, and QueryTimeLag. Each map entry is for a single replica instance.
DataSource contains the DB credentials to connect to the replica instance.
QueryAbsoluteLag is a plain SQL query that must return a single row of which the first column
must be the node value of the Prometheus metric, and the second column must be the value of the lag.
QueryTimeLag is the same as above, but used to measure the time lag.
As an example, for AWS Aurora instances, the QueryAbsoluteLag can be:
select server_id, highest_lsn_rcvd-durable_lsn as bindiff from aurora_global_db_instance_status() where server_id=<>
and QueryTimeLag can be:
select server_id, visibility_lag_in_msec from aurora_global_db_instance_status() where server_id=<>
For MySQL Group Replication, the absolute lag can be measured from the number of pending transactions
in the applier queue:
select member_id, count_transaction_remote_in_applier_queue FROM performance_schema.replication_group_member_stats where member_id=<>
Overall, what query to choose is left to the administrator, and depending on the database and need, an appropriate
query can be chosen.
```
* Trigger CI
* Fix tests
* address review comments
* Remove t.Parallel
It was spawning too many connections,
and overloading the docker container.
Co-authored-by: Mattermod <mattermod@users.noreply.github.com>
* MM-34080: Removing sqlite entirely
The initial commit missed removing this blank import.
So the library still remained in our vendor directory.
Removing it for good now.
Bye bye sqlite.
https://mattermost.atlassian.net/browse/MM-34080
* fix go.mod
This is a deadlock due to reversed locking order of the SidebarChannels
and SidebarCategories table.
I could not find the exact culprit query from the deadlock output
because it only shows the last query a transaction is running. And from looking
at the code, the only query that runs "UPDATE SidebarCategories SET DisplayName = ?, Sorting = ? WHERE Id = ?"
is UpdateSidebarCategories. But for the deadlock to happen, it has to lock
SidebarChannels _first_, and then _then_ lock SidebarCategories.
Looking a bit more throughly, I found that DeleteSidebarCategory does indeed
lock the tables in an inverse way and if DeleteSidebarCategory runs concurrently with
UpdateSidebarCategories, they will deadlock.
Here's how it will happen.
```
tx1
DELETE FROM SidebarChannels WHERE CategoryId = 'xx';
tx2
UPDATE SidebarCategories SET DisplayName='dn' WHERE Id='xx';
tx2
DELETE FROM SidebarChannels WHERE (ChannelId IN ('yy') AND CategoryId = 'xx');
tx1
DELETE FROM SidebarCategories WHERE Id = 'xx';
```
And then we see:
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
To fix this, we simply reorder the Delete query to lock the SidebarCategories first,
and then SidebarChannels.
In fact, any transaction updating/deleting rows from those two tables should always operate on that order
if possible.
https://mattermost.atlassian.net/browse/MM-31396
```release-note
Fixed a database deadlock that can happen if a sidebar category is updated and deleted at the same time.
```
* MM-33789: Revert fallback to master for GetAllProfilesInChannel
This fixes a regression introduced in https://github.com/mattermost/mattermost-server/pull/16911.
It was causing problems with too many invalidations and overloading the writer instance for big installations.
Reverting this does not affect correctness at all because it was done out of abundance of caution and the
idea at that point was it was to be done for all caches.
https://mattermost.atlassian.net/browse/MM-33789
```release-note
NONE
```
* fix gofmt issues
* MM-31094: Adds tooling to develop and test using a MySQL instance with replication lag. Adds some lazy lookups to fallback to master if results are not found.
* MM-31094: Removes mysql-read-replica from default docker services.
* MM-31094: Switches (store..SessionStore).Get and (store.TeamStore).GetMember to using context.Context.
* MM-31094: Updates (store.UsersStore).Get to use context.
* MM-31094: Updates (store.PostStore).Get to use context.
* MM-31094: Removes feature flag and config setting.
* MM-31094: Rolls back some master reads.
* MM-31094: Rolls a non-cache read.
* MM-31094: Removes feature flag from the store.
* MM-31094: Removes unused constant and struct field.
* MM-31094: Removes some old feature flag references.
* MM-31094: Fixes some tests.
* MM-31094: App layers fix.
* MM-31094: Fixes mocks.
* MM-31094: Don't reparse flag.
* MM-31094: No reparse.
* MM-31094: Removed unused FeatureFlags field.
* MM-31094: Removes unnecessary feature flags variable declarations.
* MM-31094: Fixes copy-paste error.
* MM-31094: Fixes logical error.
* MM-30194: Removes test method from store.
* Revert "MM-30194: Removes test method from store."
This reverts commit d5a6e8529b.
* MM-31094: Conforming to make's strange syntax.
* MM-31094: Configures helper for read replica with option.
* MM-31094: Adds some missing ctx's.
* MM-31094: WIP
* MM-31094: Updates test names.
* MM-31094: WIP
* MM-31094: Removes unnecessary master reads.
* MM-31094: ID case changes out of scope.
* MM-31094: Removes unused context.
* MM-31094: Switches to a helper. Removes some var naming changes. Fixes a merge error.
* MM-31094: Removes SQLITE db driver ref.
* MM-31094: Layer generate fix.
* MM-31094: Removes unnecessary changes.
* MM-31094: Moves test method.
* MM-31094: Re-add previous fix.
* MM-31094: Removes make command for dev.
* MM-31094: Fix for login.
Co-authored-by: Mattermod <mattermod@users.noreply.github.com>
* Create basic make commands for configuring golang-migrate
* Showcase full flow with new migrations
* Apply PR suggestions
* Migrate over team members
* Update mocks
* Fix specs
* Move columns that added after table creation onto separate stmts
* Put back gorp table definitions
* Fix issues with golang-migrate that not tracks underlying db driver
* Help prompt after new migration and consistent checksum for bindata
* Put gorp mapping back
* Apply PR suggestiong
* Close migrations after they run
* Add migration file to bindata check
* Updates needed
* Reset store_test
* Add copyright
* Apply PR suggestions
* Fix new circleci check
* Put back upgrade step for backwards comp
* Add store test to test migration directions
* Apply PR suggestions
* Add go-bindata to tools
* Apply PR suggestios
Co-authored-by: Mattermod <mattermod@users.noreply.github.com>