This bug is actually a Drupal issue where some edited posts have their `created` and `changed` timestamps set to the same value. But even when that happens in Drupal it still maintains the correct post order in an affected thread. This PR makes the Discourse importer also maintain the original Drupal comment order by sorting comments in the source DB by their `cid`, which is sequential and never changes. More details from this post onward:
https://meta.discourse.org/t/large-drupal-forum-migration-importer-errors-and-limitations/246939/24?u=rahim123
We'd like to lean on the DNS caching service for more than the standard
DB and Redis hosts, but without having to add additional code each time.
Define a new environment variable
DISCOURSE_DNS_CACHE_ADDITIONAL_SERVICE_NAMES (admittedly a mouthful)
which is a list of service names to be added to the static list at
process execution time.
For example, plugin foo may reference two services that you want to
cache the address of. By specifying the following two variables in the
process environment, cache_critical_dns will perform the lookup
alongside the DB and Redis host variables.
```
DISCOURSE_DNS_CACHE_ADDITIONAL_SERVICE_NAMES='FOO_SERVICE1,FOO_SERVICE2'
FOO_SERVICE1='foo.service1.example.com'
FOO_SERVICE1_SRV='foo._tcp.example.com'
FOO_SERVICE2='foo.service2.example.com'
```
The behaviour when it comes to SRV record lookup is the same as
previously implemented for the `DISCOURSE_DB_..` and
`DISCOURSE_REDIS_..` variables.
For the purposes of the health checks, services defined in the list _are
always considered healthy_. This is a compromise for conveniences sake.
Defining a dynamic method for health checks at runtime is not practical.
See t/88457/32.
1. Fix bug where we were not waiting for all unicorn workers to start up
before running benchmarks.
2. Fix a bug where headers were not used when benchmarking. Admin
benchmarks were basically running as anon user.
3. Disable rate limits when in profile env. We're pretty much going to
hit the rate limit every time as a normal user.
4. Benchmark against topic with a fixed posts count of 100. Previously profiling script was just randomly creating posts
and we would benchmark against a topic with a fixed posts count of 30.
Sometimes, the script fails because no topics with a posts count of 30
exists.
5. Benchmarks are not run against a normal user on top of anon and
admin.
6. Add script option to select tests that should be run.
This commit promotes all post_deploy migrations which existed in
Discourse v2.8.0 (timestamp <= 20220107014925).
This commit includes a fix to the promote_migrations script to promote
all migrations of the first version of the previous stable version. For
example, if the current stable version is v2.8.13, the version used as
a cutoff for promoting migrations is v2.8.0.
While load testing our user creation code path in production, we
identified that executing the DB statement to update the `Group#user_count` column within a
transaction is creating a bottleneck for us. This is because the
creation of a user and addition of the user to the relevant groups are
done in a transaction. When we execute the DB statement to update
`Group#user_count` for the relevant group, a row level lock is held
until the transaction completes. This row level lock acts like a global
lock when the server is creating users that will be added to the same
group in quick succession.
Instead of updating the counter cache within a transaction which the
default ActiveRecord `counter_cache` option does, we simply update the
counter cache outside of the committing transaction.
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
* Allow taking table prefix from env var
* FIX: remove unused column references
The columns `filedata` and `extension` are not present in a v4.2.4
database, and they aren't used in the method anyways.
* FIX: report progress for tables without imported_id
* FIX: effectively check for AR validation errors
NOTE: other migration scripts also have this problem; see /t/58202
* FIX: properly count Posts when importing attachments
* FIX: improve logging
* Remove leftover comment
* FIX: show progress when exporting Permalink file
* PERF: stream Permalink file
The current way results in tons of memory usage; write once per line instead
* Document fixes needed
* WIP - deduplicate category names
* Ignore non alphanumeric chars for grouping
* FIX: properly deduplicate user emails by merging accounts
* FIX: don't merge empty UserEmails
* Improve logging
* Merge users AFTER fixing primary key sequences
* Parallelize user merging
* Save duplicated users structure for debugging purposes
* Add progress logging for the (multiple hour) user merging step
We now use Ember CLI (core/plugins) and DiscourseJSProcessor (themes) for all Ember and template compilation. This commit removes the remnants of the legacy Sprockets-based Ember compilation system.
Sprockets, and its DiscourseJSProcess-based Babel transformations, is still in use for a few assets. Ideally that will be removed/replaced in the near future.
We are already caching any DB_HOST and REDIS_HOST (and their
accompanying replicas), we should also cache the resolved addresses for
the MessageBus specific Redis. This is a noop if no MB redis is defined
in config. A side effect is that the MB will also support SRV lookup and
priorities, following the same convention as the other cached services.
The port argument was added to redis_healthcheck so that the script
supports a setup where Redis is running on a non-default port.
Did some minor refactoring to improve readability when filtering out the
CRITICAL_HOST_ENV_VARS. The `select` block was a bit confusing, so the
sequence was made easier to follow.
We were coercing an environment variable to an int in a few places, so
the `env_as_int` method was introduced to do that coercion in one place and
for convenience purposes default to a value if provided.
See /t/68301/30.
* handle polls with duplicate items
* handle polls with incorrect poll_option_total values
* handle group IDs in personal messages
* support for version 3.3
There are situations where a container running Discourse may want to
cache the critical DNS services without running the cache_critical_dns
service, for example running migrations prior to running a full bore
application container.
Add a `--once` argument for the cache_critical_dns script that will
only execute the main loop once, and return the status code for the
script to use when exiting. 0 indicates no errors occured during SRV
resolution, and 1 indicates a failure during the SRV lookup.
Nothing is reported to prometheus in run_once mode. Generally this
mode of operation would be a part of a unix pipeline, in which the exit
status is a more meaningful and immediate signal than a prometheus metric.
The reporting has been moved into it's own method that can be called
only when the script is running as a service.
See /t/69597.
Describes the behaviour and configuration of the cache_critical_dns
script, mainly cribbed from commit messages. Tries to make this program
a bit less of an enigma.
The `PG::Connection#ping` method is only reliable for checking if the
given host is accepting connections, and not if the authentication
details are valid.
This extends the healthcheck to confirm that the auth details are
able to both create a connection and execute queries against the
database.
We expect the empty query to return an empty result set, so we can
assert on that. If a failure occurs for any reason, the healthcheck will
return false.
This commit migrates all bookmarks to be polymorphic (using the
bookmarkable_id and bookmarkable_type) columns. It also deletes
all the old code guarded behind the use_polymorphic_bookmarks setting
and changes that setting to true for all sites and by default for
the sake of plugins.
No data is deleted in the migrations, the old post_id and for_topic
columns for bookmarks will be dropped later on.
An SRV RR contains a priority value for each of the SRV targets that
are present, ranging from 0 - 65535. When caching SRV records we may want to
filter out any targets above or below a particular threshold.
This change adds support for specifying a lower and/or upper bound on
target priorities for any SRV RRs. Any targets returned when resolving
the SRV RR whose priority does not fall between the lower and upper
thresholds are ignored.
For example: Let's say we are running two Redis servers, a primary and
cold server as a backup (but not a replica). Both servers would pass health
checks, but clearly the primary should be preferred over the backup
server. In this case, we could configure our SRV RR with the primary
target as priority 1 and backup target as priority 10. The
`DISCOURSE_REDIS_HOST_SRV_LE` could then be set to 1 and the target with
priority 10 would be ignored.
See /t/66045.
`ActiveRecord::Base.connection_config` has been deprecated since Rails
6.1 and was completely removed from Rails 7.
Instead we need to use
`ActiveRecord::Base.connection_db_config.configuration_hash`.
Import scripts were forgotten when we did the Rails 7 upgrade, this
patch fixes them.
A bit of a mixed bag, this addresses several edge areas of bookmarks and makes them compatible with polymorphic bookmarks (hidden behind the `use_polymorphic_bookmarks` site setting). The main ones are:
* ExportUserArchive compatibility
* SyncTopicUserBookmarked job compatibility
* Sending different notifications for the bookmark reminders based on the bookmarkable type
* Import scripts compatibility
* BookmarkReminderNotificationHandler compatibility
This PR also refactors the `register_bookmarkable` API so it accepts a class descended from a `BaseBookmarkable` class instead. This was done because we kept having to add more and more lambdas/properties inline and it was very messy, so a factory pattern is cleaner. The classes can be tested independently as well.
Some later PRs will address some other areas like the discourse narrative bot, advanced search, reports, and the .ics endpoint for bookmarks.
This removes the option to override the sleep time between caching of
DNS records. The override was invalid because `''.to_i` is 0 in Ruby,
causing a tight loop calling the `run` method.
For Redis connections that operate over TLS, we need to ensure that we
are setting the correct arguments for the Redis client. We can utilise
the existing environment variable `DISCOURSE_REDIS_USE_SSL` to toggle
this behaviour.
No SSL verification is performed for two reasons:
- the Discourse application will perform a verification against any FQDN
as specified for the Redis host
- the healthcheck is run against the _resolved_ IP address for the Redis
hostname, and any SSL verification will always fail against a direct
IP address
If no SSL arguments are provided, the IP address is never cached against
the hostname as no healthy address is ever found in the HealthyCache.
Modify the cache_critical_dns script for SRV RR awareness. The new
behaviour is only enabled when one or more of the following environment
variables are present (and only for a host where the `DISCOURSE_*_HOST_SRV`
variable is present):
- `DISCOURSE_DB_HOST_SRV`
- `DISCOURSE_DB_REPLICA_HOST_SRV`
- `DISCOURSE_REDIS_HOST_SRV`
- `DISCOURSE_REDIS_REPLICA_HOST_SRV`
Some minor changes in refactor to original script behaviour:
- add Name and SRVName classes for storing resolved addresses for a hostname
- pass DNS client into main run loop instead of creating inside the loop
- ensure all times are UTC
- add environment override for system hosts file path and time between DNS
checks mainly for testing purposes
The environment variable for `BUNDLE_GEMFILE` is set to enables Ruby to
load gems that are installed and vendored via the project's Gemfile.
This script is usually not run from the project directory as it is
configured as a system service (see
71ba9fb7b5/templates/cache-dns.template.yml (L19))
and therefore cannot load gems like `pg` or `redis` from the default
load paths. Setting this environment variable configures bundler to look
in the correct project directory during it's setup phase.
When a `DISCOURSE_*_HOST_SRV` environment variable is present, the
decision for which target to cache is as follows:
- resolve the SRV targets for the provided hostname
- lookup the addresses for all of the resolved SRV targets via the
A and AAAA RRs for the target's hostname
- perform a protocol-aware healthcheck (PostgreSQL or Redis pings)
- pick the newest target that passes the healthcheck
From there, the resolved address for the SRV target is cached against
the hostname as specified by the original form of the environment
variable.
For example: The hostname specified by the `DISCOURSE_DB_HOST` record
is `database.example.com`, and the `DISCOURSE_DB_HOST_SRV` record is
`database._postgresql._tcp.sd.example.com`. An SRV RR lookup will return
zero or more targets. Each of the targets will be queried for A and AAAA
RRs. For each of the addresses returned, the newest address that passes
a protocol-aware healthcheck will be cached. This address is cached so
that if any newer address for the SRV target appears we can perform a
health check and prefer the newer address if the check passes.
All resolved SRV targets are cached for a minimum of 30 minutes in memory
so that we can prefer newer hosts over older hosts when more than one target
is returned. Any host in the cache that hasn't been seen for more than 30
minutes is purged.
See /t/61485.
If someone types `yes` rather than `YES`, continue anyway.
The chance of typing `yes`, when you actually want to stop, is non-existent. The chance of typing `yes` when you meant `YES` is high, and it's very frustrating when the script quite because you got the case wrong!
This commit promotes all post_deploy migrations which existed in Discourse v2.7.13 (timestamp <= 20210328233843)
This reduces the likelihood of issues relating to migration run order
Also fixes a couple of typos in `script/promote_migrations`
This allows text editors to use correct syntax coloring for the heredoc sections.
Heredoc tag names we use:
languages: SQL, JS, RUBY, LUA, HTML, CSS, SCSS, SH, HBS, XML, YAML/YML, MF, ICS
other: MD, TEXT/TXT, RAW, EMAIL
We validate the *format* of email addresses in many places with a match against
a regex, often with very slightly different syntax.
Adding a separate EmailAddressValidator simplifies the code in a few spots and
feels cleaner.
Deprecated the old location in case someone is using it in a plugin.
No functionality change is in this commit.
Note: the regex used at the moment does not support using address literals, e.g.:
* localpart@[192.168.0.1]
* localpart@[2001:db8::1]
* Optional import of custom user fields from phpBB 3.1+
* Optional import of likes from phpBB3
Requires the phpBB "Thanks for posts" extension
* Fix import of bookmarks from phpBB3
* Update `created_at` of existing user
* Support mapping of phpBB forums to existing Discourse categories
This is in addition to the ability of merging phpBB forums and importing into newly created Discourse categories.
1. bbcode hashes don't always have exactly 8 characters.
2. colors aren't always hex values, it can be a color string ("red", "blue", etc).
3. The closing tag of smileys doesn't always include a `:` character (the start of the regex was already right for this particular issue)
* File.exists? is deprecated and removed in Ruby 3.2 in favor of
File.exist?
* Dir.exists? is deprecated and removed in Ruby 3.2 in favor of
Dir.exist?
The discourse base image already contains a postgres installation, so pulling a separate postgres image is a little wasteful. Using the copy of Postgres in the discourse image saves about 20 seconds on every GitHub actions run.
This commit sets up Postgres with a few performance-improving flags, which we were already using for the `rake docker:test` task (used on our internal CI system).
Some tables in the database have constraints on columns with dates. Because of them, the script for moving timestamps can fail from time to time. This PR makes the script work with such tables.
In general, in PostgreSQL it is not always possible to defer constraint checks to the transaction commit (Primary Keys and Unique Constraints can be deferred, but them should be declared as DEFERRABLE to make it possible. Indices created with CREATE UNIQUE INDEX can't be deferred at all).
Since we can't defer constraint checks, I've made it work using a little hack. For example, if we need to move all timestamps by one day, the script will move timestamps by 1000 years and one day, and then return timestamps back by 1000 years. The script use this hack only for columns that have unique constraints.
This will fix the try-reset build that failed today. Probably this going to happen again with other tables that have constraints on date columns. I'm going to modify the script to make it work without ignoring such tables. After that, the only table we're going to need to ignore will be the 2FA table.
Before I fixed that, don't hesitate to tag me if the try-reset build fail again.
Without checking if t.table_schema = '#{@schema}' the SELECT with JOIN in the script were returning every column twice in case there is a 'backup' scheme with exactly the same tables as in the 'public scheme'
We're going to use this script for updating timestamps on Try, but it can be used with a local database during development as well.
Usage:
Commands:
ruby db_timestamp_updater.rb yesterday <date> move all timestamps by x days so that <date> will be moved to yesterday
ruby db_timestamp_updater.rb 100 move all timestamps forward by 100 days
ruby db_timestamp_updater.rb -100 move all timestamps backward by 100 days
The script moves all timestamps in the database by the same amount of days forward or backward. No need to change the script if we add a new column in the future.
The more simple solution would be just to move timestamps in several tables (topics, posts, and so on). I didn't want to go that way because it could generate additional work in the future. For example, if we add a new column with a timestamp and users can see that timestamp we'd need to add that column to the script. Or, for example, if we move a post's timestamp to the future but forget to move a timestamp of topic timer or user action it can cause weird bugs.
Post-deploy migrations exist to allow for seamless Discourse upgrades. By design, they cause migrations to run out of numerical order. This has the potential to cause some unexpected edge cases. To reduce the likelihood of these edge cases, we will promote historical post_deploy migrations to regular migrations after a full Discourse stable release cycle.
This script is intended to be run at least during every Discourse release cycle.
This means that truly seamless upgrades will not be possible between non-consecutive Discourse versions. (Upgrades will still work, but may cause some server errors for users during the upgrade)
Setting a random value in the interval 1 week ago ... now works better
because this spreads digest scheduling over a week because digests are
sent one week from the date of the last digest.
Over the years we accrued many spelling mistakes in the code base.
This PR attempts to fix spelling mistakes and typos in all areas of the code that are extremely safe to change
- comments
- test descriptions
- other low risk areas
This is a pretty straightforward bulk importer, just tailored to the vBulletin 5 database structure.
Also made a few minor improvements to the base importer -- should be self explanatory in the code.
Wrote up a new script to import from Higher Logic. Nothing too crazy going on here. Two major things about this script:
It requires you to convert a Microsoft SQL file to a format MySQL can read.
Higher Logic stores posts (at least in the case of the import I ran) with the email thread shown in the post body. The script does its best to truncate this out, but the logic may need to be improved on future imports. For the import I ran, it worked just fine as is. 🤷♂️
Made some improvements to the Vanilla MySQL script -- mainly because not all SQL imports require use of the VanillaBodyParser. Still left it as an option to turn on and use if so desired. Also added subcategory support, importing of likes, and solve status.
Includes:
* DEV: Remove external plugin linting (that's covered by CI in their repositories)
* DEV: Move lint stages to a separate workflow (partial de-`if`-ication of workflows)
* DEV: Run CI on `main` branch too
* DEV: Update postgres to 13
* DEV: Update redis to 6.x
Other changes:
* DEV: Remove matrix.os
* DEV: Remove env.BUILD_TYPE
* DEV: Remove env.TARGET
* DEV: Rename `build_types` config option to `build_type`
* DEV: Lowercase `target` and `build_type` names
* DEV: Rename `ci` to `tests`
* DEV: Rename `lint` to `linting`
* DEV: Lower the wizard qunit timeout (30 min -> 10)
* DEV: Ruby version is no longer configurable
* DEV: Run plugin tests only in the `plugins` target
* DEV: Use binstubs where applicable
* DEV: We don't open PRs to `tests-passed`
This is an importer I wrote to restore some users that were
accidentally deleted for being purged as old staged users or old
unactivated users.
It reads from CSV files exported from a discourse sql backup.
When running an import script there are many site settings that are
changed but we reset them back to where they were originally before the
import. However, there are two settings that we don't roll back:
```
purge_unactivated_users_grace_period_days
purge_deleted_uploads_grace_period_days
```
which could have some unintended consequences.
My first question is do we *really* have to change these settings? I'm
not a huge fan of changing someones settings without them really knowing
they were changed.
If we really do have to change these settings here is my proposed PR
where we don't alter the `purge_unactivated_users_grace_period_days` if
it has been disabled.
As I'm writing this another change we could make is that we don't change
either of these site settings if we detect that they aren't set to the
default values.
The drive behind this PR is that there is a discourse instance which
relies on staged users as part of their workflow and this setting was
changed by accident via the import script causing users to be deleted
that shouldn't have been.
You can use `discourse restore --location=local FILENAME` if you want to restore a backup that is stored locally even though the `backup_location` has the value `s3`.
The 'Discourse SSO' protocol is being rebranded to DiscourseConnect. This should help to reduce confusion when 'SSO' is used in the generic sense.
This commit aims to:
- Rename `sso_` site settings. DiscourseConnect specific ones are prefixed `discourse_connect_`. Generic settings are prefixed `auth_`
- Add (server-side-only) backwards compatibility for the old setting names, with deprecation notices
- Copy `site_settings` database records to the new names
- Rename relevant translation keys
- Update relevant translations
This commit does **not** aim to:
- Rename any Ruby classes or methods. This might be done in a future commit
- Change any URLs. This would break existing integrations
- Make any changes to the protocol. This would break existing integrations
- Change any functionality. Further normalization across DiscourseConnect and other auth methods will be done separately
The risks are:
- There is no backwards compatibility for site settings on the client-side. Accessing auth-related site settings in Javascript is fairly rare, and an error on the client side would not be security-critical.
- If a plugin is monkey-patching parts of the auth process, changes to locale keys could cause broken error messages. This should also be unlikely. The old site setting names remain functional, so security-related overrides will remain working.
A follow-up commit will be made with a post-deploy migration to delete the old `site_settings` rows.
* FEATURE: Import attachments
* FEATURE: Add support for importing multiple forums in one
* FEATURE: Add support for category and tag mapping
* FEATURE: Import groups
* FIX: Add spaces around images
* FEATURE: Custom mapping of user rank to trust levels
* FIX: Do not fail import if it cannot import polls
* FIX: Optimize existing records lookup
Co-authored-by: Gerhard Schlager <mail@gerhard-schlager.at>
Co-authored-by: Jarek Radosz <jradosz@gmail.com>
After running the Discourse merge script, it was pretty evident it held up well after all these years ;)
Made a few fixes:
Included an environment variable for DB_PASS as likely the password will need to be changed if running the import in an official Docker container (recommended)
Set a hard order for imported categories, otherwise sometimes they'd be imported in a weird order making things unpredictable for parent/child category imports
Fixed a couple of instances where we added unique indexes (such as on category slugs)
Set up upload regex to handle AWS URLs better
Fixed the script to work with frozen string literals
This commit adds an additional find_user_by_email hook to ManagedAuthenticator so that GitHub login can continue to support secondary email addresses
The github_user_infos table will be dropped in a follow-up commit.
This is the last core authenticator to be migrated to ManagedAuthenticator 🎉