Previously we used custom fields to denote a user was anonymous, this was
risky in that custom fields are prone to race conditions and are not
properly dedicated, missing constraints and so on.
The new table `anonymous_users` is properly protected. There is only one
possible shadow account per user, which is enforced using a constraint.
Every anonymous user will have a unique row in the new table.
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.
Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
`Upload#url` is more likely and can change from time to time. When it
does changes, we don't want to have to look through multiple tables to
ensure that the URLs are all up to date. Instead, we simply associate
uploads properly to `UserProfile` so that it does not have to replicate
the URLs in the table.
"Rejecting" a user in the queue is equivalent to deleting them, which
would then making it impossible to review rejected users. Now we store
information about the user in the payload so if they are deleted things
still display in the Rejected view.
Secondly, if a user is destroyed outside of the review queue, it will
now automatically "Reject" that queue item.
This commit fixes the follow quality issue with `PostSearchData#raw_data`:
1. URLs are being tokenized and links with similar href and characters
are being duplicated in the raw data.
`Post#cooked`:
```
<p><a href=\"https://meta.discourse.org/some.png\" class=\"onebox\" target=\"_blank\" rel=\"nofollow noopener\">https://meta.discourse.org/some.png</a></p>
```
`PostSearchData#raw_data` Before:
```
This is a test topic 0 Uncategorized https://meta.discourse.org/some.png discourse org/some png https://meta.discourse.org/some.png discourse org/some png
```
`PostSearchData#raw_data` After:
```
This is a test topic 0 Uncategorized https://meta.discourse.org/some.png meta discourse org
```
2. Ligthbox being included in search pollutes the
`PostSearchData#raw_data` unncessarily.
From 28 March 2018 to 28 March 2019, searches for the term `image` on
`meta.discourse.org` had a click through rate of 2.1%. Non-lightboxed images are not included in indexing for search yet we were indexing content within a lightbox. Also, search for terms like `image` was affected we were using `Pasted image` as the filename for
uploads that were pasted.
`Post#cooked`
```
<p>Let me see how I can fix this image<br>\n<div class=\"lightbox-wrapper\"><a class=\"lightbox\" href=\"https://meta.discourse.org/some.png\" title=\"some.png\" rel=\"nofollow noopener\"><img src=\"https://meta.discourse.org/some.png\" width=\"275\" height=\"299\"><div class=\"meta\">\n<svg class=\"fa d-icon d-icon-far-image svg-icon\" aria-hidden=\"true\"><use xlink:href=\"#far-image\"></use></svg><span class=\"filename\">some.png</span><span class=\"informations\">1750×2000</span><svg class=\"fa d-icon d-icon-discourse-expand svg-icon\" aria-hidden=\"true\"><use xlink:href=\"#discourse-expand\"></use></svg>\n</div></a></div></p>
```
`PostSearchData#raw_data` Before:
```
This is a test topic 0 Uncategorized Let me see how I can fix this image some.png png https://meta.discourse.org/some.png discourse org/some png some.png png 1750×2000
```
`PostSearchData#raw_data` After:
```
This is a test topic 0 Uncategorized Let me see how I can fix this image
```
In terms of indexing performance, we now have to parse the given HTML
through nokogiri twice. However performance is not a huge worry here since a string length of 194170 takes only 30ms
to scrub plus the indexing takes place in a background job.
Includes support for flags, reviewable users and queued posts, with REST API
backwards compatibility.
Co-Authored-By: romanrizzi <romanalejandro@gmail.com>
Co-Authored-By: jjaffeux <j.jaffeux@gmail.com>
* This is causing certain posts to appear in searches incorrectly as `PostSearchData#raw_data` contains the outdated title, category name and tag names.
Migrates email user options to a new data structure, where `email_always`, `email_direct` and `email_private_messages` are replace by
* `email_messages_level`, with options: `always`, `only_when_away` and `never` (defaults to `always`)
* `email_level`, with options: `always`, `only_when_away` and `never` (defaults to `only_when_away`)
* FEATURE: Account for `ignored_users` when merging two users
## Why?
This is part of the [Ability to ignore a user feature](https://meta.discourse.org/t/ability-to-ignore-a-user/110254/8).
When we merge two users, we need to account for merging their list of `ignored_users` too.
The webpush gem by default sets the expiration date of the JWT token to exactly 24 hours in the future. That's not really needed because the token isn't reused. And it might cause UnauthorizedRegistration if the server's clock isn't 100% correct, because the maximum allowed value is 24 hours.