discourse/app/views
Osama Sayegh 7bd3986b21
FEATURE: Replace Crawl-delay directive with proper rate limiting (#15131)
We have a couple of site setting, `slow_down_crawler_user_agents` and `slow_down_crawler_rate`, that are meant to allow site owners to signal to specific crawlers that they're crawling the site too aggressively and that they should slow down.

When a crawler is added to the `slow_down_crawler_user_agents` setting, Discourse currently adds a `Crawl-delay` directive for that crawler in `/robots.txt`. Unfortunately, many crawlers don't support the `Crawl-delay` directive in `/robots.txt` which leaves the site owners no options if a crawler is crawling the site too aggressively.

This PR replaces the `Crawl-delay` directive with proper rate limiting for crawlers added to the `slow_down_crawler_user_agents` list. On every request made by a non-logged in user, Discourse will check the User Agent string and if it contains one of the values of the `slow_down_crawler_user_agents` list, Discourse will only allow 1 request every N seconds for that User Agent (N is the value of the `slow_down_crawler_rate` setting) and the rest of requests made within the same interval will get a 429 response. 

The `slow_down_crawler_user_agents` setting becomes quite dangerous with this PR since it could rate limit lots if not all of anonymous traffic if the setting is not used appropriately. So to protect against this scenario, we've added a couple of new validations to the setting when it's changed:

1) each value added to setting must 3 characters or longer
2) each value cannot be a substring of tokens found in popular browser User Agent. The current list of prohibited values is: apple, windows, linux, ubuntu, gecko, firefox, chrome, safari, applewebkit, webkit, mozilla, macintosh, khtml, intel, osx, os x, iphone, ipad and mac.
2021-11-30 12:55:25 +03:00
..
about FIX: Support Ruby 3 keyword arguments 2021-10-05 11:25:00 -04:00
admin/backups FEATURE: further restrict downloading of backups 2017-03-01 08:28:34 -07:00
application FIX: Offer site_logo_dark_url as an option for dark mode themes (#14361) 2021-09-16 17:47:51 -04:00
badges FIX: in case of orphan user records skip badge 2019-08-30 17:21:34 +10:00
categories FIX: Resolve Schema.org validation issues 2020-05-05 16:57:16 +03:00
common Code review comments. 2021-06-21 11:06:58 +08:00
default FIX: Add a title to the groups pages 2016-07-25 14:24:43 -04:00
email FIX: Show Uncategorized when unsubscribing (#13832) 2021-07-26 12:19:30 +10:00
embed FIX: Ensure embedded replies/reply-to links open in _blank (#14597) 2021-10-13 21:34:30 +01:00
exceptions FIX: Hide empty popular/recent sections in 404 page (#10811) 2020-10-02 15:11:15 -04:00
finish_installation Upgrade to FontAwesome 5 (take two) (#6673) 2018-11-26 16:49:57 -05:00
groups FEATURE: add title tag for group detail page (#13702) 2021-07-12 20:05:57 +05:30
invites FIX: better handling of invite links after they are redeemed 2018-05-08 20:17:57 +05:30
layouts PERF: Move preload hints to the <head> (#15008) 2021-11-18 18:02:16 +00:00
list FIX: do not show spoiler content in RSS (#14277) 2021-09-08 20:19:43 +05:30
metadata DEV: Add support for Rails 6 2019-05-02 16:23:25 +10:00
offline UX: Remove Helvetica from our font stack (#11876) 2021-02-05 17:01:21 -05:00
posts FIX: do not show spoiler content in RSS (#14277) 2021-09-08 20:19:43 +05:30
published_pages FIX: use normal logo in published pages if small not available. 2020-09-21 09:20:39 +05:30
qunit DEV: Don't try to load admin locales in tests (#14917) 2021-11-13 15:31:55 +01:00
robots_txt FEATURE: Replace Crawl-delay directive with proper rate limiting (#15131) 2021-11-30 12:55:25 +03:00
safe_mode Upgrade to FontAwesome 5 (take two) (#6673) 2018-11-26 16:49:57 -05:00
search UX: better title on search page 2017-10-27 09:13:04 +05:30
session FEATURE: Rename 'Discourse SSO' to DiscourseConnect (#11978) 2021-02-08 10:04:33 +00:00
static DEV: Remove xlink hrefs (#15059) 2021-11-25 15:22:43 +11:00
tags FIX: Use new tag routes (#8683) 2020-01-21 19:23:08 +02:00
topics FIX: rename action_code_href to action_code_path (#14834) 2021-11-08 14:32:17 +11:00
user_api_keys FEATURE: Delegated authentication via user api keys (#7272) 2019-04-01 13:18:53 -04:00
user_notifications Add min-width rule to fix header display issues on the Android Gmail app (#13827) 2021-07-23 14:21:03 -07:00
users DEV: Ignore bookmarks.topic_id column and remove references to it in code (#14289) 2021-09-15 10:16:54 +10:00
users_email DEV: Hash tokens stored from email_tokens (#14493) 2021-11-25 09:34:39 +02:00
wizard Code review comments. 2021-06-21 11:06:58 +08:00