discourse/spec/lib
Osama Sayegh 7bd3986b21
FEATURE: Replace Crawl-delay directive with proper rate limiting (#15131)
We have a couple of site setting, `slow_down_crawler_user_agents` and `slow_down_crawler_rate`, that are meant to allow site owners to signal to specific crawlers that they're crawling the site too aggressively and that they should slow down.

When a crawler is added to the `slow_down_crawler_user_agents` setting, Discourse currently adds a `Crawl-delay` directive for that crawler in `/robots.txt`. Unfortunately, many crawlers don't support the `Crawl-delay` directive in `/robots.txt` which leaves the site owners no options if a crawler is crawling the site too aggressively.

This PR replaces the `Crawl-delay` directive with proper rate limiting for crawlers added to the `slow_down_crawler_user_agents` list. On every request made by a non-logged in user, Discourse will check the User Agent string and if it contains one of the values of the `slow_down_crawler_user_agents` list, Discourse will only allow 1 request every N seconds for that User Agent (N is the value of the `slow_down_crawler_rate` setting) and the rest of requests made within the same interval will get a 429 response. 

The `slow_down_crawler_user_agents` setting becomes quite dangerous with this PR since it could rate limit lots if not all of anonymous traffic if the setting is not used appropriately. So to protect against this scenario, we've added a couple of new validations to the setting when it's changed:

1) each value added to setting must 3 characters or longer
2) each value cannot be a substring of tokens found in popular browser User Agent. The current list of prohibited values is: apple, windows, linux, ubuntu, gecko, firefox, chrome, safari, applewebkit, webkit, mozilla, macintosh, khtml, intel, osx, os x, iphone, ipad and mac.
2021-11-30 12:55:25 +03:00
..
backup_restore DEV: Improve multisite testing (#14884) 2021-11-11 16:44:58 +00:00
compression DEV: Split max decompressed setting for themes and backups (#8179) 2019-10-11 14:38:10 -03:00
content_security_policy FIX: Allow CSP to work correctly for non-default hostnames/schemes (#9180) 2020-03-19 19:54:42 +00:00
i18n FEATURE: Add English (UK) as locale (#11768) 2021-01-20 21:32:22 +01:00
imap/providers FEATURE: Improve group email settings UI (#13083) 2021-05-28 09:28:18 +10:00
onebox FIX: Display Instagram Oneboxes in an iframe (#14789) 2021-11-02 14:34:51 -04:00
seed_data DEV: Add rubocop-rspec (#9288) 2020-03-27 17:35:40 +01:00
site_settings FEATURE: Replace Crawl-delay directive with proper rate limiting (#15131) 2021-11-30 12:55:25 +03:00
topic_query FIX: Exclude PMs that user sent to themselves. (#14496) 2021-10-04 11:55:35 +08:00
validators FEATURE: Add timezone to core user_options (#8380) 2019-11-25 10:49:27 +10:00
webauthn DEV: Correct typos and spelling mistakes (#12812) 2021-05-21 11:43:47 +10:00
bookmark_manager_spec.rb FEATURE: Topic-level bookmarks (#14353) 2021-09-21 08:45:47 +10:00
bookmark_query_spec.rb FEATURE: Go to last unread for topic-level bookmark links (#14396) 2021-09-21 13:49:56 +10:00
bookmark_reminder_notification_handler_spec.rb DEV: Ignore bookmarks.topic_id column and remove references to it in code (#14289) 2021-09-15 10:16:54 +10:00
browser_detection_spec.rb FIX: Detect DiscourseHub user agent. 2019-08-09 11:58:15 +03:00
content_security_policy_spec.rb DEV: prevents flakky spec when deleting plugin (#14701) 2021-10-25 10:24:21 +02:00
db_helper_spec.rb FEATURE: Include optimized thumbnails for topics (#9215) 2020-05-05 09:07:50 +01:00
discourse_js_processor_spec.rb Support for transpiling .js files (#9160) 2020-03-11 09:43:55 -04:00
encodings_spec.rb DEV: use #frozen_string_literal: true on all spec 2019-04-30 10:27:42 +10:00
introduction_updater_spec.rb FIX: replace default welcome topic post with new value from wizard 2020-04-01 15:42:45 -04:00
mini_sql_multisite_connection_spec.rb DEV: upgrade mini_sql (#12465) 2021-03-24 08:48:04 +11:00
onebox_spec.rb DEV: Absorb onebox gem into core (#12979) 2021-05-26 15:11:35 +05:30
post_jobs_enqueuer_spec.rb FIX: Do not send emails to mailing_list_mode subscribers for PMs (#14159) 2021-08-26 15:16:35 +10:00
s3_cors_rulesets_spec.rb DEV: Improve s3:ensure_cors_rules logging (#14832) 2021-11-08 11:44:12 +10:00
search_spec.rb FEATURE: Hide suspended users from site-wide search to regular users (#14245) 2021-09-06 09:59:35 -04:00
shrink_uploaded_image_spec.rb DEV: Improve script/downsize_uploads.rb (#13508) 2021-06-24 00:09:40 +02:00
theme_flag_modifier_spec.rb PERF: Eager load Theme associations in Stylesheet Manager. 2021-06-21 11:06:58 +08:00
theme_javascript_compiler_spec.rb FEATURE: Introduce theme/component QUnit tests (take 2) (#12661) 2021-04-12 15:02:58 +03:00
topic_upload_security_manager_spec.rb DEV: Clean up S3 specs, stubs, and helpers 2020-09-28 12:02:25 +01:00
upload_creator_spec.rb DEV: Remove xlink hrefs (#15059) 2021-11-25 15:22:43 +11:00
upload_recovery_spec.rb DEV: Recover missing files of existing uploads (#10757) 2020-10-01 14:54:45 +02:00
upload_security_spec.rb FIX: manually adds frowning_face_with_open_mouth for apple (#13528) 2021-07-21 23:27:20 +02:00