mirror of
https://github.com/discourse/discourse.git
synced 2025-02-25 18:55:32 -06:00
FEATURE: explicitly ban outlier traffic sources in robots.txt (#11553)
Googlebot handles no-index headers very elegantly. It advises to leave as many routes as possible open and uses headers for high fidelity rules regarding indexes.
Discourse adds special `x-robot-tags` noindex headers to users, badges, groups, search and tag routes.
Following up on b52143feff we now have it so Googlebot gets special handling.
Rest of the crawlers get a far more aggressive disallow list to protect against excessive crawling.
This commit is contained in:
@@ -91,6 +91,8 @@ RSpec.describe RobotsTxtController do
|
||||
i = response.body.index('User-agent: *')
|
||||
expect(i).to be_present
|
||||
expect(response.body[i..-1]).to include("Disallow: /auth/")
|
||||
# we have to insert Googlebot for special handling
|
||||
expect(response.body[i..-1]).to include("User-agent: Googlebot")
|
||||
end
|
||||
|
||||
it "can allowlist user agents" do
|
||||
|
||||
Reference in New Issue
Block a user