discourse/app/views/robots_txt/index.erb
Sam f40f10240c FEATURE: remove topic rss from robots
Crawlers love hitting the rss feeds (confirmed that both Google and Bing do)

Experimenting with the impact of blocking these feeds and forcing Crawlers to hit
the content direct. It is better if they hit the actual page to start with as opposed to

1. Hit RSS feed
2. Find new content
3. Hit post link
4. Get canonical
5. Hit canonical

Lots of pointless work.

We do not know for sure what impact this will have on newsreader apps,
we will listen for feedback.
2018-04-11 11:57:52 +10:00

48 lines
1.1 KiB
Plaintext

# See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
#
<% @allowed_user_agents.each do |user_agent| %>
User-agent: <%= user_agent %>
<% end %>
Disallow: /auth/cas
Disallow: /auth/facebook/callback
Disallow: /auth/twitter/callback
Disallow: /auth/google/callback
Disallow: /auth/yahoo/callback
Disallow: /auth/github/callback
Disallow: /auth/cas/callback
Disallow: /assets/browser-update*.js
Disallow: /users/
Disallow: /u/
Disallow: /badges/
Disallow: /search
Disallow: /search/
Disallow: /tags
Disallow: /tags/
Disallow: /email/
Disallow: /session
Disallow: /session/
Disallow: /admin
Disallow: /admin/
Disallow: /user-api-key
Disallow: /user-api-key/
Disallow: /*?api_key*
Disallow: /*?*api_key*
Disallow: /groups
Disallow: /groups/
Disallow: /t/*/*.rss
<% if @disallowed_user_agents %>
<% @disallowed_user_agents.each do |user_agent| %>
User-agent: <%= user_agent %>
Disallow: /
<% end %>
<% end %>
<%= server_plugin_outlet "robots_txt_index" %>
<% @crawler_delayed_agents.each do |agent, delay| %>
User-agent: <%= agent %>
Crawl-delay: <%= delay %>
<% end %>