Robots TXT Generator

Disallow Directories:

Each path must contain: "/"

Sitemap:

Usually xml file

Search engines:
  • Allow
    Refused
search engine
  • Google
    Default
    Allow
    Refused
  • Bing
    Default
    Allow
    Refused
  • Yahoo
    Default
    Allow
    Refused
  • Ask/Teoma
    Default
    Allow
    Refused
  • Alexa/Wayback
    Default
    Allow
    Refused
  • Cuil
    Default
    Allow
    Refused
  • MSN Search
    Default
    Allow
    Refused
  • Scrub The Web
    Default
    Allow
    Refused
  • DMOZ
    Default
    Allow
    Refused
  • GigaBlast
    Default
    Allow
    Refused
Special search engine(robot)
  • Google Image
    Default
    Allow
    Refused
  • Google Mobile
    Default
    Allow
    Refused
  • Yahoo MM
    Default
    Allow
    Refused
  • Yahoo Blogs
    Default
    Allow
    Refused
  • MSN PicSearch
    Default
    Allow
    Refused

Please copy and save the results below as robots.txt

About Robots TXT Generator

Robots.txt is a plain text file stored in the root of the site. Although its setup is simple, it works very well. It can specify that the search engine spider only crawls the specified content, or it can prevent the search engine spider from crawling some or all of the content of the website.

The Robots.txt file should be placed in the root of the website and accessible from the Internet. For example, if your website address is https://www.yourdomain.com/ then the file must be able to open and see the content inside https://www.yourdomain.com/robots.txt.

User-agent:

Used to describe the name of a search engine spider. In the "Robots.txt" file, if there are multiple User-agent records indicating that multiple search engine spiders are subject to the protocol, there must be at least one for the file. User-agent record. If the value of this item is set to *, the protocol is valid for any search engine spider. In the "Robots.txt" file, there can only be one record for "User-agent:*".


Disallow:

Used to describe a URL that you don't want to be accessed. This URL can be a complete path or part of it. Any URL that starts with Disallow will not be accessed by Robot.


Example:

Example 1: "Disallow:/help" means that /help.html and /help/index.html do not allow search engine spiders to crawl.

Example 2: "Disallow:/help/" means that search engine spiders are allowed to fetch /help.html instead of /help/index.html.

Example 3: The Disallow record is empty, indicating that all pages of the website are allowed to be crawled by the search engine. In the "/robots.txt" file, at least one Disallow record is required. If "/robots.txt" is an empty file, the site is open for all search engine spiders to be crawled.

#:Robots.txt The comment character in the protocol.


Comprehensive example:

Example 1: Use "/robots.txt" to prevent all search engine spiders from crawling the "/bin/cgi/" directory, as well as the "/tmp/" directory and the /foo.html file. The settings are as follows:

User-agent: *
Disallow: /bin/cgi/
Disallow: /tmp/
Disallow: /foo.html

Example 2: Only one search engine is allowed to crawl through "/robots.txt", and other search engines are prohibited from crawling. For example, only search engine spiders named "slurp" are allowed to crawl, and other search engine spiders are refused to crawl the contents of the "/cgi/" directory. The setting method is as follows:

User-agent: *
Disallow: /cgi/
User-agent: slurp
Disallow:

Example 3: Any search engine is prohibited from crawling my website. The setting method is as follows:

User-agent: *
Disallow: /

Example 4: Only one search engine is forbidden to crawl my website. For example, only the search engine spider named “slurp” is prohibited from crawling. The setting method is as follows:

User-agent: slurp
Disallow: /


Copyright © 2024 CoolGenerator.com All rights reserved.

Top