WeChall Training: WWW-Robots @ Shukularuni

Training: WWW-Robots

Name: Training: WWW-Robots
Tags: HTTP, Training
Score: 1
Description: In this little training challenge, you are going to learn about the Robots_exclusion_standard.
The robots.txt file is used by web crawlers to check if they are allowed to crawl and index your website or only parts of it.
Sometimes these files reveal the directory structure instead protecting the content from being crawled.

Enjoy!

Theory

To get the solution, as the description says, we have to go to the robots.txt file, and I was just going to explain it, but the description already did that for me so. The quick version, to control bot traffic to your website you create a robots.txt file that says what each thing can go to and do and see, for example, Google uses this to know where their search crawler bot can go to get, I don't know, description, the title of the page, and other metadata to show you when you search something in it. So I'm guessing this robots.txt file is in the root of the website, so let's go over there.

Solution

Let's go to the robots.txt file in the WeChall domain:

https://wechall.net/robots.txt

User-agent: *
Disallow: /challenge/training/www/robots/T0PS3CR3T
Disallow: /index.php?
Disallow: /users/with/
Disallow: /stats/
Disallow: /graph/
Disallow: /img/

Oh and look at that one that says "top secret", I'm sure it's there. And going over there just completes the level. So yeah, copy that link and put it in the WeChall domain, and should be good to go:

https://wechall.net/challenge/training/www/robots/T0PS3CR3T

WeChall: Your answer is correct!

There we go! That's the solution.

I voted this level as:

Diff: 02
Ed  : 10
Fun : 08

https://wechall.net/en/challenge/training/www/robots/index.php

Training: WWW-Robots

Theory

Solution

Explore My Website