Get Paid to Participate - up to $1 per post!     Twitter     Facebook     Google+
Hosting Discussion

forgot password?


  Post #1 (permalink)   08-15-2008, 08:50 PM
HD Guru
Join Date: Jan 2008
Posts: 536

Status: AbbieRose is offline
Someone told me that it is possible to stop spiders crawling certain pages that you don't want them to. I can't remember the name of the file to look up how to create such a file-can someone assist?

  Post #2 (permalink)   08-15-2008, 10:02 PM
HD Guru
Join Date: Dec 2003
Posts: 570

Status: Lesli is offline
You may be talking about the .htaccess file. I just did a quick google search on the string

block crawlers htaccess

and the first listed result was titled "How to block user agents"
Lesli Schauf, TLM Network
Linux and Windows Shared Hosting since 2002: Scribehost

  Post #3 (permalink)   08-15-2008, 10:38 PM
HD Addict
purple's Avatar
Join Date: Jan 2008
Location: Maine, USA
Posts: 226

Status: purple is offline
I know to block Google from crawling your page you use a robots.txt file in your index and a metatag to prevent it from being indexed through indirect links. You can go here for more information from Google's webmaster help:

  Post #4 (permalink)   08-16-2008, 12:30 AM
HD Amateur
diligent's Avatar
Join Date: Feb 2008
Posts: 52

Status: diligent is offline
Most search engines should follow the robots.txt rules also
__________________, Canadian hosting based on high-availability w/ 24/7 support
cPanel/WHM 11 + Fantastico, PHP4&5, Private nameservers, Unlimited domains, Self Serve DNS...Shared / Reseller / SHOUTcast / Other / 30-day money back | - Come join our new webmaster forum!

  Post #5 (permalink)   08-16-2008, 10:04 AM
HD Newbie
Join Date: Mar 2008
Location: UK
Posts: 13

Status: blueroomhosting is offline
You can use robots.txt and this will tell spiders what they should and shouldn't crawl. However, it doesn't actually forbid access. Search engine spiders like googlebot will obey those rules, however less scrupulous crawlers like spambots may not bother. If you want extra protection you can use htaccess rules which will block these user agents at the webserver level using .htaccess.
Blue Room Hosting - HA clustered UK VPS
Linux KVM Plans - Multiple OS support. Virtual console and CD drive.

  Post #6 (permalink)   08-16-2008, 03:49 PM
HD Guru
Join Date: Jan 2008
Posts: 536

Status: AbbieRose is offline
Thank you. My main concern is simply that I have duplicate pages, that are my work in progress pages that I leave on the server but not linked to anything. I don't want them being found, and potentially having the search engines discount the real pages because they find the duplicates that are not ready and not linked.

I'll look into both methods, thanks.

  Post #7 (permalink)   08-16-2008, 08:41 PM
HD Newbie
Join Date: Jul 2008
Location: Australia
Posts: 6

Status: Yawp is offline
Hi AbbieRose,
You can simply use meta tags to tell robots not to index the contents or follow the links to a page.
Add the following meta tag in the <head> section of your page.


This wont stop mailware robots or email address harvesters though.

All the best.
Yawp domains USA - UK - AUST
|| 24/7 Support - Unmatched Reliability - Redundancy
|| Web/Email Hosting - Managed DNS - Domain Names
|| Business Solutions - Web Design Services - 24/7 Monitoring

Thread Tools

New Post New Post   Old Post Old Post
Posting Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Sponsored By: