Get Paid to Participate - up to $1 per post!     Twitter     Facebook     Google+
Hosting Discussion
 

forgot password?



Reply


Old
  Post #1 (permalink)   08-15-2008, 08:50 PM
HD Guru
 
Join Date: Jan 2008
Posts: 536

Status: AbbieRose is offline
Someone told me that it is possible to stop spiders crawling certain pages that you don't want them to. I can't remember the name of the file to look up how to create such a file-can someone assist?
 
 
 


Old
  Post #2 (permalink)   08-15-2008, 10:02 PM
HD Guru
 
Join Date: Dec 2003
Posts: 570

Status: Lesli is offline
You may be talking about the .htaccess file. I just did a quick google search on the string

block crawlers htaccess

and the first listed result was titled "How to block user agents"
__________________
Lesli Schauf, TLM Network
Linux and Windows Shared Hosting since 2002: Scribehost
 
 
 


Old
  Post #3 (permalink)   08-15-2008, 10:38 PM
HD Addict
 
purple's Avatar
 
Join Date: Jan 2008
Location: Maine, USA
Posts: 226

Status: purple is offline
I know to block Google from crawling your page you use a robots.txt file in your index and a metatag to prevent it from being indexed through indirect links. You can go here for more information from Google's webmaster help:
http://www.google.com/support/webmas...y?answer=35302
 
 
 


Old
  Post #4 (permalink)   08-16-2008, 12:30 AM
HD Amateur
 
diligent's Avatar
 
Join Date: Feb 2008
Posts: 52

Status: diligent is offline
Most search engines should follow the robots.txt rules also
__________________
DILIGENThost.com, Canadian hosting based on high-availability w/ 24/7 support
cPanel/WHM 11 + Fantastico, PHP4&5, Private nameservers, Unlimited domains, Self Serve DNS...Shared / Reseller / SHOUTcast / Other / 30-day money back | Webmasters-Inside.com - Come join our new webmaster forum!
 
 
 


Old
  Post #5 (permalink)   08-16-2008, 10:04 AM
HD Newbie
 
Join Date: Mar 2008
Location: UK
Posts: 13

Status: blueroomhosting is offline
You can use robots.txt and this will tell spiders what they should and shouldn't crawl. However, it doesn't actually forbid access. Search engine spiders like googlebot will obey those rules, however less scrupulous crawlers like spambots may not bother. If you want extra protection you can use htaccess rules which will block these user agents at the webserver level using .htaccess.
__________________
Blue Room Hosting - HA clustered UK VPS
Linux KVM Plans - Multiple OS support. Virtual console and CD drive.
 
 
 


Old
  Post #6 (permalink)   08-16-2008, 03:49 PM
HD Guru
 
Join Date: Jan 2008
Posts: 536

Status: AbbieRose is offline
Thank you. My main concern is simply that I have duplicate pages, that are my work in progress pages that I leave on the server but not linked to anything. I don't want them being found, and potentially having the search engines discount the real pages because they find the duplicates that are not ready and not linked.

I'll look into both methods, thanks.
 
 
 


Old
  Post #7 (permalink)   08-16-2008, 08:41 PM
HD Newbie
 
Join Date: Jul 2008
Location: Australia
Posts: 6

Status: Yawp is offline
Hi AbbieRose,
You can simply use meta tags to tell robots not to index the contents or follow the links to a page.
Add the following meta tag in the <head> section of your page.

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

This wont stop mailware robots or email address harvesters though.

All the best.
__________________
Yawp domains USA - UK - AUST
|| 24/7 Support - Unmatched Reliability - Redundancy
|| Web/Email Hosting - Managed DNS - Domain Names
|| Business Solutions - Web Design Services - 24/7 Monitoring
 
 
 
Reply

Thread Tools

New Post New Post   Old Post Old Post
Posting Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Sponsored By: