I know to block Google from crawling your page you use a robots.txt file in your index and a metatag to prevent it from being indexed through indirect links. You can go here for more information from Google's webmaster help: http://www.google.com/support/webmas...y?answer=35302
You can use robots.txt and this will tell spiders what they should and shouldn't crawl. However, it doesn't actually forbid access. Search engine spiders like googlebot will obey those rules, however less scrupulous crawlers like spambots may not bother. If you want extra protection you can use htaccess rules which will block these user agents at the webserver level using .htaccess.
__________________ Blue Room Hosting - HA clustered UK VPS Linux KVM Plans - Multiple OS support. Virtual console and CD drive.
Thank you. My main concern is simply that I have duplicate pages, that are my work in progress pages that I leave on the server but not linked to anything. I don't want them being found, and potentially having the search engines discount the real pages because they find the duplicates that are not ready and not linked.
You can simply use meta tags to tell robots not to index the contents or follow the links to a page.
Add the following meta tag in the <head> section of your page.
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
This wont stop mailware robots or email address harvesters though.
All the best.
__________________ Yawp domains USA - UK - AUST
|| 24/7 Support - Unmatched Reliability - Redundancy
|| Web/Email Hosting - Managed DNS - Domain Names
|| Business Solutions - Web Design Services - 24/7 Monitoring