Hosting Discussion

Hosting Discussion (http://www.hostingdiscussion.com/)
-   Hardware and Server Configuration (http://www.hostingdiscussion.com/hardware-server-configuration/)
-   -   Blocking back link website crawlers / Methods (http://www.hostingdiscussion.com/hardware-server-configuration/64886-blocking-back-link-website-crawlers-methods.html)


PeterShene 09-24-2017 06:48 AM

Blocking back link website crawlers / Methods
 
Hi guys i would like to thank you for your massive discussion with relevance to cloud fare and other cdns. However i have another issue i would like some advice on and quite desperately. My bandwidth is getting sapped by spiders , mostly back link profiles i presume as i recently entered the top three for a few of my queries , this is new ground for me .

I have done alot of research on it and seen people discuss the pros and cons of actually trying to implement something like this.

However i know you cant stop them all but i would least spend 30 mins to throw a few rocks in the road then let this guy have a clean run in what took me months to accomplish.

So ive done a few things used the robot txt file to dis allow bots as a first step this did not work.

I accessed the Apache server in questions ht.acess file and tried to block via rewrite conditions using user agent string to try redirect the bots.

obviously they dont identify themselves in the correct manner because they still get through .

I tried ip ranges but i dont want to do this as here i dont know what else im blocking out.

i have heard there is a php alternative to do this but foud nothing online.
One of my competitors though has manged to do this i cannot acces ny of his back links i tried just to see if he managed to block them and he did.

Am i missing something here? is there another way ?

Evolution Host 12-11-2017 01:49 PM

Hi Peter,

If you could post some examples of HTTP requests from the bots from your Apache access logs this would allow us to see the problem in more detail.

If you could also let us know what sort of environment the Apache server is hosted in (shared hosting, VPS, dedicated server etc) and what level of access you have to the machine (SSH, remote desktop) that would also help us to suggest possible solutions.

easyhostmedia 12-14-2017 04:34 AM

Quote:

Originally Posted by PeterShene (Post 224796)
Hi guys i would like to thank you for your massive discussion with relevance to cloud fare and other cdns. However i have another issue i would like some advice on and quite desperately. My bandwidth is getting sapped by spiders , mostly back link profiles i presume as i recently entered the top three for a few of my queries , this is new ground for me .

I have done alot of research on it and seen people discuss the pros and cons of actually trying to implement something like this.

However i know you cant stop them all but i would least spend 30 mins to throw a few rocks in the road then let this guy have a clean run in what took me months to accomplish.

So ive done a few things used the robot txt file to dis allow bots as a first step this did not work.

I accessed the Apache server in questions ht.acess file and tried to block via rewrite conditions using user agent string to try redirect the bots.

obviously they dont identify themselves in the correct manner because they still get through .

I tried ip ranges but i dont want to do this as here i dont know what else im blocking out.

i have heard there is a php alternative to do this but foud nothing online.
One of my competitors though has manged to do this i cannot acces ny of his back links i tried just to see if he managed to block them and he did.

Am i missing something here? is there another way ?

This gives a good way to block these

https://stackoverflow.com/questions/...-with-htaccess

24x7server 12-17-2017 01:54 AM

You can block access to bots by simply configuring the robots.txt file.

However, you can stop spam bots using .htaccess file too, check this link.

PeterShene 01-02-2018 10:02 AM

Ahh i found a massive loophole, robot.txt request a bot to not access a certain page. Most good bots follow the rules set in rbt.txt.

What is written in rbt text however is not mandatory for a bot and thus most bad bots will ignore the request , and this is where you catch em with php, something called honey pot.

This is what all those worpress plugins are based on , as soon a bot requests or acts in a fashion not set out to your rules of your website , its identified as a bad bot and subsequently all its data is recorded and it is denied any request.

24x7server your link contains an MASSIVE list thanks so much :)

I do know quite a bit about this already but still learned a few things here just by skimming , ive bookmarked it seems like a very good article nice find and thankyou once again..

24x7server 01-02-2018 09:34 PM

No problems at all Peter. Happy to help and have a great year ahead. :)


All times are GMT -6. The time now is 12:12 AM.

Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2018, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.1.0