Get Paid to Participate - up to $1 per post!     Twitter     Facebook     Google+
Hosting Discussion
 

Hosting Discussion > Web Hosting Forums > Hardware and Server Configuration > Blocking back link website crawlers / Methods
forgot password?



Reply


Old
  Post #1 (permalink)   09-24-2017, 06:48 AM
HD Newbie
 
Join Date: Aug 2017
Posts: 30

Status: PeterShene is offline
Hi guys i would like to thank you for your massive discussion with relevance to cloud fare and other cdns. However i have another issue i would like some advice on and quite desperately. My bandwidth is getting sapped by spiders , mostly back link profiles i presume as i recently entered the top three for a few of my queries , this is new ground for me .

I have done alot of research on it and seen people discuss the pros and cons of actually trying to implement something like this.

However i know you cant stop them all but i would least spend 30 mins to throw a few rocks in the road then let this guy have a clean run in what took me months to accomplish.

So ive done a few things used the robot txt file to dis allow bots as a first step this did not work.

I accessed the Apache server in questions ht.acess file and tried to block via rewrite conditions using user agent string to try redirect the bots.

obviously they dont identify themselves in the correct manner because they still get through .

I tried ip ranges but i dont want to do this as here i dont know what else im blocking out.

i have heard there is a php alternative to do this but foud nothing online.
One of my competitors though has manged to do this i cannot acces ny of his back links i tried just to see if he managed to block them and he did.

Am i missing something here? is there another way ?
__________________
Web design east london
 
 


Old
  Post #2 (permalink)   12-11-2017, 01:49 PM
HD Newbie
 
Join Date: Dec 2017
Posts: 12

Status: Evolution Host is offline
Hi Peter,

If you could post some examples of HTTP requests from the bots from your Apache access logs this would allow us to see the problem in more detail.

If you could also let us know what sort of environment the Apache server is hosted in (shared hosting, VPS, dedicated server etc) and what level of access you have to the machine (SSH, remote desktop) that would also help us to suggest possible solutions.
__________________
Evolution Host - Hosting for KVM VPS, IRCds, mIRC Bots and Game Servers.
Premium hosting at affordable prices.
 
 
 


Old
  Post #3 (permalink)   12-14-2017, 04:34 AM
HD Wizard
 
easyhostmedia's Avatar
 
Join Date: Mar 2011
Location: Northumberland, UK
Posts: 5,276
Send a message via MSN to easyhostmedia

Status: easyhostmedia is offline
Quote:
Originally Posted by PeterShene View Post
Hi guys i would like to thank you for your massive discussion with relevance to cloud fare and other cdns. However i have another issue i would like some advice on and quite desperately. My bandwidth is getting sapped by spiders , mostly back link profiles i presume as i recently entered the top three for a few of my queries , this is new ground for me .

I have done alot of research on it and seen people discuss the pros and cons of actually trying to implement something like this.

However i know you cant stop them all but i would least spend 30 mins to throw a few rocks in the road then let this guy have a clean run in what took me months to accomplish.

So ive done a few things used the robot txt file to dis allow bots as a first step this did not work.

I accessed the Apache server in questions ht.acess file and tried to block via rewrite conditions using user agent string to try redirect the bots.

obviously they dont identify themselves in the correct manner because they still get through .

I tried ip ranges but i dont want to do this as here i dont know what else im blocking out.

i have heard there is a php alternative to do this but foud nothing online.
One of my competitors though has manged to do this i cannot acces ny of his back links i tried just to see if he managed to block them and he did.

Am i missing something here? is there another way ?
This gives a good way to block these

https://stackoverflow.com/questions/...-with-htaccess
__________________
Terry Robertson - CEO The Easyhost Media Group
Niceday Hosting - Affordable Hosting
PowerSSL - - We Secure your World
The Scamlist Forum - Fighting against scammers
 
 
 


Old
  Post #4 (permalink)   12-17-2017, 01:54 AM
HD Master
 
Join Date: Sep 2014
Location: India
Posts: 343
Send a message via Skype™ to 24x7server

Status: 24x7server is offline
You can block access to bots by simply configuring the robots.txt file.

However, you can stop spam bots using .htaccess file too, check this link.
__________________
www.24x7servermanagement.com
Server Management, Server Security, Server Monitoring.
Network Monitoring Team !! Skype: techs24x7
 
 
The Following User Says Thank You to 24x7server For This Useful Post:
HostCheetah (01-02-2018)


Old
  Post #5 (permalink)   01-02-2018, 10:02 AM
HD Newbie
 
Join Date: Aug 2017
Posts: 30

Status: PeterShene is offline
Ahh i found a massive loophole, robot.txt request a bot to not access a certain page. Most good bots follow the rules set in rbt.txt.

What is written in rbt text however is not mandatory for a bot and thus most bad bots will ignore the request , and this is where you catch em with php, something called honey pot.

This is what all those worpress plugins are based on , as soon a bot requests or acts in a fashion not set out to your rules of your website , its identified as a bad bot and subsequently all its data is recorded and it is denied any request.

24x7server your link contains an MASSIVE list thanks so much

I do know quite a bit about this already but still learned a few things here just by skimming , ive bookmarked it seems like a very good article nice find and thankyou once again..
__________________
Web design east london

Last edited by PeterShene : 01-02-2018 at 10:05 AM.
 
 


Old
  Post #6 (permalink)   01-02-2018, 09:34 PM
HD Master
 
Join Date: Sep 2014
Location: India
Posts: 343
Send a message via Skype™ to 24x7server

Status: 24x7server is offline
No problems at all Peter. Happy to help and have a great year ahead.
__________________
www.24x7servermanagement.com
Server Management, Server Security, Server Monitoring.
Network Monitoring Team !! Skype: techs24x7
 
 
 


Old
  Post #7 (permalink)   02-01-2018, 12:06 PM
HD Amateur
 
Join Date: Jan 2018
Location: India
Posts: 96
Send a message via Skype™ to webconfigure

Status: webconfigure is offline
You can define the disallow rules in the robot.txt file which will help to stop the spam backlinks. You can define the ruleset in your .htaccess file too.

Robot.txt is widely used.
 
 
 
Reply

Thread Tools

New Post New Post   Old Post Old Post
Posting Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Sponsored By: