Store Front Contact Us Projects Blog PHP Tutorials Community Forums Support Downloads Client Area Homepage Charlie Page - Home to the blog of myself! Dynasty Wizard - Free game dynasties, period. php cms reviews - find the right cms for you. 24 Quotes - Jack Bauer, Tony Almeida - Season 6 Adapt Software Political Yard - Talk about politics!
Insane Visions
Insane Visions
Insane Visions - Navigation

Favorites

Get Firefox!

PHP

MySQL

Affiliates


Gamerworld
Realm of Gaming
Game Screenie
Video Game Cheats

AllConsoleGamers
TalkPHP
XBOX 360 News
Hilarious Videos
Free Myspace Stuff

[ Apply for Affiliation ]



Scripts.com

MaxTutorial.com - Best photoshop, flash and php tutorials


Your Link Here
Your Link Here
Your Link Here

Acceptance Mark
Insane Visions - Top Navigation


Insane Visions - Bot Detection with PHP Tutorial :: Premium PHP Scripts - AdaptCMS, AdaptBB, OneCMS Bot Detection with PHP at Jun 07, 08 - 7:46 pm
News Div
Post to Digg Post to Facebook Post to Furl Post to Netscape Post to Newsvine Post to Reddit Post to Simpy Post to Spurl Post to StumbleUpon

Views: 9,770
Type: PHP
Experience Level: Medium

In the ongoing work of my PHP CMS, AdaptCMS, one thing I have basically worked on every version is stats. Every time someone loads a page, info is stored (date, ip, browser, os, username, referral, etc.). I either find a new way to make the stats more accessible or a new bit of important info. Recently I noticed some websites I was visiting were displaying bots currently at the site ("Googlebot", "MSN Bot", etc.) and found out something new and important to keep track of.

Bots or Crawlers, are basically search engines crawling around the internet. This is how you get your pages on search engines (well, a major way). Bot Detection isn't something super vital, but if your CMS or website already has practically everything, then you need Bot Detection.

The Function

$bot_list = array("Teoma", "alexa", "froogle", "Gigabot", "inktomi",
"looksmart", "URL_Spider_SQL", "Firefly", "NationalDirectory",
"Ask Jeeves", "TECNOSEEK", "InfoSeek", "WebFindBot", "girafabot",
"crawler", "www.galaxy.com", "Googlebot", "Scooter", "Slurp",
"msnbot", "appie", "FAST", "WebBug", "Spade", "ZyBorg", "rabaz",
"Baiduspider", "Feedfetcher-Google", "TechnoratiSnoop", "Rankivabot",
"Mediapartners-Google", "Sogou web spider", "WebAlta Crawler");

function
detect_bot() {
global
$bot_list;

foreach(
$botlist as $bot) {
if(
ereg($bot, $_SERVER['HTTP_USER_AGENT'])) {
$thebot = $bot;
}
}

if (
$bot) {
return
$thebot;
}
}
?>

Really Bot Detection isn't as difficult as you might guess. The SERVER variable $_SERVER['HTTP_USER_AGENT'] has in it a lot of cool info on the visitor such as the Browser they are using and if it's a bot/crawler, the name of the Bot.

First let me say that this is a portion of the function that will be in AdaptCMS and if you want to put the bot list inside the function, feel free. Secondly, let me explain what's going on in the function.

Breaking Down the Code

foreach($botlist as $bot) {

I don't think I need to explain a basic array or function tag, so let's begin with the foreach and ereg. First off, with the foreach, we are simply taking the $bot_list array (which contains a list of bots/crawlers) and going through them one by one, except the foreach also changes the name to $bot as you will find in the next bit of code.

if(ereg($bot, $_SERVER['HTTP_USER_AGENT'])) {

With the ereg PHP function, you basically are searching a variable/array. In this case we want to find if the SERVER variable $_SERVER['HTTP_USER_AGENT'] contains any mention of a bot. With the $bot_list variable we have a list we want to go through, foreach then goes through the array one by one and now with the ereg we are looking to see if we can match a bot. If so, then we set the variable $thebot to contain the name of the bot, which in the end, we return. Here is an example to use it on your website:

include ("bot_detection.php");

if (
bot_detection()) {
echo
"Hey, you're a bot! What's up ".bot_detection()."?";
}
?>

Conclusion

Most of the tutorials I've written so far have mainly been on a broad subject, nice to talk about something not ordinary and especially something that should be standard in CMS's. With this little function you can use on your website or CMS and enables you to keep track of the Bot's/Crawlers on your website. Happy Coding!

Download:

Rating:

Vote for Article:



Guest, Jun 11, 08 - 4:48 pm
Thanks for the code sample and list of bots. I may use it in one of my applications.



Anyway, there's a slight code optimization:



rather than:

if(ereg($bot, $_SERVER['HTTP_USER_AGENT'])) {

$thebot = $bot;

}

...

if ($bot) {

return $thebot;

}



Just do:

if(ereg($bot, $_SERVER['HTTP_USER_AGENT'])) {

return $bot;

}





This way it will stop looping as soon as it determines that the client is a bot, rather than looping through the whole list regardless.


admin, Jun 11, 08 - 6:22 pm
Ah, thanks for that! I wasn't thinking straight, I'll edit that in.


pligg.com, Jun 12, 08 - 10:11 am

Bot Detection with PHP - Trackback

"In this tutorial I will be showing step-by-step how to detect bots/crawlers with PHP. Bots or Crawlers, are basically search engines crawling around the internet. This is how you get your pages on search engines (well, a major way). Bot Detection isn't something super vital, but if your CMS or website already has practically everything, then you need Bot Detection."



Guest, Jul 24, 08 - 2:54 am
Is it really necessary to use the ereg() function? Wouldn't it be enough to compare both values like that:

if($bot == $_SERVER['HTTP_USER_AGENT'])



In my opinion there is also a shorter soltution that should do the same and is only one line of code:

return in_array($_SERVER['HTTP_USER_AGENT'], $botlist);


admin, Jul 24, 08 - 1:49 pm
Yeah as I read your comment I did see that using ereg() was kind of pointless.



Interesting about in_array(), I think that would be a better replacement.



Thanks!


Guest, Aug 13, 08 - 5:48 am
I don't think a simple in_array() will do since it compares string to string. $_SERVER['HTTP_USER_AGENT'] does NOT return the exact string "Googlebot" rather than returning something like "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" which will not match "Googlebot". Therefore an ereg function is definitely mandatory in order to return correct results!


Guest, Sep 18, 08 - 2:53 am
IMO stristr is better

if( stristr($_SERVER['HTTP_USER_AGENT'],$bot) ) {
return true;
}
return false;


Guest, Nov 07, 08 - 1:07 pm
Be extremely careful if you use this to grand SE's access to restricted parts of your website. Lots of sites allow bots to crawl members only areas of their sites using similar code. It's really easy to change the user-agent in a web browser, so you should never use it as any type of authentication.


 email

 website







Username:
Password:
Insane Visions - Login Register

AdaptCMS

AdaptBB

OneCMS

Latest Tutorials

- Basic PHP Security
- Bot Detection with PHP
- PHP and Forms


Latest Blogs

- AdaptCMS 1.4 - September 17th
- AdaptBB - 1.0 Features
- Starting on AdaptBB


Poll of the Month

Do you currently or are you interested in using AdaptCMS or AdaptBB?

Other
Neither, I am unimpressed by both scripts
Neither, I am using a good CMS and BB system already
Definitely AdaptBB, not AdaptCMS
AdaptCMS yes, not so much AdaptBB
Both! Without a doubt.
Results



Latest Posts

- No Subject
- No Subject
- No Subject


Testimonials

I have dealt with many CMSes before. I have had my fair share of disappointments. At first, I had a rough time with OneCMS. Thanks to the awesome support with OneCMS, I am now gliding along easily. Even making modifications! I have never dealt with this level of support, especially on a free program, before. I think that Chuck shows great devotion, and props on the CMS.

|||||4 - Scott Berlin, Founder/Webmaster, Wired-Gamers.com


Powered by AdaptCMS Pro
Insane Visions - Footer
Cheap Electricity - Loans - Credit Cards - Loans