Store Front Contact Us Projects Blog PHP Tutorials Community Forums Support Downloads Client Area Homepage Charlie Page - Home to the blog of myself! Dynasty Wizard - Free game dynasties, period. php cms reviews - find the right cms for you. 24 Quotes - Jack Bauer, Tony Almeida - Season 6 Adapt Software Political Yard - Talk about politics!
Insane Visions
Insane Visions
Insane Visions - Navigation

Favorites

Get Firefox!

PHP

MySQL

Affiliates


Gamerworld
Realm of Gaming
Game Screenie
Video Game Cheats

AllConsoleGamers
TalkPHP
XBOX 360 News
Hilarious Videos
Free Myspace Stuff

[ Apply for Affiliation ]



Scripts.com

MaxTutorial.com - Best photoshop, flash and php tutorials


Your Link Here
Your Link Here
Your Link Here

Acceptance Mark
Insane Visions - Top Navigation


Insane Visions - Bot Detection with PHP Tutorial :: Premium PHP Scripts - AdaptCMS, AdaptBB, OneCMS Bot Detection with PHP at Jun 07, 08 - 7:46 pm
News Div
Post to Digg Post to Facebook Post to Furl Post to Netscape Post to Newsvine Post to Reddit Post to Simpy Post to Spurl Post to StumbleUpon

Views: 4,815
Type: PHP
Experience Level: Medium

In the ongoing work of my PHP CMS, AdaptCMS, one thing I have basically worked on every version is stats. Every time someone loads a page, info is stored (date, ip, browser, os, username, referral, etc.). I either find a new way to make the stats more accessible or a new bit of important info. Recently I noticed some websites I was visiting were displaying bots currently at the site ("Googlebot", "MSN Bot", etc.) and found out something new and important to keep track of.

Bots or Crawlers, are basically search engines crawling around the internet. This is how you get your pages on search engines (well, a major way). Bot Detection isn't something super vital, but if your CMS or website already has practically everything, then you need Bot Detection.

The Function

$bot_list = array("Teoma", "alexa", "froogle", "Gigabot", "inktomi",
"looksmart", "URL_Spider_SQL", "Firefly", "NationalDirectory",
"Ask Jeeves", "TECNOSEEK", "InfoSeek", "WebFindBot", "girafabot",
"crawler", "www.galaxy.com", "Googlebot", "Scooter", "Slurp",
"msnbot", "appie", "FAST", "WebBug", "Spade", "ZyBorg", "rabaz",
"Baiduspider", "Feedfetcher-Google", "TechnoratiSnoop", "Rankivabot",
"Mediapartners-Google", "Sogou web spider", "WebAlta Crawler");

function
detect_bot() {
global
$bot_list;

foreach(
$botlist as $bot) {
if(
ereg($bot, $_SERVER['HTTP_USER_AGENT'])) {
$thebot = $bot;
}
}

if (
$bot) {
return
$thebot;
}
}
?>

Really Bot Detection isn't as difficult as you might guess. The SERVER variable $_SERVER['HTTP_USER_AGENT'] has in it a lot of cool info on the visitor such as the Browser they are using and if it's a bot/crawler, the name of the Bot.

First let me say that this is a portion of the function that will be in AdaptCMS and if you want to put the bot list inside the function, feel free. Secondly, let me explain what's going on in the function.

Breaking Down the Code

foreach($botlist as $bot) {

I don't think I need to explain a basic array or function tag, so let's begin with the foreach and ereg. First off, with the foreach, we are simply taking the $bot_list array (which contains a list of bots/crawlers) and going through them one by one, except the foreach also changes the name to $bot as you will find in the next bit of code.

if(ereg($bot, $_SERVER['HTTP_USER_AGENT'])) {

With the ereg PHP function, you basically are searching a variable/array. In this case we want to find if the SERVER variable $_SERVER['HTTP_USER_AGENT'] contains any mention of a bot. With the $bot_list variable we have a list we want to go through, foreach then goes through the array one by one and now with the ereg we are looking to see if we can match a bot. If so, then we set the variable $thebot to contain the name of the bot, which in the end, we return. Here is an example to use it on your website:

include ("bot_detection.php");

if (
bot_detection()) {
echo
"Hey, you're a bot! What's up ".bot_detection()."?";
}
?>

Conclusion

Most of the tutorials I've written so far have mainly been on a broad subject, nice to talk about something not ordinary and especially something that should be standard in CMS's. With this little function you can use on your website or CMS and enables you to keep track of the Bot's/Crawlers on your website. Happy Coding!

Download:

Rating:

Vote for Article:



Guest, Jun 11, 08 - 4:48 pm
Thanks for the code sample and list of bots. I may use it in one of my applications.

Anyway, there's a slight code optimization:

rather than:
if(ereg($bot, $_SERVER['HTTP_USER_AGENT'])) {
$thebot = $bot;
}
...
if ($bot) {
return $thebot;
}

Just do:
if(ereg($bot, $_SERVER['HTTP_USER_AGENT'])) {
return $bot;
}


This way it will stop looping as soon as it determines that the client is a bot, rather than looping through the whole list regardless.


admin, Jun 11, 08 - 6:22 pm
Ah, thanks for that! I wasn't thinking straight, I'll edit that in.


pligg.com, Jun 12, 08 - 10:11 am

Bot Detection with PHP - Trackback

"In this tutorial I will be showing step-by-step how to detect bots/crawlers with PHP. Bots or Crawlers, are basically search engines crawling around the internet. This is how you get your pages on search engines (well, a major way). Bot Detection isn't something super vital, but if your CMS or website already has practically everything, then you need Bot Detection."



Guest, Jul 24, 08 - 2:54 am
Is it really necessary to use the ereg() function? Wouldn't it be enough to compare both values like that:
if($bot == $_SERVER['HTTP_USER_AGENT'])

In my opinion there is also a shorter soltution that should do the same and is only one line of code:
return in_array($_SERVER['HTTP_USER_AGENT'], $botlist);


admin, Jul 24, 08 - 1:49 pm
Yeah as I read your comment I did see that using ereg() was kind of pointless.

Interesting about in_array(), I think that would be a better replacement.

Thanks!


Guest, Aug 13, 08 - 5:48 am
I don't think a simple in_array() will do since it compares string to string. $_SERVER['HTTP_USER_AGENT'] does NOT return the exact string "Googlebot" rather than returning something like "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" which will not match "Googlebot". Therefore an ereg function is definitely mandatory in order to return correct results!


 email

 website







Username:
Password:
Insane Visions - Login Register

AdaptCMS

AdaptBB

OneCMS

Latest Tutorials

- Basic PHP Security
- Bot Detection with PHP
- PHP and Forms


Latest Blogs

- AdaptBB - 1.0 Features
- Starting on AdaptBB
- AdaptCMS 1.3 - June 13th


Poll of the Month

What type of software do you normally use?

Commercial
Open Source
Neither, I write my own
Results



Latest Posts

- AdaptCMS 1.4 -
Suggestions?
- Re: OneCMS v2.6 -
Ideas
- Re: Category
Templates?


Testimonials

I tried various other Content Management Systems before settling on OneCMS. None came close to the ease of use and the amazing customer service! This was my first time dealing with a CMS so naturally I had many, many questions. Amazingly they were all answered the same day, most of the time my problems were fixed in minutes! OneCMS has many great features and many more planned for future releases and I personally can't wait! I'm glad I went with OneCMS for my site VideoGamesHardcore.com!

- Hector Cortez, Founder/Editor-In-Chief, VideoGamesHardcore.com


Powered by AdaptCMS Pro
Insane Visions - Footer
Vegas Hotel - Loans - United Specialties - Wills