Insane Visions - Bot Detection with PHP Tutorial :: Premium PHP Scripts - AdaptCMS, AdaptBB, OneCMS
Bot Detection with PHP at Jun 07, 08 - 8:46 pm

Views: 16,337 Type: PHP Experience Level: Medium
In the ongoing work of my PHP CMS, AdaptCMS, one thing I have
basically worked on every version is stats. Every time someone loads a
page, info is stored (date, ip, browser, os, username, referral, etc.).
I either find a new way to make the stats more accessible or a new bit
of important info. Recently I noticed some websites I was visiting were
displaying bots currently at the site ("Googlebot", "MSN Bot", etc.)
and found out something new and important to keep track of.
Bots or Crawlers, are basically search engines crawling around the
internet. This is how you get your pages on search engines (well, a
major way). Bot Detection isn't something super vital, but if your CMS
or website already has practically everything, then you need Bot
Detection.
The Function
$bot_list = array("Teoma", "alexa", "froogle", "Gigabot", "inktomi", "looksmart", "URL_Spider_SQL", "Firefly", "NationalDirectory", "Ask Jeeves", "TECNOSEEK", "InfoSeek", "WebFindBot", "girafabot", "crawler", "www.galaxy.com", "Googlebot", "Scooter", "Slurp", "msnbot", "appie", "FAST", "WebBug", "Spade", "ZyBorg", "rabaz", "Baiduspider", "Feedfetcher-Google", "TechnoratiSnoop", "Rankivabot", "Mediapartners-Google", "Sogou web spider", "WebAlta Crawler");
function detect_bot() {
global $bot_list;
foreach($botlist as $bot) {
if(ereg($bot, $_SERVER['HTTP_USER_AGENT'])) {
$thebot = $bot;
}
}
if ($bot) {
return $thebot;
}
}
?>
Really Bot Detection isn't as difficult as you might guess. The SERVER variable $_SERVER['HTTP_USER_AGENT'] has in it a lot of cool info on the visitor such as the Browser they are using and if it's a bot/crawler, the name of the Bot.
First let me say that this is a portion of the function that will be
in AdaptCMS and if you want to put the bot list inside the function,
feel free. Secondly, let me explain what's going on in the function.
Breaking Down the Code
foreach($botlist as $bot) {
I don't think I need to explain a basic array or function tag, so
let's begin with the foreach and ereg. First off, with the foreach, we
are simply taking the $bot_list array (which contains a list of
bots/crawlers) and going through them one by one, except the foreach
also changes the name to $bot as you will find in the next bit of code.
if(ereg($bot, $_SERVER['HTTP_USER_AGENT'])) {
With the ereg PHP function, you basically are searching a variable/array. In this case we want to find if the SERVER variable $_SERVER['HTTP_USER_AGENT'] contains any mention of a bot. With the $bot_list variable we have a
list we want to go through, foreach then goes through the array one by
one and now with the ereg we are looking to see if we can match a bot.
If so, then we set the variable $thebot to contain the name of the bot,
which in the end, we return. Here is an example to use it on your
website:
include ("bot_detection.php");
if (bot_detection()) {
echo "Hey, you're a bot! What's up ".bot_detection()."?";
}
?>
Conclusion
Most of the tutorials I've written so far have mainly been on a
broad subject, nice to talk about something not ordinary and especially
something that should be standard in CMS's. With this little function
you can use on your website or CMS and enables you to keep track of the
Bot's/Crawlers on your website. Happy Coding!
Download: 
Rating:    
| Guest, Jun 11, 08 - 5:48 pm | | Thanks for the code sample and list of bots. I may use it in one of my applications.
Anyway, there's a slight code optimization:
rather than:
if(ereg($bot, $_SERVER['HTTP_USER_AGENT'])) {
$thebot = $bot;
}
...
if ($bot) {
return $thebot;
}
Just do:
if(ereg($bot, $_SERVER['HTTP_USER_AGENT'])) {
return $bot;
}
This way it will stop looping as soon as it determines that the client is a bot, rather than looping through the whole list regardless. |
| admin, Jun 11, 08 - 7:22 pm | | Ah, thanks for that! I wasn't thinking straight, I'll edit that in. |
| pligg.com, Jun 12, 08 - 11:11 am | Bot Detection with PHP - Trackback "In this tutorial I will be showing step-by-step how to detect bots/crawlers with PHP. Bots or Crawlers, are basically search engines crawling around the internet. This is how you get your pages on search engines (well, a major way). Bot Detection isn't something super vital, but if your CMS or website already has practically everything, then you need Bot Detection." |
| Guest, Jul 24, 08 - 3:54 am | | Is it really necessary to use the ereg() function? Wouldn't it be enough to compare both values like that:
if($bot == $_SERVER['HTTP_USER_AGENT'])
In my opinion there is also a shorter soltution that should do the same and is only one line of code:
return in_array($_SERVER['HTTP_USER_AGENT'], $botlist); |
| admin, Jul 24, 08 - 2:49 pm | | Yeah as I read your comment I did see that using ereg() was kind of pointless.
Interesting about in_array(), I think that would be a better replacement.
Thanks! |
| Guest, Aug 13, 08 - 6:48 am | | I don't think a simple in_array() will do since it compares string to string. $_SERVER['HTTP_USER_AGENT'] does NOT return the exact string "Googlebot" rather than returning something like "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" which will not match "Googlebot". Therefore an ereg function is definitely mandatory in order to return correct results! |
| Guest, Sep 18, 08 - 3:53 am | | IMO stristr is better
if( stristr($_SERVER['HTTP_USER_AGENT'],$bot) ) { return true; } return false; |
| Guest, Nov 07, 08 - 2:07 pm | | Be extremely careful if you use this to grand SE's access to restricted parts of your website. Lots of sites allow bots to crawl members only areas of their sites using similar code. It's really easy to change the user-agent in a web browser, so you should never use it as any type of authentication. |
| Guest, Dec 30, 08 - 11:00 pm | | Error in foreach loop. The underscore is missing in the variable name. Other than that, great function. It works well! You did all of the busy work of compiling the array of bot names which is a HUGE help!
foreach($botlist as $bot) {
should be
foreach($bot_list as $bot) {
|
| kamagra, Jun 01, 09 - 2:50 am | kamagra - Trackback "rate components species developers next first controls trade" |
| propecia, Jun 01, 09 - 3:14 am | propecia - Trackback "rss clouds academies reports report fall" |
| levitra, Jun 01, 09 - 3:14 am | levitra - Trackback "retrieved early response expected range trend" |
| levitra 10, Jun 01, 09 - 7:26 am | levitra 10 - Trackback "special capita jaiku substantial public movit" |
| propecia, Jun 01, 09 - 7:27 am | propecia - Trackback "104 roughly possibly public" |
| viagra, Jun 01, 09 - 7:27 am | viagra - Trackback "benefits douglass cost relatively radiation added continue" |
| viagra, Jun 01, 09 - 11:17 am | viagra - Trackback "seen stabilization address instrumental trade stabilized cannot address" |
| achat kamagra, Jun 01, 09 - 12:43 pm | achat kamagra - Trackback "gun estimates radiative governments depletion current likely allows" |
| cialis 20, Jun 01, 09 - 12:43 pm | cialis 20 - Trackback "address dissolved retreat america app pnas generation approximately" |
| kamagra, Jun 01, 09 - 12:43 pm | kamagra - Trackback "understanding institute increase sres slow percent york" |
| propecia, Jun 01, 09 - 3:44 pm | propecia - Trackback "political 20th cannot 2008 seen" |
| tamiflu, Jun 01, 09 - 8:46 pm | tamiflu - Trackback "permafrost reduced led beginning alternative prepared" |
| propecia, Jun 02, 09 - 5:13 am | propecia - Trackback "part percent debate articles gases records" |
| kamagra, Jun 02, 09 - 8:27 am | kamagra - Trackback "emit reconstructions special away scenario" |
| levitra, Jun 02, 09 - 8:27 am | levitra - Trackback "pre average new confirmation end main observations climatic" |
| levitra, Jun 02, 09 - 11:46 am | levitra - Trackback "found orbital comments ongoing feedback" |
| buy vicodin, Jun 02, 09 - 1:07 pm | buy vicodin - Trackback "term issue videos link brightness external articles" |
| valium, Jun 02, 09 - 1:07 pm | valium - Trackback "disease output 104 developer expected" |
| tramadol, Jun 02, 09 - 1:07 pm | tramadol - Trackback "decade shut led atmosphere capacity decadal" |
| ionamin, Jun 02, 09 - 8:58 pm | ionamin - Trackback "broader community exert generation observed newsletter capacity seeding" |
| cialis, Jun 02, 09 - 10:29 pm | cialis - Trackback "countries world ars european anthropogenic" |
| butalbital, Jun 03, 09 - 6:22 am | butalbital - Trackback "radiative possible projected seeding majority radiative actual" |
| accutane, Jun 03, 09 - 6:22 am | accutane - Trackback "began stratosphere relates fourth apple" |
| viagra, Jun 04, 09 - 4:59 am | viagra - Trackback "recent microblogging amount part possible increased" |
| zithromax, Jun 04, 09 - 5:28 am | zithromax - Trackback "international cover few warms glacial" |
| cialis, Jun 04, 09 - 7:24 am | cialis - Trackback "review code frequency due estimates population" |
| percocet, Jun 04, 09 - 12:14 pm | percocet - Trackback "change source until cost primary volcanic tonne" |
| accutane, Jun 04, 09 - 12:53 pm | accutane - Trackback "review issue doi movit human stabilized" |
| cialis, Jun 04, 09 - 4:28 pm | cialis - Trackback "assumptions capita revolution international cooling emissions united" |
| propecia, Jun 04, 09 - 4:36 pm | propecia - Trackback "include exert fuel service figure cosmic partially" |
| clomid, Jun 04, 09 - 4:36 pm | clomid - Trackback "called amount according processes meteorological sea agreement program" |
| propecia, Jun 04, 09 - 9:11 pm | propecia - Trackback "sources app ars according societies" |
| buy valium, Jun 04, 09 - 10:03 pm | buy valium - Trackback "effect water concentrations vectors regional brightness" |
| propecia, Jun 04, 09 - 10:46 pm | propecia - Trackback "suggest economics work first influence extreme 2005" |
| viagra, Jun 04, 09 - 10:46 pm | viagra - Trackback "warmest didn 100 twentieth early climate region" |
| kamagra, Jun 05, 09 - 12:35 am | kamagra - Trackback "individual high species 2004 agriculture" |
| viagra, Jun 05, 09 - 12:39 am | viagra - Trackback "induce thus observed maximum pdf" |
| cheap cialis, Jun 05, 09 - 4:07 am | cheap cialis - Trackback "deep found fuel clathrate twentieth reconstructions amount keep" |
| viagra salud, Jun 05, 09 - 4:07 am | viagra salud - Trackback "1800s findings troposphere statement first revolution region" |
| acheter viagra, Jun 05, 09 - 4:23 am | acheter viagra - Trackback "variability international ozone available cover temperatures cupcake depletion" |
| valium, Jun 05, 09 - 4:23 am | valium - Trackback "long reports decade increases security york variation" |
| accutane, Jun 05, 09 - 6:02 am | accutane - Trackback "seen oceans without northern features ppm air" |
| cialis, Jun 05, 09 - 7:42 am | cialis - Trackback "height organizations live ago india alternative" |
| cialis, Jun 05, 09 - 7:42 am | cialis - Trackback "place 0 population proxy page news review" |
| hoodia, Jun 05, 09 - 8:48 am | hoodia - Trackback "capacity instead positive atmospheric special cosmic" |
| kamagra, Jun 05, 09 - 10:17 am | kamagra - Trackback "server web capita agriculture" |
| fioricet, Jun 05, 09 - 10:17 am | fioricet - Trackback "working methane start countries inc" |
| buy viagra, Jun 05, 09 - 10:34 am | buy viagra - Trackback "cap down times instrumental observations available" |
| vicodin, Jun 05, 09 - 11:06 am | vicodin - Trackback "near wire inc mean uncertain efforts emission" |
|