Last active
August 23, 2024 12:39
-
-
Save hsiboy/c7512f6dca87d4bbea8f to your computer and use it in GitHub Desktop.
Revisions
-
hsiboy revised this gist
Aug 31, 2017 . 1 changed file with 6 additions and 5 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,8 +1,8 @@ # Bot-Buster™ Tracks nefarious activity on website, and manages accordingly. ## It's probably a bot. If the requesting entity: * declares its user-agent as being wget, curl, webcopier etc - it's probably a bot. @@ -26,7 +26,7 @@ One more environment to consider: the corporate network. likely to find many dozens or hundreds of users with the exact same OS, browser, plugins, fonts etc. IP addresses are likely to be the same if the users are behind a corporate firewall. ## JavaScript Detection: ``` window._phantom (or window.callPhantom or navigator.onLine=false && navigator.plugins="") //phantomjs window.__phantomas //PhantomJS-based web perf metrics + monitoring tool @@ -37,7 +37,8 @@ window.webdriver //selenium window.domAutomation (or window.domAutomationController) //chromium based automation driver if (window.outerWidth === 0 && window.outerHeight === 0){ //headless browser } ``` ## Create fingerprint, and store forever: * https://github.com/samyk/evercookie * https://github.com/Valve/fingerprintjs @@ -52,7 +53,7 @@ if (window.outerWidth === 0 && window.outerHeight === 0){ //headless browser } |8 |0000000000001000|Unlikely Human Traffic Source (AWS, Azure, etc)| |16|0000000000010000|Known "Evasively Tricky" Source Country| ## future bitmap: | X-Bot | X-BotBitMap | Threat | |:-----:|:----------------:|:--------------------------------------------------------------:| -
hsiboy revised this gist
Dec 9, 2014 . 1 changed file with 6 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -2,7 +2,9 @@ Tracks nefarious activity on website, and manages accordingly. ##It's probably a bot. If the requesting entity: * declares its user-agent as being wget, curl, webcopier etc - it's probably a bot. * requests details -> details -> details -> details ad nauseum - it's probably a bot. * requests the html, but not .css, .js or site furniture - it's probably a bot. @@ -11,6 +13,9 @@ The requesting entity: * no user-agent (or matching a pattern of known bad ones) - it's probably a bot. * no cookie, and wont honor a set cookie - it's probably a bot. * no referrer, ever - it's probably a bot. * sessions with a lot of hits. it's probably a bot. * requests with a missing referer. it's probably a bot. * requests with a missing sessionID. it's probably a bot. Probable bots will be presented with a captcha type page. Humans can confirm their cognisance, bots will be trapped. -
hsiboy revised this gist
Dec 5, 2014 . 1 changed file with 16 additions and 6 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -161,7 +161,7 @@ if ($ua~"MSIE") { } sub isBadUserAgent($ua) { $BadUserAgents = [ "8484 Boston Project", @@ -188,6 +188,7 @@ if ($ua~"MSIE") { "GMI sentiment crawler/Nutch-1.0 (GMI sentiment crawler; http://GMI.googlepages.com ; MyEmail)", "Gecko/25", "GeoHasher/Nutch-1.0 (GeoHasher Web Search Engine; geohasher.gotdns.org; geo_hasher at yahoo * com)", "Google-HTTP-Java-Client/1.17.0-rc (gzip)", "Halebot (Mozilla/5.0 compatible; Halebot/2.1; http://www.tacitknowledge.com/halebot/)", "HttpProxy", "ISC Systems iRc", @@ -202,8 +203,8 @@ if ($ua~"MSIE") { "LWP", "MJ12bot/v1.0.8", "MSIE", "Microsoft URL Control - 6.00.8862", "Microsoft URL", "Missigua", "Movable Type", "Mozilla/2", @@ -299,9 +300,6 @@ if ($ua~"MSIE") { "Yahoo:LinkExpander:Slingstone", "Yeti/1.0 (NHN Corp.; http://help.naver.com/robots/)", "Zscho.de Crawler/Nutch-1.0-Zscho.de-semantic_patch (Zscho.de Crawler, collecting for machine learning; http://zscho.de/ )", "a href=", "adidxbot/2.0 (+http://search.msn.com/msnbot.htm)", "adwords", @@ -317,6 +315,7 @@ if ($ua~"MSIE") { "hanzoweb", "larbin@unspecified", "libwww-perl", "libwww-perl/5.805", "msnbot-UDiscovery/2.0b (+http://search.msn.com/msnbot.htm)", "msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)", "msnbot/2.0b (+http://search.msn.com/msnbot.htm)", @@ -342,11 +341,22 @@ sub isUsefulUserAgent($ua) { "AdsBot-Google (+http://www.google.com/adsbot.html)", "AdsBot-Google-Mobile (+http://www.google.com/mobile/adsbot.html) Mozilla (iPhone; U; CPU iPhone OS 3 0 like Mac OS X) AppleWebKit (KHTML, like Gecko) Mobile Safari", "DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)", "Feedfetcher-Google;+(+http://www.google.com/feedfetcher.html;", "GoogleProducer;+(+http://goo.gl/7y4SX)", "Googlebot/2.1 (+http://www.googlebot.com/bot.html)", "Mobile for smartphones user-agent is: Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)", "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111107 Ubuntu/10.04 (lucid) Firefox/3.6.24 Mozilla/3.5 (Google-HotelAdsVerifier)", "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)", "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)", "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)", "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)", "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)", "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)", "SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)", "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)", "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" "ia_archiver (+http://www.alexa.com/site/help/webmasters; crawler@alexa.com)", ] foreach ($UserAgent in $UsefulUserAgents) -
hsiboy revised this gist
Dec 5, 2014 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -165,7 +165,7 @@ if ($ua~"MSIE") { $BadUserAgents = [ "8484 Boston Project", "; Widows", "AddThis.com robot tech.support@clearspring.com", "BOT/0.1 (BOT for JCE)", "Bichoo Spider", -
hsiboy revised this gist
Dec 5, 2014 . 1 changed file with 17 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -337,6 +337,23 @@ if ($ua~"MSIE") { return 0; } sub isUsefulUserAgent($ua) { $UsefulUserAgents = [ "AdsBot-Google (+http://www.google.com/adsbot.html)", "AdsBot-Google-Mobile (+http://www.google.com/mobile/adsbot.html) Mozilla (iPhone; U; CPU iPhone OS 3 0 like Mac OS X) AppleWebKit (KHTML, like Gecko) Mobile Safari", "DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)", "Googlebot/2.1 (+http://www.googlebot.com/bot.html)", "Mobile for smartphones user-agent is: Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)", "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)", "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)", "SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)" ] foreach ($UserAgent in $UsefulUserAgents) if (string($ua, $UserAgent)) return 1; return 0; } ``` -
hsiboy revised this gist
Dec 5, 2014 . 1 changed file with 177 additions and 82 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -99,7 +99,7 @@ See it in action * “Proxy-Connection” does not exist and should never be seen in the wild * Referrer, if it exists, it must not be blank, and it must contain the absolute URL. ```perl #!/pseudo/code $ua = $headers['User-Agent']; @@ -161,87 +161,182 @@ if ($ua~"MSIE") { } sub isBadUserAgent($ua) { $BadUserAgents = [ "8484 Boston Project", ; Widows "AddThis.com robot tech.support@clearspring.com", "BOT/0.1 (BOT for JCE)", "Bichoo Spider", "BotBuster Bad Behavior Test", "COMODOspider/Nutch-1.0", "CherryPicker", "ClickTale bot", "ContextAd Bot 1.0", "DTS Agent", "Diamond", "Digger", "Domnutch-Bot/Nutch-1.0 (Domnutch; http://www.Nutch.de/) Nutch-1.0", "Email Extractor", "Email Siphon", "EmailCollector", "EmailSiphon", "Flamingo_SearchEngine (+http://www.flamingosearch.com/bot)", "FreeNutch/Nutch-1.2 Nutch-1.2", "Fve Nutch Spider/Nutch-1.7", "GMI sentiment crawler/Nutch-1.0 (GMI sentiment crawler; http://GMI.googlepages.com ; MyEmail)", "Gecko/25", "GeoHasher/Nutch-1.0 (GeoHasher Web Search Engine; geohasher.gotdns.org; geo_hasher at yahoo * com)", "Halebot (Mozilla/5.0 compatible; Halebot/2.1; http://www.tacitknowledge.com/halebot/)", "HttpProxy", "ISC Systems iRc", "Indy Library", "Infoaxe./Nutch-0.9", "Infoaxe./Nutch-1.0", "Internet Explorer", "Jakarta Commons", "Java 1.", "Java/1.", "KSCrawler/Nutch-1.0 (http://www.kindsight.net/en/kscrawler; crawler@kindsight.net)", "LWP", "MJ12bot/v1.0.8", "MSIE", "Microsoft URL", "Microsoft URL Control - 6.00.8862", "Missigua", "Movable Type", "Mozilla/2", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)/Nutch-1.0", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; http://www.changedetection.com/bot.html )", "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2)", "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0 ; Claritybot)", "Mozilla/4.0(", "Mozilla/4.0+(compatible;+", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729; Diffbot/0.1; +http://www.diffbot.com)", "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 (support.voilabot@orange-ftgroup.com)", "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; ) Firefox/1.5.0.11; 360Spider", "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 << seen from this ip 162.242.135.149", "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Web Preview) Chrome/27.0.1453 Safari/537.36", "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111107 Ubuntu/10.04 (lucid) Firefox/3.6.24 Mozilla/3.5 (Google-HotelAdsVerifier)", "Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/)", "Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)", "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)", "Mozilla/5.0 (compatible; EasouSpider; +http://www.easou.com/search/spider.html)", "Mozilla/5.0 (compatible; Exabot/3.0 (BiggerBetter); +http://www.exabot.com/go/robot)", "Mozilla/5.0 (compatible; Exabot/3.0; +http://www.exabot.com/go/robot)", "Mozilla/5.0 (compatible; Ezooms/1.0; help@moz.com)", "Mozilla/5.0 (compatible; Genieo/1.0 http://www.genieo.com/webfilter.html)", "Mozilla/5.0 (compatible; Googlebot/2.1; +http://import.io)", "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)", "Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (like Gecko) (Exabot-Thumbnails)", "Mozilla/5.0 (compatible; LinkChecker/8.3; +http://wummel.github.com/linkchecker/)", "Mozilla/5.0 (compatible; Linux x86_64; Mail.RU_Bot/2.0; +http://go.mail.ru/help/robots)", "Mozilla/5.0 (compatible; MJ12bot/v1.4.4; http://www.majestic12.co.uk/bot.php?+)", "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; Selenium Bot)", "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; Selenium Bot)", "Mozilla/5.0 (compatible; MojeekBot/0.6; http://www.mojeek.com/bot.html)", "Mozilla/5.0 (compatible; SEOkicks-Robot; +http://www.seokicks.de/robot.html)", "Mozilla/5.0 (compatible; SemrushBot/0.97; +http://www.semrush.com/bot.html)", "Mozilla/5.0 (compatible; TweetmemeBot/3.0; +http://tweetmeme.com/)", "Mozilla/5.0 (compatible; URLAppendBot/1.0; +http://www.profound.net/urlappendbot.html)", "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)", "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)", "Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots)", "Mozilla/5.0 (compatible; YoudaoBot/1.0; http://www.youdao.com/help/webmaster/spider/; )", "Mozilla/5.0 (compatible; aiHitBot/2.8; +http://endb-consolidated.aihit.com/)", "Mozilla/5.0 (compatible; archive.org_bot +http://www.archive.org/details/archive.org_bot)", "Mozilla/5.0 (compatible; linkCheck)", "Mozilla/5.0 (compatible; linkdexbot/2.0; +http://www.linkdex.com/about/bots/)", "Mozilla/5.0 (compatible; proximic; +http://www.proximic.com/info/spider.php)", "Mozilla/5.0 (compatible; special_archiver/3.1.1 +http://www.archive.org/details/archive.org_bot)", "Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_0 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8A293 Safari/6531.22.7/Nutch-1.0", "Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)", "Mozilla/5.0+(compatible;+PiplBot;++http://www.pipl.com/bot/)", "Murzillo compatible", "NIS Nutch Spider/Nutch-1.7 Spider/Nutch-1.7", "Nutch Experimental Crawler/Nutch-1.4 Experimental", "Nutch12/Nutch-1.2 Nutch-1.2", "NutchCVS", "Nutscrape/", "OmniExplorer", "POE-Component-Client", "Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)", "PussyCat", "PycURL", "QuerySeekerSpider ( http://queryseeker.com/bot.html )", "SMNutchSpider/Nutch-1.7", "SapphireWebCrawler/Nutch-1.0-dev (Sapphire Web Crawler using Nutch; http://boston.lti.cs.cmu.edu/crawler/; mhoy@cs.cmu.edu) http://boston.lti.cs.cmu.edu/crawler/", "Shockwave Flash", "ShowyouBot (http://showyou.com/crawler)", "Slurp/Nutch-1.0-dev (Slurp Search Engineer; http://www.google.com/bot.html; nutch-agent@lucene.apache.org)", "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)", "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)", "Super Happy Fun", "Test crawler Nutch/Nutch-1.0-dev (Nutch Test Project; changkuk@cmu.edu) Nutch-1.0-dev", "TrackBack/", "Turing Machine", "Twitterbot/1.0", "User Agent:", "User-Agent: Some-Agent/1.0", "User-agent:", "WIRE/0.22 (Linux; x86_64; Bot,Robot,Spider,Crawler)", "WISEbot", "WISEnutbot", "WeSEE:Search/0.1 (Alpha, http://www.wesee.com/bot/)", "WeSEE:Search/0.1 (Alpha, http://www.wesee.com/en/support/bot/)", "WebSite-X Suite", "WebaltBot", "Windows NT 4.0;)", "Windows NT 5.0;)", "Windows NT 5.1;)", "Windows XP 5", "Winnie Poh", "WordPress/4.0.1;", "WordPress/4.01", "Wordpress", "Yahoo:LinkExpander:Slingstone", "Yeti/1.0 (NHN Corp.; http://help.naver.com/robots/)", "Zscho.de Crawler/Nutch-1.0-Zscho.de-semantic_patch (Zscho.de Crawler, collecting for machine learning; http://zscho.de/ )", "\"", "\\\\)", "\r", "a href=", "adidxbot/2.0 (+http://search.msn.com/msnbot.htm)", "adwords", "autoemailspider", "bitlybot", "blogsearchbot-martin", "compatible ; MSIE", "compatible-", "core-project/", "ecollector", "grub crawler", "grub-client", "hanzoweb", "larbin@unspecified", "libwww-perl", "msnbot-UDiscovery/2.0b (+http://search.msn.com/msnbot.htm)", "msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)", "msnbot/2.0b (+http://search.msn.com/msnbot.htm)", "nutch-1.3/Nutch-1.3 Nutch-1.3", "nutch-1.4/Nutch-1.4 Nutch-1.4", "psbot-image (+http://www.picsearch.com/bot.html)", "psbot/0.1 (+http://www.picsearch.com/bot.html)", "psycheclone", "research-scan-bot/Nutch-1.0", "rogerbot/1.0 (http://moz.com/help/pro/what-is-rogerbot-, rogerbot-crawler+shiny@moz.com)", "spider", "user", "www.integromedb.org/Crawler", "" ] foreach ($UserAgent in $BadUserAgents) if (string($ua, $UserAgent)) return 1; return 0; } ``` -
hsiboy revised this gist
Dec 4, 2014 . 1 changed file with 26 additions and 17 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -37,22 +37,31 @@ if (window.outerWidth === 0 && window.outerHeight === 0){ //headless browser } * https://github.com/samyk/evercookie * https://github.com/Valve/fingerprintjs ## Current bitmap | X-Bot | X-BotBitMap | Threat | |:-----:|:----------------:|:----------------------:| |1 |0000000000000001|No Cookie| |2 |0000000000000010|No Referer| |4 |0000000000000100|Bad User Agent| |8 |0000000000001000|Unlikely Human Traffic Source (AWS, Azure, etc)| |16|0000000000010000|Known "Evasively Tricky" Source Country| ##future bitmap: | X-Bot | X-BotBitMap | Threat | |:-----:|:----------------:|:--------------------------------------------------------------:| |1 | 0000000000000001 | No Cookie| |2 | 0000000000000010 | No Referrer| |4 | 0000000000000100 | User Agent Spoof (Headers dont match User-Agent String)| |8 | 0000000000001000 | Unlikely Human Traffic Source (AWS, Azure, etc) | |16 | 0000000000010000 | Known "Evasively Tricky" Source Country| |32 | 0000000000100000 | Unlikely Human Behaviour| |64 | 0000000001000000 | Browser Integrity (Not requesting furniture)| |128| 0000000010000000 | Session Length Exceeded| |256| 0000000100000000 | Pages Per Session Exceeded| |512| 0000001000000000 | User Agent Spoof (Headers dont match User-Agent String)| |1024| 0000010000000000 | Browser Integrity (Not requesting furniture)| |2048 | 0000100000000000 | Generates lots of errors (404s)| |4096 | 0001000000000000 | No JavaScript| |8192 | 0010000000000000 | JavaScript validation Failed| @@ -74,13 +83,13 @@ See it in action * SELECT * SLEEP * -- (that’s two dashes) * @@VERSION * VARCHAR * CHAR * EXEC * EXECUTE * DECLARE * CAST * Range: field exists and begins with 0, real user-agents do not start ranges at 0 * Content-Range is a response header, not a request header * Via pinappleproxy || Via PCNETSERVER || Via Invisiware -
hsiboy revised this gist
Dec 3, 2014 . 1 changed file with 1 addition and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -110,6 +110,7 @@ CheckIp($headers['ip'], array["207.46.0.0/16", "65.52.0.0/14", "207.68.128.0/18" // Analyze user agents claiming to be google if ($ua="Googlebot") || ($ua="Mediapartners-Google") || ($ua="Google Web Preview"){ CheckIp($headers['ip'], array["66.249.64.0/19", "64.233.160.0/19", "72.14.192.0/18", "203.208.32.0/19", "74.125.0.0/16", "216.239.32.0/19", "209.85.128.0/17"]) if ($headers['from']=="googlebot(at)googlebot.com" // google bot sends this } // Analyze user agents claiming to be Yahoo if ($ua="Yahoo! Slurp") || ($ua="Yahoo! SearchMonkey") { -
hsiboy revised this gist
Dec 3, 2014 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -68,7 +68,7 @@ See it in action * Is it claiming to be HTTP/1.1? Then it shouldn't do HTTP/1.0 things * no-cache * Cache-Control * Enforce RFC 2965 sec 3.3.5 (Cookie2) and 9 (HISTORICAL) * SQL injection * ;DECLARE%20@ * SELECT -
hsiboy revised this gist
Dec 3, 2014 . 1 changed file with 8 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -73,7 +73,14 @@ See it in action * ;DECLARE%20@ * SELECT * SLEEP * -- (that’s two dashes) * @@version * varchar * char * exec * execute * declare * cast * Range: field exists and begins with 0, real user-agents do not start ranges at 0 * Content-Range is a response header, not a request header * Via pinappleproxy || Via PCNETSERVER || Via Invisiware -
hsiboy revised this gist
Dec 3, 2014 . 1 changed file with 7 additions and 7 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -64,16 +64,16 @@ See it in action  * Is it claiming to be HTTP/1.0? Then it shouldn't do HTTP/1.1 things * 100-continue * Is it claiming to be HTTP/1.1? Then it shouldn't do HTTP/1.0 things * no-cache * Cache-Control * Enforce RFC 2965 sec 3.3.5 and 9.1 * SQL injection * ;DECLARE%20@ * SELECT * SLEEP * cheep * Range: field exists and begins with 0, real user-agents do not start ranges at 0 * Content-Range is a response header, not a request header * Via pinappleproxy || Via PCNETSERVER || Via Invisiware -
hsiboy revised this gist
Dec 3, 2014 . 1 changed file with 7 additions and 7 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -64,16 +64,16 @@ See it in action  * Is it claiming to be HTTP/1.0? Then it shouldn't do HTTP/1.1 things --* 100-continue * Is it claiming to be HTTP/1.1? Then it shouldn't do HTTP/1.0 things --* no-cache --* Cache-Control * Enforce RFC 2965 sec 3.3.5 and 9.1 * SQL injection --* ;DECLARE%20@ --* SELECT --* SLEEP --* cheep * Range: field exists and begins with 0, real user-agents do not start ranges at 0 * Content-Range is a response header, not a request header * Via pinappleproxy || Via PCNETSERVER || Via Invisiware -
hsiboy revised this gist
Dec 3, 2014 . 1 changed file with 7 additions and 6 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -64,15 +64,16 @@ See it in action  * Is it claiming to be HTTP/1.0? Then it shouldn't do HTTP/1.1 things ⋅⋅* 100-continue * Is it claiming to be HTTP/1.1? Then it shouldn't do HTTP/1.0 things ⋅⋅* no-cache ⋅⋅* Cache-Control * Enforce RFC 2965 sec 3.3.5 and 9.1 * SQL injection ⋅⋅* ;DECLARE%20@ ⋅⋅* SELECT ⋅⋅* SLEEP ..* cheep * Range: field exists and begins with 0, real user-agents do not start ranges at 0 * Content-Range is a response header, not a request header * Via pinappleproxy || Via PCNETSERVER || Via Invisiware -
hsiboy revised this gist
Dec 3, 2014 . 1 changed file with 163 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -63,6 +63,169 @@ if (window.outerWidth === 0 && window.outerHeight === 0){ //headless browser } See it in action  * Is it claiming to be HTTP/1.0? Then it shouldn't do HTTP/1.1 things ** 100-continue * Is it claiming to be HTTP/1.1? Then it shouldn't do HTTP/1.0 things ** no-cache ** Cache-Control * Enforce RFC 2965 sec 3.3.5 and 9.1 * SQL injection ** ;DECLARE%20@ ** SELECT ** SLEEP * Range: field exists and begins with 0, real user-agents do not start ranges at 0 * Content-Range is a response header, not a request header * Via pinappleproxy || Via PCNETSERVER || Via Invisiware * keep-alive and close are mutually exclusive * Close shouldn't appear twice * Keey-Alive shouldn't appear twice either * “Proxy-Connection” does not exist and should never be seen in the wild * Referrer, if it exists, it must not be blank, and it must contain the absolute URL. ```php #!/pseudo/code $ua = $headers['User-Agent']; //Referrer, if it exists, must contain a : //While a relative URL is technically valid in Referrer, all known legit user-agents send an absolute URL if (strpos($headers['Referer'], ":") === FALSE) { return 400, "An invalid request was received from your browser. This may be caused by a malfunctioning proxy server or browser privacy software."; } // Analyze user agents claiming to be msnbot if ($ua="bingbot") || ($ua="msnbot") || ($ua="MS Search") { CheckIp($headers['ip'], array["207.46.0.0/16", "65.52.0.0/14", "207.68.128.0/18", "207.68.192.0/20", "64.4.0.0/18", "157.54.0.0/15", "157.60.0.0/16", "157.56.0.0/14"]); } // Analyze user agents claiming to be google if ($ua="Googlebot") || ($ua="Mediapartners-Google") || ($ua="Google Web Preview"){ CheckIp($headers['ip'], array["66.249.64.0/19", "64.233.160.0/19", "72.14.192.0/18", "203.208.32.0/19", "74.125.0.0/16", "216.239.32.0/19", "209.85.128.0/17"]) } // Analyze user agents claiming to be Yahoo if ($ua="Yahoo! Slurp") || ($ua="Yahoo! SearchMonkey") { CheckIp($headers['ip'], array["202.160.176.0/20", "67.195.0.0/16", "203.209.252.0/24", "72.30.0.0/16", "98.136.0.0/14", "74.6.0.0/16"]) } if ($ua~"MSIE") { if ($ua~"Opera") { // test Opera sent a "Accept" header. if ($headers['Accept']) { // looks like opera return "human" } } else { // MSIE does NOT send "Windows ME" or "Windows XP" in the user agent if ($headers['User-Agent']="Windows ME") || ($headers['User-Agent']="Windows XP") || ($headers['User-Agent'] ="Windows 2000") || ($headers['User-Agent']="Win32") { //this MSIE is a bot return "bot" } } elseif ($ua~"Konqueror") !== FALSE) { // CafeKelsa appears to be a dev project at Yahoo which indexes job listings for // Yahoo! HotJobs. It announces itself as Konqueror, so we skip these checks. if (($headers['User-Agent']~"YahooSeeker/CafeKelsa") === FALSE || CheckIp($headers['ip'], "209.73.160.0/19") === FALSE) { // if its a real browser it will send an Accept header if ($headers['Accept']) { return "human" }} } elseif ($ua~"Opera") !== FALSE) { // if its a real browser it will send an Accept header if ($headers['Accept']) { return "human" } } elseif ($ua~"Safari") !== FALSE) { // if its a real browser it will send an Accept header if ($headers['Accept']) { return "human" } } elseif ($ua~"Lynx") !== FALSE) { // if its a real browser it will send an Accept header if ($headers['Accept']) { return "human" } } elseif ($ua~"Mozilla") !== FALSE && (strpos($ua, "Mozilla") == 0) { if ($ua~"Google Desktop") === FALSE && ($ua~"PLAYSTATION 3") === FALSE) { // if its a real browser it will send an Accept header if ($headers['Accept']) { return "human" } } } // These user agent strings occur at the beginning of the line. ^ $bots = array( "<sc", // XSS exploit attempts "8484 Boston Project", // video poker/porn spam "adwords", // referrer spam "autoemailspider", // spam harvester "blogsearchbot-martin", // from honeypot "CherryPicker", // spam harvester "core-project/", // FrontPage extension exploits "Diamond", // delivers spyware/adware "Digger", // spam harvester "ecollector", // spam harvester "EmailCollector", // spam harvester "Email Siphon", // spam harvester "EmailSiphon", // spam harvester "grub crawler", // misc comment/email spam "HttpProxy", // misc comment/email spam "Internet Explorer", // XMLRPC exploits seen "ISC Systems iRc", // spam harvester "Jakarta Commons", // custommised bot "Java 1.", // custommised bot "Java/1.", // custommised bot "libwww-perl", // custommised bot "LWP", // custommised bot "Microsoft URL", // spam harvester "Missigua", // spam harvester "MJ12bot/v1.0.8", // malicious botnet "Movable Type", // customised spambots //"Mozilla ", // malicious software "Mozilla/2", // malicious software "Mozilla/4.0(", // from honeypot "Mozilla/4.0+(compatible;+", // suspicious harvester "MSIE", // malicious software "NutchCVS", // unidentified robots "Nutscrape/", // misc comment spam "OmniExplorer", // spam harvester "psycheclone", // spam harvester "PussyCat ", // misc comment spam "PycURL", // misc comment spam "Shockwave Flash", // spam harvester "Super Happy Fun ", // spam harvester "TrackBack/", // trackback spam "user", // suspicious harvester "User Agent: ", // spam harvester "User-Agent: ", // spam harvester "WebSite-X Suite", // misc comment spam "Winnie Poh", // Automated Coppermine hacks "Wordpress", // malicious software "\"", // malicious software ); // These user agent strings occur anywhere within the line. $bots = array( "\r", // A really dumb bot "; Widows ", // misc comment/email spam "a href=", // referrer spam "Bad Behavior Test", // Add this to your user-agent to test BB "compatible ; MSIE", // misc comment/email spam "compatible-", // misc comment/email spam "DTS Agent", // misc comment/email spam "Email Extractor", // spam harvester "Gecko/25", // revisit this in 500 years "grub-client", // search engine ignores robots.txt "hanzoweb", // very badly behaved crawler "Indy Library", // misc comment/email spam "larbin@unspecified", // stealth harvesters "Murzillo compatible", // comment spam bot ".NET CLR 1)", // free poker, etc. "POE-Component-Client", // free poker, etc. "Turing Machine", // www.anonymizer.com abuse "User-agent: ", // spam harvester/splogger "WebaltBot", // spam harvester "WISEbot", // spam harvester "WISEnutbot", // spam harvester "Windows NT 4.0;)", // wikispam bot "Windows NT 5.0;)", // wikispam bot "Windows NT 5.1;)", // wikispam bot "Windows XP 5", // spam harvester "WordPress/4.01", // pingback spam "\\\\)", // spam harvester ); ``` -
hsiboy revised this gist
Dec 3, 2014 . 1 changed file with 5 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -16,6 +16,11 @@ Probable bots will be presented with a captcha type page. Humans can confirm the This will work at the top of the stack using the ZTM to "manage" the offender. One more environment to consider: the corporate network. likely to find many dozens or hundreds of users with the exact same OS, browser, plugins, fonts etc. IP addresses are likely to be the same if the users are behind a corporate firewall. ##JavaScript Detection: ``` window._phantom (or window.callPhantom or navigator.onLine=false && navigator.plugins="") //phantomjs -
hsiboy revised this gist
Dec 2, 2014 . 1 changed file with 3 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -16,7 +16,7 @@ Probable bots will be presented with a captcha type page. Humans can confirm the This will work at the top of the stack using the ZTM to "manage" the offender. ##JavaScript Detection: ``` window._phantom (or window.callPhantom or navigator.onLine=false && navigator.plugins="") //phantomjs window.__phantomas //PhantomJS-based web perf metrics + monitoring tool @@ -55,6 +55,8 @@ if (window.outerWidth === 0 && window.outerHeight === 0){ //headless browser } |32768| 1000000000000000 | Known Automation (curl, wget, Selenium/Webdriver, Phantomjs)| See it in action  -
hsiboy revised this gist
Dec 2, 2014 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -37,7 +37,7 @@ if (window.outerWidth === 0 && window.outerHeight === 0){ //headless browser } | X-Bot | X-BotBitMap | Threat | |:-----:|:----------------:|:--------------------------------------------------------------:| | 1 | 0000000000000001 | Unlikely Human Traffic Source (AWS, Azure, etc) | | 2 | 0000000000000010 | Known Evasively Tricky Source Country| |4 | 0000000000000100 | Browser Integrity (Not requesting furniture)| |8 | 0000000000001000 | User Agent Spoof (Headers dont match User-Agent String)| -
hsiboy revised this gist
Dec 2, 2014 . 1 changed file with 20 additions and 19 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -34,25 +34,26 @@ if (window.outerWidth === 0 && window.outerHeight === 0){ //headless browser } ##Set bitmap: | X-Bot | X-BotBitMap | Threat | |:-----:|:----------------:|:--------------------------------------------------------------:| | 1 | 0000000000000001 | Unlikely Human Traffic Source (AWS, Azure, Google Compute etc) | | 2 | 0000000000000010 | Known Evasively Tricky Source Country| |4 | 0000000000000100 | Browser Integrity (Not requesting furniture)| |8 | 0000000000001000 | User Agent Spoof (Headers dont match User-Agent String)| |16 | 0000000000010000 | Unlikely Human Behaviour| |32 | 0000000000100000 | Honeytrap Access| |64 | 0000000001000000 | No Referrer| |128| 0000000010000000 | Session Length Exceeded| |256| 0000000100000000 | Pages Per Session Exceeded| |512| 0000001000000000 | Bad User Agent| |1024| 0000010000000000 | No Cookie| |2048 | 0000100000000000 | Generates lots of errors (404s)| |4096 | 0001000000000000 | No JavaScript| |8192 | 0010000000000000 | JavaScript validation Failed| |16384| 0100000000000000 | Fingerprint Validation Error| |32768| 1000000000000000 | Known Automation (curl, wget, Selenium/Webdriver, Phantomjs)| -
hsiboy revised this gist
Dec 2, 2014 . 1 changed file with 6 additions and 7 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -16,7 +16,7 @@ Probable bots will be presented with a captcha type page. Humans can confirm the This will work at the top of the stack using the ZTM to "manage" the offender. ##JavaSctipt detection: ``` window._phantom (or window.callPhantom or navigator.onLine=false && navigator.plugins="") //phantomjs window.__phantomas //PhantomJS-based web perf metrics + monitoring tool @@ -27,16 +27,15 @@ window.webdriver //selenium window.domAutomation (or window.domAutomationController) //chromium based automation driver if (window.outerWidth === 0 && window.outerHeight === 0){ //headless browser } ``` ##Create fingerprint, and store forever: * https://github.com/samyk/evercookie * https://github.com/Valve/fingerprintjs ##Set bitmap: ``` X-Bot X-BotBitMap Threat 1 0000000000000001 Unlikely Human Traffic Source (AWS, Azure, Google Compute etc) 2 0000000000000010 Known Evasively Tricky Source Country 4 0000000000000100 Browser Integrity (Not requesting furniture) -
hsiboy revised this gist
Dec 2, 2014 . 1 changed file with 17 additions and 17 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -30,29 +30,29 @@ if (window.outerWidth === 0 && window.outerHeight === 0){ //headless browser } create fingerprint, and store for ever: https://github.com/samyk/evercookie https://github.com/Valve/fingerprintjs Set bitmap: ``` X-Bot X-BotBitMap Threat Type 1 0000000000000001 Unlikely Human Traffic Source (AWS, Azure, Google Compute etc) 2 0000000000000010 Known Evasively Tricky Source Country 4 0000000000000100 Browser Integrity (Not requesting furniture) 8 0000000000001000 User Agent Spoof (Headers dont match User-Agent String) 16 0000000000010000 Unlikely Human Behaviour 32 0000000000100000 Honeytrap Access 64 0000000001000000 No Referrer 128 0000000010000000 Session Length Exceeded 256 0000000100000000 Pages Per Session Exceeded 512 0000001000000000 Bad User Agent 1024 0000010000000000 No Cookie 2048 0000100000000000 Generates lots of errors (404s) 4096 0001000000000000 No JavaScript 8192 0010000000000000 JavaScript validation Failed 16384 0100000000000000 Fingerprint Validation Error 32768 1000000000000000 Known Automation (curl, wget, Selenium/Webdriver, Phantomjs) ``` -
hsiboy revised this gist
Dec 2, 2014 . 1 changed file with 42 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -16,5 +16,47 @@ Probable bots will be presented with a captcha type page. Humans can confirm the This will work at the top of the stack using the ZTM to "manage" the offender. javasctipt detection: ``` window._phantom (or window.callPhantom or navigator.onLine=false && navigator.plugins="") //phantomjs window.__phantomas //PhantomJS-based web perf metrics + monitoring tool window.Buffer //nodejs window.emit //couchjs window.spawn //rhino window.webdriver //selenium window.domAutomation (or window.domAutomationController) //chromium based automation driver if (window.outerWidth === 0 && window.outerHeight === 0){ //headless browser } ``` create fingerprint, and store for ever: https://github.com/samyk/evercookie https://github.com/Valve/fingerprintjs Set bitmap: ``` X-Bot X-BotBitMap Threat Type 1 00000000000000000000000000000001 Unlikely Human Traffic Source 2 00000000000000000000000000000010 Known Evasively Tricky Source Country 4 00000000000000000000000000000100 Browser Integrity 8 00000000000000000000000000001000 User Agnet Spoof 16 00000000000000000000000000010000 Rate Limit 32 00000000000000000000000000100000 Honeytrap Access 64 00000000000000000000000001000000 No Referrer 128 00000000000000000000000010000000 Session Length Exceeded 256 00000000000000000000000100000000 Pages Per Session Exceeded 512 00000000000000000000001000000000 Bad User Agent 1024 00000000000000000000010000000000 No Cookie 2048 00000000000000000000100000000000 Filtered IP 4096 00000000000000000001000000000000 No JavaScript 8192 00000000000000000010000000000000 JavaScript validation Failed 16384 00000000000000000100000000000000 Fingerprint Validation Error 32768 00000000000000001000000000000000 Known Automation (curl, wget, Selenium/Webdriver, Phantomjs) 65536 00000000000000010000000000000000 repeated Form Submission ``` -
hsiboy created this gist
May 12, 2014 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,20 @@ #Bot-Buster™ Tracks nefarious activity on website, and manages accordingly. The requesting entity: * declares its user-agent as being wget, curl, webcopier etc - it's probably a bot. * requests details -> details -> details -> details ad nauseum - it's probably a bot. * requests the html, but not .css, .js or site furniture - it's probably a bot. * generates a large number of HTTP error codes > 400 (1.e 401, 403, 404 & 500)- it's probably a bot. * originates from an unlikely human traffic source (i.e Amazon AWS) - it's probably a bot. * no user-agent (or matching a pattern of known bad ones) - it's probably a bot. * no cookie, and wont honor a set cookie - it's probably a bot. * no referrer, ever - it's probably a bot. Probable bots will be presented with a captcha type page. Humans can confirm their cognisance, bots will be trapped. This will work at the top of the stack using the ZTM to "manage" the offender.