Skip to content

Instantly share code, notes, and snippets.

@hsiboy
Last active August 23, 2024 12:39
Show Gist options
  • Save hsiboy/c7512f6dca87d4bbea8f to your computer and use it in GitHub Desktop.
Save hsiboy/c7512f6dca87d4bbea8f to your computer and use it in GitHub Desktop.

Revisions

  1. hsiboy revised this gist Aug 31, 2017. 1 changed file with 6 additions and 5 deletions.
    11 changes: 6 additions & 5 deletions BotBuster.md
    Original file line number Diff line number Diff line change
    @@ -1,8 +1,8 @@
    #Bot-Buster™
    # Bot-Buster™

    Tracks nefarious activity on website, and manages accordingly.

    ##It's probably a bot.
    ## It's probably a bot.

    If the requesting entity:
    * declares its user-agent as being wget, curl, webcopier etc - it's probably a bot.
    @@ -26,7 +26,7 @@ One more environment to consider: the corporate network.
    likely to find many dozens or hundreds of users with the exact same OS, browser, plugins, fonts etc.
    IP addresses are likely to be the same if the users are behind a corporate firewall.

    ##JavaScript Detection:
    ## JavaScript Detection:
    ```
    window._phantom (or window.callPhantom or navigator.onLine=false && navigator.plugins="") //phantomjs
    window.__phantomas //PhantomJS-based web perf metrics + monitoring tool
    @@ -37,7 +37,8 @@ window.webdriver //selenium
    window.domAutomation (or window.domAutomationController) //chromium based automation driver
    if (window.outerWidth === 0 && window.outerHeight === 0){ //headless browser }
    ```
    ##Create fingerprint, and store forever:

    ## Create fingerprint, and store forever:

    * https://github.com/samyk/evercookie
    * https://github.com/Valve/fingerprintjs
    @@ -52,7 +53,7 @@ if (window.outerWidth === 0 && window.outerHeight === 0){ //headless browser }
    |8 |0000000000001000|Unlikely Human Traffic Source (AWS, Azure, etc)|
    |16|0000000000010000|Known "Evasively Tricky" Source Country|

    ##future bitmap:
    ## future bitmap:

    | X-Bot | X-BotBitMap | Threat |
    |:-----:|:----------------:|:--------------------------------------------------------------:|
  2. hsiboy revised this gist Dec 9, 2014. 1 changed file with 6 additions and 1 deletion.
    7 changes: 6 additions & 1 deletion BotBuster.md
    Original file line number Diff line number Diff line change
    @@ -2,7 +2,9 @@

    Tracks nefarious activity on website, and manages accordingly.

    The requesting entity:
    ##It's probably a bot.

    If the requesting entity:
    * declares its user-agent as being wget, curl, webcopier etc - it's probably a bot.
    * requests details -> details -> details -> details ad nauseum - it's probably a bot.
    * requests the html, but not .css, .js or site furniture - it's probably a bot.
    @@ -11,6 +13,9 @@ The requesting entity:
    * no user-agent (or matching a pattern of known bad ones) - it's probably a bot.
    * no cookie, and wont honor a set cookie - it's probably a bot.
    * no referrer, ever - it's probably a bot.
    * sessions with a lot of hits. it's probably a bot.
    * requests with a missing referer. it's probably a bot.
    * requests with a missing sessionID. it's probably a bot.

    Probable bots will be presented with a captcha type page. Humans can confirm their cognisance, bots will be trapped.

  3. hsiboy revised this gist Dec 5, 2014. 1 changed file with 16 additions and 6 deletions.
    22 changes: 16 additions & 6 deletions BotBuster.md
    Original file line number Diff line number Diff line change
    @@ -161,7 +161,7 @@ if ($ua~"MSIE") {
    }


    sub isBadUserAgent($ua) {
    sub isBadUserAgent($ua) {
    $BadUserAgents = [

    "8484 Boston Project",
    @@ -188,6 +188,7 @@ if ($ua~"MSIE") {
    "GMI sentiment crawler/Nutch-1.0 (GMI sentiment crawler; http://GMI.googlepages.com ; MyEmail)",
    "Gecko/25",
    "GeoHasher/Nutch-1.0 (GeoHasher Web Search Engine; geohasher.gotdns.org; geo_hasher at yahoo * com)",
    "Google-HTTP-Java-Client/1.17.0-rc (gzip)",
    "Halebot (Mozilla/5.0 compatible; Halebot/2.1; http://www.tacitknowledge.com/halebot/)",
    "HttpProxy",
    "ISC Systems iRc",
    @@ -202,8 +203,8 @@ if ($ua~"MSIE") {
    "LWP",
    "MJ12bot/v1.0.8",
    "MSIE",
    "Microsoft URL",
    "Microsoft URL Control - 6.00.8862",
    "Microsoft URL",
    "Missigua",
    "Movable Type",
    "Mozilla/2",
    @@ -299,9 +300,6 @@ if ($ua~"MSIE") {
    "Yahoo:LinkExpander:Slingstone",
    "Yeti/1.0 (NHN Corp.; http://help.naver.com/robots/)",
    "Zscho.de Crawler/Nutch-1.0-Zscho.de-semantic_patch (Zscho.de Crawler, collecting for machine learning; http://zscho.de/ )",
    "\"",
    "\\\\)",
    "\r",
    "a href=",
    "adidxbot/2.0 (+http://search.msn.com/msnbot.htm)",
    "adwords",
    @@ -317,6 +315,7 @@ if ($ua~"MSIE") {
    "hanzoweb",
    "larbin@unspecified",
    "libwww-perl",
    "libwww-perl/5.805",
    "msnbot-UDiscovery/2.0b (+http://search.msn.com/msnbot.htm)",
    "msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)",
    "msnbot/2.0b (+http://search.msn.com/msnbot.htm)",
    @@ -342,11 +341,22 @@ sub isUsefulUserAgent($ua) {
    "AdsBot-Google (+http://www.google.com/adsbot.html)",
    "AdsBot-Google-Mobile (+http://www.google.com/mobile/adsbot.html) Mozilla (iPhone; U; CPU iPhone OS 3 0 like Mac OS X) AppleWebKit (KHTML, like Gecko) Mobile Safari",
    "DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)",
    "Feedfetcher-Google;+(+http://www.google.com/feedfetcher.html;",
    "GoogleProducer;+(+http://goo.gl/7y4SX)",
    "Googlebot/2.1 (+http://www.googlebot.com/bot.html)",
    "Mobile for smartphones user-agent is: Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)",
    "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111107 Ubuntu/10.04 (lucid) Firefox/3.6.24 Mozilla/3.5 (Google-HotelAdsVerifier)",
    "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)",
    "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)",
    "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)",
    "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)",
    "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)",
    "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)",
    "SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)"
    "SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)",
    "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)",
    "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
    "ia_archiver (+http://www.alexa.com/site/help/webmasters; crawler@alexa.com)",


    ]
    foreach ($UserAgent in $UsefulUserAgents)
  4. hsiboy revised this gist Dec 5, 2014. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion BotBuster.md
    Original file line number Diff line number Diff line change
    @@ -165,7 +165,7 @@ if ($ua~"MSIE") {
    $BadUserAgents = [

    "8484 Boston Project",
    ; Widows
    "; Widows",
    "AddThis.com robot tech.support@clearspring.com",
    "BOT/0.1 (BOT for JCE)",
    "Bichoo Spider",
  5. hsiboy revised this gist Dec 5, 2014. 1 changed file with 17 additions and 0 deletions.
    17 changes: 17 additions & 0 deletions BotBuster.md
    Original file line number Diff line number Diff line change
    @@ -337,6 +337,23 @@ if ($ua~"MSIE") {
    return 0;
    }

    sub isUsefulUserAgent($ua) {
    $UsefulUserAgents = [
    "AdsBot-Google (+http://www.google.com/adsbot.html)",
    "AdsBot-Google-Mobile (+http://www.google.com/mobile/adsbot.html) Mozilla (iPhone; U; CPU iPhone OS 3 0 like Mac OS X) AppleWebKit (KHTML, like Gecko) Mobile Safari",
    "DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)",
    "Googlebot/2.1 (+http://www.googlebot.com/bot.html)",
    "Mobile for smartphones user-agent is: Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)",
    "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)",
    "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)",
    "SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)"

    ]
    foreach ($UserAgent in $UsefulUserAgents)
    if (string($ua, $UserAgent)) return 1;
    return 0;
    }

    ```


  6. hsiboy revised this gist Dec 5, 2014. 1 changed file with 177 additions and 82 deletions.
    259 changes: 177 additions & 82 deletions BotBuster.md
    Original file line number Diff line number Diff line change
    @@ -99,7 +99,7 @@ See it in action
    * “Proxy-Connection” does not exist and should never be seen in the wild
    * Referrer, if it exists, it must not be blank, and it must contain the absolute URL.

    ```php
    ```perl
    #!/pseudo/code

    $ua = $headers['User-Agent'];
    @@ -161,87 +161,182 @@ if ($ua~"MSIE") {
    }


    // These user agent strings occur at the beginning of the line. ^
    $bots = array(
    "<sc", // XSS exploit attempts
    "8484 Boston Project", // video poker/porn spam
    "adwords", // referrer spam
    "autoemailspider", // spam harvester
    "blogsearchbot-martin", // from honeypot
    "CherryPicker", // spam harvester
    "core-project/", // FrontPage extension exploits
    "Diamond", // delivers spyware/adware
    "Digger", // spam harvester
    "ecollector", // spam harvester
    "EmailCollector", // spam harvester
    "Email Siphon", // spam harvester
    "EmailSiphon", // spam harvester
    "grub crawler", // misc comment/email spam
    "HttpProxy", // misc comment/email spam
    "Internet Explorer", // XMLRPC exploits seen
    "ISC Systems iRc", // spam harvester
    "Jakarta Commons", // custommised bot
    "Java 1.", // custommised bot
    "Java/1.", // custommised bot
    "libwww-perl", // custommised bot
    "LWP", // custommised bot
    "Microsoft URL", // spam harvester
    "Missigua", // spam harvester
    "MJ12bot/v1.0.8", // malicious botnet
    "Movable Type", // customised spambots
    //"Mozilla ", // malicious software
    "Mozilla/2", // malicious software
    "Mozilla/4.0(", // from honeypot
    "Mozilla/4.0+(compatible;+", // suspicious harvester
    "MSIE", // malicious software
    "NutchCVS", // unidentified robots
    "Nutscrape/", // misc comment spam
    "OmniExplorer", // spam harvester
    "psycheclone", // spam harvester
    "PussyCat ", // misc comment spam
    "PycURL", // misc comment spam
    "Shockwave Flash", // spam harvester
    "Super Happy Fun ", // spam harvester
    "TrackBack/", // trackback spam
    "user", // suspicious harvester
    "User Agent: ", // spam harvester
    "User-Agent: ", // spam harvester
    "WebSite-X Suite", // misc comment spam
    "Winnie Poh", // Automated Coppermine hacks
    "Wordpress", // malicious software
    "\"", // malicious software
    );

    // These user agent strings occur anywhere within the line.
    $bots = array(
    "\r", // A really dumb bot
    "; Widows ", // misc comment/email spam
    "a href=", // referrer spam
    "Bad Behavior Test", // Add this to your user-agent to test BB
    "compatible ; MSIE", // misc comment/email spam
    "compatible-", // misc comment/email spam
    "DTS Agent", // misc comment/email spam
    "Email Extractor", // spam harvester
    "Gecko/25", // revisit this in 500 years
    "grub-client", // search engine ignores robots.txt
    "hanzoweb", // very badly behaved crawler
    "Indy Library", // misc comment/email spam
    "larbin@unspecified", // stealth harvesters
    "Murzillo compatible", // comment spam bot
    ".NET CLR 1)", // free poker, etc.
    "POE-Component-Client", // free poker, etc.
    "Turing Machine", // www.anonymizer.com abuse
    "User-agent: ", // spam harvester/splogger
    "WebaltBot", // spam harvester
    "WISEbot", // spam harvester
    "WISEnutbot", // spam harvester
    "Windows NT 4.0;)", // wikispam bot
    "Windows NT 5.0;)", // wikispam bot
    "Windows NT 5.1;)", // wikispam bot
    "Windows XP 5", // spam harvester
    "WordPress/4.01", // pingback spam
    "\\\\)", // spam harvester
    );
    sub isBadUserAgent($ua) {
    $BadUserAgents = [

    "8484 Boston Project",
    ; Widows
    "AddThis.com robot tech.support@clearspring.com",
    "BOT/0.1 (BOT for JCE)",
    "Bichoo Spider",
    "BotBuster Bad Behavior Test",
    "COMODOspider/Nutch-1.0",
    "CherryPicker",
    "ClickTale bot",
    "ContextAd Bot 1.0",
    "DTS Agent",
    "Diamond",
    "Digger",
    "Domnutch-Bot/Nutch-1.0 (Domnutch; http://www.Nutch.de/) Nutch-1.0",
    "Email Extractor",
    "Email Siphon",
    "EmailCollector",
    "EmailSiphon",
    "Flamingo_SearchEngine (+http://www.flamingosearch.com/bot)",
    "FreeNutch/Nutch-1.2 Nutch-1.2",
    "Fve Nutch Spider/Nutch-1.7",
    "GMI sentiment crawler/Nutch-1.0 (GMI sentiment crawler; http://GMI.googlepages.com ; MyEmail)",
    "Gecko/25",
    "GeoHasher/Nutch-1.0 (GeoHasher Web Search Engine; geohasher.gotdns.org; geo_hasher at yahoo * com)",
    "Halebot (Mozilla/5.0 compatible; Halebot/2.1; http://www.tacitknowledge.com/halebot/)",
    "HttpProxy",
    "ISC Systems iRc",
    "Indy Library",
    "Infoaxe./Nutch-0.9",
    "Infoaxe./Nutch-1.0",
    "Internet Explorer",
    "Jakarta Commons",
    "Java 1.",
    "Java/1.",
    "KSCrawler/Nutch-1.0 (http://www.kindsight.net/en/kscrawler; crawler@kindsight.net)",
    "LWP",
    "MJ12bot/v1.0.8",
    "MSIE",
    "Microsoft URL",
    "Microsoft URL Control - 6.00.8862",
    "Missigua",
    "Movable Type",
    "Mozilla/2",
    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)/Nutch-1.0",
    "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; http://www.changedetection.com/bot.html )",
    "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2)",
    "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0 ; Claritybot)",
    "Mozilla/4.0(",
    "Mozilla/4.0+(compatible;+",
    "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0",
    "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729; Diffbot/0.1; +http://www.diffbot.com)",
    "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 (support.voilabot@orange-ftgroup.com)",
    "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; ) Firefox/1.5.0.11; 360Spider",
    "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 << seen from this ip 162.242.135.149",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Web Preview) Chrome/27.0.1453 Safari/537.36",
    "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111107 Ubuntu/10.04 (lucid) Firefox/3.6.24 Mozilla/3.5 (Google-HotelAdsVerifier)",
    "Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/)",
    "Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)",
    "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)",
    "Mozilla/5.0 (compatible; EasouSpider; +http://www.easou.com/search/spider.html)",
    "Mozilla/5.0 (compatible; Exabot/3.0 (BiggerBetter); +http://www.exabot.com/go/robot)",
    "Mozilla/5.0 (compatible; Exabot/3.0; +http://www.exabot.com/go/robot)",
    "Mozilla/5.0 (compatible; Ezooms/1.0; help@moz.com)",
    "Mozilla/5.0 (compatible; Genieo/1.0 http://www.genieo.com/webfilter.html)",
    "Mozilla/5.0 (compatible; Googlebot/2.1; +http://import.io)",
    "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)",
    "Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (like Gecko) (Exabot-Thumbnails)",
    "Mozilla/5.0 (compatible; LinkChecker/8.3; +http://wummel.github.com/linkchecker/)",
    "Mozilla/5.0 (compatible; Linux x86_64; Mail.RU_Bot/2.0; +http://go.mail.ru/help/robots)",
    "Mozilla/5.0 (compatible; MJ12bot/v1.4.4; http://www.majestic12.co.uk/bot.php?+)",
    "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; Selenium Bot)",
    "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; Selenium Bot)",
    "Mozilla/5.0 (compatible; MojeekBot/0.6; http://www.mojeek.com/bot.html)",
    "Mozilla/5.0 (compatible; SEOkicks-Robot; +http://www.seokicks.de/robot.html)",
    "Mozilla/5.0 (compatible; SemrushBot/0.97; +http://www.semrush.com/bot.html)",
    "Mozilla/5.0 (compatible; TweetmemeBot/3.0; +http://tweetmeme.com/)",
    "Mozilla/5.0 (compatible; URLAppendBot/1.0; +http://www.profound.net/urlappendbot.html)",
    "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)",
    "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)",
    "Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots)",
    "Mozilla/5.0 (compatible; YoudaoBot/1.0; http://www.youdao.com/help/webmaster/spider/; )",
    "Mozilla/5.0 (compatible; aiHitBot/2.8; +http://endb-consolidated.aihit.com/)",
    "Mozilla/5.0 (compatible; archive.org_bot +http://www.archive.org/details/archive.org_bot)",
    "Mozilla/5.0 (compatible; linkCheck)",
    "Mozilla/5.0 (compatible; linkdexbot/2.0; +http://www.linkdex.com/about/bots/)",
    "Mozilla/5.0 (compatible; proximic; +http://www.proximic.com/info/spider.php)",
    "Mozilla/5.0 (compatible; special_archiver/3.1.1 +http://www.archive.org/details/archive.org_bot)",
    "Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_0 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8A293 Safari/6531.22.7/Nutch-1.0",
    "Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)",
    "Mozilla/5.0+(compatible;+PiplBot;++http://www.pipl.com/bot/)",
    "Murzillo compatible",
    "NIS Nutch Spider/Nutch-1.7 Spider/Nutch-1.7",
    "Nutch Experimental Crawler/Nutch-1.4 Experimental",
    "Nutch12/Nutch-1.2 Nutch-1.2",
    "NutchCVS",
    "Nutscrape/",
    "OmniExplorer",
    "POE-Component-Client",
    "Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)",
    "PussyCat",
    "PycURL",
    "QuerySeekerSpider ( http://queryseeker.com/bot.html )",
    "SMNutchSpider/Nutch-1.7",
    "SapphireWebCrawler/Nutch-1.0-dev (Sapphire Web Crawler using Nutch; http://boston.lti.cs.cmu.edu/crawler/; mhoy@cs.cmu.edu) http://boston.lti.cs.cmu.edu/crawler/",
    "Shockwave Flash",
    "ShowyouBot (http://showyou.com/crawler)",
    "Slurp/Nutch-1.0-dev (Slurp Search Engineer; http://www.google.com/bot.html; nutch-agent@lucene.apache.org)",
    "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)",
    "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)",
    "Super Happy Fun",
    "Test crawler Nutch/Nutch-1.0-dev (Nutch Test Project; changkuk@cmu.edu) Nutch-1.0-dev",
    "TrackBack/",
    "Turing Machine",
    "Twitterbot/1.0",
    "User Agent:",
    "User-Agent: Some-Agent/1.0",
    "User-agent:",
    "WIRE/0.22 (Linux; x86_64; Bot,Robot,Spider,Crawler)",
    "WISEbot",
    "WISEnutbot",
    "WeSEE:Search/0.1 (Alpha, http://www.wesee.com/bot/)",
    "WeSEE:Search/0.1 (Alpha, http://www.wesee.com/en/support/bot/)",
    "WebSite-X Suite",
    "WebaltBot",
    "Windows NT 4.0;)",
    "Windows NT 5.0;)",
    "Windows NT 5.1;)",
    "Windows XP 5",
    "Winnie Poh",
    "WordPress/4.0.1;",
    "WordPress/4.01",
    "Wordpress",
    "Yahoo:LinkExpander:Slingstone",
    "Yeti/1.0 (NHN Corp.; http://help.naver.com/robots/)",
    "Zscho.de Crawler/Nutch-1.0-Zscho.de-semantic_patch (Zscho.de Crawler, collecting for machine learning; http://zscho.de/ )",
    "\"",
    "\\\\)",
    "\r",
    "a href=",
    "adidxbot/2.0 (+http://search.msn.com/msnbot.htm)",
    "adwords",
    "autoemailspider",
    "bitlybot",
    "blogsearchbot-martin",
    "compatible ; MSIE",
    "compatible-",
    "core-project/",
    "ecollector",
    "grub crawler",
    "grub-client",
    "hanzoweb",
    "larbin@unspecified",
    "libwww-perl",
    "msnbot-UDiscovery/2.0b (+http://search.msn.com/msnbot.htm)",
    "msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)",
    "msnbot/2.0b (+http://search.msn.com/msnbot.htm)",
    "nutch-1.3/Nutch-1.3 Nutch-1.3",
    "nutch-1.4/Nutch-1.4 Nutch-1.4",
    "psbot-image (+http://www.picsearch.com/bot.html)",
    "psbot/0.1 (+http://www.picsearch.com/bot.html)",
    "psycheclone",
    "research-scan-bot/Nutch-1.0",
    "rogerbot/1.0 (http://moz.com/help/pro/what-is-rogerbot-, rogerbot-crawler+shiny@moz.com)",
    "spider",
    "user",
    "www.integromedb.org/Crawler",
    ""
    ]
    foreach ($UserAgent in $BadUserAgents)
    if (string($ua, $UserAgent)) return 1;
    return 0;
    }

    ```


  7. hsiboy revised this gist Dec 4, 2014. 1 changed file with 26 additions and 17 deletions.
    43 changes: 26 additions & 17 deletions BotBuster.md
    Original file line number Diff line number Diff line change
    @@ -37,22 +37,31 @@ if (window.outerWidth === 0 && window.outerHeight === 0){ //headless browser }
    * https://github.com/samyk/evercookie
    * https://github.com/Valve/fingerprintjs

    ##Set bitmap:
    ## Current bitmap

    | X-Bot | X-BotBitMap | Threat |
    |:-----:|:----------------:|:----------------------:|
    |1 |0000000000000001|No Cookie|
    |2 |0000000000000010|No Referer|
    |4 |0000000000000100|Bad User Agent|
    |8 |0000000000001000|Unlikely Human Traffic Source (AWS, Azure, etc)|
    |16|0000000000010000|Known "Evasively Tricky" Source Country|

    ##future bitmap:

    | X-Bot | X-BotBitMap | Threat |
    |:-----:|:----------------:|:--------------------------------------------------------------:|
    | 1 | 0000000000000001 | Unlikely Human Traffic Source (AWS, Azure, etc) |
    | 2 | 0000000000000010 | Known Evasively Tricky Source Country|
    |4 | 0000000000000100 | Browser Integrity (Not requesting furniture)|
    |8 | 0000000000001000 | User Agent Spoof (Headers dont match User-Agent String)|
    |16 | 0000000000010000 | Unlikely Human Behaviour|
    |32 | 0000000000100000 | Honeytrap Access|
    |64 | 0000000001000000 | No Referrer|
    |1 | 0000000000000001 | No Cookie|
    |2 | 0000000000000010 | No Referrer|
    |4 | 0000000000000100 | User Agent Spoof (Headers dont match User-Agent String)|
    |8 | 0000000000001000 | Unlikely Human Traffic Source (AWS, Azure, etc) |
    |16 | 0000000000010000 | Known "Evasively Tricky" Source Country|
    |32 | 0000000000100000 | Unlikely Human Behaviour|
    |64 | 0000000001000000 | Browser Integrity (Not requesting furniture)|
    |128| 0000000010000000 | Session Length Exceeded|
    |256| 0000000100000000 | Pages Per Session Exceeded|
    |512| 0000001000000000 | Bad User Agent|
    |1024| 0000010000000000 | No Cookie|
    |512| 0000001000000000 | User Agent Spoof (Headers dont match User-Agent String)|
    |1024| 0000010000000000 | Browser Integrity (Not requesting furniture)|
    |2048 | 0000100000000000 | Generates lots of errors (404s)|
    |4096 | 0001000000000000 | No JavaScript|
    |8192 | 0010000000000000 | JavaScript validation Failed|
    @@ -74,13 +83,13 @@ See it in action
    * SELECT
    * SLEEP
    * -- (that’s two dashes)
    * @@version
    * varchar
    * char
    * exec
    * execute
    * declare
    * cast
    * @@VERSION
    * VARCHAR
    * CHAR
    * EXEC
    * EXECUTE
    * DECLARE
    * CAST
    * Range: field exists and begins with 0, real user-agents do not start ranges at 0
    * Content-Range is a response header, not a request header
    * Via pinappleproxy || Via PCNETSERVER || Via Invisiware
  8. hsiboy revised this gist Dec 3, 2014. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions BotBuster.md
    Original file line number Diff line number Diff line change
    @@ -110,6 +110,7 @@ CheckIp($headers['ip'], array["207.46.0.0/16", "65.52.0.0/14", "207.68.128.0/18"
    // Analyze user agents claiming to be google
    if ($ua="Googlebot") || ($ua="Mediapartners-Google") || ($ua="Google Web Preview"){
    CheckIp($headers['ip'], array["66.249.64.0/19", "64.233.160.0/19", "72.14.192.0/18", "203.208.32.0/19", "74.125.0.0/16", "216.239.32.0/19", "209.85.128.0/17"])
    if ($headers['from']=="googlebot(at)googlebot.com" // google bot sends this
    }
    // Analyze user agents claiming to be Yahoo
    if ($ua="Yahoo! Slurp") || ($ua="Yahoo! SearchMonkey") {
  9. hsiboy revised this gist Dec 3, 2014. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion BotBuster.md
    Original file line number Diff line number Diff line change
    @@ -68,7 +68,7 @@ See it in action
    * Is it claiming to be HTTP/1.1? Then it shouldn't do HTTP/1.0 things
    * no-cache
    * Cache-Control
    * Enforce RFC 2965 sec 3.3.5 and 9.1
    * Enforce RFC 2965 sec 3.3.5 (Cookie2) and 9 (HISTORICAL)
    * SQL injection
    * ;DECLARE%20@
    * SELECT
  10. hsiboy revised this gist Dec 3, 2014. 1 changed file with 8 additions and 1 deletion.
    9 changes: 8 additions & 1 deletion BotBuster.md
    Original file line number Diff line number Diff line change
    @@ -73,7 +73,14 @@ See it in action
    * ;DECLARE%20@
    * SELECT
    * SLEEP
    * cheep
    * -- (that’s two dashes)
    * @@version
    * varchar
    * char
    * exec
    * execute
    * declare
    * cast
    * Range: field exists and begins with 0, real user-agents do not start ranges at 0
    * Content-Range is a response header, not a request header
    * Via pinappleproxy || Via PCNETSERVER || Via Invisiware
  11. hsiboy revised this gist Dec 3, 2014. 1 changed file with 7 additions and 7 deletions.
    14 changes: 7 additions & 7 deletions BotBuster.md
    Original file line number Diff line number Diff line change
    @@ -64,16 +64,16 @@ See it in action
    ![alt text](https://raw.githubusercontent.com/hsiboy/resources/master/busted.JPG "Busted Bots!")

    * Is it claiming to be HTTP/1.0? Then it shouldn't do HTTP/1.1 things
    --* 100-continue
    * 100-continue
    * Is it claiming to be HTTP/1.1? Then it shouldn't do HTTP/1.0 things
    --* no-cache
    --* Cache-Control
    * no-cache
    * Cache-Control
    * Enforce RFC 2965 sec 3.3.5 and 9.1
    * SQL injection
    --* ;DECLARE%20@
    --* SELECT
    --* SLEEP
    --* cheep
    * ;DECLARE%20@
    * SELECT
    * SLEEP
    * cheep
    * Range: field exists and begins with 0, real user-agents do not start ranges at 0
    * Content-Range is a response header, not a request header
    * Via pinappleproxy || Via PCNETSERVER || Via Invisiware
  12. hsiboy revised this gist Dec 3, 2014. 1 changed file with 7 additions and 7 deletions.
    14 changes: 7 additions & 7 deletions BotBuster.md
    Original file line number Diff line number Diff line change
    @@ -64,16 +64,16 @@ See it in action
    ![alt text](https://raw.githubusercontent.com/hsiboy/resources/master/busted.JPG "Busted Bots!")

    * Is it claiming to be HTTP/1.0? Then it shouldn't do HTTP/1.1 things
    ⋅⋅* 100-continue
    --* 100-continue
    * Is it claiming to be HTTP/1.1? Then it shouldn't do HTTP/1.0 things
    ⋅⋅* no-cache
    ⋅⋅* Cache-Control
    --* no-cache
    --* Cache-Control
    * Enforce RFC 2965 sec 3.3.5 and 9.1
    * SQL injection
    ⋅⋅* ;DECLARE%20@
    ⋅⋅* SELECT
    ⋅⋅* SLEEP
    ..* cheep
    --* ;DECLARE%20@
    --* SELECT
    --* SLEEP
    --* cheep
    * Range: field exists and begins with 0, real user-agents do not start ranges at 0
    * Content-Range is a response header, not a request header
    * Via pinappleproxy || Via PCNETSERVER || Via Invisiware
  13. hsiboy revised this gist Dec 3, 2014. 1 changed file with 7 additions and 6 deletions.
    13 changes: 7 additions & 6 deletions BotBuster.md
    Original file line number Diff line number Diff line change
    @@ -64,15 +64,16 @@ See it in action
    ![alt text](https://raw.githubusercontent.com/hsiboy/resources/master/busted.JPG "Busted Bots!")

    * Is it claiming to be HTTP/1.0? Then it shouldn't do HTTP/1.1 things
    ** 100-continue
    ⋅⋅* 100-continue
    * Is it claiming to be HTTP/1.1? Then it shouldn't do HTTP/1.0 things
    ** no-cache
    ** Cache-Control
    ⋅⋅* no-cache
    ⋅⋅* Cache-Control
    * Enforce RFC 2965 sec 3.3.5 and 9.1
    * SQL injection
    ** ;DECLARE%20@
    ** SELECT
    ** SLEEP
    ⋅⋅* ;DECLARE%20@
    ⋅⋅* SELECT
    ⋅⋅* SLEEP
    ..* cheep
    * Range: field exists and begins with 0, real user-agents do not start ranges at 0
    * Content-Range is a response header, not a request header
    * Via pinappleproxy || Via PCNETSERVER || Via Invisiware
  14. hsiboy revised this gist Dec 3, 2014. 1 changed file with 163 additions and 0 deletions.
    163 changes: 163 additions & 0 deletions BotBuster.md
    Original file line number Diff line number Diff line change
    @@ -63,6 +63,169 @@ if (window.outerWidth === 0 && window.outerHeight === 0){ //headless browser }
    See it in action
    ![alt text](https://raw.githubusercontent.com/hsiboy/resources/master/busted.JPG "Busted Bots!")

    * Is it claiming to be HTTP/1.0? Then it shouldn't do HTTP/1.1 things
    ** 100-continue
    * Is it claiming to be HTTP/1.1? Then it shouldn't do HTTP/1.0 things
    ** no-cache
    ** Cache-Control
    * Enforce RFC 2965 sec 3.3.5 and 9.1
    * SQL injection
    ** ;DECLARE%20@
    ** SELECT
    ** SLEEP
    * Range: field exists and begins with 0, real user-agents do not start ranges at 0
    * Content-Range is a response header, not a request header
    * Via pinappleproxy || Via PCNETSERVER || Via Invisiware
    * keep-alive and close are mutually exclusive
    * Close shouldn't appear twice
    * Keey-Alive shouldn't appear twice either
    * “Proxy-Connection” does not exist and should never be seen in the wild
    * Referrer, if it exists, it must not be blank, and it must contain the absolute URL.

    ```php
    #!/pseudo/code

    $ua = $headers['User-Agent'];

    //Referrer, if it exists, must contain a :
    //While a relative URL is technically valid in Referrer, all known legit user-agents send an absolute URL

    if (strpos($headers['Referer'], ":") === FALSE) {
    return 400, "An invalid request was received from your browser. This may be caused by a malfunctioning proxy server or browser privacy software.";
    }

    // Analyze user agents claiming to be msnbot
    if ($ua="bingbot") || ($ua="msnbot") || ($ua="MS Search") {
    CheckIp($headers['ip'], array["207.46.0.0/16", "65.52.0.0/14", "207.68.128.0/18", "207.68.192.0/20", "64.4.0.0/18", "157.54.0.0/15", "157.60.0.0/16", "157.56.0.0/14"]);
    }

    // Analyze user agents claiming to be google
    if ($ua="Googlebot") || ($ua="Mediapartners-Google") || ($ua="Google Web Preview"){
    CheckIp($headers['ip'], array["66.249.64.0/19", "64.233.160.0/19", "72.14.192.0/18", "203.208.32.0/19", "74.125.0.0/16", "216.239.32.0/19", "209.85.128.0/17"])
    }
    // Analyze user agents claiming to be Yahoo
    if ($ua="Yahoo! Slurp") || ($ua="Yahoo! SearchMonkey") {
    CheckIp($headers['ip'], array["202.160.176.0/20", "67.195.0.0/16", "203.209.252.0/24", "72.30.0.0/16", "98.136.0.0/14", "74.6.0.0/16"])
    }

    if ($ua~"MSIE") {
    if ($ua~"Opera") {
    // test Opera sent a "Accept" header.
    if ($headers['Accept']) { // looks like opera
    return "human"
    }
    } else {
    // MSIE does NOT send "Windows ME" or "Windows XP" in the user agent
    if ($headers['User-Agent']="Windows ME") || ($headers['User-Agent']="Windows XP") || ($headers['User-Agent'] ="Windows 2000") || ($headers['User-Agent']="Win32") {
    //this MSIE is a bot
    return "bot"
    }
    } elseif ($ua~"Konqueror") !== FALSE) {
    // CafeKelsa appears to be a dev project at Yahoo which indexes job listings for
    // Yahoo! HotJobs. It announces itself as Konqueror, so we skip these checks.
    if (($headers['User-Agent']~"YahooSeeker/CafeKelsa") === FALSE || CheckIp($headers['ip'], "209.73.160.0/19") === FALSE) {
    // if its a real browser it will send an Accept header
    if ($headers['Accept']) { return "human" }}
    } elseif ($ua~"Opera") !== FALSE) {
    // if its a real browser it will send an Accept header
    if ($headers['Accept']) { return "human" }
    } elseif ($ua~"Safari") !== FALSE) {
    // if its a real browser it will send an Accept header
    if ($headers['Accept']) { return "human" }
    } elseif ($ua~"Lynx") !== FALSE) {
    // if its a real browser it will send an Accept header
    if ($headers['Accept']) { return "human" }
    } elseif ($ua~"Mozilla") !== FALSE && (strpos($ua, "Mozilla") == 0) {
    if ($ua~"Google Desktop") === FALSE && ($ua~"PLAYSTATION 3") === FALSE) {
    // if its a real browser it will send an Accept header
    if ($headers['Accept']) { return "human" }
    }
    }


    // These user agent strings occur at the beginning of the line. ^
    $bots = array(
    "<sc", // XSS exploit attempts
    "8484 Boston Project", // video poker/porn spam
    "adwords", // referrer spam
    "autoemailspider", // spam harvester
    "blogsearchbot-martin", // from honeypot
    "CherryPicker", // spam harvester
    "core-project/", // FrontPage extension exploits
    "Diamond", // delivers spyware/adware
    "Digger", // spam harvester
    "ecollector", // spam harvester
    "EmailCollector", // spam harvester
    "Email Siphon", // spam harvester
    "EmailSiphon", // spam harvester
    "grub crawler", // misc comment/email spam
    "HttpProxy", // misc comment/email spam
    "Internet Explorer", // XMLRPC exploits seen
    "ISC Systems iRc", // spam harvester
    "Jakarta Commons", // custommised bot
    "Java 1.", // custommised bot
    "Java/1.", // custommised bot
    "libwww-perl", // custommised bot
    "LWP", // custommised bot
    "Microsoft URL", // spam harvester
    "Missigua", // spam harvester
    "MJ12bot/v1.0.8", // malicious botnet
    "Movable Type", // customised spambots
    //"Mozilla ", // malicious software
    "Mozilla/2", // malicious software
    "Mozilla/4.0(", // from honeypot
    "Mozilla/4.0+(compatible;+", // suspicious harvester
    "MSIE", // malicious software
    "NutchCVS", // unidentified robots
    "Nutscrape/", // misc comment spam
    "OmniExplorer", // spam harvester
    "psycheclone", // spam harvester
    "PussyCat ", // misc comment spam
    "PycURL", // misc comment spam
    "Shockwave Flash", // spam harvester
    "Super Happy Fun ", // spam harvester
    "TrackBack/", // trackback spam
    "user", // suspicious harvester
    "User Agent: ", // spam harvester
    "User-Agent: ", // spam harvester
    "WebSite-X Suite", // misc comment spam
    "Winnie Poh", // Automated Coppermine hacks
    "Wordpress", // malicious software
    "\"", // malicious software
    );

    // These user agent strings occur anywhere within the line.
    $bots = array(
    "\r", // A really dumb bot
    "; Widows ", // misc comment/email spam
    "a href=", // referrer spam
    "Bad Behavior Test", // Add this to your user-agent to test BB
    "compatible ; MSIE", // misc comment/email spam
    "compatible-", // misc comment/email spam
    "DTS Agent", // misc comment/email spam
    "Email Extractor", // spam harvester
    "Gecko/25", // revisit this in 500 years
    "grub-client", // search engine ignores robots.txt
    "hanzoweb", // very badly behaved crawler
    "Indy Library", // misc comment/email spam
    "larbin@unspecified", // stealth harvesters
    "Murzillo compatible", // comment spam bot
    ".NET CLR 1)", // free poker, etc.
    "POE-Component-Client", // free poker, etc.
    "Turing Machine", // www.anonymizer.com abuse
    "User-agent: ", // spam harvester/splogger
    "WebaltBot", // spam harvester
    "WISEbot", // spam harvester
    "WISEnutbot", // spam harvester
    "Windows NT 4.0;)", // wikispam bot
    "Windows NT 5.0;)", // wikispam bot
    "Windows NT 5.1;)", // wikispam bot
    "Windows XP 5", // spam harvester
    "WordPress/4.01", // pingback spam
    "\\\\)", // spam harvester
    );
    ```




  15. hsiboy revised this gist Dec 3, 2014. 1 changed file with 5 additions and 0 deletions.
    5 changes: 5 additions & 0 deletions BotBuster.md
    Original file line number Diff line number Diff line change
    @@ -16,6 +16,11 @@ Probable bots will be presented with a captcha type page. Humans can confirm the

    This will work at the top of the stack using the ZTM to "manage" the offender.

    One more environment to consider: the corporate network.

    likely to find many dozens or hundreds of users with the exact same OS, browser, plugins, fonts etc.
    IP addresses are likely to be the same if the users are behind a corporate firewall.

    ##JavaScript Detection:
    ```
    window._phantom (or window.callPhantom or navigator.onLine=false && navigator.plugins="") //phantomjs
  16. hsiboy revised this gist Dec 2, 2014. 1 changed file with 3 additions and 1 deletion.
    4 changes: 3 additions & 1 deletion BotBuster.md
    Original file line number Diff line number Diff line change
    @@ -16,7 +16,7 @@ Probable bots will be presented with a captcha type page. Humans can confirm the

    This will work at the top of the stack using the ZTM to "manage" the offender.

    ##JavaSctipt detection:
    ##JavaScript Detection:
    ```
    window._phantom (or window.callPhantom or navigator.onLine=false && navigator.plugins="") //phantomjs
    window.__phantomas //PhantomJS-based web perf metrics + monitoring tool
    @@ -55,6 +55,8 @@ if (window.outerWidth === 0 && window.outerHeight === 0){ //headless browser }
    |32768| 1000000000000000 | Known Automation (curl, wget, Selenium/Webdriver, Phantomjs)|


    See it in action
    ![alt text](https://raw.githubusercontent.com/hsiboy/resources/master/busted.JPG "Busted Bots!")



  17. hsiboy revised this gist Dec 2, 2014. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion BotBuster.md
    Original file line number Diff line number Diff line change
    @@ -37,7 +37,7 @@ if (window.outerWidth === 0 && window.outerHeight === 0){ //headless browser }

    | X-Bot | X-BotBitMap | Threat |
    |:-----:|:----------------:|:--------------------------------------------------------------:|
    | 1 | 0000000000000001 | Unlikely Human Traffic Source (AWS, Azure, Google Compute etc) |
    | 1 | 0000000000000001 | Unlikely Human Traffic Source (AWS, Azure, etc) |
    | 2 | 0000000000000010 | Known Evasively Tricky Source Country|
    |4 | 0000000000000100 | Browser Integrity (Not requesting furniture)|
    |8 | 0000000000001000 | User Agent Spoof (Headers dont match User-Agent String)|
  18. hsiboy revised this gist Dec 2, 2014. 1 changed file with 20 additions and 19 deletions.
    39 changes: 20 additions & 19 deletions BotBuster.md
    Original file line number Diff line number Diff line change
    @@ -34,25 +34,26 @@ if (window.outerWidth === 0 && window.outerHeight === 0){ //headless browser }

    ##Set bitmap:

    ```
    X-Bot X-BotBitMap Threat
    1 0000000000000001 Unlikely Human Traffic Source (AWS, Azure, Google Compute etc)
    2 0000000000000010 Known Evasively Tricky Source Country
    4 0000000000000100 Browser Integrity (Not requesting furniture)
    8 0000000000001000 User Agent Spoof (Headers dont match User-Agent String)
    16 0000000000010000 Unlikely Human Behaviour
    32 0000000000100000 Honeytrap Access
    64 0000000001000000 No Referrer
    128 0000000010000000 Session Length Exceeded
    256 0000000100000000 Pages Per Session Exceeded
    512 0000001000000000 Bad User Agent
    1024 0000010000000000 No Cookie
    2048 0000100000000000 Generates lots of errors (404s)
    4096 0001000000000000 No JavaScript
    8192 0010000000000000 JavaScript validation Failed
    16384 0100000000000000 Fingerprint Validation Error
    32768 1000000000000000 Known Automation (curl, wget, Selenium/Webdriver, Phantomjs)
    ```

    | X-Bot | X-BotBitMap | Threat |
    |:-----:|:----------------:|:--------------------------------------------------------------:|
    | 1 | 0000000000000001 | Unlikely Human Traffic Source (AWS, Azure, Google Compute etc) |
    | 2 | 0000000000000010 | Known Evasively Tricky Source Country|
    |4 | 0000000000000100 | Browser Integrity (Not requesting furniture)|
    |8 | 0000000000001000 | User Agent Spoof (Headers dont match User-Agent String)|
    |16 | 0000000000010000 | Unlikely Human Behaviour|
    |32 | 0000000000100000 | Honeytrap Access|
    |64 | 0000000001000000 | No Referrer|
    |128| 0000000010000000 | Session Length Exceeded|
    |256| 0000000100000000 | Pages Per Session Exceeded|
    |512| 0000001000000000 | Bad User Agent|
    |1024| 0000010000000000 | No Cookie|
    |2048 | 0000100000000000 | Generates lots of errors (404s)|
    |4096 | 0001000000000000 | No JavaScript|
    |8192 | 0010000000000000 | JavaScript validation Failed|
    |16384| 0100000000000000 | Fingerprint Validation Error|
    |32768| 1000000000000000 | Known Automation (curl, wget, Selenium/Webdriver, Phantomjs)|




  19. hsiboy revised this gist Dec 2, 2014. 1 changed file with 6 additions and 7 deletions.
    13 changes: 6 additions & 7 deletions BotBuster.md
    Original file line number Diff line number Diff line change
    @@ -16,7 +16,7 @@ Probable bots will be presented with a captcha type page. Humans can confirm the

    This will work at the top of the stack using the ZTM to "manage" the offender.

    javasctipt detection:
    ##JavaSctipt detection:
    ```
    window._phantom (or window.callPhantom or navigator.onLine=false && navigator.plugins="") //phantomjs
    window.__phantomas //PhantomJS-based web perf metrics + monitoring tool
    @@ -27,16 +27,15 @@ window.webdriver //selenium
    window.domAutomation (or window.domAutomationController) //chromium based automation driver
    if (window.outerWidth === 0 && window.outerHeight === 0){ //headless browser }
    ```
    create fingerprint, and store for ever:
    ##Create fingerprint, and store forever:

    https://github.com/samyk/evercookie
    * https://github.com/samyk/evercookie
    * https://github.com/Valve/fingerprintjs

    https://github.com/Valve/fingerprintjs

    Set bitmap:
    ##Set bitmap:

    ```
    X-Bot X-BotBitMap Threat Type
    X-Bot X-BotBitMap Threat
    1 0000000000000001 Unlikely Human Traffic Source (AWS, Azure, Google Compute etc)
    2 0000000000000010 Known Evasively Tricky Source Country
    4 0000000000000100 Browser Integrity (Not requesting furniture)
  20. hsiboy revised this gist Dec 2, 2014. 1 changed file with 17 additions and 17 deletions.
    34 changes: 17 additions & 17 deletions BotBuster.md
    Original file line number Diff line number Diff line change
    @@ -30,29 +30,29 @@ if (window.outerWidth === 0 && window.outerHeight === 0){ //headless browser }
    create fingerprint, and store for ever:

    https://github.com/samyk/evercookie

    https://github.com/Valve/fingerprintjs

    Set bitmap:

    ```
    X-Bot X-BotBitMap Threat Type
    1 00000000000000000000000000000001 Unlikely Human Traffic Source
    2 00000000000000000000000000000010 Known Evasively Tricky Source Country
    4 00000000000000000000000000000100 Browser Integrity
    8 00000000000000000000000000001000 User Agnet Spoof
    16 00000000000000000000000000010000 Rate Limit
    32 00000000000000000000000000100000 Honeytrap Access
    64 00000000000000000000000001000000 No Referrer
    128 00000000000000000000000010000000 Session Length Exceeded
    256 00000000000000000000000100000000 Pages Per Session Exceeded
    512 00000000000000000000001000000000 Bad User Agent
    1024 00000000000000000000010000000000 No Cookie
    2048 00000000000000000000100000000000 Filtered IP
    4096 00000000000000000001000000000000 No JavaScript
    8192 00000000000000000010000000000000 JavaScript validation Failed
    16384 00000000000000000100000000000000 Fingerprint Validation Error
    32768 00000000000000001000000000000000 Known Automation (curl, wget, Selenium/Webdriver, Phantomjs)
    65536 00000000000000010000000000000000 repeated Form Submission
    1 0000000000000001 Unlikely Human Traffic Source (AWS, Azure, Google Compute etc)
    2 0000000000000010 Known Evasively Tricky Source Country
    4 0000000000000100 Browser Integrity (Not requesting furniture)
    8 0000000000001000 User Agent Spoof (Headers dont match User-Agent String)
    16 0000000000010000 Unlikely Human Behaviour
    32 0000000000100000 Honeytrap Access
    64 0000000001000000 No Referrer
    128 0000000010000000 Session Length Exceeded
    256 0000000100000000 Pages Per Session Exceeded
    512 0000001000000000 Bad User Agent
    1024 0000010000000000 No Cookie
    2048 0000100000000000 Generates lots of errors (404s)
    4096 0001000000000000 No JavaScript
    8192 0010000000000000 JavaScript validation Failed
    16384 0100000000000000 Fingerprint Validation Error
    32768 1000000000000000 Known Automation (curl, wget, Selenium/Webdriver, Phantomjs)
    ```


  21. hsiboy revised this gist Dec 2, 2014. 1 changed file with 42 additions and 0 deletions.
    42 changes: 42 additions & 0 deletions BotBuster.md
    Original file line number Diff line number Diff line change
    @@ -16,5 +16,47 @@ Probable bots will be presented with a captcha type page. Humans can confirm the

    This will work at the top of the stack using the ZTM to "manage" the offender.

    javasctipt detection:
    ```
    window._phantom (or window.callPhantom or navigator.onLine=false && navigator.plugins="") //phantomjs
    window.__phantomas //PhantomJS-based web perf metrics + monitoring tool
    window.Buffer //nodejs
    window.emit //couchjs
    window.spawn //rhino
    window.webdriver //selenium
    window.domAutomation (or window.domAutomationController) //chromium based automation driver
    if (window.outerWidth === 0 && window.outerHeight === 0){ //headless browser }
    ```
    create fingerprint, and store for ever:

    https://github.com/samyk/evercookie
    https://github.com/Valve/fingerprintjs

    Set bitmap:

    ```
    X-Bot X-BotBitMap Threat Type
    1 00000000000000000000000000000001 Unlikely Human Traffic Source
    2 00000000000000000000000000000010 Known Evasively Tricky Source Country
    4 00000000000000000000000000000100 Browser Integrity
    8 00000000000000000000000000001000 User Agnet Spoof
    16 00000000000000000000000000010000 Rate Limit
    32 00000000000000000000000000100000 Honeytrap Access
    64 00000000000000000000000001000000 No Referrer
    128 00000000000000000000000010000000 Session Length Exceeded
    256 00000000000000000000000100000000 Pages Per Session Exceeded
    512 00000000000000000000001000000000 Bad User Agent
    1024 00000000000000000000010000000000 No Cookie
    2048 00000000000000000000100000000000 Filtered IP
    4096 00000000000000000001000000000000 No JavaScript
    8192 00000000000000000010000000000000 JavaScript validation Failed
    16384 00000000000000000100000000000000 Fingerprint Validation Error
    32768 00000000000000001000000000000000 Known Automation (curl, wget, Selenium/Webdriver, Phantomjs)
    65536 00000000000000010000000000000000 repeated Form Submission
    ```






  22. hsiboy created this gist May 12, 2014.
    20 changes: 20 additions & 0 deletions BotBuster.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,20 @@
    #Bot-Buster™

    Tracks nefarious activity on website, and manages accordingly.

    The requesting entity:
    * declares its user-agent as being wget, curl, webcopier etc - it's probably a bot.
    * requests details -> details -> details -> details ad nauseum - it's probably a bot.
    * requests the html, but not .css, .js or site furniture - it's probably a bot.
    * generates a large number of HTTP error codes > 400 (1.e 401, 403, 404 & 500)- it's probably a bot.
    * originates from an unlikely human traffic source (i.e Amazon AWS) - it's probably a bot.
    * no user-agent (or matching a pattern of known bad ones) - it's probably a bot.
    * no cookie, and wont honor a set cookie - it's probably a bot.
    * no referrer, ever - it's probably a bot.

    Probable bots will be presented with a captcha type page. Humans can confirm their cognisance, bots will be trapped.

    This will work at the top of the stack using the ZTM to "manage" the offender.