Skip to content

Instantly share code, notes, and snippets.

@felipecaon
Last active November 23, 2023 16:53
Show Gist options
  • Save felipecaon/7e77a93b1f4deb701bbe61fee2ebefa8 to your computer and use it in GitHub Desktop.
Save felipecaon/7e77a93b1f4deb701bbe61fee2ebefa8 to your computer and use it in GitHub Desktop.

Revisions

  1. felipecaon revised this gist Nov 23, 2023. 2 changed files with 37 additions and 22 deletions.
    37 changes: 37 additions & 0 deletions alienvaultscraper.py
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,37 @@
    import argparse
    import requests

    # Define the API endpoint and parameters

    base_url = "https://otx.alienvault.com/api/v1/indicators/domain/{domain}/url_list?limit=500&page={page}"
    current_domain = ""

    def make_request(domain, page):
    while True:
    current_domain = domain

    formatted_url = base_url.format(domain=current_domain, page=page)
    data = requests.get(formatted_url).json()

    has_next = data['has_next']

    for url_info in data["url_list"]:
    print(url_info["url"])

    if not has_next:
    break # Exit the loop when has_next is False

    page = page + 1


    if __name__ == "__main__":
    # Initialize argument parser
    parser = argparse.ArgumentParser(description="Fetch URLs associated with a domain from AlienVault OTX")

    # Add domain argument
    parser.add_argument("domain", type=str, help="The domain for which to fetch URLs (e.g., qoo10.jp)")

    # Parse command-line arguments
    args = parser.parse_args()

    make_request(domain=args.domain, page=1)
    22 changes: 0 additions & 22 deletions gistfile1.txt
    Original file line number Diff line number Diff line change
    @@ -1,22 +0,0 @@
    # get programs zips without boring ones
    curl https://chaos-data.projectdiscovery.io/index.json | jq '.[] | select((.platform=="hackerone") or (.platform=="bugcrowd") or (.platform=="intigriti")).URL' | sed 's/"//g' | grep -v "telenet\|zendesk\|tumblr\|shopify\|alibaba\|cisco\|wp_engine\|vk\.com\|hubspot\|mail\.ru" >> chaos

    # download zips in parallel
    cat chaos | parallel -j 5 wget

    # unzip in parallel
    parallel unzip ::: *zip

    # remove zips and concatenate txt
    rm *zip ; cat *.txt >> subdomains

    # remove txt files
    rm *.txt

    # download resolvers
    wget https://raw.githubusercontent.com/felipecaon/resolvers/main/resolvers.txt -O resolvers

    # active validation
    puredns resolve subdomains -r resolvers --write resolved_dns_domain


  2. felipecaon revised this gist Feb 10, 2022. 1 changed file with 5 additions and 0 deletions.
    5 changes: 5 additions & 0 deletions gistfile1.txt
    Original file line number Diff line number Diff line change
    @@ -13,5 +13,10 @@ rm *zip ; cat *.txt >> subdomains
    # remove txt files
    rm *.txt

    # download resolvers
    wget https://raw.githubusercontent.com/felipecaon/resolvers/main/resolvers.txt -O resolvers

    # active validation
    puredns resolve subdomains -r resolvers --write resolved_dns_domain


  3. felipecaon revised this gist Feb 10, 2022. 1 changed file with 16 additions and 0 deletions.
    16 changes: 16 additions & 0 deletions gistfile1.txt
    Original file line number Diff line number Diff line change
    @@ -1 +1,17 @@
    # get programs zips without boring ones
    curl https://chaos-data.projectdiscovery.io/index.json | jq '.[] | select((.platform=="hackerone") or (.platform=="bugcrowd") or (.platform=="intigriti")).URL' | sed 's/"//g' | grep -v "telenet\|zendesk\|tumblr\|shopify\|alibaba\|cisco\|wp_engine\|vk\.com\|hubspot\|mail\.ru" >> chaos

    # download zips in parallel
    cat chaos | parallel -j 5 wget

    # unzip in parallel
    parallel unzip ::: *zip

    # remove zips and concatenate txt
    rm *zip ; cat *.txt >> subdomains

    # remove txt files
    rm *.txt



  4. felipecaon revised this gist Feb 10, 2022. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion gistfile1.txt
    Original file line number Diff line number Diff line change
    @@ -1 +1 @@
    curl https://chaos-data.projectdiscovery.io/index.json | jq '.[] | select((.platform=="hackerone") or (.platform=="bugcrowd") or (.platform=="intigriti")).URL' | sed 's/"//g'
    curl https://chaos-data.projectdiscovery.io/index.json | jq '.[] | select((.platform=="hackerone") or (.platform=="bugcrowd") or (.platform=="intigriti")).URL' | sed 's/"//g' | grep -v "telenet\|zendesk\|tumblr\|shopify\|alibaba\|cisco\|wp_engine\|vk\.com\|hubspot\|mail\.ru" >> chaos
  5. felipecaon revised this gist Feb 10, 2022. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion gistfile1.txt
    Original file line number Diff line number Diff line change
    @@ -1 +1 @@
    curl https://chaos-data.projectdiscovery.io/index.json | jq '.[] | select((.platform=="hackerone") or (.platform=="bugcrowd") or (.platform=="intigriti")).URL' | sed 's/"//g'
    curl https://chaos-data.projectdiscovery.io/index.json | jq '.[] | select((.platform=="hackerone") or (.platform=="bugcrowd") or (.platform=="intigriti")).URL' | sed 's/"//g'
  6. felipecaon created this gist Feb 10, 2022.
    1 change: 1 addition & 0 deletions gistfile1.txt
    Original file line number Diff line number Diff line change
    @@ -0,0 +1 @@
    curl https://chaos-data.projectdiscovery.io/index.json | jq '.[] | select((.platform=="hackerone") or (.platform=="bugcrowd") or (.platform=="intigriti")).URL' | sed 's/"//g'