Skip to content

Instantly share code, notes, and snippets.

@macloo
Created February 22, 2024 18:54
Show Gist options
  • Save macloo/c94e3f8f98c6abe3393699cd36ff6d7c to your computer and use it in GitHub Desktop.
Save macloo/c94e3f8f98c6abe3393699cd36ff6d7c to your computer and use it in GitHub Desktop.

Revisions

  1. macloo created this gist Feb 22, 2024.
    13 changes: 13 additions & 0 deletions scraper_boilerplate.py
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,13 @@
    from bs4 import BeautifulSoup
    import requests
    hdr = {'User-Agent': 'your user-agent info here'}
    # find YOUR user-agent HERE: https://www.whatismybrowser.com/detect/what-is-my-user-agent/

    url = 'https://www.some_domain.com/some_dir'
    page = requests.get(url, headers=hdr)
    soup = BeautifulSoup(page.text, 'html.parser')

    '''
    If you have a list of URLs to scrape, you need to loop over the list, and
    make page and soup each time the loop runs.
    '''