-
-
Save OrigamiEngineer/57c6b2f465eb79c2fa5d99f3d0d67238 to your computer and use it in GitHub Desktop.
Revisions
-
AO8 revised this gist
Jun 21, 2018 . 1 changed file with 1 addition and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,5 +1,4 @@ # Adapted from example in "Web Scraping with Python, 2nd Edition" by Ran Mitchell. import csv from urllib.request import urlopen -
AO8 created this gist
Jun 19, 2018 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,20 @@ # Extra indent on page 88 leads to incorrect printout of rows, fixed below # Use of try / finally on page 88 not as Pythonic as a "with" context manager, suggested below import csv from urllib.request import urlopen from bs4 import BeautifulSoup html = urlopen("http://en.wikipedia.org/wiki/" "Comparison_of_text_editors") soup = BeautifulSoup(html, "html.parser") table = soup.findAll("table", {"class":"wikitable"})[0] rows = table.findAll("tr") with open("editors.csv", "wt+", newline="") as f: writer = csv.writer(f) for row in rows: csv_row = [] for cell in row.findAll(["td", "th"]): csv_row.append(cell.get_text()) writer.writerow(csv_row)