Last active
August 3, 2018 20:51
-
-
Save dannguyen/d83f27a93b2e6f80edda22cfa0f0a1d6 to your computer and use it in GitHub Desktop.
example of using Python's xlrd library to do a batch process of opening a Excel workbook, extract the content from the first sheet as txt, and then save as CSV
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """ | |
| Same as the other script, but attempts to preserve the same folder | |
| structure in `csv` as it exists in `xlsx` | |
| """ | |
| from csv import writer | |
| from glob import glob | |
| from os.path import basename, dirname, join | |
| from os import makedirs | |
| from xlrd import open_workbook | |
| import re | |
| CSV_DIRNAME = 'csv' | |
| XLS_DIRNAME = 'xlsx' | |
| for xfname in glob(join(XLS_DIRNAME, '*', '*.xlsx')): | |
| cdir = re.sub(r'^' + XLS_DIRNAME, CSV_DIRNAME, dirname(xfname)) | |
| makedirs(cdir, exist_ok=True) | |
| cname = join(cdir, basename(xfname) + '.csv') | |
| print("writing to", cname) | |
| cf = open(cname, 'w') | |
| cv = writer(cf) | |
| book = open_workbook(xfname) | |
| sheet = book.sheets()[0] | |
| for n in range(sheet.nrows): | |
| rowvals = sheet.row_values(n) | |
| cv.writerow(rowvals) | |
| cf.close() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """ | |
| This code processes each xlsx file in the xlsx/* path, | |
| then writes the data (a list of values) into a single CSV folder (i.e. flat structure) | |
| """ | |
| from csv import writer | |
| from glob import glob | |
| from os.path import basename, join | |
| from os import makedirs | |
| from xlrd import open_workbook | |
| import re | |
| CSV_DIRNAME = 'csv' | |
| makedirs(CSV_DIRNAME, exist_ok=True) | |
| for xfname in glob(join('xlsx', '*', '*.xlsx')): | |
| cname = join(CSV_DIRNAME, basename(xfname) + '.csv') | |
| print("writing to", cname) | |
| cf = open(cname, 'w') | |
| cv = writer(cf) | |
| book = open_workbook(xfname) | |
| sheet = book.sheets()[0] | |
| for n in range(sheet.nrows): | |
| rowvals = sheet.row_values(n) | |
| cv.writerow(rowvals) | |
| cf.close() |
Sorry, only the CSV folder was created? Were all the files created too? Actually I had forgotten that I designed this script to read all the XLSX files in a given tree of directories, but save them to a single CSV folder. I can change that.
Thank you very much. I think I've forgotten a line before. Everything works well!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi, I recently tested it with these Gov open data files: https://open.nrw/de/dataset/bundesverkehrswegezahlung-ac
I created the folder xlsx and copied the xlsx into this subfolder.
The libraries are imported.
But only the csv folder was created.
Have you any idea which error I have made?
Best,
Daniela