Skip to content

Instantly share code, notes, and snippets.

@dannguyen
Last active August 3, 2018 20:51
Show Gist options
  • Save dannguyen/d83f27a93b2e6f80edda22cfa0f0a1d6 to your computer and use it in GitHub Desktop.
Save dannguyen/d83f27a93b2e6f80edda22cfa0f0a1d6 to your computer and use it in GitHub Desktop.
example of using Python's xlrd library to do a batch process of opening a Excel workbook, extract the content from the first sheet as txt, and then save as CSV
"""
Same as the other script, but attempts to preserve the same folder
structure in `csv` as it exists in `xlsx`
"""
from csv import writer
from glob import glob
from os.path import basename, dirname, join
from os import makedirs
from xlrd import open_workbook
import re
CSV_DIRNAME = 'csv'
XLS_DIRNAME = 'xlsx'
for xfname in glob(join(XLS_DIRNAME, '*', '*.xlsx')):
cdir = re.sub(r'^' + XLS_DIRNAME, CSV_DIRNAME, dirname(xfname))
makedirs(cdir, exist_ok=True)
cname = join(cdir, basename(xfname) + '.csv')
print("writing to", cname)
cf = open(cname, 'w')
cv = writer(cf)
book = open_workbook(xfname)
sheet = book.sheets()[0]
for n in range(sheet.nrows):
rowvals = sheet.row_values(n)
cv.writerow(rowvals)
cf.close()
"""
This code processes each xlsx file in the xlsx/* path,
then writes the data (a list of values) into a single CSV folder (i.e. flat structure)
"""
from csv import writer
from glob import glob
from os.path import basename, join
from os import makedirs
from xlrd import open_workbook
import re
CSV_DIRNAME = 'csv'
makedirs(CSV_DIRNAME, exist_ok=True)
for xfname in glob(join('xlsx', '*', '*.xlsx')):
cname = join(CSV_DIRNAME, basename(xfname) + '.csv')
print("writing to", cname)
cf = open(cname, 'w')
cv = writer(cf)
book = open_workbook(xfname)
sheet = book.sheets()[0]
for n in range(sheet.nrows):
rowvals = sheet.row_values(n)
cv.writerow(rowvals)
cf.close()
@elaOnMars
Copy link

Hi, I recently tested it with these Gov open data files: https://open.nrw/de/dataset/bundesverkehrswegezahlung-ac

I created the folder xlsx and copied the xlsx into this subfolder.
The libraries are imported.

But only the csv folder was created.
Have you any idea which error I have made?

Best,
Daniela

@dannguyen
Copy link
Author

Sorry, only the CSV folder was created? Were all the files created too? Actually I had forgotten that I designed this script to read all the XLSX files in a given tree of directories, but save them to a single CSV folder. I can change that.

@dannguyen
Copy link
Author

@elaOnMars
Copy link

Thank you very much. I think I've forgotten a line before. Everything works well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment