Skip to content

Instantly share code, notes, and snippets.

@lshapz
Last active March 12, 2018 17:29
Show Gist options
  • Save lshapz/1ab5d5a954de7246fae61477bb0aef43 to your computer and use it in GitHub Desktop.
Save lshapz/1ab5d5a954de7246fae61477bb0aef43 to your computer and use it in GitHub Desktop.
workflow for revising the header row of an enormous CSV file in preparation for JSON conversion
Problem:
  • 100000 line CSV file.
  • header row includes unnecessary type definitions in it (i.e. "header1:String,header2:Decimal,header3:INT32")
  • CSV is too large to open in Numbers/Google Sheets for simple find/replace
Solution:
  • get only the header row
head -n 1 orig.csv > headers.csv
  • open headers.csv, use find and replace to delte all the ":String" cruft

  • copy the revised row into a text editor, put quotation marks around all the strings, copy again

  • get everything but the header row

tail -n +2 orig.csv > headless.csv
  • insert the new header row into the csv file
( echo "headers","as","comma","delimited","strings" ; cat headless.csv ) > final.csv && rm headless.csv
  • convert csv file into a json file
csvtojson final.csv > converted.json
Tools:
  • bash (MacOS command line/terminal)

  • a spreadsheet app (MacOS Numbers.app)

  • csvtojson npm package

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment