- 
            
      
        
      
    Star
      
          
          (137)
      
  
You must be signed in to star a gist  - 
              
      
        
      
    Fork
      
          
          (64)
      
  
You must be signed in to fork a gist  
- 
      
 - 
        
Save jrivero/1085501 to your computer and use it in GitHub Desktop.  
| import os | |
| def split(filehandler, delimiter=',', row_limit=10000, | |
| output_name_template='output_%s.csv', output_path='.', keep_headers=True): | |
| """ | |
| Splits a CSV file into multiple pieces. | |
| A quick bastardization of the Python CSV library. | |
| Arguments: | |
| `row_limit`: The number of rows you want in each output file. 10,000 by default. | |
| `output_name_template`: A %s-style template for the numbered output files. | |
| `output_path`: Where to stick the output files. | |
| `keep_headers`: Whether or not to print the headers in each output file. | |
| Example usage: | |
| >> from toolbox import csv_splitter; | |
| >> csv_splitter.split(open('/home/ben/input.csv', 'r')); | |
| """ | |
| import csv | |
| reader = csv.reader(filehandler, delimiter=delimiter) | |
| current_piece = 1 | |
| current_out_path = os.path.join( | |
| output_path, | |
| output_name_template % current_piece | |
| ) | |
| current_out_writer = csv.writer(open(current_out_path, 'w'), delimiter=delimiter) | |
| current_limit = row_limit | |
| if keep_headers: | |
| headers = reader.next() | |
| current_out_writer.writerow(headers) | |
| for i, row in enumerate(reader): | |
| if i + 1 > current_limit: | |
| current_piece += 1 | |
| current_limit = row_limit * current_piece | |
| current_out_path = os.path.join( | |
| output_path, | |
| output_name_template % current_piece | |
| ) | |
| current_out_writer = csv.writer(open(current_out_path, 'w'), delimiter=delimiter) | |
| if keep_headers: | |
| current_out_writer.writerow(headers) | |
| current_out_writer.writerow(row) | 
Adding a working solution, python3 and with encoding.
`import os
def split(filehandler, delimiter=',', row_limit=8500,
output_name_template='output_%s.csv', output_path='.', keep_headers=True):
"""
Splits a CSV file into multiple pieces.
A quick bastardization of the Python CSV library.
Arguments:
    `row_limit`: The number of rows you want in each output file. 10,000 by default.
    `output_name_template`: A %s-style template for the numbered output files.
    `output_path`: Where to stick the output files.
    `keep_headers`: Whether or not to print the headers in each output file.
Example usage:
    >> from toolbox import csv_splitter;
    >> csv_splitter.split(open('/home/ben/input.csv', 'r'));
"""
import csv
reader = csv.reader(filehandler, delimiter=delimiter)
current_piece = 1
current_out_path = os.path.join(
    output_path,
    output_name_template % current_piece
)
current_out_writer = csv.writer(open(current_out_path, 'w',encoding='utf-8'), delimiter=delimiter)
current_limit = row_limit
if keep_headers:
    headers = next(reader)
    current_out_writer.writerow(headers)
for i, row in enumerate(reader):
    if i + 1 > current_limit:
        current_piece += 1
        current_limit = row_limit * current_piece
        current_out_path = os.path.join(
            output_path,
            output_name_template % current_piece
        )
        current_out_writer = csv.writer(open(current_out_path, 'w',encoding='utf-8'), delimiter=delimiter)
        if keep_headers:
            current_out_writer.writerow(headers)
    current_out_writer.writerow(row)
split(open('test.csv','r',encoding='utf-8'))`
Thanks for the starting point! Was able to make a single file CSV split by column program after looking at your code. https://github.com/APAHRoot/HelpfulHopeful
Hopefully it works for other people too!
This is amazing and is exactly what I was looking for. Thank you so much.
Thanks for the starting point! Was able to make a single file CSV split by column program after looking at your code. https://github.com/APAHRoot/HelpfulHopeful
Hopefully it works for other people too!This is amazing and is exactly what I was looking for. Thank you so much.
@alternateaccounts You can test this tool with your 4.5gb file?
https://github.com/BurntSushi/xsv
Thank you by the mention in your project
I am sorry about this, I am new to the field, but where I should specify the file I want to split into smaller files? at which part of the code?
add newline=‘’  in open() to avoid blank row
encoding='utf-8', add it as parameter
open(current_out_path, 'w',encoding='utf-8') for example. @hoai97nam