Skip to content

Instantly share code, notes, and snippets.

@naqushab
Forked from jlln/separator.py
Created May 6, 2020 11:37
Show Gist options
  • Select an option

  • Save naqushab/cbaef9334d3a78dfb773f2e24e9b1c3b to your computer and use it in GitHub Desktop.

Select an option

Save naqushab/cbaef9334d3a78dfb773f2e24e9b1c3b to your computer and use it in GitHub Desktop.

Revisions

  1. @jlln jlln revised this gist Dec 7, 2015. No changes.
  2. @jlln jlln revised this gist May 23, 2015. 1 changed file with 18 additions and 16 deletions.
    34 changes: 18 additions & 16 deletions separator.py
    Original file line number Diff line number Diff line change
    @@ -1,16 +1,18 @@
    def separator(df,col,sep):
    untouched_columns = [c for c in df.columns if c is not col]
    divided_column = map(lambda x: x.split(sep),df[col].values)
    divided_column = [item for sublist in divided_column for item in sublist]
    repeats = [len(x.split(",")) for x in df[col]]
    repeated_rows=[divided_column]
    for c in untouched_columns:
    working = zip(df[c].values,repeats)
    working_accum=[]
    for v,r in working:
    working_accum.append([v]*r)
    working_accum = [item for sublist in working_accum for item in sublist]
    repeated_rows.append(working_accum)
    new_names = [col]+untouched_columns
    series = [pandas.Series(data=d,name=n) for d,n in zip(repeated_rows,new_names)]
    return pandas.DataFrame(series).transpose()
    def splitDataFrameList(df,target_column,separator):
    ''' df = dataframe to split,
    target_column = the column containing the values to split
    separator = the symbol used to perform the split
    returns: a dataframe with each entry for the target column separated, with each element moved into a new row.
    The values in the other columns are duplicated across the newly divided rows.
    '''
    def splitListToRows(row,row_accumulator,target_column,separator):
    split_row = row[target_column].split(separator)
    for s in split_row:
    new_row = row.to_dict()
    new_row[target_column] = s
    row_accumulator.append(new_row)
    new_rows = []
    df.apply(splitListToRows,axis=1,args = (new_rows,target_column,separator))
    new_df = pandas.DataFrame(new_rows)
    return new_df
  3. @jlln jlln created this gist Apr 18, 2015.
    16 changes: 16 additions & 0 deletions separator.py
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,16 @@
    def separator(df,col,sep):
    untouched_columns = [c for c in df.columns if c is not col]
    divided_column = map(lambda x: x.split(sep),df[col].values)
    divided_column = [item for sublist in divided_column for item in sublist]
    repeats = [len(x.split(",")) for x in df[col]]
    repeated_rows=[divided_column]
    for c in untouched_columns:
    working = zip(df[c].values,repeats)
    working_accum=[]
    for v,r in working:
    working_accum.append([v]*r)
    working_accum = [item for sublist in working_accum for item in sublist]
    repeated_rows.append(working_accum)
    new_names = [col]+untouched_columns
    series = [pandas.Series(data=d,name=n) for d,n in zip(repeated_rows,new_names)]
    return pandas.DataFrame(series).transpose()