Skip to content

Instantly share code, notes, and snippets.

@tomron
Last active August 15, 2021 06:57
Show Gist options
  • Select an option

  • Save tomron/fa56ae15723b862d2d93a49b74c831a9 to your computer and use it in GitHub Desktop.

Select an option

Save tomron/fa56ae15723b862d2d93a49b74c831a9 to your computer and use it in GitHub Desktop.

Revisions

  1. tomron revised this gist Aug 15, 2021. 1 changed file with 1 addition and 4 deletions.
    5 changes: 1 addition & 4 deletions missing_values_read_csv.py
    Original file line number Diff line number Diff line change
    @@ -2,10 +2,7 @@
    import numpy as np
    import time

    url = """
    http://archive.ics.uci.edu/ml/machine-learning-databases/
    mammographic-masses/mammographic_masses.data
    """
    url = "http://archive.ics.uci.edu/ml/machine-learning-databases/mammographic-masses/mammographic_masses.data"
    names = ['BI-RADS', 'Age', 'Shape', 'Margin', 'Density', 'Severity']


  2. tomron created this gist Aug 15, 2021.
    42 changes: 42 additions & 0 deletions missing_values_read_csv.py
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,42 @@
    import pandas as pd
    import numpy as np
    import time

    url = """
    http://archive.ics.uci.edu/ml/machine-learning-databases/
    mammographic-masses/mammographic_masses.data
    """
    names = ['BI-RADS', 'Age', 'Shape', 'Margin', 'Density', 'Severity']


    def manual_convert():
    df = pd.read_csv(url, names=names)
    df = df.replace('?', np.NAN)
    df.loc[:, names[:-1]] = df.loc[:, names[:-1]].apply(pd.to_numeric)


    def use_na_values():
    df = pd.read_csv(url, names=names, na_values=["?"])


    def use_converters():
    df = pd.read_csv(
    url,
    names=names,
    converters={"BI-RADS": lambda x: x if x != "?" else np.NAN}
    )


    def repeat(func, n=10):
    times = []
    for _ in range(n):
    start = time.time()
    func()
    end = time.time()
    times.append(end-start)
    return sum(times)/len(times)

    n = 100
    print("manual_convert", repeat(manual_convert, n))
    print("use_na_values", repeat(use_na_values, n))
    print("use_converters", repeat(use_converters, n))