Skip to content

Instantly share code, notes, and snippets.

@mapcloud
Forked from aculich/remove-empty-columns.csv
Created July 28, 2017 13:30
Show Gist options
  • Select an option

  • Save mapcloud/c8aeaba8f6733491d46cfc2c9e263a40 to your computer and use it in GitHub Desktop.

Select an option

Save mapcloud/c8aeaba8f6733491d46cfc2c9e263a40 to your computer and use it in GitHub Desktop.

Revisions

  1. Aaron Culich revised this gist May 8, 2014. 2 changed files with 2 additions and 2 deletions.
    2 changes: 1 addition & 1 deletion remove-empty-columns.ipynb
    Original file line number Diff line number Diff line change
    @@ -115,7 +115,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
    "Using the **`dropna()`** function with the **columns** axis we can drop any column where **all** the entries are **NaN** (missing values)."
    "Using the [pandas.DataFrame.dropna()](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html) function with the **columns** axis we can drop any column where **all** the entries are **NaN** (missing values)."
    ]
    },
    {
    2 changes: 1 addition & 1 deletion remove-empty-columns.py
    Original file line number Diff line number Diff line change
    @@ -24,7 +24,7 @@

    # <markdowncell>

    # Using the **`dropna()`** function with the **columns** axis we can drop any column where **all** the entries are **NaN** (missing values).
    # Using the [pandas.DataFrame.dropna()](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html) function with the **columns** axis we can drop any column where **all** the entries are **NaN** (missing values).

    # <codecell>

  2. Aaron Culich revised this gist May 8, 2014. 3 changed files with 233 additions and 1 deletion.
    6 changes: 6 additions & 0 deletions remove-empty-columns.csv
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,6 @@
    foo,bar,baz
    a,,1
    b,,2
    c,,
    ,,4
    e,,5
    194 changes: 194 additions & 0 deletions remove-empty-columns.ipynb
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,194 @@
    {
    "metadata": {
    "name": ""
    },
    "nbformat": 3,
    "nbformat_minor": 0,
    "worksheets": [
    {
    "cells": [
    {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
    "To drop all empty columns (but still keeping the headers) using the Python [Pandas library](http://pandas.pydata.org/) we can use the following 4-line script to read in the csv file, drop the columns where **all** the elements are missing, and save the data to a new csv file."
    ]
    },
    {
    "cell_type": "code",
    "collapsed": false,
    "input": [
    "from pandas.io.parsers import read_csv\n",
    "data = read_csv('remove-empty-columns.csv')\n",
    "filtered_data = data.dropna(axis='columns', how='all')\n",
    "filtered_data.to_csv('empty-columns-removed.csv')"
    ],
    "language": "python",
    "metadata": {},
    "outputs": [],
    "prompt_number": 1
    },
    {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
    "As shown below, the sample data included in the csv file has 3 columns which contain missing values.\n",
    "\n",
    "The second column, labeled **bar**, is completely empty except the header; columns like this should be dropped. The other columns contain data, but should not be dropped even though they contain some missing values."
    ]
    },
    {
    "cell_type": "code",
    "collapsed": false,
    "input": [
    "data"
    ],
    "language": "python",
    "metadata": {},
    "outputs": [
    {
    "html": [
    "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
    "<table border=\"1\" class=\"dataframe\">\n",
    " <thead>\n",
    " <tr style=\"text-align: right;\">\n",
    " <th></th>\n",
    " <th>foo</th>\n",
    " <th>bar</th>\n",
    " <th>baz</th>\n",
    " </tr>\n",
    " </thead>\n",
    " <tbody>\n",
    " <tr>\n",
    " <th>0</th>\n",
    " <td> a</td>\n",
    " <td>NaN</td>\n",
    " <td> 1</td>\n",
    " </tr>\n",
    " <tr>\n",
    " <th>1</th>\n",
    " <td> b</td>\n",
    " <td>NaN</td>\n",
    " <td> 2</td>\n",
    " </tr>\n",
    " <tr>\n",
    " <th>2</th>\n",
    " <td> c</td>\n",
    " <td>NaN</td>\n",
    " <td>NaN</td>\n",
    " </tr>\n",
    " <tr>\n",
    " <th>3</th>\n",
    " <td> NaN</td>\n",
    " <td>NaN</td>\n",
    " <td> 4</td>\n",
    " </tr>\n",
    " <tr>\n",
    " <th>4</th>\n",
    " <td> e</td>\n",
    " <td>NaN</td>\n",
    " <td> 5</td>\n",
    " </tr>\n",
    " </tbody>\n",
    "</table>\n",
    "<p>5 rows \u00d7 3 columns</p>\n",
    "</div>"
    ],
    "metadata": {},
    "output_type": "pyout",
    "prompt_number": 2,
    "text": [
    " foo bar baz\n",
    "0 a NaN 1\n",
    "1 b NaN 2\n",
    "2 c NaN NaN\n",
    "3 NaN NaN 4\n",
    "4 e NaN 5\n",
    "\n",
    "[5 rows x 3 columns]"
    ]
    }
    ],
    "prompt_number": 2
    },
    {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
    "Using the **`dropna()`** function with the **columns** axis we can drop any column where **all** the entries are **NaN** (missing values)."
    ]
    },
    {
    "cell_type": "code",
    "collapsed": false,
    "input": [
    "filtered_data = data.dropna(axis='columns', how='all')\n",
    "filtered_data"
    ],
    "language": "python",
    "metadata": {},
    "outputs": [
    {
    "html": [
    "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
    "<table border=\"1\" class=\"dataframe\">\n",
    " <thead>\n",
    " <tr style=\"text-align: right;\">\n",
    " <th></th>\n",
    " <th>foo</th>\n",
    " <th>baz</th>\n",
    " </tr>\n",
    " </thead>\n",
    " <tbody>\n",
    " <tr>\n",
    " <th>0</th>\n",
    " <td> a</td>\n",
    " <td> 1</td>\n",
    " </tr>\n",
    " <tr>\n",
    " <th>1</th>\n",
    " <td> b</td>\n",
    " <td> 2</td>\n",
    " </tr>\n",
    " <tr>\n",
    " <th>2</th>\n",
    " <td> c</td>\n",
    " <td>NaN</td>\n",
    " </tr>\n",
    " <tr>\n",
    " <th>3</th>\n",
    " <td> NaN</td>\n",
    " <td> 4</td>\n",
    " </tr>\n",
    " <tr>\n",
    " <th>4</th>\n",
    " <td> e</td>\n",
    " <td> 5</td>\n",
    " </tr>\n",
    " </tbody>\n",
    "</table>\n",
    "<p>5 rows \u00d7 2 columns</p>\n",
    "</div>"
    ],
    "metadata": {},
    "output_type": "pyout",
    "prompt_number": 3,
    "text": [
    " foo baz\n",
    "0 a 1\n",
    "1 b 2\n",
    "2 c NaN\n",
    "3 NaN 4\n",
    "4 e 5\n",
    "\n",
    "[5 rows x 2 columns]"
    ]
    }
    ],
    "prompt_number": 3
    }
    ],
    "metadata": {}
    }
    ]
    }
    34 changes: 33 additions & 1 deletion remove-empty-columns.py
    Original file line number Diff line number Diff line change
    @@ -1 +1,33 @@
    #
    # -*- coding: utf-8 -*-
    # <nbformat>3.0</nbformat>

    # <markdowncell>

    # To drop all empty columns (but still keeping the headers) using the Python [Pandas library](http://pandas.pydata.org/) we can use the following 4-line script to read in the csv file, drop the columns where **all** the elements are missing, and save the data to a new csv file.

    # <codecell>

    from pandas.io.parsers import read_csv
    data = read_csv('remove-empty-columns.csv')
    filtered_data = data.dropna(axis='columns', how='all')
    filtered_data.to_csv('empty-columns-removed.csv')

    # <markdowncell>

    # As shown below, the sample data included in the csv file has 3 columns which contain missing values.
    #
    # The second column, labeled **bar**, is completely empty except the header; columns like this should be dropped. The other columns contain data, but should not be dropped even though they contain some missing values.

    # <codecell>

    data

    # <markdowncell>

    # Using the **`dropna()`** function with the **columns** axis we can drop any column where **all** the entries are **NaN** (missing values).

    # <codecell>

    filtered_data = data.dropna(axis='columns', how='all')
    filtered_data

  3. @aculich aculich created this gist May 8, 2014.
    1 change: 1 addition & 0 deletions remove-empty-columns.py
    Original file line number Diff line number Diff line change
    @@ -0,0 +1 @@
    #