Created
May 5, 2016 12:22
-
-
Save TomAugspurger/2457b5debf072ac0318d3d14e63145d5 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "My intented reader is someone who has experience with pandas. This is not an introduction to pandas. There are many great resources for that. In particular, I recommend [Greg Reda](https://twitter.com/gjreda)'s [3-part introduction](http://gregreda.com/2013/10/26/intro-to-pandas-data-structures/), especially if you're familiar with SQL. Of course, there's the pandas [documentation](http://pandas.pydata.org/) itself. I gave [a talk](https://www.youtube.com/watch?v=otCriSKVV_8) at PyData Seattle targeted as an introduction. Wes McKinney's [Python for Data Analysis](http://shop.oreilly.com/product/0636920023784.do) is still the goto book. Jake VanderPlas's [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do), in early release, is great too.\n", | |
| "\n", | |
| "With all those resources (and many more that I've slighted by omitting), why write another? Surely the law of dimenishing returns is kicking in by now.\n", | |
| "Still, I thought there was room for a guide that is update to date (as of March 2016) and emphasizes idiomatic pandas code (code that is *pandorable*).\n", | |
| "\n", | |
| "We'll be working with [flight delay data](http://www.transtats.bts.gov/databases.asp?Mode_ID=1&Mode_Desc=Aviation&Subject_ID2=0) from the BTS (R users can install Hadley's [NYCFlights](https://github.com/hadley/nycflights13) dataset for similar data).\n", | |
| "\n", | |
| "Protip: In the chrome dev-tools go to the Network tab, right-click the request and\n", | |
| "\"Copy as cURL\". Paste that into http://curl.trillworks.com/ to get the request code." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 1, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "import zipfile\n", | |
| "\n", | |
| "import requests\n", | |
| "import numpy as np\n", | |
| "import pandas as pd\n", | |
| "import seaborn as sns\n", | |
| "import matplotlib.pyplot as plt" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "headers = {\n", | |
| " 'Pragma': 'no-cache',\n", | |
| " 'Origin': 'http://www.transtats.bts.gov',\n", | |
| " 'Accept-Encoding': 'gzip, deflate',\n", | |
| " 'Accept-Language': 'en-US,en;q=0.8',\n", | |
| " 'Upgrade-Insecure-Requests': '1',\n", | |
| " 'User-Agent': ('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) '\n", | |
| " 'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36'),\n", | |
| " 'Content-Type': 'application/x-www-form-urlencoded',\n", | |
| " 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',\n", | |
| " 'Cache-Control': 'no-cache',\n", | |
| " 'Referer': 'http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time',\n", | |
| " 'Connection': 'keep-alive',\n", | |
| " 'DNT': '1',\n", | |
| "}\n", | |
| "# Sorry about the long url\n", | |
| "data = 'UserTableName=On_Time_Performance&DBShortName=On_Time&RawDataTable=T_ONTIME&sqlstr=+SELECT+FL_DATE%2CUNIQUE_CARRIER%2CAIRLINE_ID%2CTAIL_NUM%2CFL_NUM%2CORIGIN_AIRPORT_ID%2CORIGIN_AIRPORT_SEQ_ID%2CORIGIN_CITY_MARKET_ID%2CORIGIN%2CORIGIN_CITY_NAME%2CORIGIN_STATE_NM%2CDEST_AIRPORT_ID%2CDEST_AIRPORT_SEQ_ID%2CDEST_CITY_MARKET_ID%2CDEST%2CDEST_CITY_NAME%2CDEST_STATE_NM%2CCRS_DEP_TIME%2CDEP_TIME%2CDEP_DELAY%2CTAXI_OUT%2CWHEELS_OFF%2CWHEELS_ON%2CTAXI_IN%2CCRS_ARR_TIME%2CARR_TIME%2CARR_DELAY%2CCANCELLED%2CCANCELLATION_CODE%2CDIVERTED%2CDISTANCE%2CCARRIER_DELAY%2CWEATHER_DELAY%2CNAS_DELAY%2CSECURITY_DELAY%2CLATE_AIRCRAFT_DELAY+FROM++T_ONTIME+WHERE+Month+%3D1+AND+YEAR%3D2014&varlist=FL_DATE%2CUNIQUE_CARRIER%2CAIRLINE_ID%2CTAIL_NUM%2CFL_NUM%2CORIGIN_AIRPORT_ID%2CORIGIN_AIRPORT_SEQ_ID%2CORIGIN_CITY_MARKET_ID%2CORIGIN%2CORIGIN_CITY_NAME%2CORIGIN_STATE_NM%2CDEST_AIRPORT_ID%2CDEST_AIRPORT_SEQ_ID%2CDEST_CITY_MARKET_ID%2CDEST%2CDEST_CITY_NAME%2CDEST_STATE_NM%2CCRS_DEP_TIME%2CDEP_TIME%2CDEP_DELAY%2CTAXI_OUT%2CWHEELS_OFF%2CWHEELS_ON%2CTAXI_IN%2CCRS_ARR_TIME%2CARR_TIME%2CARR_DELAY%2CCANCELLED%2CCANCELLATION_CODE%2CDIVERTED%2CDISTANCE%2CCARRIER_DELAY%2CWEATHER_DELAY%2CNAS_DELAY%2CSECURITY_DELAY%2CLATE_AIRCRAFT_DELAY&grouplist=&suml=&sumRegion=&filter1=title%3D&filter2=title%3D&geo=All%A0&time=January&timename=Month&GEOGRAPHY=All&XYEAR=2014&FREQUENCY=1&VarDesc=Year&VarType=Num&VarDesc=Quarter&VarType=Num&VarDesc=Month&VarType=Num&VarDesc=DayofMonth&VarType=Num&VarDesc=DayOfWeek&VarType=Num&VarName=FL_DATE&VarDesc=FlightDate&VarType=Char&VarName=UNIQUE_CARRIER&VarDesc=UniqueCarrier&VarType=Char&VarName=AIRLINE_ID&VarDesc=AirlineID&VarType=Num&VarDesc=Carrier&VarType=Char&VarName=TAIL_NUM&VarDesc=TailNum&VarType=Char&VarName=FL_NUM&VarDesc=FlightNum&VarType=Char&VarName=ORIGIN_AIRPORT_ID&VarDesc=OriginAirportID&VarType=Num&VarName=ORIGIN_AIRPORT_SEQ_ID&VarDesc=OriginAirportSeqID&VarType=Num&VarName=ORIGIN_CITY_MARKET_ID&VarDesc=OriginCityMarketID&VarType=Num&VarName=ORIGIN&VarDesc=Origin&VarType=Char&VarName=ORIGIN_CITY_NAME&VarDesc=OriginCityName&VarType=Char&VarDesc=OriginState&VarType=Char&VarDesc=OriginStateFips&VarType=Char&VarName=ORIGIN_STATE_NM&VarDesc=OriginStateName&VarType=Char&VarDesc=OriginWac&VarType=Num&VarName=DEST_AIRPORT_ID&VarDesc=DestAirportID&VarType=Num&VarName=DEST_AIRPORT_SEQ_ID&VarDesc=DestAirportSeqID&VarType=Num&VarName=DEST_CITY_MARKET_ID&VarDesc=DestCityMarketID&VarType=Num&VarName=DEST&VarDesc=Dest&VarType=Char&VarName=DEST_CITY_NAME&VarDesc=DestCityName&VarType=Char&VarDesc=DestState&VarType=Char&VarDesc=DestStateFips&VarType=Char&VarName=DEST_STATE_NM&VarDesc=DestStateName&VarType=Char&VarDesc=DestWac&VarType=Num&VarName=CRS_DEP_TIME&VarDesc=CRSDepTime&VarType=Char&VarName=DEP_TIME&VarDesc=DepTime&VarType=Char&VarName=DEP_DELAY&VarDesc=DepDelay&VarType=Num&VarDesc=DepDelayMinutes&VarType=Num&VarDesc=DepDel15&VarType=Num&VarDesc=DepartureDelayGroups&VarType=Num&VarDesc=DepTimeBlk&VarType=Char&VarName=TAXI_OUT&VarDesc=TaxiOut&VarType=Num&VarName=WHEELS_OFF&VarDesc=WheelsOff&VarType=Char&VarName=WHEELS_ON&VarDesc=WheelsOn&VarType=Char&VarName=TAXI_IN&VarDesc=TaxiIn&VarType=Num&VarName=CRS_ARR_TIME&VarDesc=CRSArrTime&VarType=Char&VarName=ARR_TIME&VarDesc=ArrTime&VarType=Char&VarName=ARR_DELAY&VarDesc=ArrDelay&VarType=Num&VarDesc=ArrDelayMinutes&VarType=Num&VarDesc=ArrDel15&VarType=Num&VarDesc=ArrivalDelayGroups&VarType=Num&VarDesc=ArrTimeBlk&VarType=Char&VarName=CANCELLED&VarDesc=Cancelled&VarType=Num&VarName=CANCELLATION_CODE&VarDesc=CancellationCode&VarType=Char&VarName=DIVERTED&VarDesc=Diverted&VarType=Num&VarDesc=CRSElapsedTime&VarType=Num&VarDesc=ActualElapsedTime&VarType=Num&VarDesc=AirTime&VarType=Num&VarDesc=Flights&VarType=Num&VarName=DISTANCE&VarDesc=Distance&VarType=Num&VarDesc=DistanceGroup&VarType=Num&VarName=CARRIER_DELAY&VarDesc=CarrierDelay&VarType=Num&VarName=WEATHER_DELAY&VarDesc=WeatherDelay&VarType=Num&VarName=NAS_DELAY&VarDesc=NASDelay&VarType=Num&VarName=SECURITY_DELAY&VarDesc=SecurityDelay&VarType=Num&VarName=LATE_AIRCRAFT_DELAY&VarDesc=LateAircraftDelay&VarType=Num&VarDesc=FirstDepTime&VarType=Char&VarDesc=TotalAddGTime&VarType=Num&VarDesc=LongestAddGTime&VarType=Num&VarDesc=DivAirportLandings&VarType=Num&VarDesc=DivReachedDest&VarType=Num&VarDesc=DivActualElapsedTime&VarType=Num&VarDesc=DivArrDelay&VarType=Num&VarDesc=DivDistance&VarType=Num&VarDesc=Div1Airport&VarType=Char&VarDesc=Div1AirportID&VarType=Num&VarDesc=Div1AirportSeqID&VarType=Num&VarDesc=Div1WheelsOn&VarType=Char&VarDesc=Div1TotalGTime&VarType=Num&VarDesc=Div1LongestGTime&VarType=Num&VarDesc=Div1WheelsOff&VarType=Char&VarDesc=Div1TailNum&VarType=Char&VarDesc=Div2Airport&VarType=Char&VarDesc=Div2AirportID&VarType=Num&VarDesc=Div2AirportSeqID&VarType=Num&VarDesc=Div2WheelsOn&VarType=Char&VarDesc=Div2TotalGTime&VarType=Num&VarDesc=Div2LongestGTime&VarType=Num&VarDesc=Div2WheelsOff&VarType=Char&VarDesc=Div2TailNum&VarType=Char&VarDesc=Div3Airport&VarType=Char&VarDesc=Div3AirportID&VarType=Num&VarDesc=Div3AirportSeqID&VarType=Num&VarDesc=Div3WheelsOn&VarType=Char&VarDesc=Div3TotalGTime&VarType=Num&VarDesc=Div3LongestGTime&VarType=Num&VarDesc=Div3WheelsOff&VarType=Char&VarDesc=Div3TailNum&VarType=Char&VarDesc=Div4Airport&VarType=Char&VarDesc=Div4AirportID&VarType=Num&VarDesc=Div4AirportSeqID&VarType=Num&VarDesc=Div4WheelsOn&VarType=Char&VarDesc=Div4TotalGTime&VarType=Num&VarDesc=Div4LongestGTime&VarType=Num&VarDesc=Div4WheelsOff&VarType=Char&VarDesc=Div4TailNum&VarType=Char&VarDesc=Div5Airport&VarType=Char&VarDesc=Div5AirportID&VarType=Num&VarDesc=Div5AirportSeqID&VarType=Num&VarDesc=Div5WheelsOn&VarType=Char&VarDesc=Div5TotalGTime&VarType=Num&VarDesc=Div5LongestGTime&VarType=Num&VarDesc=Div5WheelsOff&VarType=Char&VarDesc=Div5TailNum&VarType=Char'\n", | |
| "\n", | |
| "r = requests.post('http://www.transtats.bts.gov/DownLoad_Table.asp?Table_ID=236&Has_Group=3&Is_Zipped=0',\n", | |
| " headers=headers, data=data, stream=True)\n", | |
| "\n", | |
| "with open(\"flights.csv\", 'wb') as f:\n", | |
| " for chunk in r.iter_content(chunk_size=1024): \n", | |
| " if chunk:\n", | |
| " f.write(chunk)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 3, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "'/Users/tom.augspurger/sandbox/tomaugspurger.github.io/notebooks/878167309_T_ONTIME.csv'" | |
| ] | |
| }, | |
| "execution_count": 3, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "fp" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 2, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "zf = zipfile.ZipFile(\"flights.csv.zip\")\n", | |
| "filename = zf.filelist[0].filename\n", | |
| "fp = zf.extract(filename)\n", | |
| "df = pd.read_csv(fp, parse_dates=\"FL_DATE\").rename(columns=str.lower)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 3, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<table border=\"0\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>fl_date</th>\n", | |
| " <th>unique_carrier</th>\n", | |
| " <th>airline_id</th>\n", | |
| " <th>tail_num</th>\n", | |
| " <th>fl_num</th>\n", | |
| " <th>origin_airport_id</th>\n", | |
| " <th>origin_airport_seq_id</th>\n", | |
| " <th>origin_city_market_id</th>\n", | |
| " <th>origin</th>\n", | |
| " <th>origin_city_name</th>\n", | |
| " <th>...</th>\n", | |
| " <th>cancelled</th>\n", | |
| " <th>cancellation_code</th>\n", | |
| " <th>diverted</th>\n", | |
| " <th>distance</th>\n", | |
| " <th>carrier_delay</th>\n", | |
| " <th>weather_delay</th>\n", | |
| " <th>nas_delay</th>\n", | |
| " <th>security_delay</th>\n", | |
| " <th>late_aircraft_delay</th>\n", | |
| " <th>unnamed: 36</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>0</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>AA</td>\n", | |
| " <td>19805</td>\n", | |
| " <td>N338AA</td>\n", | |
| " <td>1</td>\n", | |
| " <td>12478</td>\n", | |
| " <td>1247802</td>\n", | |
| " <td>31703</td>\n", | |
| " <td>JFK</td>\n", | |
| " <td>New York, NY</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>2475.0</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>1</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>AA</td>\n", | |
| " <td>19805</td>\n", | |
| " <td>N339AA</td>\n", | |
| " <td>2</td>\n", | |
| " <td>12892</td>\n", | |
| " <td>1289203</td>\n", | |
| " <td>32575</td>\n", | |
| " <td>LAX</td>\n", | |
| " <td>Los Angeles, CA</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>2475.0</td>\n", | |
| " <td>111.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>2</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>AA</td>\n", | |
| " <td>19805</td>\n", | |
| " <td>N335AA</td>\n", | |
| " <td>3</td>\n", | |
| " <td>12478</td>\n", | |
| " <td>1247802</td>\n", | |
| " <td>31703</td>\n", | |
| " <td>JFK</td>\n", | |
| " <td>New York, NY</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>2475.0</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>3</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>AA</td>\n", | |
| " <td>19805</td>\n", | |
| " <td>N367AA</td>\n", | |
| " <td>5</td>\n", | |
| " <td>11298</td>\n", | |
| " <td>1129803</td>\n", | |
| " <td>30194</td>\n", | |
| " <td>DFW</td>\n", | |
| " <td>Dallas/Fort Worth, TX</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>3784.0</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>4</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>AA</td>\n", | |
| " <td>19805</td>\n", | |
| " <td>N364AA</td>\n", | |
| " <td>6</td>\n", | |
| " <td>13830</td>\n", | |
| " <td>1383002</td>\n", | |
| " <td>33830</td>\n", | |
| " <td>OGG</td>\n", | |
| " <td>Kahului, HI</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>3711.0</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "<p>5 rows × 37 columns</p>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " fl_date unique_carrier airline_id tail_num fl_num origin_airport_id \\\n", | |
| "0 2014-01-01 AA 19805 N338AA 1 12478 \n", | |
| "1 2014-01-01 AA 19805 N339AA 2 12892 \n", | |
| "2 2014-01-01 AA 19805 N335AA 3 12478 \n", | |
| "3 2014-01-01 AA 19805 N367AA 5 11298 \n", | |
| "4 2014-01-01 AA 19805 N364AA 6 13830 \n", | |
| "\n", | |
| " origin_airport_seq_id origin_city_market_id origin origin_city_name \\\n", | |
| "0 1247802 31703 JFK New York, NY \n", | |
| "1 1289203 32575 LAX Los Angeles, CA \n", | |
| "2 1247802 31703 JFK New York, NY \n", | |
| "3 1129803 30194 DFW Dallas/Fort Worth, TX \n", | |
| "4 1383002 33830 OGG Kahului, HI \n", | |
| "\n", | |
| " ... cancelled cancellation_code diverted distance carrier_delay \\\n", | |
| "0 ... 0.0 NaN 0.0 2475.0 NaN \n", | |
| "1 ... 0.0 NaN 0.0 2475.0 111.0 \n", | |
| "2 ... 0.0 NaN 0.0 2475.0 NaN \n", | |
| "3 ... 0.0 NaN 0.0 3784.0 NaN \n", | |
| "4 ... 0.0 NaN 0.0 3711.0 NaN \n", | |
| "\n", | |
| " weather_delay nas_delay security_delay late_aircraft_delay unnamed: 36 \n", | |
| "0 NaN NaN NaN NaN NaN \n", | |
| "1 0.0 0.0 0.0 0.0 NaN \n", | |
| "2 NaN NaN NaN NaN NaN \n", | |
| "3 NaN NaN NaN NaN NaN \n", | |
| "4 NaN NaN NaN NaN NaN \n", | |
| "\n", | |
| "[5 rows x 37 columns]" | |
| ] | |
| }, | |
| "execution_count": 3, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "df.head()\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Slicing\n", | |
| "\n", | |
| "Or, *explicit is better than implicit*.\n", | |
| "\n", | |
| "By my count, 7 of the top-15 voted pandas questions on [Stackoverflow](http://stackoverflow.com/questions/tagged/pandas?sort=votes&pageSize=15) are about slicing. This seems as good a place as any to start.\n", | |
| "\n", | |
| "Brief history digression: For years the preferred method for row and/or column selection was `.ix`." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 5, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<table border=\"0\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>fl_date</th>\n", | |
| " <th>tail_num</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>10</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>N3LGAA</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>11</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>N368AA</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>12</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>N3DDAA</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>13</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>N332AA</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>14</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>N327AA</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>15</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>N3LBAA</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " fl_date tail_num\n", | |
| "10 2014-01-01 N3LGAA\n", | |
| "11 2014-01-01 N368AA\n", | |
| "12 2014-01-01 N3DDAA\n", | |
| "13 2014-01-01 N332AA\n", | |
| "14 2014-01-01 N327AA\n", | |
| "15 2014-01-01 N3LBAA" | |
| ] | |
| }, | |
| "execution_count": 5, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "df.ix[10:15, ['fl_date', 'tail_num']]" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "However this simple little operation hides some complexity. What if, rather than our default `range(n)` index, we had an Integer index like" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 7, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<table border=\"0\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>fl_date</th>\n", | |
| " <th>unique_carrier</th>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>airline_id</th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>19393</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>WN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>19690</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>HA</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>19790</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>DL</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>19805</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>AA</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>19930</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>AS</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " fl_date unique_carrier\n", | |
| "airline_id \n", | |
| "19393 2014-01-01 WN\n", | |
| "19690 2014-01-01 HA\n", | |
| "19790 2014-01-01 DL\n", | |
| "19805 2014-01-01 AA\n", | |
| "19930 2014-01-01 AS" | |
| ] | |
| }, | |
| "execution_count": 7, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "first = df.groupby('airline_id')[['fl_date', 'unique_carrier']].first()\n", | |
| "first.head()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Can you predict ahead of time what our slice from above will give when passed to `.ix`?" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 8, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<table border=\"0\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>fl_date</th>\n", | |
| " <th>tail_num</th>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>airline_id</th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| "Empty DataFrame\n", | |
| "Columns: [fl_date, tail_num]\n", | |
| "Index: []" | |
| ] | |
| }, | |
| "execution_count": 8, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "first.ix[10:15, ['fl_date', 'tail_num']]" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "We had an integer index, so the call to `.ix` used it's label-based mode. It was looking for integer *labels* between 10:15 (inclusive). It didn't find any. Since we sliced a range it returned an empty DataFrame, rather than raising a KeyError.\n", | |
| "\n", | |
| "By way of contrast, suppose we had a string index, rather than integers." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 19, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<table border=\"0\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>fl_date</th>\n", | |
| " <th>tail_num</th>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>unique_carrier</th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>UA</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>N14214</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>US</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>N650AW</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>VX</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>N637VA</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>WN</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>N412WN</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " fl_date tail_num\n", | |
| "unique_carrier \n", | |
| "UA 2014-01-01 N14214\n", | |
| "US 2014-01-01 N650AW\n", | |
| "VX 2014-01-01 N637VA\n", | |
| "WN 2014-01-01 N412WN" | |
| ] | |
| }, | |
| "execution_count": 19, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "first = df.groupby('unique_carrier').first()\n", | |
| "first.ix[10:15, ['fl_date', 'tail_num']]" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "And it works again! Now that we had a string index, `.ix` used it's positional-mode. It looked for *rows* 10-15 (exclusive on the right).\n", | |
| "\n", | |
| "But you can't reliably predict what the outcome of the slice will be ahead of time. It's on the *reader* of the code (probably your future self) to know the dtypes so you can reckon whether `.ix` will use positional indexing or label indexing (like the last example).\n", | |
| "\n", | |
| "Since pandas 0.12, these tasks have been cleanly separated into two methods:\n", | |
| "\n", | |
| "1. `.loc` for label-based indexing\n", | |
| "2. `.iloc` for positional indexing" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 18, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<table border=\"0\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>fl_date</th>\n", | |
| " <th>tail_num</th>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>unique_carrier</th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>AA</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>N338AA</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>AS</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>N524AS</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>DL</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>N911DL</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " fl_date tail_num\n", | |
| "unique_carrier \n", | |
| "AA 2014-01-01 N338AA\n", | |
| "AS 2014-01-01 N524AS\n", | |
| "DL 2014-01-01 N911DL" | |
| ] | |
| }, | |
| "execution_count": 18, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "first.loc[['AA', 'AS', 'DL'], ['fl_date', 'tail_num']]" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 20, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<table border=\"0\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>fl_date</th>\n", | |
| " <th>airline_id</th>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>unique_carrier</th>\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>AA</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>19805</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>AS</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>19930</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>DL</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>19790</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " fl_date airline_id\n", | |
| "unique_carrier \n", | |
| "AA 2014-01-01 19805\n", | |
| "AS 2014-01-01 19930\n", | |
| "DL 2014-01-01 19790" | |
| ] | |
| }, | |
| "execution_count": 20, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "first.iloc[[0, 1, 3], [0, 1]]" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "`.ix` is still around, and isn't being deprecated any time soon. Occasionally it's useful. But if you've been using `.ix` out of habit, or if you didn't know any better, maybe give `.loc` and `.iloc` a shot. For the intrepid reader, Joris Van den Bossche (a core pandas dev) [compiled a great overview](https://github.com/pydata/pandas/issues/9595) of the pandas `__getitem__` API. To quote him:\n", | |
| "\n", | |
| "> Conclusion: it is mess :-)\n", | |
| "\n", | |
| "There are unfortunately edge cases I omitted here (many related to `.ix`, which you're done using, right?). This is a surprising difficult topic to get right, and it's so fundamental that breaking API isn't really practical. But if you dilligently use `.loc` and `.iloc`, you'll avoid most of the edge cases." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## SettingWithCopy\n", | |
| "\n", | |
| "*The warning we all love to hate.*\n", | |
| "\n", | |
| "Pandas used to get *a lot* of questions about assignments seemingly not working. We'll take [this StackOverflow](http://stackoverflow.com/q/16553298/1889400) question as a representative question." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 21, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<table border=\"0\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>a</th>\n", | |
| " <th>b</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>0</th>\n", | |
| " <td>1</td>\n", | |
| " <td>10</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>1</th>\n", | |
| " <td>2</td>\n", | |
| " <td>20</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>2</th>\n", | |
| " <td>3</td>\n", | |
| " <td>30</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>3</th>\n", | |
| " <td>4</td>\n", | |
| " <td>40</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>4</th>\n", | |
| " <td>5</td>\n", | |
| " <td>50</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " a b\n", | |
| "0 1 10\n", | |
| "1 2 20\n", | |
| "2 3 30\n", | |
| "3 4 40\n", | |
| "4 5 50" | |
| ] | |
| }, | |
| "execution_count": 21, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "f = pd.DataFrame({'a':[1,2,3,4,5], 'b':[10,20,30,40,50]})\n", | |
| "f" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "The user wanted to take the rows of `b` where `a` was 3 or less, and set them equal to `b / 10`" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 22, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<table border=\"0\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>a</th>\n", | |
| " <th>b</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>0</th>\n", | |
| " <td>1</td>\n", | |
| " <td>10</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>1</th>\n", | |
| " <td>2</td>\n", | |
| " <td>20</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>2</th>\n", | |
| " <td>3</td>\n", | |
| " <td>30</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>3</th>\n", | |
| " <td>4</td>\n", | |
| " <td>40</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>4</th>\n", | |
| " <td>5</td>\n", | |
| " <td>50</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " a b\n", | |
| "0 1 10\n", | |
| "1 2 20\n", | |
| "2 3 30\n", | |
| "3 4 40\n", | |
| "4 5 50" | |
| ] | |
| }, | |
| "execution_count": 22, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "# ignore the context manager for now\n", | |
| "with pd.option_context('mode.chained_assignment', None):\n", | |
| " f[f['a'] <= 3]['b'] = f[f['a'] <= 3 ]['b'] / 10\n", | |
| "f" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "And nothing happend. Well, something did happen, but nobody witnessed it. If an object without any references is modified, does it make a sound?\n", | |
| "\n", | |
| "The warning I silenced above links to [an explanation](http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy) that's quite helpful. I'll summarize the high points here.\n", | |
| "\n", | |
| "This comes down to what's called *chained indexing*. Above we made two calls on the left-hand side, one `__getitem__` and one `__setitem__`.\n", | |
| "\n", | |
| "1. `f[f['a'] <= 3]`\n", | |
| "2. `_['b']` # using `_` to represent the result of 1.\n", | |
| "\n", | |
| "In general, pandas can't guarantee whether that first `__getitem__` returns a view or a copy of the underlying data. And so we can't be sure that we actually have a reference to the object having `__setitem__` called on it.\n", | |
| "\n", | |
| "Done properly, you would write" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 52, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<table border=\"0\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>a</th>\n", | |
| " <th>b</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>0</th>\n", | |
| " <td>1</td>\n", | |
| " <td>1.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>1</th>\n", | |
| " <td>2</td>\n", | |
| " <td>2.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>2</th>\n", | |
| " <td>3</td>\n", | |
| " <td>3.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>3</th>\n", | |
| " <td>4</td>\n", | |
| " <td>40.0</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>4</th>\n", | |
| " <td>5</td>\n", | |
| " <td>50.0</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " a b\n", | |
| "0 1 1.0\n", | |
| "1 2 2.0\n", | |
| "2 3 3.0\n", | |
| "3 4 40.0\n", | |
| "4 5 50.0" | |
| ] | |
| }, | |
| "execution_count": 52, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "f.loc[f['a'] <= 3, 'b'] = f.loc[f['a'] <= 3, 'b'] / 10\n", | |
| "f" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Now this is all in a single call to `__setitem__` and pandas can ensure that the assignment happens properly.\n", | |
| "\n", | |
| "The rough rule is any time you see back-to-back square brackets, `][`, you're in asking for trouble. Replace that with a `.loc[..., ...]` and you'll be set.\n", | |
| "\n", | |
| "The other bit of advice is that a SettingWithCopy warning is raised when the *assignment* is made. The potential copy could be made earlier in your code." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# Vectorization\n", | |
| "\n", | |
| "DataFrame appends are relatively expensive. Keep in mind that indexes are immutable, so each time you append pandas has to create an entirely new index.\n", | |
| "\n", | |
| "In the next section, we'll download a bunch of weather files, one per state, writing each to a separate CSV. One could imagine coming back later to read them in, using the following code." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "The idiomatic python way\n", | |
| "\n", | |
| "```python\n", | |
| "files = glob.glob('weather/*.csv')\n", | |
| "columns = ['station', 'date', 'tmpf', 'relh', 'sped', 'mslp',\n", | |
| " 'p01i', 'vsby', 'gust_mph', 'skyc1', 'skyc2', 'skyc3']\n", | |
| "\n", | |
| "# init empty DataFrame, like you might for a list\n", | |
| "weather = pd.DataFrame(columns=columns)\n", | |
| "\n", | |
| "for fp in files:\n", | |
| " city = pd.read_csv(fp, columns=columns)\n", | |
| " weather.append(df)\n", | |
| "```\n", | |
| "\n", | |
| "The only nitpick is that you'd probably use a list-comprehension if you were just making a list. But pandas doesn't have access to that syntax for DataFrame-comprehension (if only), so you'd fall back to the \"intitilize empty container, append to said container\" pattern.\n", | |
| "\n", | |
| "But, there's a better, pandorable, way\n", | |
| "\n", | |
| "```python\n", | |
| "files = glob.glob('weather/*.csv')\n", | |
| "weather_dfs = [pd.read_csv(fp, names=columns) for fp in files]\n", | |
| "weather = pd.concat(weather_dfs)\n", | |
| "```\n", | |
| "\n", | |
| "Subjectively this is cleaner and more beautiful.\n", | |
| "There's fewer lines of code, and the initializtion of an empty data structure says more about *how* to achieve the real task of building a data structure.\n", | |
| "Objectively the pandorable way is faster, as we'll test next.\n", | |
| "We'll define two functions for building an identical DataFrame. The first `append_df`, creates an empty dataframe and appends to it. The second, `concat_df`, creates many DataFrames, and concatenates them at the end. We also write a short decorator that runs the functions a handful of times and records the results." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 53, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "import time\n", | |
| "\n", | |
| "size_per = 1000\n", | |
| "N = 100\n", | |
| "cols = list('abcd')\n", | |
| "\n", | |
| "def timed(n=30):\n", | |
| " '''\n", | |
| " Running a microbenchmark. Not especially important.\n", | |
| " '''\n", | |
| " def deco(func):\n", | |
| " def wrapper(*args, **kwargs):\n", | |
| " timings = []\n", | |
| " for i in range(n):\n", | |
| " t0 = time.time()\n", | |
| " func(*args, **kwargs)\n", | |
| " t1 = time.time()\n", | |
| " timings.append(t1 - t0)\n", | |
| " return timings\n", | |
| " return wrapper\n", | |
| " return deco\n", | |
| " \n", | |
| "@timed(30)\n", | |
| "def append_df():\n", | |
| " '''\n", | |
| " The pythonic (bad) way\n", | |
| " '''\n", | |
| " df = pd.DataFrame(columns=cols)\n", | |
| " for _ in range(N):\n", | |
| " df.append(pd.DataFrame(np.random.randn(size_per, 4), columns=cols))\n", | |
| " return df\n", | |
| "\n", | |
| "@timed(30)\n", | |
| "def concat_df():\n", | |
| " '''\n", | |
| " The pandorabe (good) way\n", | |
| " '''\n", | |
| " dfs = [pd.DataFrame(np.random.randn(size_per, 4), columns=cols)\n", | |
| " for _ in range(N)]\n", | |
| " return pd.concat(dfs, ignore_index=True)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 54, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "t_append = append_df()\n", | |
| "t_concat = concat_df()\n", | |
| "\n", | |
| "timings = pd.DataFrame({\"append (🐍)\": t_append, \"concat (🐼)\": t_concat})" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 55, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "%matplotlib inline\n", | |
| "\n", | |
| "sns.set_style('white')\n", | |
| "sns.set_context('talk')" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 59, | |
| "metadata": { | |
| "collapsed": false, | |
| "scrolled": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "image/png": "iVBORw0KGgoAAAANSUhEUgAAAuEAAAH7CAYAAABi5UHyAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3Xlc1VX+x/H35cJlEQRBxAVFlknLFdww0TRqnMo0UrMx\nmyWbin41NY0tTjVSTovNlNMytjjTlA3+ytHyl5ZWOqVZ4ppL4g6UGC4IisiFy3J/f5jUHRYvCOci\nvp5/yed77vl+bo/AN8dzz9fidDqdAgAAAGCMl6cbAAAAAC40hHAAAADAMEI4AAAAYBghHAAAADCM\nEA4AAAAYRggHAAAADDMewjMzMzVx4kTFx8crJSVFW7durXXc888/r+HDh2vAgAH65S9/qX379jV4\nDgAAAKAlMhrCHQ6HUlNTNWHCBG3cuFFTpkxRamqq7Ha7y7h///vf+uSTT/Tuu+9q06ZNGjBggB54\n4IEGzQEAAAC0VEZDeEZGhqxWqyZNmiSr1arx48crLCxMq1atchk3ceJELVy4UOHh4SouLlZRUZFC\nQ0MlSWvXrnVrDgAAAKCl8jZ5s6ysLMXGxrrUoqOjlZWVVWOsn5+f3nvvPf3hD39QUFCQXn/9dUlS\ndna223MAAAAALZHRlXC73S5/f3+Xmr+/v0pLS2sdP2bMGG3fvl133HGHpk6dqqKiogbPAQAAALQ0\nRkN4bWHZbrcrICCg1vE+Pj7y9vbWLbfcojZt2mj9+vUNnuO/VVRUKDc3VxUVFY17EwAAAMA5MhrC\nY2JilJ2d7VLLzs5WXFycS+3FF1/U7Nmzq792Op0qLy9XUFCQYmJiamw9qW2Ouhw6dEjJyck6dOhQ\nI98FAAAAcG6MhvDExEQ5HA6lp6eroqJCCxcuVEFBgZKSklzG9evXT2+//bb27Nmj8vJyvfTSSwoK\nClJ8fLwSExNVXl5+1jkAAACAlspoCLfZbJo7d66WLFmiIUOGaP78+Xr55Zfl5+enGTNmKC0tTZI0\nYsQI/f73v9edd96ppKQk7dixQ3//+99ls9nqnQMAAAA4H1icTqfT002YlJubq+TkZK1cuVKRkZGe\nbgcAAAAXIB5bDwAAABhGCAcAAAAMI4QDAAAAhhHCAQAAAMMI4QAAAIBhhHAAAADAMEI4AAAAYBgh\nHAAAADCMEA4AAAAYRggHAAAADCOEAwAAAIYRwgEAAADDCOEAAACAYYRwAAAAwDBCOAAAAGAYIRwA\nAAAwjBAOAAAAGEYIBwAAAAwjhAMAAACGEcIBAAAAwwjhAAAAgGGEcAAAAMAwQjgAAABgGCEcAAAA\nMIwQDgAAABhGCAcAAAAMI4QDAAAAhhHCAQAAAMMI4QAAAIBhhHAAAADAMEI4AAAAYBghHAAAADCM\nEA4AAAAYRggHAAAADCOEAwAAAIYRwgEAAADDCOEAAACAYYRwoAWxl1UoJ69IxfZyT7cCAACakben\nGwBwWvryXfq/1ftkL6uUzdtLo4d219Rre8lq5XdlAABaG/52B1qAD77I1tuf7Ja9rFKS5Kio0pLP\ns/S/n+z2cGcAAKA5EMKBFuCDL7JrrS/7MkdOp9NwNwAAoLkRwoEWoKCotNZ60SmHKioJ4QAAtDaE\ncKAFuCQ6tNZ6XNcQ+XjzbQoAQGvD3+5ACzB5dE/52awuNW+rRb+46mIPdQQAAJoTp6MALUBcZIie\nu/cyvffZPuXkFalLeKDGjYhVXNcQT7cGAACaASEcaCG6RgTpt5PiPd0GAAAwgO0oAAAAgGGEcAAA\nAMAwQjgAAABgGCEcAAAAMIwQDgAAABhGCAcAAAAMI4QDAAAAhhHCAQAAAMMI4QAAAIBhhHAAAADA\nMEI4AAAAYBghHAAAADCMEA4AAAAYRggHAAAADCOEAwAAAIYRwgEAAADDCOEAAACAYYRwAAAAwDBC\nOAAAAGCY8RCemZmpiRMnKj4+XikpKdq6dWut4xYsWKDRo0dr4MCBmjhxojZu3Fh97fXXX1fv3r2V\nkJCg+Ph4JSQkaNOmTabeAgAAAHBOjIZwh8Oh1NRUTZgwQRs3btSUKVOUmpoqu93uMm7dunWaPXu2\nXnjhBW3cuFE33XSTUlNTdeLECUmng/y0adO0efNmffXVV9q8ebMGDBhg8q0AAAAAjWY0hGdkZMhq\ntWrSpEmyWq0aP368wsLCtGrVKpdxhw4d0q233qoePXpIkq677jp5eXlp7969kqSdO3dWXwMAAADO\nN94mb5aVlaXY2FiXWnR0tLKyslxq48aNc/l606ZNKikp0U9+8hOVlpYqOztb8+bN0/3336/g4GDd\ncsstGj9+fLP3DwAAADQFoyHcbrfL39/fpebv76/S0tI6X7Nv3z7dc889uueeexQcHKzc3FwNGDBA\nkydP1tChQ7VlyxalpqaqQ4cOGj58eHO/BQAAAOCcGd2OUlvgttvtCggIqHX8mjVrNHnyZN188826\n9dZbJUmRkZF66623NHz4cHl7e2vgwIEaN26cVqxY0ez9AwAAAE3BaAiPiYlRdna2Sy07O1txcXE1\nxi5atEj33nuv0tLSdPvtt1fXd+zYoddee81lbFlZmXx9fZunaVxwKquc+np/vr7afUSO8kpPtwMA\nAFoho9tREhMT5XA4lJ6erkmTJmnx4sUqKChQUlKSy7i1a9fq8ccf1+uvv17j1JPAwEDNmTNH3bt3\n15VXXqmMjAx9+OGHSk9PN/lW0Ert+bZQs+Zt0JHC0yf2BAX46H8m9tewvp093BkAAGhNLE6n02ny\nhnv27NEf//hH7d27V1FRUUpLS1Pfvn01Y8YMWSwWpaWlaerUqcrIyKhe3XY6nbJYLHrhhReUlJSk\nVatW6dlnn9WBAwfUqVMn3Xfffbriiivcun9ubq6Sk5O1cuVKRUZGNudbxXnGUV6pqX/6RMeLy1zq\n3laLXn4wWR3D2nioMwAA0NoYD+GeRghHXb7Y+p2enreh1muTR/fUz3/aMo/FLDxZqrc+3Km12/Nk\ntVo0vH8XTfnZxWrj7+Pp1gAAQB2MbkcBWrJie3md107Vc82TyisqNf1vX+jg0eLq2tI12dqfe0Kz\n7kqSxWLxYHcAAKAuxh9bD7RU/S8Kl1cdmTW+R7jZZtz0xdbvXAL4GTtzCrRtX74HOgIAAO4ghAPf\niwgN0ITki2rUh/bppIQeHTzQ0dl9c+hk3dfyigx2AgAAGoLtKMCP3HzVxeoVE6ZVm3NVUVGlIb07\nali/Li12W0eX8MA6r0V2CDLYCQAAaAhCOPBfEnp0aLEr3/9teHwX/e8nu3WkoMSlHhsZrP4Xtcwt\nNAAAgO0owHnN18eqp+4cpmH9OsvbapHNx6rkQV312G+GyquuDe4AAMDjWAkHznMd2gXooV8M0pnT\nRlvq1hkAAPADQjjQShC+AQA4f7AdBQAAADCMEA4AAAAYRggHAAAADCOEAwAAAIYRwgEAAADDCOEA\nAACAYYRwAAAAwDDOCQfOoryiUl9uy9OBIycVFdFWiX06yceb318BAEDjEcKBehw7YdfDL3+hg0dP\nVde6RgTqidRhahfk58HOAADA+YzlPKAebyzNdAngknTgcLHe+nCnhzoCAACtASEcqMeX2/Nqr2/7\nznAnAACgNSGEA/XwstRet1jquAAAAOAGQjhQj2H9OtdaH96/i+FOAABAa0IIB+rx6zG9FN25rUst\nLjJYN199sYc6AgAArQGnowD1CA701ezfjdSmXYeVe/ikunVsq4QeHeRV1z4VAAAANxDCgbOwelk0\n+JKOGnxJR0+3AgAAWgm2owAAAACGEcIBAAAAwwjhAAAAgGGEcAAAAMAwPpgJnCe+yy/Wjv3HFBzk\nqwE9Oshq5XdoAADOV4Rw4Dzw6nvb9MEX2XI6T3/dITRAabcmqmtEkGcbAwAAjcJSGtDCrf4qV0vX\n/BDAJelIQYmenb/Jc00BAIBzQggHWrjPNufWWt+fe0IHDp803A0AAGgKhHCghSsvr6rzmqO80mAn\nAACgqRDCgRZucK/an9TZITRA0Z2DDXcDAACaAiEcaOFGJ0apT2x7l5rNx6r/mdBPXl4WD3UFAADO\nBaejAC2czceqmXdcqnVf52n7/nyFBPkqeWA3tQ/x93RrAACgkQjhwHnA6mXRpX0769K+nT3dCgAA\naAJsRwEAAAAMI4QDAAAAhhHCAQAAAMMI4QAAAIBhhHAAAADAMEI4AAAAYBghHAAAADCMEA4AAAAY\nRggHAAAADCOEAwAAAIYRwgEAAADDCOEAAACAYYRwAAAAwDBCOAAAAGAYIRwAAAAwjBAOAAAAGEYI\nBwAAAAwjhAMAAACGEcIBAAAAwwjhAAAAgGGEcAAAAMAwQjgAAABgmLenGwBai2/yirRg5R7t/fa4\n2of469rhMRrap5On2wIAAC0QIRxoAt8cKtL9L66WvaxSkpR37JS278/XXRP7aXRid882BwAAWhy2\nowBNYOF/9lYH8B+b/9FuVVZWeaAjAADQkhHCgSaw99vjtdYLikp17ESp4W4AAEBLx3YUoAlEhAbo\n4NHiGnU/m1VtA23nPP/RQrsWr96n3TmFCg3205ikaPWNCz/neQEAgGcYXwnPzMzUxIkTFR8fr5SU\nFG3durXWcQsWLNDo0aM1cOBATZw4URs3bmzwHIAp1w6PqbU+OrG7/Gzn9rvukYIS3ffXVXp/dZZ2\nf1uotdvz9MgrX+rTTQfOaV4AAOA5RkO4w+FQamqqJkyYoI0bN2rKlClKTU2V3W53Gbdu3TrNnj1b\nL7zwgjZu3KibbrpJqampOnHihNtzACYNvDhCv/t5gjq085ck+ft667rLYvWrMZec89yLPt2r48Vl\nLjWnU5r3QaYqq5znPD8AADDPaAjPyMiQ1WrVpEmTZLVaNX78eIWFhWnVqlUu4w4dOqRbb71VPXr0\nkCRdd9118vLy0t69e92eAzDt8oFdNfcPV+rNGaP1r8d+pqlje8vbeu7fYpnZBbXW80+U6nDBqbO+\n/mSJQ59vOah1X+epvKLmh0cBAIB5RveEZ2VlKTY21qUWHR2trKwsl9q4ceNcvt60aZNKSkoUFxen\nxYsXuzUH4AleXhaFtvVr0jlDg/2Uk1dUo+5ttahtQP37zZevzdHcxdvlqDh9QktwoE3TfzlYvWLC\nmrRHAADQMEZXwu12u/z9/V1q/v7+Ki2t+/SIffv26Z577tE999yjkJCQRs0BnA+OnyzTv1fu0fNv\nf6X/W71fxfZySdLVQ7vXOn5EfKQC6wnh3xwq0pxFW6sDuCSdKHboyTfWy1HOijgAAJ5kdCW8trBs\nt9sVEBBQ6/g1a9bovvvu09SpU3Xrrbc2ag7gfJCTV6Q/zPlCJ0sc1bV3Ptmtv9wzQkN6d9IdKX2U\n/tFunSxxyOpl0fD+XZR6fd965/xsU66ctWwZLzrl0KZdhzW0T+emfhsAAMBNRkN4TEyM0tPTXWrZ\n2dkaO3ZsjbGLFi3SU089pccff1xXX311o+YAzhf/eP9rlwAuSSdLyvXQS2v0j0d+qmuSYnTlkCgd\nPFqsdkF+CgnyPeucpY6KOq/V9mAhAABgjtHtKImJiXI4HEpPT1dFRYUWLlyogoICJSUluYxbu3at\nHn/8cb366qsuAbwhcwDni/KKKm3Zc7TWa4Uny7Rq8+mjCG0+VkV3DnYrgEvSoIs71lr3tnopvgdn\njAMA4ElGQ7jNZtPcuXO1ZMkSDRkyRPPnz9fLL78sPz8/zZgxQ2lpaZKkv//976qoqNBvfvMbJSQk\nKD4+XgkJCVqzZk29cwDnIy8vS72nqGzcdaRR88b3CNfIhMga9V+NuUTtgvh+AQDAkyxOZ227Rluv\n3NxcJScna+XKlYqMrBlQAE94/O8Z2rDzcK3XrhzcTb+dFN+oeZ1OpzbtOqJ1Ow7J18eqkQmRiusa\nci6tAgCAJsBj64EW4L7JCfrV4x+rrJZTS5IHdWv0vBaLRQMvjtDAiyPOpT0AANDEjD+2HkBNgQE2\nPXfvCIWH/HD8ps3Hqqlje3GmNwAArRAr4UADlZZVyNdmlcViadJ5u3Vsq78/fKV2ZB9TcUm5eseG\nKegsD+MBAADnJ0I44KbVX+Vq/ke7dPDoKbUL8tW4EbG6flRck4ZxLy+L+sS2b7L5AABAy0QIB9yw\nPvOQ/vyvTdVfF54s0xsfZKrK6dTE5Is82BkAADgfsScccMO7n+6rtf7+6ixVVl1QBwwBAIAmwEo4\n4Ia8/FO11o8Xl8leWq5AD+3drqysUsaOQ8o6eEKdwgKU1L+L/Gx8WwMA0NLxtzXghujObVVQVFqj\n3qGdvwL8fDzQkVRc4tDDr3yprIMnqmvpy3fpiTuHqXP7QI/0BAAA3MN2FMANN1xxkWp7qOWkK3vI\ny6tpT0lx19uf7HEJ4JKUf6JUcxd/7ZF+AACA+wjhwFlUVTn11e4jklzD9pVDuumnQ6I805SkL7d/\nV2t9867DKnVUGO4GAAA0BCEcqMeRwhKlPvMfvf3JnhofwFy5/lsdPFrsoc4kr7qORrRYmvwMcwAA\n0LQI4UA9XlywRd/VEbSrnNLnWw4a7ugHw/t3qbU+pFdH+fpYDXcDAAAagg9mAnUoLCrVlj1H6x1T\nXlHVrD2s+zpPy9bm6HhxmXrFhOn6kXEKCz79aPuUy2L1wRdZspdVurymZ/d2zdoTAAA4d4RwoA4O\nNwJ2Yu+OjZ7/2Am73v1sn77ef0whgb762dAoDe3Tufr6+6v3a+7//fAhy/25J/TF1u/03L2XKbSt\nn+Yt31kjgEvSgk/26Oqh0fLz5dsbAICWir+lge9VVjm1MfOQduYUKDTYTyMTuqp7p7bKySuqdXzK\nyDj9pGvDV50rKqt0+NgpPfLKl8o/8cOxh5t3H9Et1/ZSysg4lToqNP/j3TVee+xEqZZ8nqUxSdH6\nOOObWuc/VVqhXd8UqP9FHRrcGwAAMIMQDkgqK69U2ty1+nr/sera/OW79KsxvfT6kh2yl/1w2khI\nkK+m3TRA/X4S3qB7lFdU6o0PMvXJum9d5vuxdz7ZrauGdteBIyd1yl5e65idOQUKDrTJWc+DOtv4\n++hEcZlKSisU3s5fe789LotFuqhbO48dqQgAAH5ACAckfbAmyyWAS6dXlD/4IluvPJSslRu+1dHj\ndvWMCtXw/p3l493wDz6+vGibPln/bb1jTpVW6NvDJxUW7Ccvy+kPf/63kECb7KV1H0EYEmjTghV7\ntH7HIVU5T5+iUvV9Ym8X5KsHfzFIvWLCGtw/AABoOpyOAkhauz2v1npOXpH+9Po6hQX7adDFEdqX\ne1wLVuzVd/kNO5rwRHGZPt104KzjLBapXZCfwoL9ldinU61jNuw8LKu17tXskCBfZXx9qDrAV/1o\nybzwZJkeffVLlZTWvsoOAADMYCUckGSt7XGY39t74Lhm/+9XLrV/r9yj+yYnaER8pFvzHy20q6Ky\nnv0j30vs3Unh7U6ffnLPpHg5nTV/QXCUV+l/P96tUQMi9emmXJdrVwzqqhUb6g/75RVVWvTpXt18\n1SVu9Q4AAJoeIRyQNCK+i3ZkHTv7wO9VVjk1Z9E2De7VUX62s38bdQ5vIz+bVaWOmqeZSJLVy6Jh\nfTvrfyb2q64F+PkooUeHWlfpKyqdiu4crJEDuurLbd/Jy2LR8P5d5KioPGsIl07vKwcAAJ5DCAck\njU7srvdX79fBo6fcfs0pe7m27cvXKXu5NmQelq+PVZcP7Ko+ce1rjA3w89F1l8Xp7U9cTzwJ9PfR\nQ78cpOjOwWrbxlbjdWXltYd2SSp1VCqhRwcl9PjhFJTjJ8vkbfVSRWX9xyt2bh94trcHAACaESEc\n0OmV6DFJMXr1ve0Net2CFXu0+5vC6q9XbPhWU67qqUlX9JAk7cop0ObdR9TG30dXD+uu9iF++vDL\nHB0/WareMe114097qGtEUJ3zD7w4Qv94/+taT0IZfElEjVpIkK9SRsbq3yv31tv3uBGxbr5DAADQ\nHAjhwPdGJkTqtfe26+w7t09r28bmEsDPePvj3bpyUDe9tWyXVmz44TSUeR9kavqvBuv5+0a63VOX\n8EDdkHyR3lmxx6U+dniMYiNDan3NL66+RF0jgvTJum91uOCUjh63V4d4i0X6zbje9QZ/AADQ/CxO\nZ32nDbc+ubm5Sk5O1sqVKxUZ6d6H6nDheG7+phofdpSkoAAfFdvLq8NsuyBf9YwK1dqvaz9VZdyI\nGP3f6qwa9eBAm/756E8bfMRhZvYxrdn6naqqnLq0byf1jXP/jPKTJQ6t+zpP5ZVODb4kovqx9wAA\nwHNYCQd+5M7x/eQor9KX27+T0yl5Wy26cnA3pY7vp2MnSrVlzxG18bdp4MUdlL58V53zZB08UWv9\nRLFDX+8/pvgeDXua5SXRYbokunFnewcF2HTF4KhGvRYAADQPQjjwI36+3nrol4N0pKBERwpL1K1j\n2+oPTLYP8XcJs8mDuum9VftV9V9P1Gkf7KewkLpXm3liJQAA4GE9QC06hAaod2z7Wk8sOaNrRJB+\nd2O82vj7VNc6hgXo0amJGtG/S62vCW3ry9MqAQAAK+HAuRg5oKsS+3RSZlaBfG1WXdw9VF5eFsV0\nCda1w2O05PMf9oUH+Hnr9zcNkHc9DwYCAAAXBkI4cI78bN5K6Flzj/dt1/XR6MQofbX7iNr4+ejS\nvp1dVs0BAMCFixAONKOojm0V1bGtp9sAAAAtDP8uDgAAABhGCAcAAAAMI4QDAAAAhhHCAQAAAMMI\n4QAAAIBhhHAAAADAMEI4AAAAYBghHAAAADCMEA4AAAAYRggHAAAADCOEAwAAAIYRwgEAAADDCOEA\nAACAYYRwAAAAwDBCOAAAAGAYIRwAAAAwzPtsA+x2uxYvXqzVq1dr+/btOn78uCwWi8LCwtS7d29d\ndtlluuaaaxQQEGCiXwAAAOC8V2cIr6io0Kuvvqo333xT3bp1U1JSkq655hqFhISosrJShYWF2r17\ntxYsWKC//OUvuvnmm3XbbbfJZrOZ7B8AAAA479QZwm+44QYlJSVpyZIlioiIqHeSnJwcvf3227rh\nhhu0ePHiJm8SAAAAaE0sTqfTWduFo0ePKjw8vEGTHTlyRB06dGiSxppLbm6ukpOTtXLlSkVGRnq6\nHQAAAFyA6vxg5tkCeF5eniorK11qLT2AAwAAAC2BW6ejHD58WHfffbd27NihsrIyTZ48WaNGjdKo\nUaO0a9eu5u4RAAAAaFXcCuGPPfaYjh8/rnbt2um9997T3r179c477+iKK67QE0880dw9AgAAAK3K\nWY8olKSMjAwtXLhQnTt31ooVKzRq1Cj169dPoaGhuvbaa5u7RwAAAKBVcWsl3MfHR5WVlTp16pTW\nr1+vyy67TNLpD2JyPjgAAADQMG6thF966aX6wx/+IH9/f9lsNo0cOVKrV6/WE088UR3IAQAAALjH\nrZXwmTNnqn///goMDNScOXPUpk0bZWZmavDgwXrkkUeau0cAAACgVanznHCn0ymLxdKgyRrzGtM4\nJxwAAACeVudK+IQJE7RixQq3JqmqqtKHH36o8ePHN1ljAAAAQGtV557wv/71r5o5c6aeeOIJJScn\na9iwYYqLi1O7du3kdDpVWFioXbt2acOGDVq2bJl69uypv/71ryZ7BwAAAM5LdW5HOWPbtm1KT0/X\n6tWrVVhY6LLdJCwsTCNGjNCNN96ovn37NnuzTYHtKAAAAPC0s56O0rdv3+qAffDgQR07dkxeXl4K\nDw9XREREszcIAAAAtDZuHVF4RpcuXdSlS5fm6gUAAAC4ILh1RCEAAACApmM8hGdmZmrixImKj49X\nSkqKtm7dWu/4N954Q7/97W9daq+//rp69+6thIQExcfHKyEhQZs2bWrOtgEAAIAmYzSEOxwOpaam\nasKECdq4caOmTJmi1NRU2e32GmPtdrueeeYZzZo1q8bZ45mZmZo2bZo2b96sr776Sps3b9aAAQNM\nvQ0AAADgnDQohB8+fFgZGRkqLS1Vfn5+g2+WkZEhq9WqSZMmyWq1avz48QoLC9OqVatqjL3rrrt0\n4MAB3XjjjTWu7dy5Uz169Gjw/QEAAICWwK0QXlJSot/97ne67LLLdMstt+jo0aP64x//qMmTJ6ug\noMDtm2VlZSk2NtalFh0draysrBpjn376ab344osKCwtzqZeWlio7O1vz5s1TUlKSrrnmGi1atMjt\nHgAAAABPcyuE//nPf9ahQ4e0bNky+fr6SpKmTZum8vJyPfnkk27fzG63y9/f36Xm7++v0tLSGmPD\nw8NrnSM/P18DBgzQ5MmT9dlnn+mxxx7T008/rc8//9ztPgAAAABPciuEr1y5UtOnT1d0dHR1LSYm\nRmlpaQ0Kv7UFbrvdroCAALfniIyM1FtvvaXhw4fL29tbAwcO1Lhx47RixQq35wAAAAA8ya0QXlxc\nrMDAwBp1i8WiiooKt28WExOj7Oxsl1p2drbi4uLcnmPHjh167bXXXGplZWXVK/QAAABAS+dWCE9K\nStIrr7yiysrK6lphYaH+/Oc/a9iwYW7fLDExUQ6HQ+np6aqoqNDChQtVUFCgpKQkt+cIDAzUnDlz\n9PHHH8vpdGrt2rX68MMPdf3117s9BwAAAOBJboXwRx55RDk5ORo6dKhKS0t16623atSoUTpx4oQe\nfvhht29ms9k0d+5cLVmyREOGDNH8+fP18ssvy8/PTzNmzFBaWtpZ54iKitLzzz+vl156SQkJCZo5\nc6ZmzZqlnj17ut0HAAAA4EkWp9PpdHfw2rVrlZWVpYqKCsXGxmrYsGE1zvBu6XJzc5WcnKyVK1cq\nMjLS0+0AAADgAuTdkMGJiYkuD8UpLy+XdHqFGwAAAIB73Arh69ev12OPPaacnBxVVVVV151OpywW\ni3bu3NlsDQIAAACtjVsh/NFHH1VcXJwefPBB+fn5NXdPAAAAQKvmVgg/cuSIXnnlFZdzwgEAAAA0\njluno1x55ZVatWpVc/cCAAAAXBDcWgm/7777NHbsWC1dulRdu3aVl5drdn/22WebpTkAAACgNXIr\nhD/88MOyWCyKjIxkTzgAAABwjtwK4Rs3btS//vUv9enTp7n7AQAAAFo9t/aER0VFyeFwNHcvAAAA\nwAXBrZVTrsPIAAAgAElEQVTw1NRUPfTQQ7r55pvVrVs3eXu7viwpKalZmgMAAABaI7ceW9+zZ8+6\nJzjPHtbDY+sBAADgaW6thO/atau5+wAAAAAuGHWGcIfDIZvNVv3n+pwZBwAAAODs6gzh/fr105o1\naxQWFqa+ffvKYrHUGON0Os+77SgAAACAp9UZwt98800FBwdLkubNm2esIQAAAKC1qzOEr1+/Xn36\n9JG3t7cGDx5ssicAAACgVavznPC//e1vKikpMdkLAAAAcEGoM4S7cXIhAAAAgEao94jC8vJyt56U\nyekoAAAAgPvqDeGjRo1yaxJORwEAAADcV28If+GFF6pPSAEAAADQNOoM4RaLRQkJCQoLCzPZDwAA\nANDq8cFMAAAAwLA6Q3hKSop8fX1N9gIAAABcEOrcjvLUU0+Z7AMAAAC4YNS5Eg4AAACgeRDCAQAA\nAMMI4QAAAIBhhHAAAADAMEI4AAAAYBghHAAAADCMEA4AAAAYRggHAAAADCOEAwAAAIYRwgEAAADD\nCOEAAACAYYRwAAAAwDBCOAAAAGAYIRwAAAAwjBAOAAAAGEYIBwAAAAwjhAMAAACGEcIBAAAAwwjh\nAAAAgGGEcAAAAMAwQjgAAABgGCEcAAAAMIwQDgAAABhGCAcAAAAMI4QDAAAAhhHCAQAAAMMI4QAA\nAIBhhHAAAADAMEI4AAAAYBghHAAAADCMEA4AAAAYRggHAAAADCOEAwAAAIYRwgEAAADDCOEAAACA\nYYRwAAAAwDBCOAAAAGAYIRwAAAAwjBAOAAAAGEYIBwAAAAwzHsIzMzM1ceJExcfHKyUlRVu3bq13\n/BtvvKHf/va35zQHAAAA0JIYDeEOh0OpqamaMGGCNm7cqClTpig1NVV2u73GWLvdrmeeeUazZs2S\nxWJp1BwAAABAS2Q0hGdkZMhqtWrSpEmyWq0aP368wsLCtGrVqhpj77rrLh04cEA33nhjo+cAAAAA\nWiKjITwrK0uxsbEutejoaGVlZdUY+/TTT+vFF19UWFhYo+cAAAAAWiKjIdxut8vf39+l5u/vr9LS\n0hpjw8PDz3kOAAAAoCUyGsJrC8t2u10BAQFG5wAAAAA8yWgIj4mJUXZ2tkstOztbcXFxRucAAAAA\nPMloCE9MTJTD4VB6eroqKiq0cOFCFRQUKCkpyegcAAAAgCcZDeE2m01z587VkiVLNGTIEM2fP18v\nv/yy/Pz8NGPGDKWlpZ3THAAAAMD5wOJ0Op2ebsKk3NxcJScna+XKlYqMjPR0OwAAALgA8dh6AAAA\nwDBCOAAAAGAYIRwAAAAwjBAOAAAAGEYIBwAAAAwjhAMAAACGEcIBAAAAwwjhAAAAgGGEcAAAAMAw\nQjgAAABgGCEcAAAAMIwQDgAAABhGCAcAAAAMI4QDAAAAhhHCAQAAAMMI4QAAAIBhhHAAAADAMEI4\nAAAAYBghHAAAADCMEA4AAAAYRggHAAAADCOEAwAAAIYRwgEAAADDCOEAAACAYYRwAAAAwDBCOAAA\nAGAYIRwAAAAwjBAOAAAAGEYIBwAAAAwjhAMAAACGEcIBAAAAwwjhAAAAgGGEcAAAAMAwQjgAAABg\nGCEcAAAAMIwQDgAAABhGCAcAAAAMI4QDAAAAhhHCAQAAAMMI4QAAAIBhhHAAAADAMEI4AAAAYBgh\nHAAAADCMEA4AAAAYRggHAAAADCOEAwAAAIYRwgEAAADDCOEAAACAYYRwAAAAwDBCOAAAAGAYIRwA\nAAAwjBAOAAAAGEYIBwAAAAwjhAMAAACGEcIBAAAAwwjhAAAAgGGEcAAAAMAwQjgAAABgGCEcAAAA\nMIwQDgAAABhGCAcAAAAM8/Z0AwAAAM2poOS45m9brPUHt8jqZdWwbgM1uc91CrD5e7o1XMAI4QAA\noNVyVDiU9ulzOlR8tLr28b7VyinM1czkabJYLB7sDhcytqMAAIBW68sDm1wC+Bl7jmVpx5HdHugI\nOM14CM/MzNTEiRMVHx+vlJQUbd26tdZxS5cu1RVXXKH4+HjdcccdOnbsWPW1119/Xb1791ZCQoLi\n4+OVkJCgTZs2mXoLAADgPJFblFfntQMn6r4GNDejIdzhcCg1NVUTJkzQxo0bNWXKFKWmpsput7uM\n27Vrl9LS0jR79mytW7dO7du31/Tp06uvZ2Zmatq0adq8ebO++uorbd68WQMGDDD5VgAAwHkgsm2n\nOq91advRYCeAK6MhPCMjQ1arVZMmTZLVatX48eMVFhamVatWuYw7swrep08f2Ww2TZs2TZ9//rkK\nCgokSTt37lSPHj1Mtg4AAM5Dl3YdoA5twmrU40K7q09ETw90BJxmNIRnZWUpNjbWpRYdHa2srKx6\nx4WEhCg4OFhZWVkqLS1Vdna25s2bp6SkJF1zzTVatGiRkf4BAMD5xeZtU9rl92lYt4HysfrIz9tX\nyTFJ+sNld8kppxbv/Eh3LnlYk/99t2Z+9lftPZbt6ZZxgTB6Oordbpe/v+txQP7+/iotLXV7XH5+\nvgYMGKDJkydr6NCh2rJli1JTU9WhQwcNHz682d8DAAA4v7QPCNU9Q6fWqL/51UJ9sGdl9dfbD+/W\nnvy/6qkrH1JkcN3bWICmYHQlvK7AHRAQ4FLz8/Orc1xkZKTeeustDR8+XN7e3ho4cKDGjRunFStW\nNHv/AACgdSh2nNLH+1fXqJdVOvThnv94oCNcaIyG8JiYGGVnu/4zT3Z2tuLi4lxqsbGxLuMKCgpU\nVFSk2NhY7dixQ6+99prL+LKyMvn6+jZf4wAAoF4ny4pVWl569oEtxJHifJVXltd67UA9J6oATcVo\nCE9MTJTD4VB6eroqKiq0cOFCFRQUKCkpyWXcmDFj9PHHH2vz5s0qKyvTc889pxEjRig4OFiBgYGa\nM2eOPv74YzmdTq1du1Yffvihrr/+epNvBQAASNp3LEd/+GSWpi6+X79+7/d67su5Kior9nRbZ9Uh\nsL18rD61XqvvRBWgqRgN4TabTXPnztWSJUs0ZMgQzZ8/Xy+//LL8/Pw0Y8YMpaWlSZJ69uypmTNn\navr06Ro2bJjy8/P15JNPSpKioqL0/PPP66WXXlJCQoJmzpypWbNmqWdPPuEMAIBJBfbjmrnqee0r\nyJEkVTqrlHFgs575/GXPNuaGQFsbXRmTVKNus/romosu90BHuNBYnE6n09NNmJSbm6vk5GStXLlS\nkZGRnm4HAIDz1qIdH+qdr5fUeu1PyffrovYxhjtqmKqqKi3e9ZE+2fe5jpee0MXhP9GNfca2+L7R\nOhg9HQUAALQeR04dq/daSw+zXl5euv6Sq3T9JVd5uhVcgIw/th4AALQOsaHdaq1bZFFMHdcAnEYI\nBwAAjTIiaog6BXWoUR/efbA6B0V4oCPg/MF2FAAA0Ch+Pn567PLf673M5dqc97X8rDaN6J7IBxsB\nNxDCAQBAo4X4tdWvE27Qr3WDp1sBzitsRwEAAC1e7ok8bTu0U8VlpzzdCtAkWAkHAAAtVlHpSc1e\n+3ftOLJHkuRj9VHKxaM1odc1Hu4MODeEcAAA0GLN2fBWdQCXpPLKci34eqki23ZSYteEZr+/o8Kh\nYkeJQvzbysviJafTqa2HdmrTd9tks/poeNRgdW/Xtdn7QOtDCAcAAOesrMKh7MJvFeQbqC5tOzbJ\nnIX2E/rqu69rvfafrC+aNIRnHtmj9Qe3ytvLqmHdBinIN1ALti/R2gObVFbpUFhAO93Qa4x2HN2j\n1Tnrql+3dPdK/TJ+gq7mw6hoIEI4AAA4J5/s+1zp295TSbldktSzfazuvfRWhfqH6HDxUe0v+Eah\n/u3UMzy2QfNmHt0rp2p/sPe+ghx9uOc/GhV9qfx9/Nya7+ipY/rfbf+nTd9tr17FvqHPtZq/dbGW\n7/usetz7uz6p8dpjJYV6ecNbNepOOfWvre9pWLeBCvZr694bA8Rj6z3dDgAA57XMI3uV9ulzNeo9\n28cqMrizVu5fUx2kY9p104PD71Q7/2CXsVXOKm3Jy9S+ghy1DwjV0Mh4/fOrf+uznLVnvX+Xth01\n8/JpCvRtU++4knK7pi3/k/JLClzqse2itL/wm7Pe52zuGvIrjeg+5JznwYWDlXAAANBo/8n6otb6\nrvz92pW/36WWVfitfr/scV3UPlbJsZdqUJf+Kqtw6MnVL2nn0b3V4+ZtWVi9qn42B4sOaemelbqx\nz9h6x63OWVcjgEtqkgAuSb7etiaZBxcOQjgAAGi0orKTDRpfXF6izXnbtTlvu9r7t1N5VYVO/Ncc\n7gbwM7bmZVaH8NU567Rs76c6VlKoi8JiNL7X1eoUGK5lez5r0JwNEWRro/iOvZptfrROhHAAANBo\nvTr00JZDmY16bb69sEl6CLD5Szr9Icl5WxZW19cf3KKth3cqOqSr8ooPN8m9/luQb6B+f+lvZGMl\nHA3Ew3oAAECjdQwMN3YvH6/a1w5HRV+q8spyvbdzeY1rZRVl2pW/r9l6un3gTbqkw0XNNj9aL0I4\nAABotL0F2XVe87I0Xczw9rLqt4m36Cdh0dU1q5dV43r+VElRg3TMflwny4qb7H7uOnrqmPF7onVg\nOwoAAGi0tr6BdV7rG3Gxthza0ST3uf6SqzSka7yGdI1XVsG3KrAfV1xolEK+P2klxK+tfL19VVZR\n1uh7BNkCddLRsCAfFcJJa2gcVsIBAECjDY8aIpvVp0Y9xK+t7hh0kyLbdnJ7Lm8vb/lZfWvUbVYf\nJcckVX8dE9pNA7v0rQ7gkuTn7asrY4fXeK3FYlGv8LNvF7k4PE6/6D++zuvWWlb1Lw7/iXpH9Djr\n3EBtrGlpaWmebsKkoqIizZs3T7/85S/Vti2H6gMAcC78ffwUGxqlzKN7ZS8vlSR1CeqoaUm3q3Pb\nCI2MHqrwNqHyslhVUVmuskqHvCxe6tCmvTq0CdfxsiJJUmxolH5/6W0aGTNUO47sVrGjRJIU6h+i\n3yZOVbQbj4bv3aGHqpxVOlD0ncory9WlbUfdNvAmTep9rSwWi/JOHlFlVaWCfYPk5+2n0ooy+Xh5\na1T0UP3PkF8pLqy7yqsqtL/gG1U5qyRJF4XF6M7BN+vnfcbJUVmuY/ZCtfEJ0BWxSbp94E3ytrKp\nAI3Dw3oAAMA5q6qq0v7Cb+Tt5e1WYD7jlKNE5VUVCvnR0yadTqf2F3yjSmel4kK7y+plbVAvlVWV\nKqtwVJ+aUt84L4uXLBaLS72o9KRyjucqLKCdurTt2KB7A+7i1zcAAHDOvLy8XD406a42toAaNYvF\noriw7o3uxeplPWsAPzOuNm39gtS348WNvj/gDvaEAwAAAIYRwgEAAADDCOEAAACAYYRwAAAAwDBC\nOAAAAGAYIRwAAAAwjBAOAAAAGEYIBwAAAAwjhAMAAACGEcIBAAAAwwjhAAAAgGGEcAAAAMAwQjgA\nAABgGCEcAAAAMIwQDgAAABhGCAcAAAAMI4QDAAAAhhHCAQAAAMMI4QAAAIBhhHAAAADAMEI4AAAA\nYBghHAAAADCMEA4AAAAYRggHAAAADCOEAwAAAIYRwgEAAADDCOEAAACAYYRwAAAAwDBCOAAAAGAY\nIRwAAAAwjBAOAAAAGEYIBwAAAAwjhAMAAACGEcIBAAAAwwjhAAAAgGGEcAAAAMAwQjgAAABgGCEc\nAAAAMIwQDgAAABhGCAcAAAAMI4QDAAAAhhHCAQAAAMMI4QAAAIBhhHAAAADAMOMhPDMzUxMnTlR8\nfLxSUlK0devWWsctXbpUV1xxheLj43XHHXfo2LFjDZ4DAAAAaImMhnCHw6HU1FRNmDBBGzdu1JQp\nU5Samiq73e4ybteuXUpLS9Ps2bO1bt06tW/fXtOnT2/QHAAAAEBLZTSEZ2RkyGq1atKkSbJarRo/\nfrzCwsK0atUql3FnVsH79Okjm82madOm6fPPP1dBQYHWrl3r1hwAAABAS2U0hGdlZSk2NtalFh0d\nraysrHrHhYSEKCQkRFlZWcrOznZrDgAAAKCl8jZ5M7vdLn9/f5eav7+/SktLzzrOz89PpaWlbs9R\nl8rKSknSoUOHGto+AAAA0CAdO3aUt3fNyG00hNcVuAMCAlxqZwJ3bePcnaMuR48elSTddNNNDW0f\nAAAAaJCVK1cqMjKyRt1oCI+JiVF6erpLLTs7W2PHjnWpxcbGKjs7u/rrgoICFRUVKTY2VsXFxW7N\nUZfevXsrPT1d4eHhslqtjXwnAAAAwNl17Nix1rrREJ6YmCiHw6H09HRNmjRJixcvVkFBgZKSklzG\njRkzRjfffLPGjx+vXr166bnnntOIESMUHBzs9hx18fPz08CBA5vj7QEAAABusTidTqfJG+7Zs0d/\n/OMftXfvXkVFRSktLU19+/bVjBkzZLFYlJaWJklavny5Zs+erWPHjmngwIF68sknFRoaWu8cAAAA\nwPnAeAgHAAAALnQ8th4AAAAwjBAOAAAAGEYIBwAAAAwjhAMAAACGEcIBAECrlJub6+kWgDoZPScc\nwLl57bXXlJ2draeeeqrW67m5uXrkkUf0xhtvSJL++c9/qqioqPq60+mUxWJRr169dMUVV0iS7rjj\nDj344IOKjo5u9v4BwJSdO3fqN7/5jdasWVPnmOXLl+vrr7/WtGnTdPLkSf3jH/+QxWKpvn7mZ+bo\n0aPVs2dPFRcX6/bbb9c///lP2Ww2E28DrRghHGhFZsyYobvvvrv668GDB6tXr141xu3YsaP6z/fe\ne68eeeSRGk+iBYDzWVFRkSorK+u8XlxcrOeff14LFy6sHv+rX/1KISEhNcae+ZkZGBion/3sZ5oz\nZ47uvffe5mkcFwy2owA/8sEHH+j666/XkCFDNGTIEM2YMaP62uWXX665c+dqxIgRGjx4sGbMmKHy\n8nJJ0vTp0zVr1iyNHTtWCQkJmjp1qvLy8qpf+/HHH+vaa6/V4MGD9etf/1o5OTmSpIMHD2rQoEGa\nO3eukpKSNGzYMJdV7ry8PN1yyy1KSEjQ9ddfr+zs7Dp737Jliw4fPqwBAwZU1+p6DMCP6z179lRl\nZaUyMjIa9h8LAGqxfv16TZgwQfHx8br22mv1xRdfSJJKSkr02GOPKSkpSUlJSXrkkUdUXFwsSXrp\npZd0//3364477lB8fLzGjBlT/TpJ+uijjzRmzBglJCTohhtuqA7FBw4cUGpqqkaOHKn+/fvr5z//\nubKzs1VQUKDbbrtNhYWFSkhI0IkTJ2r0OX/+fA0dOlRt2rRp0PtLSUnRO++8U9070FiEcOB7Bw8e\n1KOPPqrHH39c69at0/z587V06VKXcLp8+XK9++67WrZsmbZv364XX3yx+tq7776rP/3pT8rIyFBE\nRET1Ksm2bdv08MMPa+bMmVq7dq1GjRql22+/vXqF5uTJkzp48KA+/fRTzZkzR/Pnz9fWrVslSffc\nc4+ioqK0bt06zZw5U59++mmd/S9atEijR492qf34n1Xrq//0pz+tXg0CgMYqKChQamqqpkyZos2b\nN+u+++7T3XffreLiYj366KPKycnR0qVLtWzZMuXn57ssdCxfvly//vWvtWHDBg0fPlx/+tOfJJ1+\nSvYDDzyg6dOna/PmzRo3bpzuvvtuOZ1OPfroo4qLi9Onn36qjIwMtWvXTq+88opCQ0M1d+5ctWvX\nTps3b1ZwcHCNXmv7memOwMBA9evXT8uWLWv8fyhAhHCgWkREhJYuXarevXvr+PHjKiwsVHBwsA4f\nPlw95p577lH79u0VFhamO++8U0uXLq2+NmHCBPXt21c2m00PPfSQtm3bpoMHD2rRokVKSUlR//79\nZbVa9Ytf/EIVFRVat25d9Wtvu+02+fj4qF+/foqJiVFOTo5yc3O1fft2TZs2TT4+PurVq5cmTpxY\nZ/8bNmxQ3759G/Xee/furQ0bNjTqtQBwxmeffaaoqChdd911slgsGjVqlN588035+Pjoo48+0v33\n36+QkBAFBQXpwQcf1LJly+RwOCRJ/fv315AhQ+Tt7a2xY8fqm2++kXR6FXzEiBEaNmyYJOmmm27S\n7Nmz5XQ6NWvWLN11110qLy9Xbm6uQkJCXH5m1+Xo0aP69ttv1adPn0a9z969e2v9+vWNei1wBnvC\nge9ZrVa98847WrRokdq0aaNLLrlEFRUVLls3unXrVv3njh07Kj8/v/rrqKio6j+3bdtWfn5+ys/P\nV15entavX6/FixdLOr0VpKKiQt99952ioqJksVjUrl276td6e3vL6XTq6NGjCggIcPmn0sjISJd7\n/tjhw4cVHh7uUnNnO4okhYeHKz8/v/pDSADQGPn5+YqIiHCp9enTR0ePHlVlZaU6d+5cXe/SpYuc\nTmd1aA4NDa2+dubnYF1z9uvXT5K0b98+/eUvf9GRI0cUFxcni8Wiqqqqs/Z56NAhBQQEKCAgoFHv\nMzw83GUhBWgMVsKB733wwQdavny53n//fX300UeaPXt2jU+//3iF5eDBg+rUqVOt1woLC1VaWqqO\nHTsqPDxcU6dO1fr167V+/Xpt2LBB77//vsaOHVtvPxERESopKXE53aS+FR6LxVLjQ0jubkepqqqS\nxWIhgAM4JxERETV+Tr322ms6deqUbDabvvvuu+r6gQMH5OXl5bIIUdecR44ccak9++yzOnbsmO6+\n+27deeed+uKLL/Tmm29q0KBBbvXp5eXlVlivS2Vlpby8iFA4N/wfBHyvuLhY3t7e8vb2lsPh0Ny5\nc3Xw4MHqD19K0t/+9jcdP35chw8f1quvvqqUlJTqawsWLNC+fftUWlqqZ555RkOHDlVERISuu+46\nLViwQJmZmZKkFStW6Oqrr67+y6iu1erOnTtr8ODBevrpp1VWVqY9e/ZowYIFdfbfqVMnHT161KXm\n7kr4kSNH1KFDh3r+6wDA2V122WU6ePCglixZoqqqKv3nP//RP/7xD7Vr107XXnutnn32WRUWFurE\niRP685//rJEjRyowMLDWuc78nLrqqqu0Zs0aZWRkyOl0Kj09XR9++KF8fX3lcDjk5+cn6fSH0995\n5x1VVFRIkmw2m8rKylx+hp/RqVMnlZaW1vhwZUN+Zv54EQZoDLajAN9LSUnR2rVrdfnllyssLExX\nXXWVJk6cqKysrOoxcXFxuv7662W323XDDTfotttuq742cOBAPfDAAzpw4IAuvfRSPfvss5KkQYMG\nafr06XrggQeUl5enzp07/3979xcS1bbAcfzbMQ1TUoRAS1Or09gfZMZAMYXAcYoM+4MMQVRQBvkQ\nRS/aqElJmQ+SCSYx5YuBWIqlIKWlEkSC1cMYiIRpghpGgVCGmM7ch2jumWvdo/d6ppx+H5iHvdZe\na6+9H2Z+LtesoaKigpiYGEZGRmbNPv/1+MqVKxQWFrJt2zbCw8OxWCxMTk5+d/zJyck4HA7S0tLc\nZS0tLbS3t886d+XKlWzZssV97HA4SE5OnucTExHxFBoait1up6SkhOLiYiIjI6mqqiIkJASbzUZZ\nWRmZmZl8+fIFs9lMfn7+D/v69l4YGxtLeXk5JSUljI6OYjAYsNvtBAcHc/78eQoKCpiZmSE+Pp68\nvDxKSkpwOp0YDAbWr19PUlISTU1NREVFufsOCwvjzz//xOFwuNeaw9dZ+2+hHv69T7jJZPIYW09P\nD3v37l2oxya/qSWuH/3ZJyIe0tLSKCsrIyEhYVadzWYjOjqanJycnzCyr3p6erDZbLS0tMy7bVZW\nFrm5uSQlJf0DIxMR+fXcuHGD4eFhLly4MK924+Pj7N69m9bW1h/O4ovMhZajiPiI+Ph4Vq1aRVdX\n17za9fT0EBgYqAAuIr+VgwcP0tXVNe/9vhsaGjhw4IACuPzfFMJF5mgxfGmxqKiI69evz6vNtWvX\n5j0TJCKy2AUFBXHmzBmqqqrm3Objx4+0t7f/1P96iu/QchQRERERES/TTLiIiIiIiJcphIuIiIiI\neJlCuIiIiIiIlymEi4iIiIh4mUK4iIgPiYuLIy4ujv7+/ll1L1++JC4ujiNHjsy5v7a2NvdPht+9\ne5fU1NQFGyvA0aNHqaysXNA+RUQWA4VwEREf4+/vz6NHj2aVt7W18ccfc3/bHx0d5dSpU0xMTCzk\n8EREBIVwERGfk5iY+MMQbjQa59yP0+lcFPvji4gsRgrhIiI+xmKx0Nvby9jYmLvs1atXTExMkJCQ\n4HHuwMAA2dnZGI1G0tPTqaioYGZmBoD09HQAMjIyuHfvnruN3W4nNTUVk8mEzWZjamrKXffkyROs\nVitGoxGLxUJdXZ3H9RobG0lPT8dkMlFcXOy+lojI70YhXETEx0RGRmIwGDxmw9va2rBYLB4z21NT\nUxw/fhyDwUBzczMlJSW0trZSXl4OQH19PS6Xi9raWjIyMgB4//49PT091NTUUFlZyf3796mvrwfg\n2bNn5OTksGvXLpqamsjJyaG0tJQHDx4A8PTpU4qKijhx4gSNjY04nU66u7u99VhERH4pCuEiIj7I\nbDbT3t7uPn748CE7duzwOKe5uZmgoCByc3NZs2YNiYmJFBQUUFNTg8vlIiwsDIDQ0FACAgIA8PPz\no7S0lLVr15KSkkJKSgq9vb0A3Lp1i7S0NI4dO0Z0dDRZWVkcOnQIu90OwO3bt9m5cydWq5XY2FjO\nnTtHRESENx6HiMgvRyFcRMQHWSwWuru7+fTpE0NDQ4yNjZGUlORxzuvXrxkYGMBkMrlfJ0+eZHp6\nmuHh4e/2GxISQnBwsPt4xYoVTE5OuvuLj4/3OD8hIYHBwUF3/ebNm911fn5+bNy4cUHuV0RksVn6\nswcgIiILz2AwsHr1ajo7O3n37h1ms3nWzigzMzNs3bqVS5cuzWofERHhsab8m/+2u8qyZctmlTmd\nTqanpwFYsmQJLpfLo37pUn0MicjvSTPhIiI+ymw209HR8d2lKADr1q1jaGiI8PBwoqKiiIqK4u3b\nt5SVlf1PO6PExsbicDg8yl68eEFMTAwAGzZs8Kh3uVz09fXN/8ZERHyAQriIiI+yWCw8fvyYwcFB\nUhs7kTIAAAEpSURBVFJSZtXv2bMHgLNnz9Lf38/z588pLCzE39+fgIAAli9fDkBfXx+fP3/+2+tl\nZ2fT2dnJzZs3GRoaoqGhgbq6OvePAx0+fJiOjg5qamp48+YNly9fZmRkZAHvWERk8VAIFxHxIX+d\nvTYajQQFBbF9+/bvLvsIDAykurqa8fFxrFYrp0+fJiUlhYsXLwJfv5C5f/9+8vLyuHPnzt9ee9Om\nTVy9epWmpiYyMzOprq4mPz8fq9XqHk95eTm1tbXs27ePDx8+uLdBFBH53Sxx/ecCPRERERER+Udp\nJlxERERExMsUwkVEREREvEwhXERERETEyxTCRURERES8TCFcRERERMTLFMJFRERERLxMIVxERERE\nxMsUwkVEREREvEwhXERERETEy/4F4gCxbpu4ClYAAAAASUVORK5CYII=\n", | |
| "text/plain": [ | |
| "<matplotlib.figure.Figure at 0x10940a9b0>" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "output_type": "display_data" | |
| } | |
| ], | |
| "source": [ | |
| "plt.figure(figsize=(12, 8))\n", | |
| "sns.stripplot(x='Method', y='Time (s)',\n", | |
| " data=timings.stack().reset_index().rename(columns={0: 'Time (s)',\n", | |
| " 'level_1': 'Method'}),\n", | |
| " jitter=True, size=7)\n", | |
| "sns.despine()\n", | |
| "plt.savefig('concat-append.svg', transparent=True)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 30, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "CPU times: user 6.9 s, sys: 80.7 ms, total: 6.98 s\n", | |
| "Wall time: 7.71 s\n" | |
| ] | |
| }, | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "0 2014-01-01\n", | |
| "1 2014-01-01\n", | |
| "2 2014-01-01\n", | |
| "3 2014-01-01\n", | |
| "4 2014-01-01\n", | |
| " ... \n", | |
| "471944 2014-01-31\n", | |
| "471945 2014-01-31\n", | |
| "471946 2014-01-31\n", | |
| "471947 2014-01-31\n", | |
| "471948 2014-01-31\n", | |
| "Name: fl_date, dtype: datetime64[ns]" | |
| ] | |
| }, | |
| "execution_count": 30, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "import datetime\n", | |
| "\n", | |
| "%time df.fl_date.apply(lambda x: datetime.datetime.strptime(x, \"%Y-%m-%d\"))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 29, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "CPU times: user 128 ms, sys: 1.41 ms, total: 130 ms\n", | |
| "Wall time: 130 ms\n" | |
| ] | |
| }, | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "0 2014-01-01\n", | |
| "1 2014-01-01\n", | |
| "2 2014-01-01\n", | |
| "3 2014-01-01\n", | |
| "4 2014-01-01\n", | |
| " ... \n", | |
| "471944 2014-01-31\n", | |
| "471945 2014-01-31\n", | |
| "471946 2014-01-31\n", | |
| "471947 2014-01-31\n", | |
| "471948 2014-01-31\n", | |
| "Name: fl_date, dtype: datetime64[ns]" | |
| ] | |
| }, | |
| "execution_count": 29, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "%time pd.to_datetime(df.fl_date)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 31, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<table border=\"0\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th>fl_date</th>\n", | |
| " <th>unique_carrier</th>\n", | |
| " <th>airline_id</th>\n", | |
| " <th>tail_num</th>\n", | |
| " <th>fl_num</th>\n", | |
| " <th>origin_airport_id</th>\n", | |
| " <th>origin_airport_seq_id</th>\n", | |
| " <th>origin_city_market_id</th>\n", | |
| " <th>origin</th>\n", | |
| " <th>origin_city_name</th>\n", | |
| " <th>...</th>\n", | |
| " <th>cancelled</th>\n", | |
| " <th>cancellation_code</th>\n", | |
| " <th>diverted</th>\n", | |
| " <th>distance</th>\n", | |
| " <th>carrier_delay</th>\n", | |
| " <th>weather_delay</th>\n", | |
| " <th>nas_delay</th>\n", | |
| " <th>security_delay</th>\n", | |
| " <th>late_aircraft_delay</th>\n", | |
| " <th>unnamed: 36</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th>0</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>AA</td>\n", | |
| " <td>19805</td>\n", | |
| " <td>N338AA</td>\n", | |
| " <td>1</td>\n", | |
| " <td>12478</td>\n", | |
| " <td>1247802</td>\n", | |
| " <td>31703</td>\n", | |
| " <td>JFK</td>\n", | |
| " <td>New York, NY</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>2475.0</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>1</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>AA</td>\n", | |
| " <td>19805</td>\n", | |
| " <td>N339AA</td>\n", | |
| " <td>2</td>\n", | |
| " <td>12892</td>\n", | |
| " <td>1289203</td>\n", | |
| " <td>32575</td>\n", | |
| " <td>LAX</td>\n", | |
| " <td>Los Angeles, CA</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>2475.0</td>\n", | |
| " <td>111.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>2</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>AA</td>\n", | |
| " <td>19805</td>\n", | |
| " <td>N335AA</td>\n", | |
| " <td>3</td>\n", | |
| " <td>12478</td>\n", | |
| " <td>1247802</td>\n", | |
| " <td>31703</td>\n", | |
| " <td>JFK</td>\n", | |
| " <td>New York, NY</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>2475.0</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>3</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>AA</td>\n", | |
| " <td>19805</td>\n", | |
| " <td>N367AA</td>\n", | |
| " <td>5</td>\n", | |
| " <td>11298</td>\n", | |
| " <td>1129803</td>\n", | |
| " <td>30194</td>\n", | |
| " <td>DFW</td>\n", | |
| " <td>Dallas/Fort Worth, TX</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>3784.0</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>4</th>\n", | |
| " <td>2014-01-01</td>\n", | |
| " <td>AA</td>\n", | |
| " <td>19805</td>\n", | |
| " <td>N364AA</td>\n", | |
| " <td>6</td>\n", | |
| " <td>13830</td>\n", | |
| " <td>1383002</td>\n", | |
| " <td>33830</td>\n", | |
| " <td>OGG</td>\n", | |
| " <td>Kahului, HI</td>\n", | |
| " <td>...</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>0.0</td>\n", | |
| " <td>3711.0</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " <td>NaN</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "<p>5 rows × 37 columns</p>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " fl_date unique_carrier airline_id tail_num fl_num origin_airport_id \\\n", | |
| "0 2014-01-01 AA 19805 N338AA 1 12478 \n", | |
| "1 2014-01-01 AA 19805 N339AA 2 12892 \n", | |
| "2 2014-01-01 AA 19805 N335AA 3 12478 \n", | |
| "3 2014-01-01 AA 19805 N367AA 5 11298 \n", | |
| "4 2014-01-01 AA 19805 N364AA 6 13830 \n", | |
| "\n", | |
| " origin_airport_seq_id origin_city_market_id origin origin_city_name \\\n", | |
| "0 1247802 31703 JFK New York, NY \n", | |
| "1 1289203 32575 LAX Los Angeles, CA \n", | |
| "2 1247802 31703 JFK New York, NY \n", | |
| "3 1129803 30194 DFW Dallas/Fort Worth, TX \n", | |
| "4 1383002 33830 OGG Kahului, HI \n", | |
| "\n", | |
| " ... cancelled cancellation_code diverted distance carrier_delay \\\n", | |
| "0 ... 0.0 NaN 0.0 2475.0 NaN \n", | |
| "1 ... 0.0 NaN 0.0 2475.0 111.0 \n", | |
| "2 ... 0.0 NaN 0.0 2475.0 NaN \n", | |
| "3 ... 0.0 NaN 0.0 3784.0 NaN \n", | |
| "4 ... 0.0 NaN 0.0 3711.0 NaN \n", | |
| "\n", | |
| " weather_delay nas_delay security_delay late_aircraft_delay unnamed: 36 \n", | |
| "0 NaN NaN NaN NaN NaN \n", | |
| "1 0.0 0.0 0.0 0.0 NaN \n", | |
| "2 NaN NaN NaN NaN NaN \n", | |
| "3 NaN NaN NaN NaN NaN \n", | |
| "4 NaN NaN NaN NaN NaN \n", | |
| "\n", | |
| "[5 rows x 37 columns]" | |
| ] | |
| }, | |
| "execution_count": 31, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "df.head()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 11, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "'2014-01-01'" | |
| ] | |
| }, | |
| "execution_count": 11, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "strdates[0]" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 21, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "'2014-01-01'" | |
| ] | |
| }, | |
| "execution_count": 21, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "datetime.datetime(2014, 1, 1).strftime(\"%Y-%m-%d\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 18, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "'2014-01-01'" | |
| ] | |
| }, | |
| "execution_count": 18, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "df.fl_date[0]" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 24, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "datetime.datetime(2014, 1, 1, 0, 0)" | |
| ] | |
| }, | |
| "execution_count": 24, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "datetime.datetime.strptime(df.fl_date[0], \"%Y-%m-%d\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "That closes out part one. We've seen that..." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Vectorization" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=288&DB_Short_Name=Aviation%20Support%20Tables" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 34, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "!mv /Users/tom.augspurger/Desktop/227597776_T_MASTER_CORD.zip ." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 35, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Archive: 227597776_T_MASTER_CORD.zip\r\n", | |
| " inflating: 227597776_T_MASTER_CORD.csv \r\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "!unzip 227597776_T_MASTER_CORD.zip\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 112, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "coord = (pd.read_csv(\"227597776_T_MASTER_CORD.csv\", index_col=['AIRPORT'])\n", | |
| " .query(\"AIRPORT_IS_LATEST == 1\")[['LATITUDE', 'LONGITUDE']]\n", | |
| " .dropna()\n", | |
| " .sample(500))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 113, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "import math\n", | |
| "\n", | |
| "def gcd1(lat1, lng1, lat2, lng2):\n", | |
| " '''\n", | |
| " http://www.johndcook.com/blog/python_longitude_latitude/\n", | |
| " '''\n", | |
| " degrees_to_radians = math.pi / 180.0\n", | |
| " ϕ1 = (90 - lat1) * degrees_to_radians\n", | |
| " ϕ2 = (90 - lat2) * degrees_to_radians\n", | |
| " \n", | |
| " θ1 = lng1 * degrees_to_radians\n", | |
| " θ2 = lng2 * degrees_to_radians\n", | |
| " \n", | |
| " cos = (math.sin(ϕ1) * math.sin(ϕ2) * math.cos(θ1 - θ2) +\n", | |
| " math.cos(ϕ1) * math.cos(ϕ2))\n", | |
| " try:\n", | |
| " arc = math.acos(cos)\n", | |
| " except ValueError:\n", | |
| " return float('nan')\n", | |
| " return arc * 6373 # radius of earth, in kilometers" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 114, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "data": { | |
| "text/html": [ | |
| "<div>\n", | |
| "<table border=\"0\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: right;\">\n", | |
| " <th></th>\n", | |
| " <th></th>\n", | |
| " <th>LATITUDE_1</th>\n", | |
| " <th>LONGITUDE_1</th>\n", | |
| " <th>LATITUDE_2</th>\n", | |
| " <th>LONGITUDE_2</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <th rowspan=\"5\" valign=\"top\">LTM</th>\n", | |
| " <th>LTM</th>\n", | |
| " <td>3.372778</td>\n", | |
| " <td>-59.789444</td>\n", | |
| " <td>3.372778</td>\n", | |
| " <td>-59.789444</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>ASJ</th>\n", | |
| " <td>3.372778</td>\n", | |
| " <td>-59.789444</td>\n", | |
| " <td>28.430556</td>\n", | |
| " <td>129.712500</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>MIJ</th>\n", | |
| " <td>3.372778</td>\n", | |
| " <td>-59.789444</td>\n", | |
| " <td>6.083333</td>\n", | |
| " <td>171.733333</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>OLV</th>\n", | |
| " <td>3.372778</td>\n", | |
| " <td>-59.789444</td>\n", | |
| " <td>34.978611</td>\n", | |
| " <td>-89.786944</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <th>CCP</th>\n", | |
| " <td>3.372778</td>\n", | |
| " <td>-59.789444</td>\n", | |
| " <td>-36.771389</td>\n", | |
| " <td>-73.062222</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table>\n", | |
| "</div>" | |
| ], | |
| "text/plain": [ | |
| " LATITUDE_1 LONGITUDE_1 LATITUDE_2 LONGITUDE_2\n", | |
| "LTM LTM 3.372778 -59.789444 3.372778 -59.789444\n", | |
| " ASJ 3.372778 -59.789444 28.430556 129.712500\n", | |
| " MIJ 3.372778 -59.789444 6.083333 171.733333\n", | |
| " OLV 3.372778 -59.789444 34.978611 -89.786944\n", | |
| " CCP 3.372778 -59.789444 -36.771389 -73.062222" | |
| ] | |
| }, | |
| "execution_count": 114, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "idx = pd.MultiIndex.from_product([coord.index, coord.index])\n", | |
| "\n", | |
| "pairs = pd.concat([\n", | |
| " coord.reindex(idx.get_level_values(0)).add_suffix('_1').reset_index(drop=True),\n", | |
| " coord.reindex(idx.get_level_values(1)).add_suffix('_2').reset_index(drop=True)\n", | |
| "], axis=1)\n", | |
| "pairs.index = idx\n", | |
| "pairs.head()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 115, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "CPU times: user 16.5 s, sys: 185 ms, total: 16.7 s\n", | |
| "Wall time: 16.9 s\n" | |
| ] | |
| }, | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "LTM LTM 0.000000\n", | |
| " ASJ 16340.817049\n", | |
| " MIJ 14202.972915\n", | |
| " OLV 4681.525982\n", | |
| " CCP 4672.416679\n", | |
| " ... \n", | |
| "NOP JSE 9704.619652\n", | |
| " A78 8622.929560\n", | |
| " KPN 8022.288320\n", | |
| " CNE 12591.973820\n", | |
| " NOP NaN\n", | |
| "dtype: float64" | |
| ] | |
| }, | |
| "execution_count": 115, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "%%time\n", | |
| "pairs.apply(lambda x: gcd1(x['LATITUDE_1'], x['LONGITUDE_1'],\n", | |
| " x['LATITUDE_2'], x['LONGITUDE_2']), axis=1)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 116, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "def gcd2(lat1, lng1, lat2, lng2):\n", | |
| " '''\n", | |
| " Vectorized versoin of `gcd1` using numpy\n", | |
| " http://www.johndcook.com/blog/python_longitude_latitude/\n", | |
| " '''\n", | |
| " ϕ1 = np.deg2rad(90 - lat1)\n", | |
| " ϕ2 = np.deg2rad(90 - lat2)\n", | |
| " \n", | |
| " θ1 = np.deg2rad(lng1)\n", | |
| " θ2 = np.deg2rad(lng2)\n", | |
| " \n", | |
| " cos = (np.sin(ϕ1) * np.sin(ϕ2) * np.cos(θ1 - θ2) +\n", | |
| " np.cos(ϕ1) * np.cos(ϕ2))\n", | |
| " arc = np.arccos(cos)\n", | |
| " return arc * 6373" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 117, | |
| "metadata": { | |
| "collapsed": false | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "CPU times: user 45.8 ms, sys: 13 ms, total: 58.8 ms\n", | |
| "Wall time: 48.7 ms\n" | |
| ] | |
| }, | |
| { | |
| "data": { | |
| "text/plain": [ | |
| "LTM LTM 0.000000\n", | |
| " ASJ 16340.817049\n", | |
| " MIJ 14202.972915\n", | |
| " OLV 4681.525982\n", | |
| " CCP 4672.416679\n", | |
| " ... \n", | |
| "NOP JSE 9704.619652\n", | |
| " A78 8622.929560\n", | |
| " KPN 8022.288320\n", | |
| " CNE 12591.973820\n", | |
| " NOP NaN\n", | |
| "dtype: float64" | |
| ] | |
| }, | |
| "execution_count": 117, | |
| "metadata": {}, | |
| "output_type": "execute_result" | |
| } | |
| ], | |
| "source": [ | |
| "%%time\n", | |
| "gcd(pairs.LATITUDE_1, pairs.LONGITUDE_1,\n", | |
| " pairs.LATITUDE_2, pairs.LONGITUDE_2)" | |
| ] | |
| } | |
| ], | |
| "metadata": { | |
| "kernelspec": { | |
| "display_name": "Python 3", | |
| "language": "python", | |
| "name": "python3" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython3", | |
| "version": "3.5.1" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 0 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment