Last active
August 29, 2015 14:17
-
-
Save dannguyen/0d19d59d6e80cd069c5e to your computer and use it in GitHub Desktop.
Revisions
-
dannguyen revised this gist
Mar 13, 2015 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -27,7 +27,7 @@ unzip -o "*.zip" wc -l *.csv ``` 2014 comes in at 45,787 reports, which is nearly 5 times fewer than in 2013 (and more than 10 times fewer than 2012): ``` 160852 2003.csv -
dannguyen revised this gist
Mar 13, 2015 . 1 changed file with 171 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,3 +1,5 @@ The NYPD just released their [2014 Stop and Frisk data, and in CSV format too](http://www.nyc.gov/html/nypd/html/analysis_and_planning/stop_question_and_frisk_report.shtml)! Here's some Bash commands to get the data and scope it out (see the previous guide [here on how to graph it with R](https://gist.github.com/dannguyen/67ece10c6132282b1da2)) ## Gettin the data Using bash: @@ -19,3 +21,172 @@ done unzip -o "*.zip" ``` #### Do a quick count by year ```sh wc -l *.csv ``` 2014 comes in at 45,788 reports, which is nearly 5 times fewer than in 2013 (and more than 10 times fewer than 2012): ``` 160852 2003.csv 313524 2004.csv 398192 2005.csv 506492 2006.csv 472097 2007.csv 540303 2008.csv 581169 2009.csv 601286 2010.csv 685725 2011.csv 532912 2012.csv 191852 2013.csv 45788 2014.csv 5030192 total ``` #### Inspect the headers Let's see if they made the headers all uniform, which would make this an easy import into a database: ```sh for year in $(seq 2003 2014); do fname="$year.csv" head -n 1 $fname | tr ',' $'\n' | tr '[:upper:]' '[:lower:]' done | sort | uniq -c | sort -rn ``` **BLAAARGH!!!!** ``` 12 zip 12 year 12 ycoord 12 xcoord 12 weight 12 typeofid 12 trhsloc 12 timestop 12 sumoffen 12 sumissue 12 state 12 sex 12 ser_num 12 sector 12 searched 12 sb_outln 12 sb_other 12 sb_hdobj 12 sb_admis 12 riflshot 12 rf_verbl 12 rf_vcrim 12 rf_vcact 12 rf_rfcmp 12 rf_othsw 12 rf_knowl 12 rf_furt 12 rf_bulg 12 rf_attir 12 revcmd 12 repcmd 12 recstat 12 radio 12 race 12 post 12 pistol 12 pf_wall 12 pf_ptwep 12 pf_pepsp 12 pf_other 12 pf_hcuff 12 pf_hands 12 pf_grnd 12 pf_drwep 12 pf_baton 12 perstop 12 perobs 12 pct 12 othrweap 12 othpers 12 othfeatr 12 offverb 12 offunif 12 offshld 12 officrid 12 machgun 12 knifcuti 12 inout 12 ht_inch 12 ht_feet 12 haircolr 12 frisked 12 eyecolor 12 explnstp 12 dob 12 datestop 12 cs_vcrim 12 cs_other 12 cs_objcs 12 cs_lkout 12 cs_furtv 12 cs_drgtr 12 cs_descr 12 cs_cloth 12 cs_casng 12 cs_bulge 12 crimsusp 12 contrabn 12 compyear 12 comppct 12 city 12 build 12 beat 12 asltweap 12 arstoffn 12 arstmade 12 aptnum 12 age 12 adtlrept 12 addrtyp 12 ac_time 12 ac_stsnd 12 ac_rept 12 ac_proxm 12 ac_other 12 ac_inves 12 ac_incid 12 ac_evasv 12 ac_cgdir 12 ac_assoc 11 stname 11 stinter 11 rescode 11 premtype 11 premname 11 linecm 11 dettypcm 11 detailcm 11 crossst 11 addrpct 11 addrnum 4 forceuse 1 wepfound 1 strname 1 strintr 1 rescod 1 prenam 1 premtyp 1 dettyp_c 1 details_ 1 detail1_ 1 crossst 1 adrpct 1 adrnum ``` I'll deal with this tomorrow. It'll probably use the same transformations as the [ones I did on the previous files](https://gist.github.com/dannguyen/67ece10c6132282b1da2) -
dannguyen revised this gist
Mar 13, 2015 . 1 changed file with 8 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -11,4 +11,11 @@ for year in $(seq 2003 2014); do echo $url curl -sO $url done ``` #### Unzip the files ```sh unzip -o "*.zip" ``` -
dannguyen created this gist
Mar 13, 2015 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,14 @@ ## Gettin the data Using bash: #### Download the files ```sh baseurl="http://www.nyc.gov/html/nypd/downloads/zip/analysis_and_planning/YYYY_sqf_csv.zip" for year in $(seq 2003 2014); do url=$(echo $baseurl | sed "s/YYYY/$year/") echo $url curl -sO $url done ```