Skip to content

Instantly share code, notes, and snippets.

@dannguyen
Last active August 29, 2015 14:17
Show Gist options
  • Select an option

  • Save dannguyen/0d19d59d6e80cd069c5e to your computer and use it in GitHub Desktop.

Select an option

Save dannguyen/0d19d59d6e80cd069c5e to your computer and use it in GitHub Desktop.

Revisions

  1. dannguyen revised this gist Mar 13, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion nypd-stop-and-frisk-2014.md
    Original file line number Diff line number Diff line change
    @@ -27,7 +27,7 @@ unzip -o "*.zip"
    wc -l *.csv
    ```

    2014 comes in at 45,788 reports, which is nearly 5 times fewer than in 2013 (and more than 10 times fewer than 2012):
    2014 comes in at 45,787 reports, which is nearly 5 times fewer than in 2013 (and more than 10 times fewer than 2012):

    ```
    160852 2003.csv
  2. dannguyen revised this gist Mar 13, 2015. 1 changed file with 171 additions and 0 deletions.
    171 changes: 171 additions & 0 deletions nypd-stop-and-frisk-2014.md
    Original file line number Diff line number Diff line change
    @@ -1,3 +1,5 @@
    The NYPD just released their [2014 Stop and Frisk data, and in CSV format too](http://www.nyc.gov/html/nypd/html/analysis_and_planning/stop_question_and_frisk_report.shtml)! Here's some Bash commands to get the data and scope it out (see the previous guide [here on how to graph it with R](https://gist.github.com/dannguyen/67ece10c6132282b1da2))

    ## Gettin the data

    Using bash:
    @@ -19,3 +21,172 @@ done
    unzip -o "*.zip"
    ```

    #### Do a quick count by year

    ```sh
    wc -l *.csv
    ```

    2014 comes in at 45,788 reports, which is nearly 5 times fewer than in 2013 (and more than 10 times fewer than 2012):

    ```
    160852 2003.csv
    313524 2004.csv
    398192 2005.csv
    506492 2006.csv
    472097 2007.csv
    540303 2008.csv
    581169 2009.csv
    601286 2010.csv
    685725 2011.csv
    532912 2012.csv
    191852 2013.csv
    45788 2014.csv
    5030192 total
    ```




    #### Inspect the headers

    Let's see if they made the headers all uniform, which would make this an easy import into a database:

    ```sh
    for year in $(seq 2003 2014); do
    fname="$year.csv"
    head -n 1 $fname | tr ',' $'\n' | tr '[:upper:]' '[:lower:]'
    done | sort | uniq -c | sort -rn
    ```

    **BLAAARGH!!!!**

    ```
    12 zip
    12 year
    12 ycoord
    12 xcoord
    12 weight
    12 typeofid
    12 trhsloc
    12 timestop
    12 sumoffen
    12 sumissue
    12 state
    12 sex
    12 ser_num
    12 sector
    12 searched
    12 sb_outln
    12 sb_other
    12 sb_hdobj
    12 sb_admis
    12 riflshot
    12 rf_verbl
    12 rf_vcrim
    12 rf_vcact
    12 rf_rfcmp
    12 rf_othsw
    12 rf_knowl
    12 rf_furt
    12 rf_bulg
    12 rf_attir
    12 revcmd
    12 repcmd
    12 recstat
    12 radio
    12 race
    12 post
    12 pistol
    12 pf_wall
    12 pf_ptwep
    12 pf_pepsp
    12 pf_other
    12 pf_hcuff
    12 pf_hands
    12 pf_grnd
    12 pf_drwep
    12 pf_baton
    12 perstop
    12 perobs
    12 pct
    12 othrweap
    12 othpers
    12 othfeatr
    12 offverb
    12 offunif
    12 offshld
    12 officrid
    12 machgun
    12 knifcuti
    12 inout
    12 ht_inch
    12 ht_feet
    12 haircolr
    12 frisked
    12 eyecolor
    12 explnstp
    12 dob
    12 datestop
    12 cs_vcrim
    12 cs_other
    12 cs_objcs
    12 cs_lkout
    12 cs_furtv
    12 cs_drgtr
    12 cs_descr
    12 cs_cloth
    12 cs_casng
    12 cs_bulge
    12 crimsusp
    12 contrabn
    12 compyear
    12 comppct
    12 city
    12 build
    12 beat
    12 asltweap
    12 arstoffn
    12 arstmade
    12 aptnum
    12 age
    12 adtlrept
    12 addrtyp
    12 ac_time
    12 ac_stsnd
    12 ac_rept
    12 ac_proxm
    12 ac_other
    12 ac_inves
    12 ac_incid
    12 ac_evasv
    12 ac_cgdir
    12 ac_assoc
    11 stname
    11 stinter
    11 rescode
    11 premtype
    11 premname
    11 linecm
    11 dettypcm
    11 detailcm
    11 crossst
    11 addrpct
    11 addrnum
    4 forceuse
    1 wepfound
    1 strname
    1 strintr
    1 rescod
    1 prenam
    1 premtyp
    1 dettyp_c
    1 details_
    1 detail1_
    1 crossst
    1 adrpct
    1 adrnum
    ```


    I'll deal with this tomorrow. It'll probably use the same transformations as the [ones I did on the previous files](https://gist.github.com/dannguyen/67ece10c6132282b1da2)
  3. dannguyen revised this gist Mar 13, 2015. 1 changed file with 8 additions and 1 deletion.
    9 changes: 8 additions & 1 deletion nypd-stop-and-frisk-2014.md
    Original file line number Diff line number Diff line change
    @@ -11,4 +11,11 @@ for year in $(seq 2003 2014); do
    echo $url
    curl -sO $url
    done
    ```
    ```

    #### Unzip the files

    ```sh
    unzip -o "*.zip"
    ```

  4. dannguyen created this gist Mar 13, 2015.
    14 changes: 14 additions & 0 deletions nypd-stop-and-frisk-2014.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,14 @@
    ## Gettin the data

    Using bash:

    #### Download the files

    ```sh
    baseurl="http://www.nyc.gov/html/nypd/downloads/zip/analysis_and_planning/YYYY_sqf_csv.zip"
    for year in $(seq 2003 2014); do
    url=$(echo $baseurl | sed "s/YYYY/$year/")
    echo $url
    curl -sO $url
    done
    ```