Last active
August 29, 2015 14:09
-
-
Save nico1510/969021c84b50d14f1530 to your computer and use it in GitHub Desktop.
Script to clean data obtained from https://www.kaggle.com/c/the-seeclickfix-311-challenge/download/unencrypted_data.zip
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| DELETE FROM seeclickfix WHERE source LIKE 'remote_api_created'; | |
| UPDATE seeclickfix SET source = 'city_initiated' where LOWER(description) LIKE '%citizen%'; -- almost always city_initiated | |
| UPDATE seeclickfix SET source = 'city_initiated' where LOWER(description) LIKE '%called%'; -- almost always city_initiated | |
| DELETE FROM seeclickfix where source LIKE 'city_initiated' OR source LIKE 'NA'; | |
| DELETE FROM seeclickfix WHERE LOWER(description) LIKE 'please pick up%'; -- duplicate most likely also api created | |
| DELETE FROM seeclickfix WHERE LOWER(description) LIKE 'brush in front'; -- duplicate most likely also api created | |
| DELETE FROM seeclickfix WHERE LOWER(description) LIKE 'nreported from my mobile devicenhttp://m.seeclickfix.com'; | |
| DELETE FROM seeclickfix WHERE description LIKE '%If the hydrant is not cleared we encourage you to clear%'; -- duplicate (error?) | |
| DELETE FROM seeclickfix WHERE description IS NULL; | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment