# Data collection and statistics using Python and R [![Project Status: Concept – Minimal or no implementation has been done yet, or the repository is only intended to be a limited example, demo, or proof-of-concept.](https://www.repostatus.org/badges/latest/concept.svg)](https://www.repostatus.org/#concept) ## Scripting in Python and R The following gist offers a focus on Data Collection, one of the stages* of the Data Science methodology. We will also perform basic math operations on a single dataframe to see how they render using Python or R. # Versioning I used no versioning system for this gist. My repository's status is flagged as active because it has reached a stable, usable state. Original [gist](https://gist.github.com/aiPhD/15873ff613af833f9693e1a595bdfcc6) related to this repository is pending as concept. ## Author * **Isaac Arnault** - Suggesting two implementations in `Python` and `R`, from *Initial work* [Cognitive Class Lab - Module 2](https://cognitiveclass.ai/courses/data-science-methodology-2/) and one exercise. ## Licence All public gists https://gist.github.com/aiPhD
Copyright 2018, Isaac Arnault
MIT License, http://www.opensource.org/licenses/mit-license.php ## Sources * Figure appended in architecture.md, inspired by [Cognitiveclass.ai](https://cognitiveclass.ai/).
* Dataframe used as sample coming from [Spatialkey.com](https://support.spatialkey.com/spatialkey-sample-csv-data/). ## Exercise * Perform a data collection in `Python` and `R` using `Jupyter`.
⇢ Use the following dataframe from [Spatialkey.com](http://samplecsvs.s3.amazonaws.com/TechCrunchcontinentalUSA.csv). * How many observations and variables does the dataframe contain? Base your assessment on your scripting outputs. * Calculate Sum, Min, Max and Mean of variable "raisedAmt" using Python (and Pandas) and using R.
— (*) Ten stages are crucial regarding Data Science methodology, among which Data collection. See architecture.md.