Skip to content

Instantly share code, notes, and snippets.

@prokopyev
prokopyev / eff_fuzzy_match.R
Created May 21, 2018 18:59 — forked from gdmcdonald/eff_fuzzy_match.R
Efficient fuzzy match of two data frames by one common string column in R, outputing a list of the matching and non-matching rows
#Efficient fuzzy match of two data frames by one common column
library(dplyr)
library(fuzzyjoin)
library(stringdist)
eff_fuzzy_match<-function(data_frame_A,
data_frame_B,
by_what,
choose_p = 0.1,
choose_max_dist = 0.4,
@prokopyev
prokopyev / index.md
Created May 2, 2018 02:15 — forked from brunosan/index.md
This is a list inspired by some of our current or potential lines of work at the World Bank Innovation Labs. The “Innovations in Big Data Analytics” program helps to strengthen the World Bank capabilities to effectively use big data in its operational and strategic work.

This is a list inspired by some of our current or potential lines of work at the World Bank Innovation Labs. The “Innovations in Big Data Analytics” program helps to strengthen the World Bank capabilities to effectively use big data in its operational and strategic work.

We are always looking for great Data Scientists. If you can solve any of these [using open software], you'll be heads down helping us from day one. Email us to [email protected]

(This list is updated frequently).

1. Nightlights from Satellite

We are building an open stack to process nightly data from satellite and query light output from all known villages. Currently we are doing 20 years of nightly data for 600,000 villages in India.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@prokopyev
prokopyev / lmOut
Created March 18, 2017 21:42 — forked from EconometricsBySimulation/lmOut
A simple command to grab coefficients, t-stats, p-values, f-stats, etc from a regression and export them as an easy to use spreadsheet.
lmOut <- function(res, file="test.csv", ndigit=3, writecsv=T) {
# If summary has not been run on the model then run summary
if (length(grep("summary", class(res)))==0) res <- summary(res)
co <- res$coefficients
nvar <- nrow(co)
ncol <- ncol(co)
f <- res$fstatistic
formatter <- function(x) format(round(x,ndigit),nsmall=ndigit)