Skip to content

Instantly share code, notes, and snippets.

@cattleguard
Created May 26, 2016 05:31
Show Gist options
  • Save cattleguard/11310281ac4334900a84e7d24ec02559 to your computer and use it in GitHub Desktop.
Save cattleguard/11310281ac4334900a84e7d24ec02559 to your computer and use it in GitHub Desktop.
Using poweRlaw package to test power law against HHS Hacking/IT Incidents Individuals Affected
library(poweRlaw)
# csv is just a dump from the breach website with a filter applied for Hacking/IT Incidents and the date range.
hhs.data <- read.csv("~/Downloads/hhs_hacking_01012010thru12312015.csv", header = T, stringsAsFactors = FALSE)
hhs.data <- hhs.data[!is.na(hhs.data$Individuals.Affected),]
hhs.data$Date.Submitted <- strptime(hhs.data$Breach.Submission.Date, "%m/%d/%Y")
hhs.data <- subset(hhs.data, Date.Submitted > "2010-01-01" & Date.Submitted < "2015-01-01")
m <- displ$new(hhs.data$Individuals.Affected)
m$setXmin(estimate_xmin(m))
m$setPars(estimate_pars(m))
plot(m, main="Power Law v. Log-Normal Fit to HHS Breach Data\nIndividuals Affected", sub = "Jan 1, 2010 until Jan 1,2015")
lines(m)
n <- dislnorm(hhs.data$Individuals.Affected)
n$setXmin(estimate_xmin(n))
n$setPars(estimate_pars(n))
lines(n)
# Vuong's Test Statistic showing dislnorm to be a better fit.
comp = compare_distributions(m,n)
print(comp[1])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment