Skip to content

Instantly share code, notes, and snippets.

@ddofer
Created August 9, 2017 12:33
Show Gist options
  • Select an option

  • Save ddofer/b88d5a44bb83eebd87480f6a83a8f932 to your computer and use it in GitHub Desktop.

Select an option

Save ddofer/b88d5a44bb83eebd87480f6a83a8f932 to your computer and use it in GitHub Desktop.

Revisions

  1. ddofer created this gist Aug 9, 2017.
    24 changes: 24 additions & 0 deletions freqLabs.R
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,24 @@
    #LABS
    #get most frequent lab tests, for distinct patients.

    #Filter for distinct by user:

    # data.labs = as.data.frame(data.labs)
    # data.labs.userDistinct = subset(as.data.table(data.labs),select=c("guid_tz","kod_bdika")) #ORIG
    # data.labs.userDistinct= unique(data.labs.userDistinct) #ORIG

    data.labs.userDistinct= unique(data.labs,by="guid_tz") #changed

    #Filter all Labs data!
    "Lab tests that occured for at least K unique users:"
    commonlabNames = sort(table(data.labs.userDistinct$kod_bdika)[table(data.labs.userDistinct$kod_bdika)>25],decreasing=T) # Keeps supermajority of labs
    data.labs = data.labs[data.labs$kod_bdika %in% commonlabNames, ] # Get data of labs with only the labtests which occured at least K times for unique patients


    ##
    # Labs which appear at least K times:
    # sort(table(data.labs$kod_bdika)[table(data.labs$kod_bdika)>250],decreasing=T) ## 336 (note that we're not normalizing by occurences per test vs per user)
    # FreqlabNames = sort(table(data.labs$kod_bdika)[table(data.labs$kod_bdika)>350],decreasing=T)
    FreqlabNames = names(sort(table(data.labs.userDistinct$kod_bdika),decreasing=T)[1:250]) #most frequent tests. note long tail

    data.labs.freq = data.labs[data.labs$kod_bdika %in% FreqlabNames, ] # Get data of labs with only the most frequent labtests