Skip to content

Instantly share code, notes, and snippets.

@tomhopper
Forked from jhofman/dplyr_filter_ungroup.R
Created January 29, 2016 20:31
Show Gist options
  • Save tomhopper/edb10f680510d092cd56 to your computer and use it in GitHub Desktop.
Save tomhopper/edb10f680510d092cd56 to your computer and use it in GitHub Desktop.

Revisions

  1. @jhofman jhofman created this gist Jan 20, 2016.
    19 changes: 19 additions & 0 deletions dplyr_filter_ungroup.R
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,19 @@
    library(dplyr)

    # create a dummy dataframe with 100,000 groups and 1,000,000 rows
    # and partition by group_id
    df <- data.frame(group_id=sample(1:1e5, 1e6, replace=T),
    val=sample(1:100, 1e6, replace=T)) %>%
    group_by(group_id)

    # filter rows with a value of 1 naively
    system.time(df %>% filter(val == 1))

    # user system elapsed
    # 1.447 0.017 1.476

    # ungroup before filtering for a huge speedup
    system.time(df %>% ungroup() %>% filter(val == 1))

    # user system elapsed
    # 0.007 0.003 0.010