Skip to content

Instantly share code, notes, and snippets.

@eqbalz
Last active February 8, 2017 19:03
Show Gist options
  • Save eqbalz/032b8698b8b3dd7e76fca5fd4a0da9fb to your computer and use it in GitHub Desktop.
Save eqbalz/032b8698b8b3dd7e76fca5fd4a0da9fb to your computer and use it in GitHub Desktop.

Revisions

  1. eqbalz revised this gist Feb 8, 2017. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion hive_to_df.R
    Original file line number Diff line number Diff line change
    @@ -1,5 +1,5 @@
    #RxHiveData only works with spark compute context
    computeContext <- RxSpark(consoleOutput = TRUE, persistentRun = TRUE, autoCleanup = FALSE)
    computeContext <- RxSpark(consoleOutput = TRUE, persistentRun = TRUE)
    rxSetComputeContext(computeContext)

    airColInfo <- list(
  2. eqbalz created this gist Feb 6, 2017.
    28 changes: 28 additions & 0 deletions hive_to_df.R
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,28 @@
    #RxHiveData only works with spark compute context
    computeContext <- RxSpark(consoleOutput = TRUE, persistentRun = TRUE, autoCleanup = FALSE)
    rxSetComputeContext(computeContext)

    airColInfo <- list(
    arrdelay = list(type = "integer"),
    #crsdeptime = list(type = "numeric"),
    dayofweek = list(
    type = "factor",
    levels = c(
    "Monday",
    "Tuesday",
    "Wednesday",
    "Thursday",
    "Friday",
    "Saturday",
    "Sunday"
    )
    )
    )

    hive_data <- RxHiveData(
    query = "select * from AirlineDemoSmallHive",
    colInfo = airColInfo
    )

    myData <- rxDataStep(inData = hive_data, rowSelection = arrdelay > 240 & arrdelay <= 300, varsToKeep = c("arrdelay", "dayofweek"))
    head(myData)