Skip to content

Instantly share code, notes, and snippets.

@vishalmehta1991
Last active September 21, 2021 21:30
Show Gist options
  • Select an option

  • Save vishalmehta1991/ffaaa7483a20d2edc78a3e838ab92cdc to your computer and use it in GitHub Desktop.

Select an option

Save vishalmehta1991/ffaaa7483a20d2edc78a3e838ab92cdc to your computer and use it in GitHub Desktop.

Revisions

  1. vishalmehta1991 renamed this gist Oct 31, 2019. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  2. vishalmehta1991 renamed this gist Oct 31, 2019. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  3. vishalmehta1991 created this gist Oct 31, 2019.
    26 changes: 26 additions & 0 deletions mg_rf_dask.py
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,26 @@
    from cuml.dask.ensemble import RandomForestClassifier as cuRF_mg
    # cuml Random Forest params
    cu_rf_params = {
    n_estimators’: 25,
    max_depth’: 13,
    n_bins’: 15,
    n_streams’: 8
    }

    # Start by setting up the CUDA cluster on the local host
    cluster = LocalCUDACluster(threads_per_worker=1, n_workers=n_workers)
    c = Client(cluster)
    workers = c.has_what().keys()

    # Shard the data across all workers
    X_train_df, y_train_df = dask_utils.persist_across_workers(c,[X_train_df,y_train_df],workers=workers)

    # Build and train the model
    cu_rf_mg = cuRFC_mg(**cu_rf_params)
    cu_rf_mg.fit(X_train_df, y_train_df)

    # Check the accuracy on a test set
    cu_rf_mg_predict = cu_rf_mg.predict(X_test)
    acc_score = accuracy_score(cu_rf_mg_predict, y_test, normalize=True)
    c.close()
    cluster.close()