# earthaccess authentication on a distributed Dask Cluster

In [1]:
import earthaccess
import xarray as xr
import os

earthaccess.login()

earthaccess.__version__

Enter your Earthdata Login username:  earthaccess
Enter your Earthdata password:  ········


'0.9.0'

In [10]:
# Start cluster if one does not exist.
if 'client' not in locals():
    import dask
    from dask.distributed import Client
    from dask import delayed
    import dask.array as da
    dask.config.set(scheduler='processes')

    client = Client(n_workers=2, threads_per_worker=1)

## Auth on a distributed cluster

Because processes and distributed workers don't share local variables we need a way to pass them the credentials so each local instance of earthaccess can authenticate and open our granules.

This is not optimal and I anticipate that we start embedding the token in the results themselves so earthaccess will grab it from there without us having to manually forward them to the workers.

In [11]:
# this gets executed on each worker
def auth_env(auth):
    os.environ["EARTHDATA_USERNAME"] = auth["EARTHDATA_USERNAME"]
    os.environ["EARTHDATA_PASSWORD"] = auth["EARTHDATA_PASSWORD"]
    
client.run(auth_env, auth=earthaccess.auth_environ())

{'tcp://127.0.0.1:33247': None, 'tcp://127.0.0.1:34619': None}

In [12]:
granule_info = earthaccess.search_data(short_name="MUR25-JPL-L4-GLOB-v04.2", count=10)

Granules found: 7871


In [14]:
def sstmean_1file(gran_info_single):
    earthaccess.login()
    fileobj = earthaccess.open([gran_info_single])[0]
    data = xr.open_dataset(fileobj)
    return data['analysed_sst'].mean().item()

In [15]:
# Process several granules in parallel using Dask:
sstmean_1file_parallel = delayed(sstmean_1file)
tasks = [sstmean_1file_parallel(gi) for gi in granule_info[:2]]

In [16]:
results = da.compute(*tasks)
print(results)

Opening 1 granules, approx size: 0.0 GB
using endpoint: https://archive.podaac.earthdata.nasa.gov/s3credentials
Opening 1 granules, approx size: 0.0 GB
using endpoint: https://archive.podaac.earthdata.nasa.gov/s3credentials


QUEUEING TASKS | : 100%|██████████| 1/1 [00:00<00:00, 2291.97it/s]
QUEUEING TASKS | : 100%|██████████| 1/1 [00:00<00:00, 2340.57it/s]
PROCESSING TASKS | :   0%|          | 0/1 [00:00<?, ?it/s]

(287.01715087890625, 287.0110778808594)


PROCESSING TASKS | : 100%|██████████| 1/1 [00:00<00:00,  4.81it/s]
COLLECTING RESULTS | : 100%|██████████| 1/1 [00:00<00:00, 23563.51it/s]
PROCESSING TASKS | : 100%|██████████| 1/1 [00:00<00:00,  4.66it/s]
COLLECTING RESULTS | : 100%|██████████| 1/1 [00:00<00:00, 24528.09it/s]
