{ "cells": [ { "cell_type": "markdown", "id": "8ad557e4-5fbf-41d3-8a10-98da53634057", "metadata": {}, "source": [ "## Creating a STAC Collection for a Virtual Icechunk Store\n", "\n", "There is a virtual icechunk store that is publicly available at: s3://nasa-waterinsight/virtual-zarr-store/NLDAS-3-icechunk/\n", "\n", "This notebook goes through the current thinking for how you would set up a STAC collection that points to that virtual icechunk store and provides all the information a user needs to interact with the virtual zarr store programmatically or via a web UI. " ] }, { "cell_type": "code", "execution_count": 1, "id": "6eb787d3-2e2c-4f72-9aed-12022fa6f5c9", "metadata": {}, "outputs": [], "source": [ "import json\n", "import datetime\n", "\n", "import icechunk\n", "import pystac\n", "import xstac\n", "import zarr\n", "\n", "import xarray as xr" ] }, { "cell_type": "markdown", "id": "c68fe3f7", "metadata": {}, "source": [ "Zarr can emit a lot of warnings about Numcodecs not being including in the Zarr version 3 specification yet -- let's suppress those." ] }, { "cell_type": "code", "execution_count": 2, "id": "ac29867a-c95e-4956-b370-dceb0dd1bd94", "metadata": {}, "outputs": [], "source": [ "import warnings\n", "\n", "warnings.filterwarnings(\n", " \"ignore\",\n", " message=\"Numcodecs codecs are not in the Zarr version 3 specification*\",\n", " category=UserWarning,\n", ")" ] }, { "cell_type": "markdown", "id": "647e3459-3e2f-4749-bc06-82f6e7fc1581", "metadata": {}, "source": [ "These are the PRs that need to land before you can open the virtual icechunk store with zarr directly:\n", "\n", "- https://github.com/zarr-developers/zarr-python/pull/3369\n", "- https://github.com/earth-mover/icechunk/pull/1161\n", "\n", "Until then:" ] }, { "cell_type": "code", "execution_count": 3, "id": "da9704f0-40ac-4b85-9706-6cc60ebffcb5", "metadata": {}, "outputs": [], "source": [ "storage = icechunk.s3_storage(\n", " bucket=\"nasa-waterinsight\",\n", " prefix=\"virtual-zarr-store/NLDAS-3-icechunk/\",\n", " region=\"us-west-2\",\n", " anonymous=True,\n", ")" ] }, { "cell_type": "markdown", "id": "798eb3c1-8a4e-444f-92c5-a4e0e1257f24", "metadata": {}, "source": [ "The `bucket` and `prefix` are from the icechunk href. The `anonymous=True` needs to come from somewhere else." ] }, { "cell_type": "code", "execution_count": 4, "id": "bba627a9-a2be-4cfd-94b7-02848c15841f", "metadata": {}, "outputs": [], "source": [ "config = icechunk.RepositoryConfig.default()\n", "config.set_virtual_chunk_container(\n", " icechunk.VirtualChunkContainer(\n", " \"s3://nasa-waterinsight/NLDAS3/forcing/daily/\",\n", " icechunk.s3_store(region=\"us-west-2\")\n", " )\n", ")\n", "virtual_credentials = icechunk.containers_credentials(\n", " {\n", " \"s3://nasa-waterinsight/NLDAS3/forcing/daily/\": icechunk.s3_anonymous_credentials()\n", " }\n", ")" ] }, { "cell_type": "markdown", "id": "eb811ca5-a483-495e-9dc8-1c5b67fadc7f", "metadata": {}, "source": [ "Here we need the `href` for the internal storage bucket(s) (composed of `bucket` and `prefix`) and we need the `region` of that bucket. Then we need some way of providing credentials." ] }, { "cell_type": "code", "execution_count": 5, "id": "f9f8b4b0-e6f5-4da0-a7f6-d08676cf2401", "metadata": {}, "outputs": [], "source": [ "repo = icechunk.Repository.open(\n", " storage=storage,\n", " config=config,\n", " authorize_virtual_chunk_access=virtual_credentials,\n", ")\n", "\n", "session = repo.readonly_session(snapshot_id='YTNGFY4WY9189GEH1FNG')" ] }, { "cell_type": "markdown", "id": "efd51ed6-0017-4819-9460-2c2012d49b8e", "metadata": {}, "source": [ "Since icechunk manages versions (like git) we need some way of knowing which `branch`, `tag` or `snapshot_id` (similar to `commit`) to use. " ] }, { "cell_type": "code", "execution_count": 6, "id": "cda06847-3d50-4acc-b19a-e10b5c60851a", "metadata": {}, "outputs": [], "source": [ "ds = xr.open_zarr(session.store, consolidated=False, zarr_format=3)" ] }, { "cell_type": "markdown", "id": "8d7279e4-d0dc-4943-a1c0-e0e1b051517a", "metadata": {}, "source": [ "Last of all we need a way of specifying that we are looking at icechunk here as well as the standard fields: `consolidated`, `zarr_format` that are already included in the [STAC Zarr extension](https://github.com/stac-extensions/zarr).\n", "\n", "Note that it is possible that these last two are not actually required. Xarray should know that for icechunk stores `consolidated` is always false and similarly xarray should be able to infer the zarr format from the store itself." ] }, { "cell_type": "code", "execution_count": 7, "id": "ecdeb03b-9f11-4288-ba89-6a6f746d8ba8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
<xarray.Dataset> Size: 51TB\n",
"Dimensions: (time: 8399, lat: 6500, lon: 11700)\n",
"Coordinates:\n",
" * time (time) datetime64[ns] 67kB 2001-01-02 2001-01-03 ... 2024-01-01\n",
" * lat (lat) float64 52kB 7.005 7.015 7.025 7.035 ... 71.97 71.98 71.99\n",
" * lon (lon) float64 94kB -169.0 -169.0 -169.0 ... -52.03 -52.01 -52.0\n",
"Data variables:\n",
" Tair_max (time, lat, lon) float64 5TB dask.array<chunksize=(1, 500, 900), meta=np.ndarray>\n",
" Wind_E (time, lat, lon) float64 5TB dask.array<chunksize=(1, 500, 900), meta=np.ndarray>\n",
" PSurf (time, lat, lon) float64 5TB dask.array<chunksize=(1, 500, 900), meta=np.ndarray>\n",
" SWdown (time, lat, lon) float64 5TB dask.array<chunksize=(1, 500, 900), meta=np.ndarray>\n",
" Qair (time, lat, lon) float64 5TB dask.array<chunksize=(1, 500, 900), meta=np.ndarray>\n",
" Tair_min (time, lat, lon) float64 5TB dask.array<chunksize=(1, 500, 900), meta=np.ndarray>\n",
" Rainf (time, lat, lon) float64 5TB dask.array<chunksize=(1, 500, 900), meta=np.ndarray>\n",
" Wind_N (time, lat, lon) float64 5TB dask.array<chunksize=(1, 500, 900), meta=np.ndarray>\n",
" LWdown (time, lat, lon) float64 5TB dask.array<chunksize=(1, 500, 900), meta=np.ndarray>\n",
" Tair (time, lat, lon) float64 5TB dask.array<chunksize=(1, 500, 900), meta=np.ndarray>\n",
"Attributes: (12/17)\n",
" missing_value: -9999.0\n",
" time_definition: daily\n",
" shortname: NLDAS_FOR0010_D_3.0\n",
" title: NLDAS Forcing Data L4 Daily 0.01 x 0.01 degree V3...\n",
" version: 3.0 beta\n",
" institution: NASA GSFC\n",
" ... ...\n",
" websites: https://ldas.gsfc.nasa.gov/nldas/v3/ ; https://li...\n",
" MAP_PROJECTION: EQUIDISTANT CYLINDRICAL\n",
" SOUTH_WEST_CORNER_LAT: 7.005000114440918\n",
" SOUTH_WEST_CORNER_LON: -168.9949951171875\n",
" DX: 0.009999999776482582\n",
" DY: 0.009999999776482582