Skip to content

Instantly share code, notes, and snippets.

@sgillies
Last active October 19, 2024 17:20
Show Gist options
  • Save sgillies/7e5cd548110a5b4d45ac1a1d93cb17a3 to your computer and use it in GitHub Desktop.
Save sgillies/7e5cd548110a5b4d45ac1a1d93cb17a3 to your computer and use it in GitHub Desktop.

Revisions

  1. sgillies revised this gist Dec 17, 2017. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion advanced_rasterio_features.ipynb
    Original file line number Diff line number Diff line change
    @@ -287,7 +287,7 @@
    "\n",
    "At this point you could get the `profile` attribute of the dataset object we've named `src` and GDAL wouldn't need to download any more bytes to provide the dataset metadata. Thanks to the the TIFF format's consolidation of metadata in the head of the file and [HTTP range requests](https://tools.ietf.org/html/rfc7233), we only need to read 0.03% of the file to know its dimensions, data type, spatial extent, and coordinate reference system.\n",
    "\n",
    "Calling `src.read()` triggers 3 more HTTP requests by GDAL. The third is for the first 16384 (2^14) bytes of the 8 MB .ovr file that GDAL discovered when it fetched the directory listing. In our case, the array returned by ``src.read()`` will come entirely from the .ovr file. (TODO: link to a discussion of GDAL overviews or explain).\n",
    "Calling `src.read()` triggers 3 more HTTP requests by GDAL. The third is for the first 16384 (2^14) bytes of the 8 MB .ovr file that GDAL discovered when it fetched the directory listing. In our case, the array returned by ``src.read()`` will come entirely from the .ovr file. For a reference on GeoTIFF overviews, see http://www.gdal.org/frmt_gtiff.html.\n",
    "\n",
    "```\n",
    "* Couldn't find host landsat-pds.s3.amazonaws.com in the .netrc file; using defaults\n",
  2. sgillies revised this gist Dec 15, 2017. 1 changed file with 50 additions and 19 deletions.
    69 changes: 50 additions & 19 deletions advanced_rasterio_features.ipynb
    Original file line number Diff line number Diff line change
    @@ -8,21 +8,23 @@
    "\n",
    "[Rasterio](https://mapbox.github.io/rasterio/) is an open source Python package that wraps [GDAL](http://www.gdal.org/) in idiomatic Python functions and classes.\n",
    "\n",
    "Five advanced features of Rasterio that are useful in developing cloud applications will be demonstrated in this notebook.\n",
    "The last pre-release of Rasterio has five advanced features that are useful for developing cloud-native applications.\n",
    "\n",
    "1. Quick overviews of GeoTIFFs in the cloud\n",
    "2. Quick subsets of GeoTIFFs in the cloud\n",
    "3. Lazy warping of GeoTIFFs in the cloud\n",
    "4. Formatted files in RAM\n",
    "5. Datasets in zipped streams\n",
    "\n",
    "These features already exist in the latest version of the GDAL library. What Rasterio does, for the first time, is make them into solid Python patterns.\n",
    "Please note these features already exist in the latest version of the GDAL library. What Rasterio does, for the first time, is make them into solid Python patterns.\n",
    "\n",
    "This notebook is a demonstration of these patterns.\n",
    "\n",
    "## Notebook requirements\n",
    "\n",
    "This notebook uses f-strings and requires Python 3.6. It will probably work with other Python 3 versions if the f-strings are replaced by `str.format()` calls. My team has switched to Python 3.6 this year and we're glad we did.\n",
    "\n",
    "I recommend that you run this notebook in an isolated Python environment. My preference is for one created with venv. Install the latest pre-release of rasterio with its S3-related extras.\n",
    "I recommend that you run this notebook in an isolated Python environment. My preference is for one created with venv. Install the latest pre-release of Rasterio with its S3-related extras.\n",
    "\n",
    "```\n",
    "python3.6 -m venv rasterio-advanced-features\n",
    @@ -39,11 +41,11 @@
    "conda install -c conda-forge/label/dev rasterio\n",
    "```\n",
    "\n",
    "You will need an AWS account and credentials to run the scripts in this notebook.\n",
    "You will need an AWS account and credentials to run the scripts in this notebook. An AWS account is free and doesn't take long to set up: https://aws.amazon.com/account/.\n",
    "\n",
    "## Rasterio documentation\n",
    "\n",
    "This notebook glosses over basic usage of rasterio and discusses several advanced usage patterns. Please consult the documentation of the rasterio package for help with basic usage: https://mapbox.github.io/rasterio/.\n",
    "This notebook glosses over basic usage of Rasterio and discusses several advanced usage patterns. Please consult the documentation of the Rasterio package for help with basic usage: https://mapbox.github.io/rasterio/.\n",
    "\n",
    "## Quick tour of the AWS Landsat PDS\n",
    "\n",
    @@ -54,9 +56,38 @@
    "> The Landsat program is a joint effort of the U.S. Geological Survey and NASA. First launched in 1972, the Landsat series of satellites has produced the longest, continuous record of Earth’s land surface as seen from space. NASA is in charge of developing remote-sensing instruments and spacecraft, launching the satellites, and validating their performance. USGS develops the associated ground systems, then takes ownership and operates the satellites, as well as managing data reception, archiving, and distribution. Since late 2008, Landsat data have been made available to all users free of charge. Carefully calibrated Landsat imagery provides the U.S. and the world with a long-term, consistent inventory of vitally important global resources.\n",
    "AWS has made Landsat 8 data freely available on Amazon S3 so that anyone can use our on-demand computing resources to perform analysis and create new products without needing to worry about the cost of storing Landsat data or the time required to download it.\n",
    "\n",
    "> Landsat data is in AWS S3 bucket named *landsat-pds* and is organzied by *path*, *row*, and *scene*. The scene id also has the path and row encoded in it. If you know the scene you're interested in – by searching the USGS Earth Explorer site, or via James Bridle's [Landsat Tumblr](http://laaaaaaandsat.tumblr.com/post/168223275313/path-199-row-101-2017-12-05-at-111037-gmt) – you can extract the path and row and construct an AWS S3 prefix that lets you find all the objects associated with that scene using boto3.\n",
    "\n",
    "Please note that you'll need an AWS account and credentials to execute the script below, in which we use boto3 to print information about objects corresponding to a single Landsat scene."
    "> Landsat data is in AWS S3 bucket named *landsat-pds* and is organzied by *path*, *row*, and *scene*. The scene id also has the path and row encoded in it. If you know the scene you're interested in – by searching the USGS Earth Explorer site, or via James Bridle's [Landsat Tumblr](http://laaaaaaandsat.tumblr.com/) – you can extract the path and row and construct an AWS S3 prefix that lets you find all the objects associated with that scene using boto3."
    ]
    },
    {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
    "If you don't have AWS credentials set in your environment, you can set them in the block below. If you, delete the block. Be careful to remove your credentials from the notebook before sharing it with anyone else."
    ]
    },
    {
    "cell_type": "code",
    "execution_count": 2,
    "metadata": {},
    "outputs": [
    {
    "name": "stdout",
    "output_type": "stream",
    "text": [
    "env: AWS_ACCESS_KEY_ID=AWS_SECRET_ACCESS_KEY=\n"
    ]
    }
    ],
    "source": [
    "%env AWS_ACCESS_KEY_ID= AWS_SECRET_ACCESS_KEY="
    ]
    },
    {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
    "In the script below we will use the AWS boto3 module to examine the structure of the Landsat Public Dataset. `LC08_L1TP_139045_20170304_20170316_01_T1` is a Landsat scene ID with a standard pattern."
    ]
    },
    {
    @@ -132,11 +163,11 @@
    "source": [
    "## There's a web browser in GDAL\n",
    "\n",
    "Each of the .TIF files in the landsat-pds bucket is a georeferenced raster dataset formatted as a [cloud optimized GeoTIFFs](https://trac.osgeo.org/gdal/wiki/CloudOptimizedGeoTIFF). A GeoTIFF is a TIFF with extra tags specifying spatial reference systems and coordinates and can be accompanied by reduced-resolution .ovr files as well as other auxiliary files.\n",
    "Each of the .TIF files in the landsat-pds bucket is a georeferenced raster dataset formatted as a [cloud optimized GeoTIFF](https://trac.osgeo.org/gdal/wiki/CloudOptimizedGeoTIFF). A GeoTIFF is a TIFF with extra tags specifying spatial reference systems and coordinates and can be accompanied by reduced-resolution .ovr files as well as other auxiliary files.\n",
    "\n",
    "There is a browser in the latest version of GDAL that can navigate these TIFFs and auxiliary files like a web browser navigates linked HTML documents. HTTP and the GeoTIFF format replace specialized geospatial raster services like [WCS](http://www.opengeospatial.org/standards/wcs) in the workflow presented in this notebook.\n",
    "\n",
    "By using GDAL's browser, rasterio can open and query cloud-optimized GeoTIFFs *without* prior download.\n",
    "By using GDAL's browser, Rasterio can open and query cloud-optimized GeoTIFFs *without* prior download.\n",
    "\n",
    "## Quick overviews of GeoTIFFs\n",
    "\n",
    @@ -181,7 +212,7 @@
    "source": [
    "## A look backstage\n",
    "\n",
    "Not only is there a web browser in a rasterio dataset object, it's a sophisticated web brower that uses HTTP range requests to download the least number of bytes required to execute `src.read()` with the given parameters. With a little extra configuration we can see exactly how few bytes.\n",
    "Not only is there a web browser in a Rasterio dataset object, it's a sophisticated web brower that uses HTTP range requests to download the least number of bytes required to execute `src.read()` with the given parameters. With a little extra configuration we can see exactly how few bytes.\n",
    "\n",
    "We will read and display a 10:1 overview as we did for band 4. The S3 object in the listing above with the name ending in `B5.TIF` corresponds to the near-infrared (NIR) band of the Landsat imager.\n",
    "\n",
    @@ -526,9 +557,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
    "Let’s say you want to write a program like this that will run on a computer with a very limited filesystem or no filesystem at all. Python has an in-memory binary file-like class, `io.BytesIO`, but, unlike `NamedTemporaryFile`, instances of `BytesIO` lack the name GDAL needs to access data. To solve this problem, rasterio provides a new class, `io.MemoryFile`, a drop-in replacement for Python’s `NamedTemporaryFile` which keeps its bytes in a virtual file in an in-memory filesystem that GDAL can access, not on disk.\n",
    "Let’s say you want to write a program like this that will run on a computer with a very limited filesystem or no filesystem at all. Python has an in-memory binary file-like class, `io.BytesIO`, but, unlike `NamedTemporaryFile`, instances of `BytesIO` lack the name GDAL needs to access data. To solve this problem, Rasterio provides a new class, `io.MemoryFile`, a drop-in replacement for Python’s `NamedTemporaryFile` which keeps its bytes in a virtual file in an in-memory filesystem that GDAL can access, not on disk.\n",
    "\n",
    "The usage of rasterio’s `MemoryFile` is modeled after Python’s `zipfile.ZipFile` class. `MemoryFile.open` returns a rasterio dataset object, just as `rasterio.open` does."
    "The usage of `MemoryFile` is modeled after Python’s `zipfile.ZipFile` class. `MemoryFile.open` returns a dataset object, just as `rasterio.open` does."
    ]
    },
    {
    @@ -588,7 +619,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
    "Below is an example of downloading an entire Landsat PDS GeoTIFF to a stream of bytes and then opening the stream of bytes with rasterio."
    "Below is an example of downloading an entire Landsat PDS GeoTIFF to a stream of bytes and then opening the stream of bytes."
    ]
    },
    {
    @@ -631,7 +662,7 @@
    "\n",
    "Rasterio can read datasets within zipped streams of bytes. Zipfiles are commonly used in the GIS domain to package legacy multi-file formats like shapefiles (a shapefile is actually an ensemble of .shp, .dbf, .shx, .prj, and other files) or virtual raster files (VRT) and the rasters they reference.\n",
    "\n",
    "Below we'll fetch a VRT and JPEG pair from the Rasterio GitHub repo and package them in an in-memory zip file to simulate a zip file such as one you might accept in an upload to a server."
    "Below we'll fetch a VRT and JPEG pair from the Rasterio GitHub repo and package them in an in-memory zip file to simulate a zip file such as one that a server might accept as an upload."
    ]
    },
    {
    @@ -727,13 +758,13 @@
    "source": [
    "## Acknowledgements\n",
    "\n",
    "GDAL's [virtual file systems](http://www.gdal.org/gdal_virtual_file_systems.html), upon which rasterio's `MemoryFile` and `ZipMemoryFile` are based, and virtually warped dataset feature, used by `WarpedVRT`, were written by Frank Warmerdam.\n",
    "GDAL's [virtual file systems](http://www.gdal.org/gdal_virtual_file_systems.html), upon which Rasterio's `MemoryFile` and `ZipMemoryFile` are based, and virtually warped dataset feature, used by `WarpedVRT`, were written by Frank Warmerdam.\n",
    "\n",
    "Even Rouault is the author of GDAL's curl-based HTTP virtual filesystem and the GeoTIFF \"browser\" that powers cloud-optimized GeoTIFFs in rasterio. Mapbox is proud to be a sponsor of early work on GDAL's browser.\n",
    "Even Rouault is the author of GDAL's curl-based HTTP virtual filesystem and the GeoTIFF \"browser\" that powers cloud-optimized GeoTIFFs in Rasterio. Mapbox is proud to be a sponsor of early work on GDAL's browser.\n",
    "\n",
    "Rasterio's binary wheels are made possible by the tools and wisdom of the Python wheel-builders community: https://mail.python.org/mailman/listinfo/wheel-builders. Matthew Brett's [delocate](https://github.com/matthew-brett/delocate) has been particularly useful.\n",
    "\n",
    "Rasterio is written by programmers at companies and organizations such as Mapbox, the Conservation Biology Institute, Planet Labs, the U.S. Geological Survey, Continuum Analytics, and Digital Globe. See https://github.com/mapbox/rasterio/graphs/contributors for the complete list of contributors."
    "Rasterio is written by programmers at companies and organizations such as Mapbox, the Conservation Biology Institute, Planet, the U.S. Geological Survey, Continuum Analytics, and DigitalGlobe. See https://github.com/mapbox/rasterio/graphs/contributors for the complete list of contributors."
    ]
    },
    {
    @@ -765,4 +796,4 @@
    },
    "nbformat": 4,
    "nbformat_minor": 2
    }
    }
  3. sgillies revised this gist Dec 14, 2017. 1 changed file with 189 additions and 76 deletions.
    265 changes: 189 additions & 76 deletions advanced_rasterio_features.ipynb
    189 additions, 76 deletions not shown because the diff is too large. Please use a local Git client to view these changes.
  4. sgillies created this gist Dec 12, 2017.
    655 changes: 655 additions & 0 deletions advanced_rasterio_features.ipynb
    655 additions, 0 deletions not shown because the diff is too large. Please use a local Git client to view these changes.