Getting Started with GEDI L2B Data in Python

This tutorial demonstrates how to work with the Canopy Cover and Vertical Profile Metrics (GEDI02_B.001) data product.

The Global Ecosystem Dynamics Investigation (GEDI) mission aims to characterize ecosystem structure and dynamics to enable radically improved quantification and understanding of the Earth's carbon cycle and biodiversity. The GEDI instrument produces high resolution laser ranging observations of the 3-dimensional structure of the Earth. GEDI is attached to the International Space Station and collects data globally between 51.6$^{o}$ N and 51.6$^{o}$ S latitudes at the highest resolution and densest sampling of any light detection and ranging (lidar) instrument in orbit to date. The Land Processes Distributed Active Archive Center (LP DAAC) distributes the GEDI Level 1 and Level 2 products. The L1B and L2 GEDI products are archived and distributed in the HDF-EOS5 file format.


Use Case Example:

This tutorial was developed using an example use case for a project being completed by the National Park Service. The goal of the project is to use GEDI L2B data to observe tree canopy height, cover, and profile over Redwood National Park in northern California.

This tutorial will show how to use Python to open GEDI L2B files, visualize the full orbit of GEDI points (shots), subset to a region of interest, visualize GEDI canopy height and vertical profile metrics, and export subsets of GEDI science dataset (SDS) layers as GeoJSON files that can be loaded into GIS and/or Remote Sensing software programs.


Data Used in the Example:


Topics Covered:

  1. Get Started
    1.1 Import Packages
    1.2 Set Up the Working Environment and Retrieve Files
  2. Import and Interpret Data
    2.1 Open a GEDI HDF5 File and Read File Metadata
    2.2 Read SDS Metadata and Subset by Beam
  3. Visualize a GEDI Orbit
    3.1 Subset by Layer and Create a Geodataframe
    3.2 Visualize a Geodataframe
  4. Work with GEDI L2B Data
    4.1 Import and Extract PAVD
    4.2 Visualize PAVD
  5. Work with GEDI L2B Beam Transects
    5.1 Quality Filtering
    5.2 Plot Beam Transects
    5.3 Subset Beam Transects
  6. Plot Profile Transects
    6.1 Plot PAVD Transects
  7. Spatial Visualization
    7.1 Import, Subset, and Quality Filter all Beams
    7.2 Spatial Subsetting
    7.3 Visualize All Beams: Canopy Height, Elevation, and PAI
  8. Export Subsets as GeoJSON Files

Before Starting this Tutorial:

Setup and Dependencies

It is recommended to use Conda, an environment manager to set up a compatible Python environment. Download Conda for your OS here: https://www.anaconda.com/download/. Once you have Conda installed, Follow the instructions below to successfully setup a Python environment on Linux, MacOS, or Windows.

This Python Jupyter Notebook tutorial has been tested using Python version 3.7. Conda was used to create the python environment.

If you do not have jupyter notebook installed, you may need to run:

conda install jupyter notebook

Having trouble getting a compatible Python environment set up? Contact LP DAAC User Services at: https://lpdaac.usgs.gov/lpdaac-contact-us/

If you prefer to not install Conda, the same setup and dependencies can be achieved by using another package manager such as pip.


Example Data:

This tutorial uses the GEDI L2B observation from June 19, 2019 (orbit 02932). Use the links below to download the files directly from the LP DAAC Data Pool:

You will need to have the file above downloaded into the same directory as this Jupyter Notebook in order to successfully run the code below.

Source Code used to Generate this Tutorial:

The repository containing all of the required files is located at: https://git.earthdata.nasa.gov/projects/LPDUR/repos/gedi-tutorials/browse

NOTE: This tutorial was developed for GEDI L2B HDF-EOS5 files and should only be used for that product.


1. Get Started

1.1 Import Packages

Import the required packages and set the input/working directory to run this Jupyter Notebook locally.

1.2 Set Up the Working Environment and Retrieve Files

The input directory is defined as the current working directory. Note that you will need to have the jupyter notebook and example data (.h5 and .geojson) stored in this directory in order to execute the tutorial successfully.

NOTE: If you have downloaded the tutorial materials to a different directory than the Jupyter Notebook, `inDir` above needs to be changed. You will also need to add a line: `os.chdir(inDir)` and execute it below.

In this section, a GEDI .h5 file has been downloaded to the inDir defined above. You will need to download the file directly from the LP DAAC Data Pool in order to execute this tutorial.


2. Import and Interpret Data

2.1 Open a GEDI HDF5 File and Read File Metadata

Read the file using h5py.

The standard format for GEDI filenames is as follows:

GEDI02_B: Product Short Name
2019170155833: Julian Date and Time of Acquisition (YYYYDDDHHMMSS)
O02932: Orbit Number
T02267: Track Number
02: Positioning and Pointing Determination System (PPDS) type (00 is predict, 01 rapid, 02 and higher is final)
001: GOC SDS (software) release number
01: Granule Production Version

Read in a GEDI HDF5 file using the h5py package.

The GEDI HDF5 file contains groups in which data and metadata are stored.

First, the METADATA group contains the file-level metadata.

This contains useful information such as the creation date, PGEVersion, and VersionID. Below, print the file-level metadata attributes.

2.2 Read SDS Metadata and Subset by Beam

The GEDI instrument consists of 3 lasers producing a total of 8 beam ground transects. The eight remaining groups contain data for each of the eight GEDI beam transects. For additional information, be sure to check out: https://gedi.umd.edu/instrument/specifications/.

One useful piece of metadata to retrieve from each beam transect is whether it is a full power beam or a coverage beam.

Below, pick one of the full power beams that will be used to retrieve GEDI L2B shots in Section 3.

Identify all the objects in the GEDI HDF5 file below.

Note: This step may take a while to complete.


3. Visualize a GEDI Orbit

In the section below, import GEDI L2B SDS layers into a GeoPandas GeoDataFrame for the beam specified above.

Use the lat_lowestmode and lon_lowestmode to create a shapely point for each GEDI shot location.

3.1 Subset by Layer and Create a Geodataframe

Read in the SDS and take a representative sample (every 100th shot) and append to lists, then use the lists to generate a pandas dataframe.

Above is a dataframe containing columns describing the beam, shot number, lat/lon location, and quality information about each shot.

Below, create an additional column called 'geometry' that contains a shapely point generated from each lat/lon location from the shot.

Next, convert to a Geopandas GeoDataFrame.

Pull out and plot an example shapely point below.

3.2 Visualize a GeoDataFrame

In this section, use the GeoDataFrame and the geoviews python package to spatially visualize the location of the GEDI shots on a basemap and import a geojson file of the spatial region of interest for the use case example: Redwood National Park.

Import a geojson of Redwood National Park as an additional GeoDataFrame. Note that you will need to have downloaded the geojson from the bitbucket repo containing this tutorial and have it saved in the same directory as this Jupyter Notebook.

Defining the vdims below will allow you to hover over specific shots and view information about them.

Below, combine a plot of the Redwood National Park Boundary (combine two geoviews plots using *) with the point visual mapping function defined above in order to plot (1) the representative GEDI shots, (2) the region of interest, and (3) a basemap layer.

Above is a good illustration of the full GEDI orbit (GEDI files are stored as one ISS orbit). One of the benefits of using geoviews is the interactive nature of the output plots. Use the tools to the right of the map above to zoom in and find the shots intersecting Redwood National Park.

(HINT: find where the orbit intersects the west coast of the United States)

Below is a screenshot of the region of interest:

alt text

Side Note: Wondering what the 0's and 1's for l2b_quality_flag mean?

Above, 0 is poor quality and a quality_flag value of 1 indicates the laser shot meets criteria based on energy, sensitivity, amplitude, and real-time surface tracking quality. We will show an example of how to quality filter GEDI data in section 5.1.

After finding one of the shots within Redwood NP, find the index for that shot number so that we can find the correct shot to visualize in Section 4.

Shot: 29320619900465601

2932: Orbit Number
06: Beam Number
199: Minor frame number (0-241)
00465601: Shot number within orbit


4. Work with GEDI L2B Data

The L2B product contains biophysical information derived from the geolocated GEDI return waveforms including total and vertical profiles of canopy cover and Plant Area Index (PAI), the vertical Plant Area Volume Density (PAVD) profile, and Foliage Height Diversity (FHD).

Detailed product information can be found on the GEDI L2B Product Page.

4.1 Import and Extract Specific Shots

Notice that there are over a thousand datasets available in the GEDI L2B product. In the code blocks below, you will subset to just a few of the datasets available.

In this section, learn how to extract and subset specific shots and plot Plant Area Volume Density (PAVD) using holoviews.

We will set the shot index used as an example from the GEDI L1B Tutorial and GEDI L2A Tutorial to show how to subset a single shot of GEDI L2B data.

4.2 Visualize PAVD

In section 4.2, import the PAVD metrics (pavd_z) and begin exploring how to plot them.

Below, open the dz layer in order to define the correct vertical step size.

So the vertical step size is 5.0 meters.

And it looks like PAVD includes 30 "steps" in each shot, describing the PAVD at height = step # * dz.

Now, bring in other useful L2B datasets such as elev_lowestmode, lat_lowestmode and lon_lowestmode.

Grab the location, elevation, and PAVD metrics for the shot defined above:

Put everything together to identify the shot that we want to extract:

Next, reformat PAVD into a list of tuples containing each PAVD value and height.

Below, plot each shot by using holoviews Path() function, with the PAVD plotted in the third dimension in shades of green.

Congratulations! You have plotted your first PAVD profile.


5. Work with GEDI L2B Beam Transects

Next, import a number of desired SDS layers for BEAM0110 (for the entire orbit) and create a pandas Dataframe to store the arrays.

In the GEDI L2B product, Canopy Height is stored in units (cm), so below convert to meters.

As mentioned in the sections above, Plant Area Volume Density (pavd) is defined as the Vertical Plant Area Volume Density profile with a vertical step size of dZ. Below, reformat the shape of the PAVD layer in order to add it to the dataframe below.

Above, notice that unlike a SDS layer like Canopy Height, which has a single value for each shot, PAVD has 30 values (representing different vertical heights) for each shot.

Below, reformat the data into a list of values for each shot.

Note: The cell above may take up to a minute to process.

Below, notice the reformatted PAVD layer, which should now fit into the dataframe created below.

Notice the unusual values listed above--those shots are flagged as poor quality and will be removed in Section 5.1.

Now that you have the desired SDS into a pandas dataframe, begin plotting the entire beam transect:

Congratulations! You have plotted your first GEDI full orbit beam transect. Notice above that things look a little messy--before we dive deeper into plotting full transects, let's quality filter the shots in the section below.

5.1 Quality Filtering

Now that you have the desired layers imported as a dataframe for the entire beam transect, let's perform quality filtering.

Below, remove any shots where the l2b_quality_flag is set to 0 by defining those shots as nan.

The syntax of the line below can be read as: in the dataframe, find the rows "where" the quality flag is not equal (ne) to 0. If a row (shot) does not meet the condition, set all values equal to nan for that row.

Below, quality filter even further by using the degrade_flag (Greater than zero if the shot occurs during a degrade period, zero otherwise) and the Sensitivity layer, using a threshold of 0.95.

Below, drop all of the shots that did not pass the quality filtering standards outlined above from the transectDF.

5.2 Plot Beam Transects

Next, plot the full remaining transect of high quality values using holoviews Scatter(). Combine the Tandem-X derived elevation, the GEDI-derived elevation, and the Canopy Top Elevation in a combined holoviews plot.

The plot still looks a bit messy this far zoomed out--feel free to pan, zoom, and explore different areas of the plot. The waveforms plotted in section 4 were 46597-46600. If you zoom into the high-quality shots between 4.000e+5 and 5.000e+5, you will find the portion of the transect intersecting Redwood National Park, seen below:

alt text

5.3 Subset Beam Transects

Now, subset down to a smaller transect centered on the shot analyzed in the sections above.

Below, subset the transect using .loc.


6. Plot Profile Transects

In this section, plot the transect subset using elevation, canopy height, and plant area volume density (PAVD) metrics.

In order to get an idea of the length of the beam transect that you are plotting, you can plot the x-axis as distance, which is calculated below.

6.1 Plot PAVD Transects

Similar to what was done with PAVD in the sections above, reformat PAVD into a list of tuples containing each PAVD value and height by shot.

Below, plot each shot by using holoviews Path() function, with the PAVD plotted in the third dimension in shades of green.

Add in the ground elevation and canopy top elevation for better context as to where in the canopy the highest PAVD exists.

Above, you can get an idea about the terrain over the region of interest, particularly the classic "V" representing the river valley that is bisected by the transect. In terms of vegetation structure, this plot does a good job of showing not only which portions of the canopy are taller, but also where they are denser (darker shades of green).

At this point you have visualized the elevation, canopy, and vertical structure of specific footprints over Redwood national park, and for a transect cutting through the national park. In section 7 you will look at mapping all of the high-quality shots from all eight GEDI beams for a given region of interest in order to gain knowledge on the spatial distribution of and characteristics of the canopy over Redwood National Park.


7. Spatial Visualization

Section 7 combines many of the techniques learned above including how to import GEDI datasets, perform quality filtering, spatial subsetting, and visualization.

7.1 Import, Subset, and Quality Filter All Beams

Below, re-open the GEDI L2B observation--but this time, loop through and import data for all 8 of the GEDI beams.

Loop through each of the desired datasets (SDS) for each beam, append to lists, and transform into a pandas DataFrame.

7.2 Spatial Subsetting

Below, subset the pandas dataframe using a simple bounding box region of interest. If you are interested in spatially clipping GEDI shots to a geojson region of interest, be sure to check out the GEDI-Subsetter python script available at: https://git.earthdata.nasa.gov/projects/LPDUR/repos/gedi-subsetter/browse.

Over 3.5 million shots are contained in this single GEDI orbit! Below subset down to only the shots falling within this small bounding box encompassing Redwood National Park. RedwoodNP our geopandas geodataframe can be called for the "envelope" or smallest bounding box encompassing the entire region of interest. Here, use that as the bounding box for subsetting the GEDI shots.

Filter by the bounding box, which is done similarly to filtering by quality in section 6.1 above.

Notice you have drastically reduced the number of shots you are working with (which will greatly enhance your experience in plotting them below). But first, remove any poor quality shots that exist within the ROI.

Down to roughly 2000 shots, next create a Shapely Point out of each shot and insert it as the geometry column in the [soon to be geo]dataframe.

7.3 Visualize All Beams: Canopy Height, Elevation, and PAI

Now, using the pointVisual function defined in section 3.2, plot the geopandas GeoDataFrame using geoviews.

Feel free to pan and zoom in to the GEDI shots in yellow.

Now let's not only plot the points in the geodataframe but also add a colormap for Canopy Height (m), Elevation (m), and Plant Area Index (PAI).

Above and in the screenshot below, notice the higher canopy heights (shades of yellow) over the Redwood stands of the national park vs. other types of forests (pink-blue) vs. the low-lying (and consequently flat) profiles over lakes and rivers (purple).

alt text

Next, take a look at the GEDI-derived elevation over the shots. Notice below that the colormap is changed to 'terrain'.

Last but certainly not least, Plant Area Index:

Success! You have now learned how to start working with GEDI L2B files in Python as well as some interesting strategies for visualizing those data in order to better understand your specific region of interest. Using this jupyter notebook as a workflow, you should now be able to switch to GEDI files over your specific region of interest and re-run the notebook. Good Luck!


8. Export Subsets as GeoJSON Files

In this section, export the GeoDataFrame as a .geojson file that can be easily opened in your favorite remote sensing and/or GIS software and will include an attribute table with all of the shots/values for each of the SDS layers in the dataframe.

Contact Information

Material written by Cole Krehbiel$^{1}$

    Contact: LPDAAC@usgs.gov
    Voice: +1-605-594-6116
    Organization: Land Processes Distributed Active Archive Center (LP DAAC)
    Website: https://lpdaac.usgs.gov/
    Date last modified: 05-11-2021
$^{1}$KBR Inc., contractor to the U.S. Geological Survey, Earth Resources Observation and Science (EROS) Center, Sioux Falls, South Dakota, 57198-001, USA. Work performed under USGS contract G15PD00467 for LP DAAC$^{2}$. $^{2}$LP DAAC Work performed under NASA contract NNG14HH33I.