Getting Started with GEDI L1B Data in Python

This tutorial demonstrates how to work with the Geolocated Waveform (GEDI01_B.001) data product.

The Global Ecosystem Dynamics Investigation (GEDI) mission aims to characterize ecosystem structure and dynamics to enable radically improved quantification and understanding of the Earth's carbon cycle and biodiversity. The GEDI instrument produces high resolution laser ranging observations of the 3-dimensional structure of the Earth. GEDI is attached to the International Space Station and collects data globally between 51.6$^{o}$ N and 51.6$^{o}$ S latitudes at the highest resolution and densest sampling of any light detection and ranging (lidar) instrument in orbit to date. The Land Processes Distributed Active Archive Center (LP DAAC) distributes the GEDI Level 1 and Level 2 products. The L1B and L2 GEDI products are archived and distributed in the HDF-EOS5 file format.


Use Case Example:

This tutorial was developed using an example use case for a project being completed by the National Park Service. The goal of the project is to use GEDI L1B data to observe GEDI waveforms over Redwood National Park in northern California.

This tutorial will show how to use Python to open GEDI L1B files, visualize the full orbit of GEDI points (shots), subset to a region of interest, visualize GEDI full waveforms, and export subsets of GEDI science dataset (SDS) layers as GeoJSON files that can be loaded into GIS and/or Remote Sensing software programs.


Data Used in the Example:


Topics Covered:

  1. Get Started
    1.1 Import Packages
    1.2 Set Up the Working Environment and Retrieve Files
  2. Import and Interpret Data
    2.1 Open a GEDI HDF5 File and Read File Metadata
    2.2 Read SDS Metadata and Subset by Beam
  3. Visualize a GEDI Orbit
    3.1 Subset by Layer and Create a Geodataframe
    3.2 Visualize a Geodataframe
  4. Subset and Visualize Waveforms
    4.1 Import and Extract Waveforms
    4.2 Visualize Waveforms
  5. Quality Filtering
  6. Plot Profile Transects
    6.1 Subset Beam Transects
    6.2 Plot Waveform Transects
  7. Spatial Visualization
    7.1 Import, Subset, and Quality Filter All Beams
    7.2 Spatial Subsetting
    7.3 Visualize All Beams: Elevation
  8. Export Subsets as GeoJSON Files

Before Starting this Tutorial:

Setup and Dependencies

It is recommended to use Conda, an environment manager to set up a compatible Python environment. Download Conda for your OS here: https://www.anaconda.com/download/. Once you have Conda installed, Follow the instructions below to successfully setup a Python environment on Linux, MacOS, or Windows.

This Python Jupyter Notebook tutorial has been tested using Python version 3.7. Conda was used to create the python environment.

If you do not have jupyter notebook installed, you may need to run:

conda install jupyter notebook

Having trouble getting a compatible Python environment set up? Contact LP DAAC User Services at: https://lpdaac.usgs.gov/lpdaac-contact-us/

If you prefer to not install Conda, the same setup and dependencies can be achieved by using another package manager such as pip.


Example Data:

This tutorial uses the GEDI L1B observation from June 19, 2019 (orbit 02932). Use the link below to download the file directly from the LP DAAC Data Pool:

You will need to have the file above downloaded into the same directory as this Jupyter Notebook in order to successfully run the code below.

Source Code used to Generate this Tutorial:

The repository containing all of the required files is located at: https://git.earthdata.nasa.gov/projects/LPDUR/repos/gedi-tutorials/browse

NOTE: This tutorial was developed for GEDI L1B HDF-EOS5 files and should only be used for that product.


1. Get Started

1.1 Import Packages

Import the required packages and set the input/working directory to run this Jupyter Notebook locally.

1.2 Set Up the Working Environment and Retrieve Files

The input directory is defined as the current working directory. Note that you will need to have the jupyter notebook and example data (.h5 and .geojson) stored in this directory in order to execute the tutorial successfully.

NOTE: If you have downloaded the tutorial materials to a different directory than the Jupyter Notebook, `inDir` above needs to be changed. You will also need to add a line: `os.chdir(inDir)` and execute it below.

In this section, the GEDI .h5 file has been downloaded to the inDir defined above. You will need to download the file directly from the LP DAAC Data Pool in order to execute this tutorial.


2. Import and Interpret Data

2.1 Open a GEDI HDF5 File and Read File Metadata

Read the file using h5py.

The standard format for GEDI filenames is as follows:

GEDI01_B: Product Short Name
2019170155833: Julian Date and Time of Acquisition (YYYYDDDHHMMSS)
O02932: Orbit Number
T02267: Track Number
02: Positioning and Pointing Determination System (PPDS) type (00 is predict, 01 rapid, 02 and higher is final)
003: GOC SDS (software) release number
01: Granule Production Version

Read in a GEDI HDF5 file using the h5py package.

The GEDI HDF5 file contains groups in which data and metadata are stored.

First, the METADATA group contains the file-level metadata.

This contains useful information such as the creation date, PGEVersion, and VersionID. Below, print the file-level metadata attributes.

2.2 Read SDS Metadata and Subset by Beam

The GEDI instrument consists of 3 lasers producing a total of 8 beam ground transects. The eight remaining groups contain data for each of the eight GEDI beam transects. For additional information, be sure to check out: https://gedi.umd.edu/instrument/specifications/.

One useful piece of metadata to retrieve from each beam transect is whether it is a full power beam or a coverage beam.

Below, pick one of the full power beams that will be used to retrieve GEDI L1B waveforms in Section 3.

Identify all the objects in the GEDI HDF5 file below.

Note: This step may take a while to complete.


3. Visualize a GEDI Orbit

In the section below, import GEDI L1B SDS layers into a GeoPandas GeoDataFrame for the beam specified above.

Use the latitude_bin0 and longitude_bin0 to create a shapely point for each GEDI shot location.

3.1 Subset by Layer and Create a Geodataframe

Read in the SDS and take a representative sample (every 100th shot) and append to lists, then use the lists to generate a pandas dataframe.

Above is a dataframe containing columns describing the beam, shot number, lat/lon location, and quality information about each shot.

Below, create an additional column called 'geometry' that contains a shapely point generated from each lat/lon location from the shot.

Next, convert to a Geopandas GeoDataFrame.

Pull out and plot an example shapely point below.

3.2 Visualize a GeoDataFrame

In this section, use the GeoDataFrame and the geoviews python package to spatially visualize the location of the GEDI shots on a basemap and import a geojson file of the spatial region of interest for the use case example: Redwood National Park.

Import a geojson of Redwood National Park as an additional GeoDataFrame. Note that you will need to have downloaded the geojson from the bitbucket repo containing this tutorial and have it saved in the same directory as this Jupyter Notebook.

Defining the vdims below will allow you to hover over specific shots and view information about them.

Below, combine a plot of the Redwood National Park Boundary (combine two geoviews plots using *) with the point visual mapping function defined above in order to plot (1) the representative GEDI shots, (2) the region of interest, and (3) a basemap layer.

Above is a good illustration of the full GEDI orbit (GEDI files are stored as one ISS orbit). One of the benefits of using geoviews is the interactive nature of the output plots. Use the tools to the right of the map above to zoom in and find the shots intersecting Redwood National Park.

(HINT: find where the orbit intersects the west coast of the United States)

Below is a screenshot of the region of interest:

alt text

Side Note: Wondering what the 0's and 1's for stale_return_flag and degrade mean?

We will show an example of how to quality filter GEDI data in section 5.

After finding one of the shots within Redwood NP, find the index for that shot number so that we can find the correct waveform to visualize in Section 4.

Shot: 29320619900465601

2932: Orbit Number
06: Beam Number
199: Minor frame number (0-241)
00465601: Shot number within orbit


4. Subset and Visualize Waveforms

In this section, learn how to extract and subset specific waveforms and plot them using holoviews.

4.1 Import and Extract Waveforms

In order to find and extract the full waveform for the exact index we are interested in, instead of importing the entire waveform dataset (over 1 billion values!), we will use the rx_sample_count and rx_sample_start_index to identify the location of the waveform that we are interested in visualizing and just extract those values.

Next, read how to use the rx_sample_count and rx_sample_start_index layers:

Use rx_sample_count and rx_sample_start_index to identify the location of each waveform in rxwaveform.

Next grab additional information about the shot, including the unique shot_number, and lat/lon location.

Put everything together to identify the waveform we want to extract:

In order to plot a full waveform, you also need to import the elevation recorded at bin0 (the first elevation recorded) and lastbin or the last elevation recorded for that waveform.

Extract the full waveform using the index start and count:

Below you can see why it is important to extract the specific waveform that you are interested in: almost 1.4 billion values are stored in the rxwaveform dataset!

4.2 Visualize Waveforms

Below, plot the extracted waveform using the elevation difference from bin0 to last_bin on the y axis, and the waveform energy returned or amplitude (digital number = dn) on the x axis.

Next, use the Holoviews python package (hv) to plot the waveform.

Congratulations! You have plotted your first waveform.

Above is a basic line plot showing amplitude (DN) as a function of elevation from the rxwaveform for the specific shot selected. Next, add additional chart elements and make the graph interactive.

As you can see, the selected shot does not look so interesting--this waveform shape looks more characteristic of some type of low canopy/bare ground than a tree canopy. If you zoom in to the GEDI shot it appears to be over the river itself or possibly a sand bar:

alt text

Next, plot a couple more waveforms and see if you can capture a shot over the forest.

Now that is starting to look more like a very tall, multi-layered tree canopy! Continue with the next shot:

Lastly, plot the transect of four consecutive waveforms:

Notice above moving west to east (left to right) along the transect you can see the elevation lowering and the tree canopy decreasing as it encounters the sandbar/river.

Below, use geoviews to plot the location of the four points to verify what is seen above.

Above, zoom in on the yellow dots until you can get a clearer view of all four shots.

In the screenshots below, we can match the location of each point with the associated waveform. It appears to confirm the hypothesis that the elevation is decreasing as the shots get closer to the river, and the forest canopy decreases until it is detecting the sand bar/river by the final waveform in the sequence.

alt text alt text


5. Quality Filtering

Now that you have the desired layers imported as a dataframe for the selected shots, let's perform quality filtering.

Below, remove any shots where the stale_return_flag is set to 1 (indicates that a "stale" cue point from the coarse search algorithm is being used) by defining those shots as nan.

The syntax of the line below can be read as: in the dataframe, find the rows "where" the stale return flag is not equal (ne) to 0. If a row (shot) does not meet the condition, set all values equal to nan for that row.

Below, quality filter even further by using the degrade flag (Greater than zero if the shot occurs during a degrade period, zero otherwise).

Below, drop all of the shots that did not pass the quality filtering standards outlined above from the transectDF.

Good news! It looks like all four of the example waveforms passed the initial quality filtering tests. For additional information on quality filtering GEDI data, be sure to check out: https://lpdaac.usgs.gov/resources/faqs/#how-should-i-quality-filter-gedi-l1b-l2b-data.


6. Plot Profile Transects

In this section, plot a transect subset using waveforms.

6.1 Subset Beam Transects

Subset down to a smaller transect centered on the waveforms analyzed in the sections above.

6.2 Plot Waveform Transects

In order to get an idea of the length of the beam transect that you are plotting, you can plot the x-axis as distance, which is calculated below.

In order to plot each vertical value for each waveform, you will need to reformat the data structure to match what is needed by holoviews Path() capabilities.

Good news again, it looks like all of the waveforms in our transect passed the quality filtering test.

Below, plot each waveform by using holoviews Path() function. This will plot each individual waveform value by distance, with the amplitude plotted in the third dimension in shades of green.

If you zoom in to a subset of the transect, you can get even better detail showing the structure of the canopy from the rxwaveform dataset, where denser portions of the canopy are seen in darker shades of green.

alt text


7. Spatial Visualization

Section 7 combines many of the techniques learned above including how to import GEDI datasets, perform quality filtering, spatial subsetting, and visualization.

7.1 Import, Subset, and Quality Filter All Beams

Below, re-open the GEDI L1B observation--but this time, loop through and import data for all 8 of the GEDI beams.

Loop through each of the desired datasets (SDS) for each beam, append to lists, and transform into a pandas DataFrame.

Notice that we did not import the rxwaveform dataset--due to the large size of that dataset, we will wait until after spatial subsetting to import the waveforms.

7.2 Spatial Subsetting

Below, subset the pandas dataframe using a simple bounding box region of interest. If you are interested in spatially clipping GEDI shots to a geojson region of interest, be sure to check out the GEDI-Subsetter python script available at: https://git.earthdata.nasa.gov/projects/LPDUR/repos/gedi-subsetter/browse.

Over 3.5 million shots are contained in this single GEDI orbit! Below subset down to only the shots falling within this small bounding box encompassing Redwood National Park. RedwoodNP our geopandas geodataframe can be called for the "envelope" or smallest bounding box encompassing the entire region of interest. Here, use that as the bounding box for subsetting the GEDI shots.

Filter by the bounding box, which is done similarly to filtering by quality in section 5 above.

NOTE: It may take up to a few minutes to run the cell above.

Notice you have drastically reduced the number of shots you are working with (which will greatly enhance your experience in plotting them below). But first, remove any poor quality shots that exist within the ROI.

Next create a Shapely Point out of each shot and insert it as the geometry column in the [soon to be geo]dataframe.

7.3 Visualize All Beams: Elevation

Now, using the pointVisual function defined in section 3.2, plot the geopandas GeoDataFrame using geoviews.

Feel free to pan and zoom in to the GEDI shots in yellow.

Now let's not only plot the points in the geodataframe but also add a colormap for Elevation bin0 (m).


8. Export Subsets as GeoJSON Files

In this section, export the GeoDataFrame as a .geojson file that can be easily opened in your favorite remote sensing and/or GIS software and will include an attribute table with all of the shots/values for each of the SDS layers in the dataframe.

First, import the waveforms for the subset of shots to be exported.

Success! You have now learned how to start working with GEDI L1B files in Python as well as some interesting strategies for visualizing those data in order to better understand your specific region of interest. Using this jupyter notebook as a workflow, you should now be able to switch to GEDI files over your specific region of interest and re-run the notebook. Good Luck!

Contact Information

Material written by Cole Krehbiel1

    Contact: LPDAAC@usgs.gov
    Voice: +1-605-594-6116
    Organization: Land Processes Distributed Active Archive Center (LP DAAC)
    Website: https://lpdaac.usgs.gov/
    Date last modified: 05-11-2021
1KBR Inc., contractor to the U.S. Geological Survey, Earth Resources Observation and Science (EROS) Center, Sioux Falls, South Dakota, 57198-001, USA. Work performed under USGS contract G15PD00467 for LP DAAC2. 2LP DAAC Work performed under NASA contract NNG14HH33I.