import os
import h5py
import numpy as np
import pandas as pd
import geopandas as gp
from shapely.geometry import Point
import geoviews as gv
from geoviews import opts, tile_sources as gvts
import holoviews as hv
gv.extension('bokeh', 'matplotlib')


inDir = os.getcwd() + os.sep  # Set input directory to the current working directory


gediFiles = [g for g in os.listdir() if g.startswith('GEDI02_B') and g.endswith('.h5')]  # List all GEDI L2B .h5 files in inDir
gediFiles

['GEDI02_B_2019170155833_O02932_T02267_02_001_01.h5']


L2B = 'GEDI02_B_2019170155833_O02932_T02267_02_001_01.h5'
L2B

'GEDI02_B_2019170155833_O02932_T02267_02_001_01.h5'


gediL2B = h5py.File(L2B, 'r')  # Read file using h5py


list(gediL2B.keys())

['BEAM0000',
 'BEAM0001',
 'BEAM0010',
 'BEAM0011',
 'BEAM0101',
 'BEAM0110',
 'BEAM1000',
 'BEAM1011',
 'METADATA']


list(gediL2B['METADATA'])

['DatasetIdentification']


for g in gediL2B['METADATA']['DatasetIdentification'].attrs: print(g)

PGEVersion
VersionID
abstract
characterSet
creationDate
credit
fileName
language
originatorOrganizationName
purpose
shortName
spatialRepresentationType
status
topicCategory
uuid


print(gediL2B['METADATA']['DatasetIdentification'].attrs['purpose'])

The purpose of the L2B dataset is to extract biophysical metrics from each GEDI waveform. These metrics are based on the directional gap probability profile derived from the L1B waveform and include canopy cover, Plant Area Index (PAI), Plant Area Volume Density (PAVD) and Foliage Height Diversity (FHD).


beamNames = [g for g in gediL2B.keys() if g.startswith('BEAM')]
beamNames

['BEAM0000',
 'BEAM0001',
 'BEAM0010',
 'BEAM0011',
 'BEAM0101',
 'BEAM0110',
 'BEAM1000',
 'BEAM1011']


for g in gediL2B['BEAM0000'].attrs: print(g)

description
wp-l2-l2b_githash
wp-l2-l2b_version


for b in beamNames: 
    print(f"{b} is a {gediL2B[b].attrs['description']}")

BEAM0000 is a Coverage beam
BEAM0001 is a Coverage beam
BEAM0010 is a Coverage beam
BEAM0011 is a Coverage beam
BEAM0101 is a Full power beam
BEAM0110 is a Full power beam
BEAM1000 is a Full power beam
BEAM1011 is a Full power beam


beamNames = ['BEAM0110']


gediL2B_objs = []
gediL2B.visit(gediL2B_objs.append)                                           # Retrieve list of datasets
gediSDS = [o for o in gediL2B_objs if isinstance(gediL2B[o], h5py.Dataset)]  # Search for relevant SDS inside data file
[i for i in gediSDS if beamNames[0] in i][0:10]                              # Print the first 10 datasets for selected beam

['BEAM0110/algorithmrun_flag',
 'BEAM0110/ancillary/dz',
 'BEAM0110/ancillary/l2a_alg_count',
 'BEAM0110/ancillary/maxheight_cuttoff',
 'BEAM0110/ancillary/rg_eg_constraint_center_buffer',
 'BEAM0110/ancillary/rg_eg_mpfit_max_func_evals',
 'BEAM0110/ancillary/rg_eg_mpfit_maxiters',
 'BEAM0110/ancillary/rg_eg_mpfit_tolerance',
 'BEAM0110/ancillary/signal_search_buff',
 'BEAM0110/ancillary/tx_noise_stddev_multiplier']


lonSample, latSample, shotSample, qualitySample, beamSample = [], [], [], [], []  # Set up lists to store data

# Open the SDS
lats = gediL2B[f'{beamNames[0]}/geolocation/lat_lowestmode'][()]
lons = gediL2B[f'{beamNames[0]}/geolocation/lon_lowestmode'][()]
shots = gediL2B[f'{beamNames[0]}/geolocation/shot_number'][()]
quality = gediL2B[f'{beamNames[0]}/l2b_quality_flag'][()]

# Take every 100th shot and append to list
for i in range(len(shots)):
    if i % 100 == 0:
        shotSample.append(str(shots[i]))
        lonSample.append(lons[i])
        latSample.append(lats[i])
        qualitySample.append(quality[i])
        beamSample.append(beamNames[0])
            
# Write all of the sample shots to a dataframe
latslons = pd.DataFrame({'Beam': beamSample, 'Shot Number': shotSample, 'Longitude': lonSample, 'Latitude': latSample,
                         'Quality Flag': qualitySample})
latslons


# Clean up variables that will no longer be needed
del beamSample, quality, qualitySample, gediL2B_objs, latSample, lats, lonSample, lons, shotSample, shots


# Take the lat/lon dataframe and convert each lat/lon to a shapely point
latslons['geometry'] = latslons.apply(lambda row: Point(row.Longitude, row.Latitude), axis=1)


# Convert to a Geodataframe
latslons = gp.GeoDataFrame(latslons)
latslons = latslons.drop(columns=['Latitude','Longitude'])
latslons['geometry']

0       POINT (111.99630 -51.80387)
1       POINT (112.03913 -51.80391)
2       POINT (112.08027 -51.80384)
3       POINT (112.12145 -51.80374)
4       POINT (112.16262 -51.80362)
                   ...             
9792     POINT (88.20845 -51.80358)
9793     POINT (88.24961 -51.80361)
9794     POINT (88.29075 -51.80358)
9795     POINT (88.33191 -51.80355)
9796     POINT (88.37309 -51.80351)
Name: geometry, Length: 9797, dtype: geometry


latslons['geometry'][0]


# Define a function for visualizing GEDI points
def pointVisual(features, vdims):
    return (gvts.EsriImagery * gv.Points(features, vdims=vdims).options(tools=['hover'], height=500, width=900, size=5, 
                                                                        color='yellow', fontsize={'xticks': 10, 'yticks': 10, 
                                                                                                  'xlabel':16, 'ylabel': 16}))


redwoodNP = gp.GeoDataFrame.from_file('RedwoodNP.geojson')  # Import geojson as GeoDataFrame


redwoodNP


redwoodNP['geometry'][0]  # Plot GeoDataFrame


# Create a list of geodataframe columns to be included as attributes in the output map
vdims = []
for f in latslons:
    if f not in ['geometry']:
        vdims.append(f)
vdims

['Beam', 'Shot Number', 'Quality Flag']


# Call the function for plotting the GEDI points
gv.Polygons(redwoodNP['geometry']).opts(line_color='red', color=None) * pointVisual(latslons, vdims = vdims)


print(f"Quality Flag: {gediL2B[b]['l2b_quality_flag'].attrs['description']}")

Quality Flag: Flag simpilfying selection of most useful data for Level 2B


del latslons  # No longer need the geodataframe used to visualize the full GEDI orbit


len(gediSDS)

1488


beamNames

['BEAM0110']


beamSDS = [g for g in gediSDS if beamNames[0] in g]  # Subset to a single beam
len(beamSDS)

186


shot = 29320619500465599


index = np.where(gediL2B[f'{beamNames[0]}/shot_number'][()]==shot)[0][0]  # Set the index for the shot identified above
index

465598


pavd = gediL2B[[g for g in beamSDS if g.endswith('/pavd_z')][0]]  # PAVD


print(f"Plant Area Volume Density is {pavd.attrs['description']}")

Plant Area Volume Density is Vertical Plant Area Volume Density profile with a vertical step size of dZ


# Grab vertical step size 
dz = gediL2B[f'{beamNames[0]}/ancillary/dz'][0]
dz

5.0


print(f"The shape of PAVD is {pavd.shape}.")

The shape of PAVD is (979699, 30).


# Bring in the desired SDS
elev = gediL2B[f'{beamNames[0]}/geolocation/elev_lowestmode'][()]  # Latitude
lats = gediL2B[f'{beamNames[0]}/geolocation/lat_lowestmode'][()]  # Latitude
lons = gediL2B[f'{beamNames[0]}/geolocation/lon_lowestmode'][()]  # Longitude


shotElev = elev[index]
shotLat = lats[index]
shotLon = lons[index]
shotPAVD = pavd[index]


print(f"The shot is located at: {str(shotLat)}, {str(shotLon)} (shot ID: {shot}, index {index}) and is from {beamNames[0]}.")

The shot is located at: 41.28472739326018, -124.03109998658007 (shot ID: 29320619500465599, index 465598) and is from BEAM0110.


pavdAll = []
pavdElev = []

for i, e in enumerate(range(len(shotPAVD))):
    if shotPAVD[i] > 0:
        pavdElev.append((shot, shotElev + dz * i, shotPAVD[i]))  # Append tuple of shot number, elevation, and PAVD
pavdAll.append(pavdElev)                                         # Append to final list


path1 = hv.Path(pavdAll, vdims='PAVD').options(color='PAVD', clim=(0,0.13), cmap='Greens', line_width=20, colorbar=True, 
                                               width=700, height=550, clabel='PAVD', xlabel='Shot Number',
                                               ylabel='Elevation (m)', fontsize={'title':16, 'xlabel':16, 'ylabel': 16,
                                                                                 'xticks':12, 'yticks':12, 
                                                                                 'clabel':12, 'cticks':10})
path1


# Open all of the desired SDS
dem = gediL2B[[g for g in beamSDS if g.endswith('/digital_elevation_model')][0]][()]
zElevation = gediL2B[[g for g in beamSDS if g.endswith('/elev_lowestmode')][0]][()]
zHigh = gediL2B[[g for g in beamSDS if g.endswith('/elev_highestreturn')][0]][()]
zLat = gediL2B[[g for g in beamSDS if g.endswith('/lat_lowestmode')][0]][()]
zLon = gediL2B[[g for g in beamSDS if g.endswith('/lon_lowestmode')][0]][()]
canopyHeight = gediL2B[[g for g in beamSDS if g.endswith('/rh100')][0]][()]
quality = gediL2B[[g for g in beamSDS if g.endswith('/l2b_quality_flag')][0]][()]
degrade = gediL2B[[g for g in beamSDS if g.endswith('/degrade_flag')][0]][()]
sensitivity = gediL2B[[g for g in beamSDS if g.endswith('/sensitivity')][0]][()]
pavd = gediL2B[f'{beamNames[0]}/pavd_z'][()]
shotNums = gediL2B[f'{beamNames[0]}/shot_number'][()]

# Create a shot index
shotIndex = np.arange(shotNums.size)


canopyHeight = canopyHeight / 100  # Convert RH100 from cm to m


print(f"The shape of Canopy Height is {canopyHeight.shape} vs. the shape of PAVD, which is {pavd.shape}.")

The shape of Canopy Height is (979699,) vs. the shape of PAVD, which is (979699, 30).


# Set up an empty list to append to 
pavdA = []
for i in range(len(pavd)):
    
    # If any of the values are fill value, set to nan
    pavdF = [np.nan]
    for p in range(len(pavd[i])):
        if pavd[i][p]!= -9999:
            pavdF.append(pavd[i][p])  # If the value is not fill value, append to list
    pavdA.append(pavdF)               # Append back to master list


len(pavdA)

979699


# Take the DEM, GEDI-produced Elevation, and Canopy height and add to a Pandas dataframe
transectDF = pd.DataFrame({'Shot Index': shotIndex, 'Shot Number': shotNums, 'Latitude': zLat, 'Longitude': zLon, 
                           'Tandem-X DEM': dem, 'Elevation (m)': zElevation, 'Canopy Elevation (m)': zHigh, 
                           'Canopy Height (rh100)': canopyHeight, 'Quality Flag': quality, 'Degrade Flag': degrade, 
                           'Plant Area Volume Density': pavdA, 'Sensitivity': sensitivity})


transectDF


# Plot Canopy Height
canopyVis = hv.Scatter((transectDF['Shot Index'], transectDF['Canopy Height (rh100)']))
canopyVis.opts(color='darkgreen', height=500, width=900, title=f'GEDI L2B Full Transect {beamNames[0]}',
               fontsize={'title':16, 'xlabel':16, 'ylabel': 16}, size=0.1, xlabel='Shot Index', ylabel='Canopy Height (m)')


del canopyVis, canopyHeight, degrade, dem, pavd, pavdA, quality, sensitivity, shotIndex, shotNums, zElevation, zHigh, zLat, zLon


transectDF = transectDF.where(transectDF['Quality Flag'].ne(0))  # Set any poor quality returns to NaN


transectDF


transectDF = transectDF.where(transectDF['Degrade Flag'].ne(1))
transectDF = transectDF.where(transectDF['Sensitivity'] > 0.95)


transectDF = transectDF.dropna()  # Drop all of the rows (shots) that did not pass the quality filtering above


print(f"Quality filtering complete, {len(transectDF)} high quality shots remaining.")

Quality filtering complete, 66317 high quality shots remaining.


# Plot Digital Elevation Model
demVis = hv.Scatter((transectDF['Shot Index'], transectDF['Tandem-X DEM']), label='Tandem-X DEM')
demVis = demVis.opts(color='black', height=500, width=900, fontsize={'xlabel':16, 'ylabel': 16}, size=1.5)


# Plot GEDI-Retrieved Elevation
zVis = hv.Scatter((transectDF['Shot Index'], transectDF['Elevation (m)']), label='GEDI-derived Elevation')
zVis = zVis.opts(color='saddlebrown', height=500, width=900, fontsize={'xlabel':16, 'ylabel': 16}, size=1.5)


# Plot Canopy Top Elevation
rhVis = hv.Scatter((transectDF['Shot Index'], transectDF['Canopy Elevation (m)']), label='Canopy Top Elevation')
rhVis = rhVis.opts(color='darkgreen', height=500, width=900, fontsize={'xlabel':16, 'ylabel': 16}, size=1.5, 
                   tools=['hover'], xlabel='Shot Index', ylabel='Elevation (m)')


# Combine all three scatterplots
(demVis * zVis * rhVis).opts(show_legend=True, legend_position='top_left',fontsize={'title':15, 'xlabel':16, 'ylabel': 16}, 
                             title=f'{beamNames[0]} Full Transect: {L2B.split(".")[0]}')


print(index)

465598


# Grab 50 points before and after the shot visualized above
start = index - 50
end = index + 50


print(f"The transect begins at ({transectDF['Latitude'][start]}, {transectDF['Longitude'][start]}) and ends at ({transectDF['Latitude'][end]}, {transectDF['Longitude'][end]}).")

The transect begins at (41.26951477815523, -124.05868759659765) and ends at (41.299873598763384, -124.00358737366548).


transectDF = transectDF.loc[start:end]  # Subset the Dataframe to only the selected region of interest over Redwood NP


# Calculate along-track distance
distance = np.arange(0.0, len(transectDF.index) * 60, 60)  # GEDI Shots are spaced 60 m apart
transectDF['Distance'] = distance                          # Add Distance as a new column in the dataframe


pavdAll = []
for j, s in enumerate(transectDF.index):
    pavdShot = transectDF['Plant Area Volume Density'][s]
    elevShot = transectDF['Elevation (m)'][s]
    pavdElev = []
    
    # Remove fill values
    if np.isnan(pavdShot).all():
        continue
    else:
        del pavdShot[0]
    for i, e in enumerate(range(len(pavdShot))):
        if pavdShot[i] > 0:
            pavdElev.append((distance[j], elevShot + dz * i, pavdShot[i]))  # Append tuple of distance, elevation, and PAVD
    pavdAll.append(pavdElev)                                                # Append to final list


canopyElevation = [p[-1][1] for p in pavdAll]  # Grab the canopy elevation by selecting the last value in each PAVD


import warnings
warnings.filterwarnings('ignore')
path1 = hv.Path(pavdAll, vdims='PAVD').options(color='PAVD', clim=(0,0.3), cmap='Greens', line_width=8, colorbar=True, 
                                               width=950, height=500, clabel='PAVD', xlabel='Distance Along Transect (m)',
                                               ylabel='Elevation (m)', fontsize={'title':16, 'xlabel':16, 'ylabel': 16,
                                                                                 'xticks':12, 'yticks':12, 
                                                                                 'clabel':12, 'cticks':10})
path1


path2 = hv.Curve((distance, transectDF['Elevation (m)']), label='Ground Elevation').options(color='black', line_width=2)
path3 = hv.Curve((distance, canopyElevation), label='Canopy Top Elevation').options(color='grey', line_width=1.5)


# Plot all three together
path = path1 * path2 * path3
path.opts(height=500,width=980, ylim=(min(transectDF['Elevation (m)']) - 5, max(canopyElevation) + 5),
          xlabel='Distance Along Transect (m)', ylabel='Elevation (m)', legend_position='bottom_right',
          fontsize={'title':15, 'xlabel':15, 'ylabel': 15, 'xticks': 14, 'yticks': 14, 'legend': 14}, 
          title=f'GEDI L2B {beamNames[0]} PAVD over Redwood National Park on June 19, 2019')


del distance, canopyElevation, pavdAll, pavdElev, pavdShot, transectDF


beamNames = [g for g in gediL2B.keys() if g.startswith('BEAM')]


beamNames

['BEAM0000',
 'BEAM0001',
 'BEAM0010',
 'BEAM0011',
 'BEAM0101',
 'BEAM0110',
 'BEAM1000',
 'BEAM1011']


# Set up lists to store data
shotNum, dem, zElevation, zHigh, zLat, zLon, canopyHeight, quality, degrade, sensitivity, pai, beamI = ([] for i in range(12))


# Loop through each beam and open the SDS needed
for b in beamNames:
    [shotNum.append(h) for h in gediL2B[[g for g in gediSDS if g.endswith('/shot_number') and b in g][0]][()]]
    [dem.append(h) for h in gediL2B[[g for g in gediSDS if g.endswith('/digital_elevation_model') and b in g][0]][()]]
    [zElevation.append(h) for h in gediL2B[[g for g in gediSDS if g.endswith('/elev_lowestmode') and b in g][0]][()]]  
    [zHigh.append(h) for h in gediL2B[[g for g in gediSDS if g.endswith('/elev_highestreturn') and b in g][0]][()]]  
    [zLat.append(h) for h in gediL2B[[g for g in gediSDS if g.endswith('/lat_lowestmode') and b in g][0]][()]]  
    [zLon.append(h) for h in gediL2B[[g for g in gediSDS if g.endswith('/lon_lowestmode') and b in g][0]][()]]  
    [canopyHeight.append(h) for h in gediL2B[[g for g in gediSDS if g.endswith('/rh100') and b in g][0]][()]]  
    [quality.append(h) for h in gediL2B[[g for g in gediSDS if g.endswith('/l2b_quality_flag') and b in g][0]][()]]  
    [degrade.append(h) for h in gediL2B[[g for g in gediSDS if g.endswith('/degrade_flag') and b in g][0]][()]]  
    [sensitivity.append(h) for h in gediL2B[[g for g in gediSDS if g.endswith('/sensitivity') and b in g][0]][()]]  
    [beamI.append(h) for h in [b] * len(gediL2B[[g for g in gediSDS if g.endswith('/shot_number') and b in g][0]][()])]  
    [pai.append(h) for h in gediL2B[f'{b}/pai'][()]]


# Convert lists to Pandas dataframe
allDF = pd.DataFrame({'Shot Number': shotNum, 'Beam': beamI, 'Latitude': zLat, 'Longitude': zLon, 'Tandem-X DEM': dem,
                      'Elevation (m)': zElevation, 'Canopy Elevation (m)': zHigh, 'Canopy Height (rh100)': canopyHeight,
                      'Quality Flag': quality, 'Plant Area Index': pai,'Degrade Flag': degrade, 'Sensitivity': sensitivity})


del beamI, canopyHeight, degrade, dem, gediSDS, pai, quality, sensitivity, zElevation, zHigh, zLat, zLon, shotNum


len(allDF)

3547051


redwoodNP.envelope[0].bounds

(-124.16015705494489,
 41.080601363502545,
 -123.84950230520286,
 41.83981133687605)


minLon, minLat, maxLon, maxLat = redwoodNP.envelope[0].bounds  # Define the min/max lat/lon from the bounds of Redwood NP


allDF = allDF.where(allDF['Latitude'] > minLat)
allDF = allDF.where(allDF['Latitude'] < maxLat)
allDF = allDF.where(allDF['Longitude'] > minLon)
allDF = allDF.where(allDF['Longitude'] < maxLon)


allDF = allDF.dropna()  # Drop shots outside of the ROI


len(allDF)

4477


# Set any poor quality returns to NaN
allDF = allDF.where(allDF['Quality Flag'].ne(0))
allDF = allDF.where(allDF['Degrade Flag'].ne(1))
allDF = allDF.where(allDF['Sensitivity'] > 0.95)
allDF = allDF.dropna()
len(allDF)

2077


# Take the lat/lon dataframe and convert each lat/lon to a shapely point
allDF['geometry'] = allDF.apply(lambda row: Point(row.Longitude, row.Latitude), axis=1)


# Convert to geodataframe
allDF = gp.GeoDataFrame(allDF)
allDF = allDF.drop(columns=['Latitude','Longitude'])


allDF['Shot Number'] = allDF['Shot Number'].astype(str)  # Convert shot number to string

vdims = []
for f in allDF:
    if f not in ['geometry']:
        vdims.append(f)

visual = pointVisual(allDF, vdims = vdims)
visual * gv.Polygons(redwoodNP['geometry']).opts(line_color='red', color=None)


allDF['Canopy Height (rh100)'] = allDF['Canopy Height (rh100)'] / 100  # Convert canopy height from cm to m


# Plot the basemap and geoviews Points, defining the color as the Canopy Height for each shot
(gvts.EsriImagery * gv.Points(allDF, vdims=vdims).options(color='Canopy Height (rh100)',cmap='plasma', size=3, tools=['hover'],
                                                          clim=(0,102), colorbar=True, clabel='Meters',
                                                          title='GEDI Canopy Height over Redwood National Park: June 19, 2019',
                                                          fontsize={'xticks': 10, 'yticks': 10, 'xlabel':16, 'clabel':12,
                                                                    'cticks':10,'title':16,'ylabel':16})).options(height=500,
                                                                                                                  width=900)


(gvts.EsriImagery * gv.Points(allDF, vdims=vdims).options(color='Elevation (m)',cmap='terrain', size=3, tools=['hover'],
                                                          clim=(min(allDF['Elevation (m)']), max(allDF['Elevation (m)'])),
                                                          colorbar=True, clabel='Meters',
                                                          title='GEDI Elevation over Redwood National Park: June 19, 2019',
                                                          fontsize={'xticks': 10, 'yticks': 10, 'xlabel':16, 'clabel':12,
                                                                    'cticks':10,'title':16,'ylabel':16})).options(height=500,
                                                                                                                  width=900)


(gvts.EsriImagery * gv.Points(allDF, vdims=vdims).options(color='Plant Area Index',cmap='Greens', size=3, tools=['hover'],
                                                          clim=(0,1), colorbar=True, clabel='m2/m2',
                                                          title='GEDI PAI over Redwood National Park: June 19, 2019',
                                                          fontsize={'xticks': 10, 'yticks': 10, 'xlabel':16, 'clabel':12,
                                                                    'cticks':10,'title':16,'ylabel':16})).options(height=500,
                                                                                                                  width=900)


gediL2B.filename  # L2B Filename

'GEDI02_B_2019170155833_O02932_T02267_02_001_01.h5'


outName = gediL2B.filename.replace('.h5', '.json')  # Create an output file name using the input file name
outName

'GEDI02_B_2019170155833_O02932_T02267_02_001_01.json'


allDF.to_file(outName, driver='GeoJSON')  # Export to GeoJSON


del allDF

	Beam	Shot Number	Longitude	Latitude	Quality Flag
0	BEAM0110	29320618800000001	111.996300	-51.803868	0
1	BEAM0110	29320604600000101	112.039132	-51.803905	0
2	BEAM0110	29320614600000201	112.080271	-51.803836	0
3	BEAM0110	29320600400000301	112.121445	-51.803737	0
4	BEAM0110	29320610400000401	112.162622	-51.803621	0
...	...	...	...	...	...
9792	BEAM0110	29320617400979201	88.208452	-51.803578	0
9793	BEAM0110	29320603200979301	88.249610	-51.803614	0
9794	BEAM0110	29320613200979401	88.290753	-51.803581	0
9795	BEAM0110	29320623200979501	88.331913	-51.803548	0
9796	BEAM0110	29320609000979601	88.373089	-51.803506	0

	Shot Index	Shot Number	Latitude	Longitude	Tandem-X DEM	Elevation (m)	Canopy Elevation (m)	Canopy Height (rh100)	Quality Flag	Degrade Flag	Plant Area Volume Density	Sensitivity
0	0	29320618800000001	-51.803868	111.996300	-999999.0	21242.515625	21242.515625	0.0	0	0	[nan]	-3.436965
1	1	29320618900000002	-51.803867	111.996712	-999999.0	21242.505859	21242.505859	0.0	0	0	[nan]	30.496670
2	2	29320619000000003	-51.803867	111.997123	-999999.0	21242.496094	21242.496094	0.0	0	0	[nan]	8.071431
3	3	29320619100000004	-51.803867	111.997535	-999999.0	21242.484375	21242.484375	0.0	0	0	[nan]	-212.896439
4	4	29320619200000005	-51.803866	111.997946	-999999.0	21242.474609	21242.474609	0.0	0	0	[nan]	-6.853874
...	...	...	...	...	...	...	...	...	...	...	...	...
979694	979694	29320618400979695	-51.803445	88.411747	-999999.0	18017.906250	18017.906250	0.0	0	0	[nan]	22.138037
979695	979695	29320618500979696	-51.803445	88.412159	-999999.0	18017.296875	18017.296875	0.0	0	0	[nan]	4.475757
979696	979696	29320618600979697	-51.803444	88.412570	-999999.0	18017.884766	18017.884766	0.0	0	0	[nan]	10.112548
979697	979697	29320618700979698	-51.803444	88.412981	-999999.0	18017.275391	18017.275391	0.0	0	0	[nan]	424.691803
979698	979698	29320618800979699	-51.803443	88.413393	-999999.0	18017.263672	18017.263672	0.0	0	0	[nan]	15.887813

Getting Started with GEDI L2B Data in Python¶

This tutorial demonstrates how to work with the Canopy Cover and Vertical Profile Metrics (GEDI02_B.001) data product.¶

Use Case Example:¶

Data Used in the Example:¶

Topics Covered:¶

Before Starting this Tutorial:¶

Setup and Dependencies¶

Having trouble getting a compatible Python environment set up? Contact LP DAAC User Services at: https://lpdaac.usgs.gov/lpdaac-contact-us/¶

Example Data:¶

A NASA Earthdata Login account is required to download the data used in this tutorial. You can create an account at the link provided.¶

You will need to have the file above downloaded into the same directory as this Jupyter Notebook in order to successfully run the code below.¶

Source Code used to Generate this Tutorial:¶

1. Get Started ¶

1.1 Import Packages ¶

Import the required packages and set the input/working directory to run this Jupyter Notebook locally.¶

1.2 Set Up the Working Environment and Retrieve Files¶

The input directory is defined as the current working directory. Note that you will need to have the jupyter notebook and example data (.h5 and .geojson) stored in this directory in order to execute the tutorial successfully.¶

In this section, a GEDI .h5 file has been downloaded to the inDir defined above. You will need to download the file directly from the LP DAAC Data Pool in order to execute this tutorial.¶

Direct Link to file:¶

2. Import and Interpret Data ¶

2.1 Open a GEDI HDF5 File and Read File Metadata ¶

Read the file using h5py.¶

The standard format for GEDI filenames is as follows:¶

Read in a GEDI HDF5 file using the h5py package.¶

Navigate the HDF5 file below.¶

The GEDI HDF5 file contains groups in which data and metadata are stored.¶

First, the METADATA group contains the file-level metadata.¶

2.2 Read SDS Metadata and Subset by Beam ¶

The GEDI instrument consists of 3 lasers producing a total of 8 beam ground transects. The eight remaining groups contain data for each of the eight GEDI beam transects. For additional information, be sure to check out: https://gedi.umd.edu/instrument/specifications/.¶

One useful piece of metadata to retrieve from each beam transect is whether it is a full power beam or a coverage beam.¶

Below, pick one of the full power beams that will be used to retrieve GEDI L2B shots in Section 3.¶

Identify all the objects in the GEDI HDF5 file below.¶

3. Visualize a GEDI Orbit ¶

In the section below, import GEDI L2B SDS layers into a GeoPandas GeoDataFrame for the beam specified above.¶

Use the lat_lowestmode and lon_lowestmode to create a shapely point for each GEDI shot location.¶

3.1 Subset by Layer and Create a Geodataframe ¶

Read in the SDS and take a representative sample (every 100th shot) and append to lists, then use the lists to generate a pandas dataframe.¶

Above is a dataframe containing columns describing the beam, shot number, lat/lon location, and quality information about each shot.¶

Below, create an additional column called 'geometry' that contains a shapely point generated from each lat/lon location from the shot.¶

Next, convert to a Geopandas GeoDataFrame.¶

Pull out and plot an example shapely point below.¶

3.2 Visualize a GeoDataFrame ¶

In this section, use the GeoDataFrame and the geoviews python package to spatially visualize the location of the GEDI shots on a basemap and import a geojson file of the spatial region of interest for the use case example: Redwood National Park.¶

Import a geojson of Redwood National Park as an additional GeoDataFrame. Note that you will need to have downloaded the geojson from the bitbucket repo containing this tutorial and have it saved in the same directory as this Jupyter Notebook.¶

Defining the vdims below will allow you to hover over specific shots and view information about them.¶

Below, combine a plot of the Redwood National Park Boundary (combine two geoviews plots using *) with the point visual mapping function defined above in order to plot (1) the representative GEDI shots, (2) the region of interest, and (3) a basemap layer.¶

Above is a good illustration of the full GEDI orbit (GEDI files are stored as one ISS orbit). One of the benefits of using geoviews is the interactive nature of the output plots. Use the tools to the right of the map above to zoom in and find the shots intersecting Redwood National Park.¶

Below is a screenshot of the region of interest:¶

Side Note: Wondering what the 0's and 1's for l2b_quality_flag mean?¶

Above, 0 is poor quality and a quality_flag value of 1 indicates the laser shot meets criteria based on energy, sensitivity, amplitude, and real-time surface tracking quality. We will show an example of how to quality filter GEDI data in section 5.1.¶

After finding one of the shots within Redwood NP, find the index for that shot number so that we can find the correct shot to visualize in Section 4.¶

Shot: 29320619900465601¶

4. Work with GEDI L2B Data ¶

The L2B product contains biophysical information derived from the geolocated GEDI return waveforms including total and vertical profiles of canopy cover and Plant Area Index (PAI), the vertical Plant Area Volume Density (PAVD) profile, and Foliage Height Diversity (FHD).¶

4.1 Import and Extract Specific Shots¶

Notice that there are over a thousand datasets available in the GEDI L2B product. In the code blocks below, you will subset to just a few of the datasets available.¶

In this section, learn how to extract and subset specific shots and plot Plant Area Volume Density (PAVD) using holoviews.¶

We will set the shot index used as an example from the GEDI L1B Tutorial and GEDI L2A Tutorial to show how to subset a single shot of GEDI L2B data.¶

4.2 Visualize PAVD¶

In section 4.2, import the PAVD metrics (pavd_z) and begin exploring how to plot them.¶

Print the description for the PAVD dataset.¶

Below, open the dz layer in order to define the correct vertical step size.¶

So the vertical step size is 5.0 meters.¶

And it looks like PAVD includes 30 "steps" in each shot, describing the PAVD at height = step # * dz.¶

Now, bring in other useful L2B datasets such as elev_lowestmode, lat_lowestmode and lon_lowestmode.¶

Grab the location, elevation, and PAVD metrics for the shot defined above:¶

Put everything together to identify the shot that we want to extract:¶

Next, reformat PAVD into a list of tuples containing each PAVD value and height.¶

Below, plot each shot by using holoviews Path() function, with the PAVD plotted in the third dimension in shades of green.¶

Congratulations! You have plotted your first PAVD profile.¶

5. Work with GEDI L2B Beam Transects¶

Next, import a number of desired SDS layers for BEAM0110 (for the entire orbit) and create a pandas Dataframe to store the arrays.¶

In the GEDI L2B product, Canopy Height is stored in units (cm), so below convert to meters.¶

As mentioned in the sections above, Plant Area Volume Density (pavd) is defined as the Vertical Plant Area Volume Density profile with a vertical step size of dZ. Below, reformat the shape of the PAVD layer in order to add it to the dataframe below.¶

Above, notice that unlike a SDS layer like Canopy Height, which has a single value for each shot, PAVD has 30 values (representing different vertical heights) for each shot.¶

Below, reformat the data into a list of values for each shot.¶

Note: The cell above may take up to a minute to process.¶

Below, notice the reformatted PAVD layer, which should now fit into the dataframe created below.¶

Notice the unusual values listed above--those shots are flagged as poor quality and will be removed in Section 5.1.¶

Now that you have the desired SDS into a pandas dataframe, begin plotting the entire beam transect:¶

1.2 Set Up the Working Environment and Retrieve Files ¶

In this section, a GEDI .h5 file has been downloaded to the `inDir` defined above. You will need to download the file directly from the LP DAAC Data Pool in order to execute this tutorial.¶

Read the file using `h5py`.¶

Read in a GEDI HDF5 file using the `h5py` package.¶

First, the `METADATA` group contains the file-level metadata.¶

In the section below, import GEDI L2B SDS layers into a `GeoPandas` GeoDataFrame for the beam specified above.¶

Use the `lat_lowestmode` and `lon_lowestmode` to create a `shapely` point for each GEDI shot location.¶

Read in the SDS and take a representative sample (every 100th shot) and append to lists, then use the lists to generate a `pandas` dataframe.¶

Below, create an additional column called 'geometry' that contains a `shapely` point generated from each lat/lon location from the shot.¶

Next, convert to a `Geopandas` GeoDataFrame.¶

Pull out and plot an example `shapely` point below.¶

In this section, use the GeoDataFrame and the `geoviews` python package to spatially visualize the location of the GEDI shots on a basemap and import a geojson file of the spatial region of interest for the use case example: Redwood National Park.¶

Below, combine a plot of the Redwood National Park Boundary (combine two `geoviews` plots using `*`) with the point visual mapping function defined above in order to plot (1) the representative GEDI shots, (2) the region of interest, and (3) a basemap layer.¶

Side Note: Wondering what the 0's and 1's for `l2b_quality_flag` mean?¶

In this section, learn how to extract and subset specific shots and plot Plant Area Volume Density (PAVD) using `holoviews`.¶

In section 4.2, import the PAVD metrics (`pavd_z`) and begin exploring how to plot them.¶

Below, open the `dz` layer in order to define the correct vertical step size.¶

And it looks like PAVD includes 30 "steps" in each shot, describing the PAVD at height = step # * `dz`.¶

Now, bring in other useful L2B datasets such as `elev_lowestmode`, `lat_lowestmode` and `lon_lowestmode`.¶

Below, plot each shot by using `holoviews` Path() function, with the PAVD plotted in the third dimension in shades of green.¶

5. Work with GEDI L2B Beam Transects ¶

Next, import a number of desired SDS layers for BEAM0110 (for the entire orbit) and create a `pandas` Dataframe to store the arrays.¶

Now that you have the desired SDS into a `pandas` dataframe, begin plotting the entire beam transect:¶

Below, remove any shots where the `l2b_quality_flag` is set to 0 by defining those shots as `nan`.¶

The syntax of the line below can be read as: in the dataframe, find the rows "where" the quality flag is not equal (ne) to 0. If a row (shot) does not meet the condition, set all values equal to `nan` for that row.¶

Below, quality filter even further by using the `degrade_flag` (Greater than zero if the shot occurs during a degrade period, zero otherwise) and the `Sensitivity` layer, using a threshold of 0.95.¶

Below, drop all of the shots that did not pass the quality filtering standards outlined above from the `transectDF`.¶

Next, plot the full remaining transect of high quality values using `holoviews` Scatter(). Combine the Tandem-X derived elevation, the GEDI-derived elevation, and the Canopy Top Elevation in a combined holoviews plot.¶

Below, subset the transect using `.loc`.¶

Below, plot each shot by using `holoviews` Path() function, with the PAVD plotted in the third dimension in shades of green.¶

7. Spatial Visualization ¶

Loop through each of the desired datasets (SDS) for each beam, append to lists, and transform into a `pandas` DataFrame.¶

Down to roughly 2000 shots, next create a `Shapely` Point out of each shot and insert it as the geometry column in the [soon to be geo]dataframe.¶

Now, using the `pointVisual` function defined in section 3.2, plot the `geopandas` GeoDataFrame using `geoviews`.¶

Last but certainly not least, `Plant Area Index`:¶

8. Export Subsets as GeoJSON Files ¶

In this section, export the GeoDataFrame as a `.geojson` file that can be easily opened in your favorite remote sensing and/or GIS software and will include an attribute table with all of the shots/values for each of the SDS layers in the dataframe.¶