GDAL

Geospatial Data Abstraction Library (GDAL) is a translator library for raster geospatial data formats. As the name implies, GDAL has an abstraction layer that hides format-specific details, which means there is only one GDAL API regardless of file format. For example, one of the following - an HDF4external SDS/Vdata/Vgroup or an HDF5external dataset/group - is mapped to a GDALDataset object.

It also comes with a variety of useful command line utilities for data translation and processing. We'll explain how to convert NASA HDF4/5 products into GeoTIFF/KML.

Download

The latest GDAL is version 3.1.0.

GDAL doesn't include HDF4, HDF5, and KML support by default. If your system doesn't have HDF4, HDF5, and KML library, please download and install them first.

Installation

The following build instruction assumes that all downloaded packages are unpacked with tar zxvf package.tar.gz at the same directory.

We will use /tmp directory as a prefix.
  1. Build HDF4 library with cd hdf-4.2.15 && ./configure --prefix=/tmp --disable-netcdf && make && make install
    • If you use clang 12 on Mac OS X Catalina or Big Sur, please use git clone https://github.com/hdfeos/hdf4 and use cd hdf4 && ./configure --prefix=/tmp --disable-netcdf --disable-fortran --enable-hdf4-xdr && make && make install.
  2. Build HDF5 library with cd ../hdf5-1.12.0 && ./configure --prefix=/tmp && make && make install
  3. Build libkml library with cd ../libkml-1.3.0 && ./configure --prefix=/tmp && make && make install.
    • Make sure that the latest PROJ library is installed. On Linux, you can use yum install proj-devel.
    • If compiler throws an error due to long long type in Boost library during libkml compilation, use ./configure --prefix=/tmp CFLAGS='-Wno-long-long'
  4. Build GDAL with cd ../gdal-3.1.0 && ./configure --prefix=/tmp --with-hdf4=/tmp --with-hdf5=/tmp --with-libkml=/tmp && make && make install.
If installation is successful, you will see a library under /tmp/lib and tools under /tmp/bin. GDAL website also provides some tips on building with HDF.

Usage

Library

A GDALDataset object represents one HDF4 or HDF5 object. Although the term may cause confusion for HDF4 and HDF5 users, it represents HDF4 Vgroups and HDF5 groups as well as HDF4 Vdata, HDF4 SDS, and HDF5 datasets.

Since GDAL is written in C++, information provided here will be explained in C++. This section explains how to call some important functions. In the example, we will use an HDF5 file converted from an HDF-EOS2 file from AE_RnGdexternal data product of NSIDC. You can download the HDF5 file from here.

Although examples below show how to use GDAL to handle an HDF5 file, handling HDF4 files is very similar due to GDAL's abstraction layer.

Open a File or an Object

The GDALOpen() function opens a file as the following example shows.

Figure 1. Code opening a file
GDALDataset *ds = (GDALDataset *)GDALOpen("AMSR_E_L3_RainGrid_B05_200707.h5", GA_ReadOnly);

The first argument specifies an HDF4 or HDF5 file, but it can also represent a specific dataset. For instance, if the following string is passed as the first argument, GDAL opens the HDF5 dataset RrLandRain in the HDF5 group MonthlyRainTotal_GeoGrid/Data_Fields in the HDF5 file AMSR_E_L3_RainGrid_B05_200707.h5. HDF5:\ specifies that this is an HDF5 file.

Figure 2. A string specifying an object
"HDF5:\"AMSR_E_L3_RainGrid_B05_200707.h5\"://MonthlyRainTotal_GeoGrid/Data_Fields/RrLandRain"
The above string can be fetched by calling the GetMetadata() method as Section 3.3 explains.

Retrieve Attributes

GDALDataset can have metadata as an HDF5 object can have attributes. The GetMetadata() method returns a list of metadata. Unlike HDF4 and HDF5, GDAL categorizes metadata, and GetMetadata() takes one argument that specifies the domain. Since all attributes in the file are stored in the default domain, an empty string can be passed as the argument.

Figure 3. Code retrieving attributes
char **metadata = poDataset->GetMetadata("");
Each string is formatted as Name=Value and null-terminated. If an attribute is not of string type, values are converted into a string.

Open Child Objects

In GDAL, the list of child objects is merely one category of metadata. GetMetadata(), introduced in Section 3.2, can be used to retrieve the list of child objects. The only difference is that the first argument should be SUBDATASETS. This is a predefined domain for child objects.

Figure 4. Code retrieving child objects
char **metadata = poDataset->GetMetadata("SUBDATASETS");
The HDF-EOS2 file we used has two datasets in the MonthlyRainTotal_GeoGrid/Data_Fields group. For this file, the above code returned the following.

Figure 5. SUBDATASETS
SUBDATASET_1_NAME=HDF5:"AMSR_E_L3_RainGrid_B05_200707.nc"://MonthlyRainTotal_GeoGrid/Data_Fields/RrLandRain
SUBDATASET_1_DESC=[28x72] //MonthlyRainTotal_GeoGrid/Data_Fields/RrLandRain (32-bit floating-point)
SUBDATASET_2_NAME=HDF5:"AMSR_E_L3_RainGrid_B05_200707.nc"://MonthlyRainTotal_GeoGrid/Data_Fields/TbOceanRain
SUBDATASET_2_DESC=[28x72] //MonthlyRainTotal_GeoGrid/Data_Fields/TbOceanRain (32-bit floating-point)

As explained in the previous section, GDALOpen() can open a specific object if the first argument indicates one dataset. A value whose name is SUBDATASET_*_NAME can be the first argument.

Read Data

A GDALDataset object corresponding to an HDF4 SDS or an HDF5 dataset contains one GDALRasterBand object. The GetRasterBand() method returns one GDALRasterBand object.

After getting a GDALRasterBand object, one can read or write values by using the RasterIO() method. The following is a routine that reads all values in the dataset given by the poDataset variable. After executing the following code, the buffer will obtain the data stored in the HDF file.

Figure 6. Code reading data
int xsize = poDataset->GetRasterXSize();
int ysize = poDataset->GetRasterYSize();
GDALRasterBand *rb = poDataset->GetRasterBand(1);
float *buffer = new float[xsize * ysize];
rb->RasterIO(GF_Read, 0, 0, xsize, ysize, buffer, xsize, ysize, GDT_Float32, 0, 0);

You can download a C++ program that reads the AE_RnGD HDF-EOS2 file from here. This program is composed of the above fragments.

Additionally, we wrote a dumper program. This program recursively dumps all datasets in a file. Although this program is not very practical, this sample shows how users can call GDAL APIs. You can download this file from here.

Tool

We will focus on two tools - gdalinfo and gdal_translate. Please go to GDAL utilities web page for the complete list of tools and their usages.

gdalinfo

The gdalinfo program lists various information about HDF4 and HDF5 files. For example, run the following command on the sample MOD17A2.A2007113.h11v09.005.2007136163924.hdf file. Please note that $ sign is a shell prompt and /tmp is the GDAL installation prefix that we used in the installation section.

Figure 7. gdalinfo usage
$/tmp/bin/gdalinfo MOD17A2.A2007113.h11v09.005.2007136163924.hdf

You will see metadata information about the file and datasets available inside the file.

Figure 8. gdalinfo output of MOD17A2 product
Driver: HDF4/Hierarchical Data Format Release 4
Files: MOD17A2.A2007113.h11v09.005.2007136163924.hdf
Size is 512, 512
Coordinate System is `'
Metadata:
ALGORITHMPACKAGEACCEPTANCEDATE=2005-02-11
ALGORITHMPACKAGEMATURITYCODE=Normal

...

Subdatasets:
SUBDATASET_1_NAME=HDF4_EOS:EOS_GRID:"MOD17A2.A2007113.h11v09.005.2007136163924.hdf":MOD_Grid_MOD17A2:Gpp_1km
SUBDATASET_1_DESC=[1200x1200] Gpp_1km MOD_Grid_MOD17A2 (16-bit integer)
SUBDATASET_2_NAME=HDF4_EOS:EOS_GRID:"MOD17A2.A2007113.h11v09.005.2007136163924.hdf":MOD_Grid_MOD17A2:PsnNet_1km
SUBDATASET_2_DESC=[1200x1200] PsnNet_1km MOD_Grid_MOD17A2 (16-bit integer)
SUBDATASET_3_NAME=HDF4_EOS:EOS_GRID:"MOD17A2.A2007113.h11v09.005.2007136163924.hdf":MOD_Grid_MOD17A2:Psn_QC_1km
SUBDATASET_3_DESC=[1200x1200] Psn_QC_1km MOD_Grid_MOD17A2 (8-bit unsigned integer)
Corner Coordinates:
Upper Left ( 0.0, 0.0)
Lower Left ( 0.0, 512.0)
Upper Right ( 512.0, 0.0)
Lower Right ( 512.0, 512.0)
Center ( 256.0, 256.0)

You can check the details about a specific dataset by copying and pasting the name of dataset in the Subdatasets: metadata information section from the previous command output. For example, let's pick the second dataset as shown in Figure 9. We omit the UNIX shell prompt and path here.

Figure 9. gdalinfo usage for an HDF dataset in MOD17A2 product
gdalinfo HDF4_EOS:EOS_GRID:"MOD17A2.A2007113.h11v09.005.2007136163924.hdf":MOD_Grid_MOD17A2:PsnNet_1km

The above command generates the following output.

Figure 10. gdalinfo output for an HDF dataset in MOD17A2 product
Driver: HDF4Image/HDF4 Dataset
Files: MOD17A2.A2007113.h11v09.005.2007136163924.hdf
Size is 1200, 1200
Coordinate System is:

...

PROJECTION["Sinusoidal"],

...

Metadata:
_FillValue=32767
add_offset=0

...

long_name=MODIS/Terra Net Photosynthesis (GPP - maint resp) 8-Day L4 Global 1km SIN Grid

...

scale_factor=0.0001

...

units=kg_C_/m^2
valid_range=-30000, 30000

...

Corner Coordinates:
Upper Left (-7783653.638, 0.000) ( 70d 0' 0.00"W, 0d 0' 0.01"N)
Lower Left (-7783653.638,-1111950.520) ( 71d 4'47.51"W, 10d 0' 0.00"S)
Upper Right (-6671703.118, 0.000) ( 60d 0' 0.00"W, 0d 0' 0.01"N)
Lower Right (-6671703.118,-1111950.520) ( 60d55'32.15"W, 10d 0' 0.00"S)
Center (-7227678.378, -555975.260) ( 65d14'53.84"W, 5d 0' 0.00"S)
Band 1 Block=1200x400 Type=Int16, ColorInterp=Gray
Description = MODIS/Terra Net Photosynthesis (GPP - maint resp) 8-Day L4 Global 1km SIN Grid
NoData Value=32767
Unit Type: kg_C_/m^2
Offset: 0, Scale:0.0001

Please note that the value for Driver is different from Figure 8. Please also note that some key HDF attributes are processed as GDAL's internal representation:

However, valid_range is not processed by GDAL. You can check the range of values in the dataset using the -stats option.

Figure 11. gdalinfo output with -stats option for an HDF dataset in MOD17A2 product
gdalinfo -stats HDF4_EOS:EOS_GRID:"MOD17A2.A2007113.h11v09.005.2007136163924.hdf":MOD_Grid_MOD17A2:PsnNet_1km
Driver: HDF4Image/HDF4 Dataset
...
Metadata:
STATISTICS_MAXIMUM=32766
STATISTICS_MEAN=589.65777083333
STATISTICS_MINIMUM=-298
STATISTICS_STDDEV=3207.487944228

This will affect the GeoTIFF conversion which we will cover in the next section.

gdal_translate

gdal_translate can convert HDF into GeoTIFF (default) and KMZ. GeoTIFF is 2D raster so 3D/4D datasets will be split into multiple bands. If an HDF file has multiple objects, each object will become a separate GeoTIFF file with -sds option. For example, the sample MOD17A2 file has 3 datasets and the following command will create 3 different GeoTIFF files.

Figure 12. gdal_translate example for HDF to GeoTIFF conversion
gdal_translate -sds MOD17A2.A2007113.h11v09.005.2007136163924.hdf MOD17A2.tif
Input file size is 1200, 1200
0...10...20...30...40...50...60...70...80...90...100 - done.
Input file size is 1200, 1200
0...10...20...30...40...50...60...70...80...90...100 - done.
Input file size is 1200, 1200
0...10...20...30...40...50...60...70...80...90...100 - done.

If you examine the output directory of the above command using ls -l MOD17A2*, you will see 3 new GeoTIFF files: MOD17A2_1.tif, MOD17A2_2.tif, and MOD17A2_3.tif. Please note that the file size of MOD17A2_1.tif and MOD17A2_2.tif is twice bigger than the file size of MOD17A2_3.tif because the data type of MOD17A2_3.tif is Byte while the data type of MOD17A2_1.tif and MOD17A_2.tif is Int16. You can check the data type of each GeoTIFF file using gdalinfo.

The converted GeoTIFF files can be viewed easily with standard TIFF image viewers that come with operating systems such as Photo Viewer on Windows and Preview on Mac. For example, you can visualize MOD17A2_3.tif file as shown in Figure 13.

However, the generic viewer such as Photo Viewer works well only with Byte type GeoTIFF file. If you open a Int16 type GeoTIFF file like MOD17A2_2.tif, you'll see a mostly black image as shown in Figure 14.

If you compare Figure 14. with the correct visualization image from our comprehensive example page or QGIS screenshot in Figure 15., you will realize that generic TIFF viewer can handle only Byte type GeoTIFF file.

gdal_translate allows you to change the type of GeoTIFF file and scale down the values so that they can fit into a new type using -ot and -scale option. For example, we can change Int16 to Byte as follows:

Figure 16. gdal_translate example for changing type
# Select the PsnNet_1km grid and convert it into GeoTIFF with Byte type.
gdal_translate -ot Byte -scale -298 441 0 255 HDF4_EOS:EOS_GRID:"MOD17A2.A2007113.h11v09.005.2007136163924.hdf":MOD_Grid_MOD17A2:PsnNet_1km MOD17A2_2.byte.tif

# Alternatively, you can change the type of GeoTIFF file that is already converted from Figure 12.
gdal_translate -ot Byte -scale -298 441 0 255 MDO17A2_2.tif MOD17A2_2.byte.tif

Now the Windows Photo Viewer can easily visualize the scaled down MOD17A2_2.byte.tif file as shown in Figure 17.

In the above example (Figure 16.), we set [-298, 441] as source min/max values of the PsnNet_1km dataset for -scale option. According to Figure 11., the minimum value is -298 but the maximum value is 32766 which is outside of the valid_range attribute values [-30000, 30000] as you can see from Figure 10. We obtained the value 441 by applying valid_range attribute information using the GDAL's Virtual Raster Format (VRT).

GDAL's VRT file is an XML representation of GDAL Data Model. For example, you can obtain VRT file of PsnNet_1km dataset using the following gdal_transform command in Figure 18.

Figure 18. gdal_translate example for generating VRT
gdal_translate -of vrt HDF4_EOS:EOS_GRID:"MOD17A2.A2007113.h11v09.005.2007136163924.hdf":MOD_Grid_MOD17A2:PsnNet_1km MOD17A2_2.vrt

The output VRT file, MOD17A2_2.vrt, can be edited with any text or XML editor. Please modify the <SimpleSource> XML tag to <ComplexSource> and add a new <LUT> tag at the end of VRT file as shown in Figure 19.

Figure 19. Edited VRT codes
<ComplexSource>
...
<LUT>-32768:32767,-30001:32767,-30000:-30000,30000:30000,30001:32767</LUT>
</ComplexSource>

In the above example, the LUT tag's value maps the values between -32768 and -30001 to 32767 which is a fill value[1]. It also maps values between -30000 and 30000 to the same value as the original value. Finally, it maps value above 30001 to fill value (32767) again. Please save the edited VRT file as MOD17A2_2.lut.vrt. You can generate a new GeoTIFF file, MOD17A2_2.lut.tif, from the edited VRT file using gdal_translate and compute stats again using gdalinfo as shown in Figure 20.

Figure 20. Creating a new GeoTIFF from VRT
gdal_translate MOD17A2_2.lut.vrt MOD17A2_2.lut.tif
Input file size is 1200, 1200
0...10...20...30...40...50...60...70...80...90...100 - done.

gdalinfo -stats MOD17A2_2.lut.tif
Driver: GTiff/GeoTIFF
Files: MOD17A2_2.lut.tif
Size is 1200, 1200
Coordinate System is:
...
Offset: 0, Scale:0.0001
Metadata:
STATISTICS_MAXIMUM=441
STATISTICS_MEAN=270.11561044718
STATISTICS_MINIMUM=-298
STATISTICS_STDDEV=79.789989148942

The stats option tells you that the maximum value of the converted GeoTIFF file is 441. You can also convert the corrected GeoTIFF file into KMZ (KML) using the following command in Figure 21.

Figure 21. Creating a KMZ from a GeoTIFF
gdal_translate -of KMLSUPEROVERLAY MOD17A2_2.lut.tif MOD17A2_2.lut.kmz
Input file size is 1200, 1200
0...10...20...30...40...50...60...70...80...90...100 - done.

You can easily visualize the converted KMZ file, MOD17A2_2.lut.kmz, with Google Earth as shown in Figure 22.

Limitation

You cannot georeference all NASA products with GDAL. Please check the supported product list for HDF4 and HDF5. The below is the supported product list obtained from the GDAL website.

HDF4
HDF5

Although QGIS can visualize Int16 type GeoTIFF, it may not apply scale and offset correctly[2].

Multi-dimensional Dataset Support in GDAL 3.1.0

GDAL 3.1.0 improved the multi-dimensional (i.e., 3 or more dimensions) dataset support in the library. Therefore, you can try the new APIs and tools for NASA products that are not officially supported.

For Python, gdal-python can handle multi-dimensional dataset properly[3,4].

There are 2 GDAL command line tools that use the new APIs - gdalmdiminfo[5] and gdalmdimtranslate[6].

gdalmdiminfo can print all group information for both HDF4 and HDF5. The below is a sample output from MOP03TM-201802-L3V95.2.1.he5.

Figure 23. gdalmdiminfo output of MOP03TM product
{
"type": "group",
"driver": "HDF5",
"name": "/",
"groups": {
"HDFEOS": {
"groups": {
"ADDITIONAL": {
"groups": {
"FILE_ATTRIBUTES": {
"attributes": {
"title": "MOPITT Level 3 Monthly File",
"institution": "MOPITT at ACOM of NCAR",
"StartTime": 791596823.325000048,
"StopTime": 794016008.597999454,
"FillValue": -9999
}
}
}
},
"GRIDS": {
"groups": {
"MOP03": {
...
"RetrievedSurfaceTemperatureDay": {
"datatype": "Float32",
"dimensions": [
"/HDFEOS/GRIDS/MOP03/XDim",
"/HDFEOS/GRIDS/MOP03/YDim"
],
"attributes": {
"long_name": "Retrieved Surface Temperature Day"
},
"unit": "K",
"nodata_value": -9999
},
...

If you use the regular gdal_translate, the above /HDFEOS/GRIDS/MOP03/Data Fields/RetrievedSurfaceTemperatureDay dataset can't be translated into GeoTIFF properly. The output image will be rotated by 90 degrees. You can fix the problem using the new tool gdalmdimtranslate as the following code illustrates:

Figure 24. Creating a GeoTIFF from MOP03TM HDF-EOS5
gdalmdimtranslate -array "name=/HDFEOS/GRIDS/MOP03/Data Fields/RetrievedSurfaceTemperatureDay,transpose=[1,0]" MOP03TM-201802-L3V95.2.1.he5 test.tif
0...10...20...30...40...50...60...70...80...90...100 - done.
ArcGIS Pro can import the output GeoTIFF file and visualize it correctly.
You can compare the above screenshot with our Comprehensive Examples python output.

See Also

References

  1. GIS StackExchange. http://gis.stackexchange.com/questions/89366/set-a-range-of-values-as-nodata-in-a-vrt
  2. QGIS GitHub. https://github.com/qgis/QGIS/pull/1252
  3. hdf4multidim.py
  4. hdf5multidim.py
  5. gdalmdiminfo
  6. gdalmdimtranslate

Last modified: 12/30/2020
About Us | Contact Info | Archive Info | Disclaimer
Sponsored by Subcontract number 4400528183 under Raytheon Contract number NNG15HZ39C, funded by NASA / Maintained by The HDF Group