Geospatial Data Abstraction Library (GDAL) is a translator library for
raster geospatial data formats. As the name implies, GDAL has an
abstraction layer that hides format-specific details, which means
there is only one GDAL API regardless of file format. For example,
one of the following - an
HDF4
SDS/Vdata/Vgroup or an
HDF5
dataset/group - is mapped to a
GDALDataset
object.
It also comes with a variety of useful command line utilities for data translation and processing. We'll explain how to convert NASA HDF4/5 products into GeoTIFF/KML.
The latest GDAL is version 3.1.0.
GDAL doesn't include HDF4, HDF5, and KML support by default. If your system doesn't have HDF4, HDF5, and KML library, please download and install them first.
The following build instruction assumes that all downloaded packages are unpacked with tar zxvf package.tar.gz
at the same directory.
/tmp
directory as a prefix.
cd hdf-4.2.15 && ./configure --prefix=/tmp --disable-netcdf && make && make install
git clone https://github.com/hdfeos/hdf4
and use cd hdf4 && ./configure --prefix=/tmp --disable-netcdf --disable-fortran --enable-hdf4-xdr && make && make install
.
cd ../hdf5-1.12.0 && ./configure --prefix=/tmp && make && make install
cd ../libkml-1.3.0 && ./configure --prefix=/tmp && make && make install
.
yum install proj-devel
../configure --prefix=/tmp CFLAGS='-Wno-long-long'
cd ../gdal-3.1.0 && ./configure --prefix=/tmp --with-hdf4=/tmp --with-hdf5=/tmp --with-libkml=/tmp && make && make install
.
/tmp/lib
and tools under /tmp/bin
.
GDAL website also provides some tips on building with HDF.
A GDALDataset
object represents one HDF4 or HDF5 object.
Although the term may cause confusion for HDF4 and HDF5 users,
it represents HDF4 Vgroups and HDF5 groups as well as HDF4 Vdata, HDF4 SDS, and HDF5 datasets.
Since GDAL is written in C++, information provided here will be explained in C++. This section explains how to call some important functions. In the example, we will use an HDF5 file converted from an HDF-EOS2 file from AE_RnGd data product of NSIDC. You can download the HDF5 file from here.
Although examples below show how to use GDAL to handle an HDF5 file, handling HDF4 files is very similar due to GDAL's abstraction layer.
The GDALOpen()
function opens a file as the following example shows.
The first argument specifies an HDF4 or HDF5 file, but it can
also represent a specific dataset. For instance, if the following
string is passed as the first argument, GDAL opens the HDF5
dataset RrLandRain
in the HDF5 group MonthlyRainTotal_GeoGrid/Data_Fields
in the HDF5 file
AMSR_E_L3_RainGrid_B05_200707.h5
. HDF5:\
specifies that this is an HDF5 file.
GetMetadata()
method as Section 3.3 explains.
GDALDataset
can have metadata as an HDF5 object can have
attributes. The GetMetadata()
method returns a list of metadata.
Unlike HDF4 and HDF5, GDAL categorizes metadata, and
GetMetadata()
takes one argument that specifies the domain. Since
all attributes in the file are stored in the default domain, an
empty string can be passed as the argument.
In GDAL, the list of child objects is merely one category of
metadata. GetMetadata()
, introduced in
Section 3.2, can be used
to retrieve the list of child objects. The only difference is
that the first argument should be SUBDATASETS
. This is a
predefined domain for child objects.
MonthlyRainTotal_GeoGrid/Data_Fields
group.
For this file, the above code returned the following.
As explained in the previous section,
GDALOpen()
can open a specific
object if the first argument indicates one dataset. A value whose
name is SUBDATASET_*_NAME
can be the first argument.
A GDALDataset
object corresponding to an HDF4 SDS or an HDF5
dataset contains one GDALRasterBand
object. The GetRasterBand()
method returns one GDALRasterBand
object.
After getting a GDALRasterBand
object, one can read or write
values by using the RasterIO()
method. The following is a routine
that reads all values in the dataset given by the poDataset
variable. After executing the following code, the buffer will
obtain the data stored in the HDF file.
You can download a C++ program that reads the AE_RnGD HDF-EOS2 file from here. This program is composed of the above fragments.
Additionally, we wrote a dumper program. This program recursively dumps all datasets in a file. Although this program is not very practical, this sample shows how users can call GDAL APIs. You can download this file from here.
We will focus on two tools - gdalinfo
and gdal_translate
.
Please go to GDAL utilities web page for the complete list of tools and their usages.
The gdalinfo
program lists various information about HDF4 and HDF5 files.
For example, run the following command on the sample
MOD17A2.A2007113.h11v09.005.2007136163924.hdf
file.
Please note that $
sign is a shell prompt and /tmp
is the GDAL installation prefix that we used in the
installation section.
You will see metadata information about the file and datasets available inside the file.
You can check the details about a specific dataset by copying and pasting the name of dataset
in the Subdatasets:
metadata information section from the previous command output.
For example, let's pick the second dataset as shown in Figure 9. We omit the UNIX shell prompt and path here.
The above command generates the following output.
Please note that the value for Driver
is different from Figure 8.
Please also note that some key HDF attributes are processed as GDAL's internal representation:
_FillValue
to NoData
add_offset
to Offset
scale_factor
to Scale
units
to Unit
long_name
to Description
However, valid_range
is not processed by GDAL.
You can check the range of values in the dataset using the -stats
option.
This will affect the GeoTIFF conversion which we will cover in the next section.
gdal_translate
can convert HDF into GeoTIFF (default) and KMZ.
GeoTIFF is 2D raster so 3D/4D datasets will be split into multiple bands.
If an HDF file has multiple objects, each object will become a separate GeoTIFF file with -sds
option.
For example, the sample MOD17A2 file has 3 datasets and the following command will create 3 different GeoTIFF files.
If you examine the output directory of the above command using ls -l MOD17A2*
,
you will see 3 new GeoTIFF files: MOD17A2_1.tif, MOD17A2_2.tif, and MOD17A2_3.tif.
Please note that the file size of MOD17A2_1.tif and MOD17A2_2.tif is twice bigger than the file size of MOD17A2_3.tif
because the data type of MOD17A2_3.tif is Byte while the data type of MOD17A2_1.tif and MOD17A_2.tif is Int16.
You can check the data type of each GeoTIFF file using gdalinfo
.
The converted GeoTIFF files can be viewed easily with standard TIFF image viewers that come with operating systems such as Photo Viewer on Windows and Preview on Mac. For example, you can visualize MOD17A2_3.tif file as shown in Figure 13.
However, the generic viewer such as Photo Viewer works well only with Byte type GeoTIFF file. If you open a Int16 type GeoTIFF file like MOD17A2_2.tif, you'll see a mostly black image as shown in Figure 14.
If you compare Figure 14. with the correct visualization image from our comprehensive example page or QGIS screenshot in Figure 15., you will realize that generic TIFF viewer can handle only Byte type GeoTIFF file.
gdal_translate
allows you to change the type of GeoTIFF file and scale down the values so that they can fit into a new type using -ot
and -scale
option. For example, we can change Int16 to Byte as follows:
Now the Windows Photo Viewer can easily visualize the scaled down MOD17A2_2.byte.tif file as shown in Figure 17.
In the above example (Figure 16.), we set [-298, 441] as source min/max values of the PsnNet_1km dataset for -scale
option. According to Figure 11., the minimum value is -298 but the maximum value is 32766 which is outside of the valid_range attribute values [-30000, 30000] as you can see from Figure 10. We obtained the value 441 by applying valid_range attribute information using the GDAL's Virtual Raster Format (VRT).
GDAL's VRT file is an XML representation of GDAL Data Model. For example, you can obtain VRT file of PsnNet_1km dataset using the following gdal_transform
command in Figure 18.
The output VRT file, MOD17A2_2.vrt, can be edited with any text or XML editor. Please modify the <SimpleSource>
XML tag to <ComplexSource>
and add a new <LUT>
tag at the end of VRT file as shown in Figure 19.
In the above example, the LUT tag's value maps the values between -32768 and -30001 to 32767 which is a fill value[1]. It also maps values between -30000 and 30000 to the same value as the original value. Finally, it maps value above 30001 to fill value (32767) again. Please save the edited VRT file as MOD17A2_2.lut.vrt. You can generate a new GeoTIFF file, MOD17A2_2.lut.tif, from the edited VRT file using gdal_translate
and compute stats again using gdalinfo
as shown in Figure 20.
The stats
option tells you that the maximum value of the converted GeoTIFF file is 441. You can also convert the corrected GeoTIFF file into KMZ (KML) using the following command in Figure 21.
You can easily visualize the converted KMZ file, MOD17A2_2.lut.kmz, with Google Earth as shown in Figure 22.
You cannot georeference all NASA products with GDAL. Please check the supported product list for HDF4 and HDF5. The below is the supported product list obtained from the GDAL website.
Although QGIS can visualize Int16 type GeoTIFF, it may not apply scale and offset correctly[2].
GDAL 3.1.0 improved the multi-dimensional (i.e., 3 or more dimensions) dataset support in the library. Therefore, you can try the new APIs and tools for NASA products that are not officially supported.
For Python, gdal-python
can handle multi-dimensional dataset properly[3,4].
There are 2 GDAL command line tools that use the new APIs - gdalmdiminfo
[5] and gdalmdimtranslate
[6].
gdalmdiminfo
can print all group information for both HDF4 and HDF5. The below is a sample output from MOP03TM-201802-L3V95.2.1.he5.
If you use the regular gdal_translate
, the above /HDFEOS/GRIDS/MOP03/Data Fields/RetrievedSurfaceTemperatureDay
dataset can't be translated into GeoTIFF properly. The output image will be rotated by 90 degrees. You can fix the problem using the new tool gdalmdimtranslate
as the following code illustrates: