R

R is a free software environment for statistical computing and graphics. It has a rich set of packages that can access NASA HDF products. To access HDF, you can use 3 different R packages.

Download

You can download the latest R from here. You will need the following ncdf4 R source package to build with the custom libraries that enhance the access of NASA HDF products.

rgdal and hdf5r package will work without source customization so you don't have to download R source packages.

Installation

Package installation is case-sensitive.

RNetCDF

The default installation of RNetCDF doesn't support reading HDF4. For example, if you install R using apt-get install r-base, Ubuntu installs libnetcdf19 package. You must remove the libnetcdf19 package first. Installing a custom netCDF with HDF4 support is not simple. Thus, we provide a complete GitHub Action workflow file. Please follow the steps in the workflow.

ncdf4

The default installation of ncdf4 doesn't support reading HDF4 because netCDF library is not built with HDF4 support. For example, if you open an HDF4 file, you will get the following error message.

Error in R_nc4_open: NetCDF: Attempt to use feature that was not turned on when netCDF was built.

In such case, you can build netCDF library with --enable-hdf4 configuration from source and build ncdf4 R package from source. If --enable-hdf4 configuration option fails when you configure netCDF-C library, try to set the following environment variable options assuming that both HDF4 and JPEG libraries are installed under /usr/local.

$export CFLAGS='-L/usr/local/include' $export LDFLAGS='-L/usr/local/lib -ljpeg' Once netCDF library that can support HDF4 is installed, you can build ncdf4 R package from source. >install.packages("ncdf4_1.20.tar.gz", repos = NULL, type="source") The above command may not work on Mac OS X with the following error message. ** testing if installed package can be loaded Error in dyn.load(file, DLLpath = DLLpath, ...) : unable to load shared object '/Library/Frameworks/R.framework/Versions/3.3/Resources/library/ncdf4/libs/ncdf4.so': dlopen(/Library/Frameworks/R.framework/Versions/3.3/Resources/library/ncdf4/libs/ncdf4.so, 6): Symbol not found: _nc_close Referenced from: /Library/Frameworks/R.framework/Versions/3.3/Resources/library/ncdf4/libs/ncdf4.so Expected in: flat namespace in /Library/Frameworks/R.framework/Versions/3.3/Resources/library/ncdf4/libs/ncdf4.so Error: loading failed For such case, please decompress the ncdf4 R package source and edit PKG_LIBS line in ncdf4/src/Makevars.in as follows. ## PKG_LIBS=@NETCDF_LDFLAGS@ PKG_LIBS=-L/usr/local/lib -lnetcdf -ljpeg -lmfhdf -ldf -lhdf5_hl -lhdf5 -ldl -lm -lz -lcurl Then, build the package again and install it. $ R CMD build ncdf4 $ R CMD INSTALL ncdf4_1.20.tar.gz
rgdal

To maximize the usability of NASA HDF products, we recommend you to use a special GDAL library called GEE from NASA. Please read carefully how to build GDAL with HDF4 support. Once you installed the GEE, build and install rgdal from source package by issuing the following command at the R prompt.

>install.packages('rgdal', type='source')

hdf5r

At the R prompt, run the following command to install binary package.

>install.packages('hdf5r', repos = "http://cran.us.r-project.org")

On Unix systems, hdf5r package may be installed from source after compilation. hdf5r package is based on HDF5 C++ bindings. Thus, make sure that your HDF5 library is built with --enable-cxx configuration option.

Usage

RNetCDF

We provide a complete sample R code example and plot on GitHub.

ncdf4

To use ncdf4, the first step is to include the package.

>library(ncdf4)

To open a NASA HDF file, use nc_open() call with path to the file name.

>nc <- nc_open('AIRS.2003.02.05.L3.RetStd_H001.v6.0.12.0.G14112124328.hdf')

To list available datasets in HDF file, type the assigned variable name.

>nc File AIRS.2003.02.05.L3.RetStd_H001.v6.0.12.0.G14112124328.hdf (NC_FORMAT_NETCDF4): 770 variables (excluding dimension variables): short TotalCounts_A[XDim:ascending,YDim:ascending] (Contiguous storage) _FillValue: 0 float SurfPres_Forecast_A[XDim:ascending,YDim:ascending] (Contiguous storage) ...

To retrieve data from dataset, use ncvar_get().

>v1 <- nc$var[['Temperature_MW_A']] >z_all <- ncvar_get(nc, v1)

If you access HDF4 dataset, it is necessary to change endianness using readBin() because ncdf4 doesn't do it for you. You don't need the following conversion for HDF5 dataset.

>zv <- as.vector(as.single(z_all)) >zz <- file("tmpbin", "wb") >writeBin(zv, zz) >close(zz) >zz2 <- file("tmpbin", "rb") >zs <- readBin(zz2, numeric(), size=4, length(zv), endian="little") >close(zz2) >dim(zs) <- dim(z_all)

To retrieve the value of attribute of a dataset, use ncatt_get().

>fillvalue <- ncatt_get(nc, v1, "_FillValue")
rgdal

To use rgdal, the first step is to include the package and other geospatial helper packages.

>library(rgdal) >library(gdalUtils) >library(raster)

To open a NASA HDF file, use get_subdatasets() with path to the file name.

>sds <- get_subdatasets('NASAHDF/FLASH_TISA_Terra+Aqua_Version3A_113011.20140513.hdf')

To list available datasets in HDF file, type the assigned variable name.

>sds Geospatial Data Abstraction Library extensions to R successfully loaded Loaded GDAL runtime: GDAL 2.2.0dev, released 2016/99/99 Path to GDAL shared files: /opt/hdfeos/share/gdal Loaded PROJ.4 runtime: Rel. 4.8.0, 6 March 2012, [PJ_VERSION: 480] Path to PROJ.4 shared files: (autodetected) Linking to sp version: 1.2-3 [1] "HDF4_SDS:UNKNOWN:/home/hyoklee/FLASH_TISA_Terra+Aqua_Version3A_113011.20140513.hdf:0" [2] "HDF4_SDS:UNKNOWN:/home/hyoklee/FLASH_TISA_Terra+Aqua_Version3A_113011.20140513.hdf:1" ...
In the above output, the version number and release date of GDAL runtime look strange because we use GEE from NASA.

To retrieve data from dataset, use readGDAL().

>d5 <- readGDAL(sds[6], options=c("RASTERXDIM=4","RASTERYDIM=3","RASTERBDIM=2","RASTER4DIM=1","RASTER5DIM=0"))

The RASTERXDIM, ..., RASTER4DIM options allow you to access 5-dimensional dataset and they are available only in GEE. If you use regular GDAL, you cannot access the dataset correctly.

Once data is retrieved correctly using GEE, you can assign it as a raster image and manipulate data for visualization and analysis.

>r <- raster(d5)
hdf5r

To use R hdf5r package, the first step is to include the package.

>library(hdf5r)

To open a NASA HDF file, use h5file() with path to the file name.

>file <- h5file('NASAHDF/SMAP_L3_SM_P_20151012_R11920_001.h5', 'r')

To list available datasets in HDF5 file, type the assigned variable name.

>list.datasets(file, recursive=TRUE)

To retrieve data from dataset in a group, use readDataSet().

>dset <- file[['/Soil_Moisture_Retrieval_Data/soil_moisture']] >vals <- readDataSet(dset)

To retrieve the value of attribute of a dataset, use h5attr().

>fv <- h5attr(dset, "_FillValue")
Visualization

R provides rich graphics packages for data visualization. There are three common feature types in HDF-EOS - grid, swath, and point. We will provide a few examples of visualizing them.

To plot grid data like figure below, you can follow the AIRS L3 comprehensive example.

To plot swath data like figure below, you can follow the AIRS L2 comprehensive example.

To plot point data using symbols like figure below, you can follow the OCO2 comprehensive example.

Conversion

To export the contents of a data frame (e.g., vals in hdf5r example) to a CSV file, you can use the following code.

>filename <- "temp.csv" >write.table(vals, file = filename, sep = ",", row.names = FALSE)

See Also

  1. Comprehensive Examples
  2. GDAL
  3. Apache Spark


Last modified: 01/06/2023
About Us | Contact Info | Archive Info | Disclaimer
Sponsored by Subcontract number 4400528183 under Raytheon Contract number NNG15HZ39C, funded by NASA / Maintained by The HDF Group