PyHDF

PyHDF is a Python interface to the HDF4 external library. It covers most HDF4 APIs of Scientific Data Set, Vdata, and Vgroup.

Download

The latest version is pyhdf-0.10.5. It is last updated on May 8, 2022. The following 0.9.0 packages are provided for archiving-purpose only.

Platform Package Checksum
Windows pyhdf-0.9.0.win32-py2.7.msi SHA256
Unix pyhdf-0.9.0.tar.gz SHA256

Installation

The easiest way to install pyhdf will be using miniconda 32-bit. Try conda install pyhdf first. Alternatviely, try pip install pyhdf for version 0.10.x or pip install hdf4-python for version 0.9.x. If both conda and pip don't work, you can build pyhdf from source by following this guide. This installation guide is for Python 2.7. We assume that you are already familiar with Python installation and package managers such as easy_install, pip, and conda.

If you're a Docker user, please try our Docker images of Anacondaexternal through Docker Hub that include all the required Python modules (e.g., basemap) to visualize HDF-EOS data.

Windows
The following installation guide works on Windows platform only.
  1. Download Anaconda 32-bit Python and install it at the default location C:\Anaconda.
  2. Click Start > Anaconda Command Prompt
  3. Type conda install numpy at the C:\Anaconda>. Type y for any update. Type exit to close the Anaconda command shell.
  4. Run pyhdf-0.9.0.win32-py2.7.msi.
  5. If you have installed several Pythons already, please make sure that the package is installed under Anaconda as shown in Figure 1.
Unix
The following installation guide works on Mac and Linux platforms only. We assume that your UNIX shell is bash.
  1. Install the latest HDF4 library.
  2. Install numpy library. For example, $easy_install numpy
  3. Unpack the downloaded source. Change directory to the unpacked source directory. (e.g., $tar zxvf pyhdf-0.9.0.tar.gz && cd pyhdf-0.9.0)
  4. Set INCLUDE_DIRS environment variable. For example, if HDF4 library is installed under /usr/local, run $export INCLUDE_DIRS=/usr/local/include.
  5. Set LIBRARY_DIRS environment variable. For example, if HDF4 library is installed under /usr/local, run $export LIBRARY_DIRS=/usr/local/lib.
  6. Run $python setup.py install.
  7. If the last step fails under virtualenv, please run $python setup.py bdist_egg and run $easy_install dist/pyhdf-0.9.0-py2.7-linux-x86_64.egg

Build on Windows

Building from source is tricky on Windows so we explain it here.

  1. Download both shared and static HDF4 binary distribution. Unzip and install them using the installers that have .exe extensions.
  2. Click Start > Anaconda Command Prompt
  3. Type C:\Anaconda>conda install numpy.
  4. Type C:\Anaconda>conda install setuptools.
  5. Download libmsvcr90d.a and put it under C:\Anaconda\libs.
  6. Unpack the pyhdf-0.9.0.zip (checksum:SHA256) source and change the directory to pyhdf-0.9.0.
  7. Downlaod run.bat script and type C:\pyhdf-0.9.0>run.
  8. The above script will build the installer and execute the installer at the end.

Usage

PyHDF is similar to HDF4external C API in that most functions have similar names and functionality. Although most function names are the same as or similar to corresponding C APIs, they are categorized into a few classes. For example, the SD API is divided into five Python classes, including SD, SDS, SDim and SDAttr.

How to read and visualize

If you have installed PyHDF successfully, you can read and visualize NASA HDF4 products. First, please make sure that you have installed basemap, matplotlib, and numpy modules and import them before pyhdf as shown in Figure 2. For example, if Python fails to load basemap module, you can install one using conda install basemap.

Figure 2. Python code for importing PyHDF interface and other packages for visualization
import os
import matplotlib as mpl
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import numpy as np
from pyhdf.SD import SD, SDC

Next, open the sample NASA AIRS HDF-EOS2 file, AIRS.2002.08.01.L3.RetStd_H031.v4.0.21.0.G06104133732.hdf, and read datasets as shown in Figure 3. PyHDF supports HDF4 Vgroup, SDS, and Vdata interfaces but we focus on SDS because many NASA datasets are stored in SDS.

Figure 3. Python code for opening file and reading datasets
# Open file.
FILE_NAME = 'AIRS.2002.08.01.L3.RetStd_H031.v4.0.21.0.G06104133732.hdf'
hdf = SD(FILE_NAME, SDC.READ)

# List available SDS datasets.
print hdf.datasets()

# Read dataset.
DATAFIELD_NAME='RelHumid_A'
data3D = hdf.select(DATAFIELD_NAME)
data = data3D[11,:,:]

# Read geolocation dataset.
lat = hdf.select('Latitude')
latitude = lat[:,:]
lon = hdf.select('Longitude')
longitude = lon[:,:]

Finally, plot the data on map using the functions in basemap and matplotlib packages as shown in Figure 4.

Figure 4. Python code for visualizing data on map
m = Basemap(projection='cyl', resolution='l', llcrnrlat=-90, urcrnrlat = 90, llcrnrlon=-180, urcrnrlon = 180)
m.drawcoastlines(linewidth=0.5)
m.drawparallels(np.arange(-90., 120., 30.), labels=[1, 0, 0, 0])
m.drawmeridians(np.arange(-180., 181., 45.), labels=[0, 0, 0, 1])
x, y = m(longitude, latitude)
m.pcolormesh(x, y, data)

The complete code is here. Use right mouse button and select Save Link As to download the code. If you execute the code (e.g., python AIRS.py) on the directory where the sample file exists, you will get the image as shown in Figure 5.

How to write

You can also write an HDF4 file. Figure 6 shows part of a program that creates an HDF4 file with an SDS dataset.

Figure 6. Python code creating an HDF4 SDS with PyHDF interface
from pyhdf.SD import *
# import Numeric Python package -- Numpy
from numpy import *

data = array(((1, 2, 3),
(4, 5, 6)), int16)

# Create an HDF file
sd = SD("hello.hdf", SDC.WRITE | SDC.CREATE)

# Create a dataset
sds = sd.create("sds1", SDC.INT16, (2, 3))

# Fill the dataset with a fill value
sds.setfillvalue(0)

# Set dimension names
dim1 = sds.dim(0)
dim1.setname("row")
dim2 = sds.dim(1)
dim2.setname("col")

# Assign an attribute to the dataset
sds.units = "miles"

# Write data
sds[:] = data

# Close the dataset
sds.endaccess()

# Flush and close the HDF file
sd.end()

The code in Figure 6 creates an HDF4 file and an SDS object in it. This code is straightforward to those who are familiar with HDF4external. As Table 1 shows, many PyHDF interfaces are equivalent to HDF4 C interfaces.

PyHDF API Equivalent HDF4 C API
SD (constructor) SDstart
SD.create SDcreate
SDS.setfillvalue SDsetfillvalue
SDS.dim SDgetdimid
SDim.setname SDsetdimname
SDS.endaccess SDendaccess
SD.end SDend
Table 1 PyHDF API and equivalent HDF4 C API

The statement starting with sd = SD() creates an SD instance, and it is equivalent to the SDstart() function. The SD class implements functions applied to a file such as creating a file and a global attribute. The SD interface identifier that the SDstart() API returns does not exist because the SD class of PyHDF encapsulates the data and possible operations.

The statement starting with sds.units sets an attribute to the specific SDS object. This is equivalent to the SDsetattr() C function. The next statement, sds[:] = data, writes the actual values to the file as SDwritedata() does.

Both V API and VS API are divided into a few classes and are encapsulated like SD API. This eliminates the use of an identifier, and may improve the readability.

References


Last modified: 09/09/2022
About Us | Contact Info | Archive Info | Disclaimer
Sponsored by Subcontract number 4400528183 under Raytheon Contract number NNG15HZ39C, funded by NASA / Maintained by The HDF Group