(Source: Ullman, R.E. (July, 2001), Status and Plans for HDF-EOS, NASA's format for EOS Standard Products. Retrieved May 2, 2003 from the HDF-EOS Tools and Information Web Site: http:/hdfeos.gsfc.nasa.gov/hdfeos/HDFEOS_status/HDFEOSStatus.htm)
Why HDF5?
As science computing systems evolved, it became clear to NCSA's HDF group that HDF4 would have difficulty evolving to meet the demands of these systems. The future of Earth Observing systems is likely to include parallel processing environments, very large data sets, data spanning multiple computing environments, new data models, and complex data analysis and visualization capabilities requiring industry standard interfaces. But, HDF4 supports only datasets smaller than 2 gigabytes, with fewer than 20,000 datasets in any one file, and is not capable of efficiently performing I/O in parallel computing environments. Size and complexity are an issue. HDF4 library consists of over 300,000 lines of mature, heritage code that represents a variety of disparate scientific data models. The lack of underlying commonality in the implementation of these models contributes to the complexity of the code. This conceptual complexity in turn makes it difficult to adapt the library to modern high performance computing architectures.
NCSA spent three years looking for ways to extend and adapt HDF4 to meet these challenges, but in the end it was clear that such an adaptation would only result in an extremely complex format and I/O library, which would not only be difficult to maintain, but would not meet these new requirements nearly as well as a completely new design would. Indeed, it was felt that, if the HDF libraries were not completely overhauled, the data format and software would gradually become unable to support the modern computing needs of scientists. With these pressures, and informed by the lessons learned by NCSA in developing and supporting HDF4 over many years,led to the development of HDF5, a new data paradigm built from a solid foundation of computing science data principles.
The good news is that HDF5 is clearly superior to HDF4. The underlying concepts are more robust and the workmanship is cleaner, more compact, direct and simply more maintainable. HDF5 will be a powerful, flexible and pragmatic data format for many years, or decades. There are no plans for a future transition to another, different "HDF6". We believe that HDF5 "got it right"; that the capabilities built into HDF5 will directly benefit the Earth Science Community because they directly map into our needs in the near and distant future.
Just as NASA defined certain aggregates of HDF4 structures to represent HDF-EOS point, swath and grid in HE4, so HE5 is a standard usage of HDF5 to implement point, swath and grid in HE4, so HE5 is a standard usage of HDF5 to implement these same structures. In October, 2000, the EOS Aura Data Systems Working Group adopted HE5 as the Aura platform standard. This is the first EOS mission to use HDF5 and HE5 for all standard products. The Aura instrument science teams are together working to further standardize their products to assure compatibility among the Aura instrument teams by defining standard file metadata and other conventions.
Continued Support for HDF4
HDF4 heritage and transition from HDF4 to HDF5 are important considerations. Many years of effort have gone into developing high quality data production software based on the HDF4 and HE4 standards, and any change to a new standard is a rightful concern. NASA and NCSA understand this, and are striving to assure that these challenges will not be any more burdensome for science data producers than necessary, especially for the science data end users. We will do that by developing compatibility and transition tools, by working closely with teams that make the transitions, and by continuing to maintain the HDF4 code as long as required.