HDF-EOS WORKSHOP
"Ask the Experts" HDF, HDF-EOS Q&A

Candace Carlisle (CC): OK, let's have people from our answer panel start migrating down here, please. Just in case you've forgotten who you are, from Hughes, I'd like Larry Klein, Karl Cox, and Raj, who's going to have to pronounce his own last name for me [Gejjagaraguppe]; from NCSA, Mike Folk; from Hughes STX, Doug Ilg, Suresh, and Ray Milburn, and then we are going to have a couple questions where I have a couple of plants in the audience, I hope, to help me out with. Okay, what we're going to do is we're going to start with the questions that we got written first, and they're in no particular order, and then once we've handled those we will take questions from the audience.

— Question 1 —

CC: Okay, the first question, in no particular order, was: Relation to geospatial information infrastructure, i.e., NIMA involvement and SDTS? And I do have Dan Marinelli, who promised to help me with the SDTS answer.

Dan Marinelli (DM): Okay, the short version, I think, and maybe Suresh could help me out here, was that when we chose HDF to standardize on in the V0 timeframe, SDTS was coming into being, and also the SDTS standard doesn't cover all of the requirements that we have.

Ramachandran Suresh (RS): Basically, the data model used by SDTS is different from the ECS data model.

CC: In terms of the national spatial data infrastructure, we are participating with the Federal Geographic Data Committee, and we are committed to their metadata standard, and also we export our metadata to the Global Change Master Directory, which is an FGDC node. We also, in terms of global participation, Suresh has been participating for the ESDIS project in CEOS, which is an international group, the Committee on Earth-Observing Satellites.

— Question 2—

CC: Okay, let's go on to the next question: Does HDF-EOS support any other data types, such as regions, curves, histograms, and 3-D models? And this is for Doug.

Doug Ilg (DI): Okay, currently, HDF-EOS doesn't support any of those objects. It supports objects that were identified during a survey that we took—it's been about 3 or 4 years now—we took a survey of the data types that were used by Earth scientists. Those were not specifically identified as being widely used. That doesn't mean that there aren't people out there using them, and if there is a compelling reason to add those things, we can certainly work on that. Some of the ones that were listed in this question, histograms and 3-D models, for instance, although we don't handle things with those particular names, the structures that we do handle can certainly contain those objects. Histogram is a 1-dimensional or 2-dimensional object, an array, and a 3-D model is a 3-dimensional or more array. So we do handle some of those objects, although not necessarily with those names. Regions and curves, we didn't identify a requirement for those.

— Question 3 —

CC: Okay, the next question was a followup to this: Are these concepts useful to Earth systems science? And I think Doug left the window open that this is an audience participation question. Is there any of our scientific community that feels a need for regions, curves, histograms, or 3-D models?

Audience member: Nathan Sovik from SEDAC, the socioeconomic DAAC. We don't have a bird in the sky, but we try to serve as NASA's connection to economics and other social scientists in the community, and we obviously deal mostly with textual data as well as vector data—we have a lot of GIS coverages, and we do see a need for integrating remote sensing imagery with different types of products. Now, whether that happens in HDF-EOS or not is I guess an open question, whether it's necessary. But I'd like to voice, for the socioeconomic group, anyway, that we want to see remote sensing imagery produced to a larger community than that's definitely being considered.

CC: Okay, let's write that as a note.

—Question 4—

CC: All right. Who defined standard types, profiles, that augment EOS-HDF; for example, political boundaries, extracted features? And that's another question for Doug.

DI: Okay, those, as I said for the original set of structures that we implemented, they were identified in the survey, so in the spirit of that survey, I guess, the way new structures would be identified would be through the ESDIS or EOSDIS project, through NASA, with lots of input from the user community. So, users, for instance, like CIESIN and their large user community of socioeconomic data.

—Question 5—

CC: Next question is: Which are the most important architectures to support? This came from a vendor, by the way. Languages? Does anyone use C++? And this question Suresh was going to answer.

RS: Okay, I think currently it's the, in terms of programming language, C and Fortran are the ones the ECS project is using, C++ in different parts of the system, but the bulk of the software as far as I know is currently in C and Fortran. Any other comment?

Larry Klein (LK): Yeah, our data server is implemented in C++, so we put applications on top of HDF-EOS, like subsetting and subsampling, so internally we're using C++, although our data providers are using C and Fortran.

CC: Is there any audience participation on this one? The question was what are the most important architectures and languages to support?

Audience member: Cheryl Craig from MOPITT. I just want to say that Fortran means Fortran 90, also, not just Fortran 77.

RS: Yeah, that's right. We support both Fortran 77 and Fortran 90.

—Question 6—

CC: Okay, the next question: How easy has it been to add HDF support to existing products? And Suresh was going to take the first stab at this.

RS: What was the question?

CC: How easy has it been to add HDF support to existing products?

RS: Very easy. Probably I'm not the right person to answer the question since there are so many people who have developed tools, particularly for the Fortner Company, and Ted Meyer or somebody can say something.

Ted Meyer: The question is how easy is it to add HDF-EOS support?

CC: The question was written "HDF support to existing products."

RS: HDF support.

CC: It came from a vendor. So I assume they're looking for input from some vendors.

Ted: Assuming your tool is designed to support the kind of structures that HDF supports, it's fairly straightforward. The API is fairly easy to work with, and I guess that's how I would answer that.

RS: Yeah. Well we have done some studies also; we did a pilot data migration project about a year and a half ago where we took some data sets in native format and migrated into HDF. And those are large volume remote sensing data from satellites. And based on my experience, it took about 6 months to translate five to six data sets into HDF.

Audience member: Yeah, I have something to say about that because that's my job. I'm Bill Weibel, from JPL, LinkWinds. It's at least a two-pronged problem whenever you're dealing with something like this. The first step, which is just to open up in HDF and read a data element out of it, is very simple. That's almost trivial, and you do it in a couple of lines, you do it in a very small, like six lines of code. And most of it is scientific data sets, most as SDSs. Even in the case of HDF-EOS, it's SDSs. And you can find those. The second step in the fork is much more complicated; well, the second step gets more complicated—then it becomes a matter of identifying what's in there. There's a trivial, there's a simple step, there's finding descriptive strings for the data sets that are in there, if they exist, and they don't always exist. And if they don't exist, you end up with meaningless names, like dataset1, dataset2, etc., etc. Another step that you want to go to, then, which gets more dicey, is interpreting, is having software interpret the metadata that's coming out of there. So, for instance, it's trivial to make your software list the latitude array and longitude array and your temperature array and so on and so forth. But now you want your software to save the user the step of saying, well, this temperature array should be plotted against, you know, in 2 dimensions, against the longitude and latitude array, longitude along axis 1, and latitude along axis 2. And if you know what convention you're dealing with from this data ahead of time, it's an easy step to write that. And if you don't, you have to start coming up with various rules to handle various cases so that you will save the user this step of trying to match the coordinate, the world coordinate, with dimension in the array. So that's my take on that whole issue. Technically simple at one level, but again it's tough to make it easy for the user. It can be difficult unless everybody, unless people start agreeing on how to interpret the data.

Audience member: Isn't that what HDF-EOS, isn't that second step what HDF-EOS--

Bill: No, it's not. HDF-EOS has provided a mechanism for storing a higher level form of the data, and there have been agreements on that there should be latitudes and longitudes. So I'm sorry, I'm wrong on some of the, it does do that. But when I went searching literature for instance, the documents that say that these are the HDF-EOS conventions, which is different from the HDF-EOS in API. Those were not directly linked to those Web pages. And that's the kind of thing that needs to be done in order for us to say, okay, you won't need to do any of this for HDF data. And the danger is also that data suppliers can create valid HDF-EOS data sets, but only if they're using conventions. And that makes it harder for scientists to get a user tool to read an HDF-EOS data set and immediately get a picture of the data. A raster image of the data can always come up, but it may be 720 by 360 pixels, which I had in one case, and when I tried to overlay a map on top of it, I saw through that something that looked stupid, until I pull out the latitudes and longitudes from there.

CC: More comment from the panel on that question?

LK: The latest user's guides do a pretty good mapping of the conventions to the implementations, though, I'm not sure which version you're using

Bill No, I have the Product Guide for HDF-EOS 2, Version 1, but I went to the Web documents for the workshop, which had a lot of information, but things like conventions—I saw them referenced, but then they weren't linked. And so I thought, well, you know. So it means there's a delay in finding out what I wanted to know.

Mike Folk (MF): One of the original intentions of HDF-EOS was to focus the conventions to a few well-defined data sets so we could have a common EOS structural metadata inside, so if you have some specific issues, then we'd like to hear them, so maybe we can clean up the documentation.

Bill: Well, my impression is that, from Doug, is that this has been worked out. But I don't think it's been made obvious. My impression also from the workshop is that it wasn't obvious to the entire community that those things had been set up. But, I mean, is that an incorrect question impression?

DI: I think most of our problem is a problem of getting the information out all in one place so that you can find it where you need it. I think it's all there, somewhere. We've just got to collect it all in one place for you.

CC: And that is part of what we're trying to do with our Web site is to try to organize some of the information better. We may not be there yet, but we're going to continue to work on it. I do have a big pile of questions here, so let's move forward.

—Question 7—

CC: This next question is for Mike Folk: Since HDF 5 will define the new comprehensive data model, will the current elemental HDF data objects still exist? And how will this affect applications which are or will be developed using HDF 4? For example, will the higher level HDF-EOS data objects be changed?

MF: Okay, I have a very long answer to that, which I'm sure Candace doesn't want me to provide, so I'll give the short answer. First of all, I want to point out that HDF 5 is a research project, okay, particularly as far as the EOS project is concerned. There is no EOS commitment to using HDF 5. And another thing I want to point out is that what we currently have is a prototype, so the things that I'll say about it are subject to change depending on what we learn and what kind of feedback we get from that. Okay, having said that, it will, okay, the first question was, since it will define a new comprehensive data model, will the current elemental HDF objects still exist? They will not exist in the file format the way they currently exist. They will not exist explicitly in the file format, so there will not be a raster image thing, or a Vdata thing, in an HDF file. However, we've been pretty careful in designing the single data model, to make sure that it's possible to implement any of the other seven objects in terms of the model that we have. The model consists of a multi-dimensional array of records, basically. That's the basic data element. And then there's a basic organizational element that's like a grouping structure, so for example, if you're talking about a Vdata, then you're talking about a 1-dimensional array of records. So there will be that derivation. But one of the implications of this which we didn't realize until we really got more deeply into the problem, that I think a lot of people are assuming or asking about, and that is that it will be very difficult, I won't say you can't do this, it would be very difficult to take the current HDF library, the current HDF APIs, and put them on top of the new HDF 5 library and expect them to make sense, because there will be ambiguities. Things will be stored in the HDF 5 format that won't make any sense to the current version, for example. So we've moved away from that, and our plan is simply to say that you've got to choose one or the other format and then to work really hard at one of the earlier questions, which would be translating between one and the other. Now the question about, for example, the last question, for example, will the higher level HDF-EOS data objects be changed. The format of those objects; okay, if the HDF-EOS library is implemented using the new HDF 5 library, then the format of those objects will change. However, I'm very confident that the objects themselves, the contents of them, will not change, so that you'll be able to use exactly the same API that you're now using. That's yet another argument, I think, for using the HDF-EOS API as soon as you can rather than just a straight HDF 4 or earlier HDF API.

—Question 8—

CC: Okay, the next question: What is the interest of the HDF/HDF-EOS community in the development of pure Java interfaces to the HDF file format, as well as the higher level HDF-EOS API? I would be very interested in any contacts who may be working on this. So first, I'm going to open it up to people at the table. I think just about everybody here is doing something in Java.

MF: I'll start by saying, you know, if we mean by a Java interface a pure Java interface; in other words, no C code down there below that's being called by Java, I'll just warn you that we're talking about between 200,000 and 300,000 lines of code. We're talking about the JPEG library; I don't know if the ESL Pegasus people are going to do the JPEG library in Java and provide it for us—that would be great. Gnu Zip, netCDF—we're talking about a whole lot of other pieces of software. So if we're talking about a pure Java interface to HDF, it's a really big job, okay. One of the things about HDF 5 is that it would be a much, it's still a really big job, but it would be much less of a job, partly because we would know we were doing it from the very beginning. Okay, the real question was is there community interest. And just in the last month, I've really begun to hear, we've really begun to get a lot of interest in that. Not that the number of inquiries has been that great; you know, it's probably been 10 or 12, but the inquiries have been from people who say we want to participate in this and we want to help with it. So that's my perspective.

CC: Hughes, do you have anything to add? Hughes has nothing to add. Anything from the audience? Lee Elson.

Lee Elson: Yeah, I think it's going to happen that people will do pure Java permutations on parts of HDF. In fact, we're planning to do that. We've already begun to look at it and one of the problems that we have right off the bat is that there is not a lot of documentation on the really low-level aspects of HDF. For example, if one wanted to go and find out exactly how a raster 8 file is, exactly how it's stored, I don't think that information is very easily available. So making that information available will certainly help developers do some of this implementation.

MF: You know, that's a very good point. In fact, I was talking to Ben Kobler about this yesterday, with respect to HDF 5—and with respect to HDF 4. There is a specification, but there's a lot of missing stuff there, and if you've gotten hold of the specification it's probably what you're speaking to. If you haven't, there's quite a bit of good stuff in there. I can say that—Doug did the last version of it. But that, I think, is a need that we've identified; I mean, with Ben we came to the conclusion that that has to be a project that we do in the fairly near future.

—Question 9—

CC: Okay, let's move forward to the next question: Will you be extending the current HDF-EOS library to include API functions to access the granule-level metadata? For example, the MODIS HDF-EOS data structure places calibration scales and offsets in data fields within the file. These fields are essential if you intend to convert the swath data into meaningful geophysical units. Doug was going to take the first stab at this.

DI: The short answer is yes, we do plan on moving; that functionality for reading the metadata currently exists in a part of EOS, or ECS, called the SDPS toolkit—is that what it's currently named? SDP toolkit, sorry, it changes names quite often. So right now that functionality exists, it's a little bit more difficult to get to, that's quite a large package that probably a lot of people aren't going to want to use. What we're thinking about doing is taking those particular parts of the SDP toolkit that handle metadata and moving those over into HDF-EOS, where we probably should have put them in the first place, but I guess we weren't thinking far enough ahead, and I'm going to hand it over to Larry to tell you when that might be happening.

LK: [beginning of tape; words missing]--by EOSDIS, so we have it as a high priority to change our metadata tools, our metadata access tools, to be independent of the SDP toolkit and package them with HDF-EOS. So it's a high priority, something I think we should be able to accomplish in the next couple of months.

—Question 10—

CC: Okay, the next question: Would it be possible to extend the geolocation data to include spacecraft geometry along with the latitude and longitude fields? Currently, the latitude/longitude fields represent a considerable volume of data which must be accessed each time the swath data fields are Earth located. Orbital models exist which can perform the image element to Earth location transform with access to these orbital elements. When dealing with server/client relationships, especially in long-haul situations, the advantage in serving a 20-word set of satellite geometry values as opposed to a point-for-point lat/lon set, is considerable. Karl Cox?

Karl Cox (KC): This has been discussed off and on for several years now. Basically, the idea of just sending out orbital ephemeris data and attitude data is so that other programs can essentially calculate the geolocation of each pixel or observation as opposed to putting into the data file the geolocation information. That's not a decision that we, as such, have made. The data model, when it was put together, includes a pointer to the attitude and ephemeris data, but it would be the providers themselves that have to actually build this stuff into it in order to make use of that. The latitudes and longitudes that are in the granule, in the data granule, are not used by the ECS system to actually search and locate the tool, locate the granules. It's information describing the bounding region through a bounding rectangle or g-polygon which are used in the search process. So that certainly is a viable means, but at present it's not being, this information is not being provided by any data provider. But like I say, it is available to be ordered along with the data, if that's what you want to do. I might make a comment that for some applications that the information about the ephemeris that's coming down from the satellite is not accurate enough to geolocate some products. So what would be available through that route as opposed to what comes through the Flight Dynamics Facility would be a choice that someone would have to make, presumably the data provider. That's all I can comment about.

RS: Well, I'd like to add something to what Karl has just mentioned. I have seen a demonstration of a software package which does a very similar thing, taking orbital ephemeris data and doing data subsetting and that kind of stuff. The person who developed this prototype software is Mike Botts. He has given demos in the past, so if you are interested to contact him, you can send me an e-mail and I will send his e-mail, because I don't remember on top of my head.

DI: What I'd like to add to that is that I think one of the problems with doing that sort of a thing, handing out just ephemeris, is that we've got a wide range of user types out there. Most of these people aren't going to want to navigate their own data. They want navigated data. They don't even care, really, to that sort of accuracy where the data is, they want to see pretty pictures, a lot of them, when you're talking about the K-12 educators and probably even up through, maybe even through undergrad. And there aren't that many tools out there, if any, really, that can apply that sort of information, really make use of it, so when those sorts of tools exist, I think that's going to be a much more viable method of navigating the data.

—Question 11—

CC: Okay, moving on to the next question: Within the swath HDF-EOS profile, is it possible for a granule to span files? For example, is it possible for a swath start to exist in one HDF-EOS file and the end of the same swath granule to exist in another HDF-EOS file? Doug.

DI: Okay, the simple answer is not exactly. I don't know how simple that is. But what I would do in that situation is really call it two different swaths, and start it and stop it in one file, and then start it again in the next. Because a swath by its nature is continuous anyway; it goes from the beginning of launch until the satellite dies, and every swath that you see has been cut out of the real swath. So it's just a matter of cutting it up into bite-sized pieces. What you can do if you need to cut it up into, if you need to preserve the length of the swath for some reason, you absolutely must have 1 full day's worth of data—14 orbits for instance, in the case of a lot of the EOS instruments—you can either slice the swath up by parameter. Say you have got a 10-band instrument, you only put 5 bands in 1 swath and put 5 in another swath that would then sort of be collocated. Or the other option would be to use, and I wouldn't really recommend this one, HDF and HDF-EOS have the capability to do external elements, where you can actually use one file to point off to other binary files where the data is found. So you could get past the 2-gigabyte limit by doing that sort of thing, although it's not real easy to do. Larry?

Larry: Unless I'm wrong, that feature is not supported in, at least inside EOSDIS.

KC: You are correct. HDF-EOS supports that, but the EOSDIS system does not. I might add that we've talked about data granules. You should be aware that the data granules can consist of more than one file. And so in the spirit of the question, yes, that can be done; it's not necessarily the most advisable thing for a data provider to do, but you can have several files that form a granule where the swath begins in the first file and ends in the last file. It depends greatly on how the data provider sets up their structure of their file.

CC: Okay, that last question was from Larry Fishtahler from MODIS.

—Question 12—

CC: Okay, to move on to the next question: To accomplish its role of producing HDF-EOS specific applications, industry needs explicit rights to use and distribute the current and future versions of the HDF-EOS libraries developed by NASA for EOS. Additionally, the rights for the metadata tools associated with the SDP toolkit are essential for industry to support HDF-EOS metadata. What is NASA's proposed approach to making such rights available? Okay, I have a statement here that was put together by our ECS COTR and our deputy project manager:

"The ESDIS project will ensure that HDF-EOS libraries and tools developed and maintained under the ECS contract will be made available as public domain software. Any HDF-EOS software and maintenance updates generated in support of the MTPE program will also be made publicly available. These updates may address software enhancements or fixes identified by other users of HDF-EOS if they also meet MTPE program needs. However, NASA is not committing to developing any and all software enhancements and fixes requested by external users."

So what we are trying to do is identify those parts of the ECS software that vendors do need to develop their products. And so if anybody has any input to us on what those parts are, if they happen to know, you can certainly send us an e-mail on that.

—Question 13—

CC: Which brings us to the next question, also from a vendor: The HDF-EOS libraries are accompanied by a file called cfront.h. An information file in the distribution states that any software built using cfront.h must include a rather long copyright/permission statement. It seems to state further that software built using cfront.h may not be distributed for a fee. Can you please clarify the usage restrictions and licensing info regarding the use of HDF-EOS libraries to build commercial products?

Okay, first I'm going to refer you to my previous statement; then I'm going to say that the people that are in charge of this are lawyers. And they can't possibly distribute anything without a very long permission statement that tells you how, no matter what happens to you, it's not NASA's fault. But my understanding is that there are certain tools that are being built by the ECS contractor that are needed to build commercial products and that we are committed to providing those tools.

—Question 14—

CC: This is my next question; it's going to be answered by Dan Marinelli: As currently distributed, HDF-EOS libraries are built only for a subset of the platforms supported by NCSA. These are the officially supported HDF-EOS platforms, according to the May/April '97 HDF-EOS documentation. The list of officially supported platforms does not include PC or Mac. What are you planning to do to address the need of commercial software vendors to support the PC and Mac platforms?

DM: Basically we at the ESDIS project have to weigh the demand for this capability along with other requirements that we need to support and determine where we're going to allocate our funds. Up till now they've been largely concerned with the producer community, but have recognized the user community need for a PC and Mac platform use.

Audience member: Can I just say something in addition to that, or something to consider, anyway? My name is Scott Quier with SAGE III. With the bang for the buck balance swinging in the direction of the PC platform, I suspect, especially with SAGE III data to go by, that researchers are going to be moving from the heretofore traditional Unix platforms for research purposes to PC platforms. We've had, within our particular area, we've had almost all of our research section move off of Unix platforms of one color or another to Windows NT boxes.

Audience member: Steve Eddins with The Mathworks. I wrote that question. I'd like to echo what Dave Uhlir said earlier from RSI. Mathworks refined our workstation business to be relatively constant and our PC/Mac business to be in supernova phase. So I'm anxious to see that. I'm willing to try to produce some building of my own, with my original source form, but with the sparseness of testing and elevation data sets for me to use to see that my port is running correctly, I hesitate to put it in now based on that.

CC: Okay, let's take that as an action for the project to further work this issue.

—Question 15—

CC: Okay, let's go on to the next question: Seems HDF-EOS and utilities are at a primitive stage now; do you know when the metadata will be documented and when more HDF-EOS data files will be available? And Karl Cox was going to take the first.

KC: Addressing the metadata, in the talk that I gave, in the copies and the handout, there was a reference page: the B.0 implementation of the science data model is the documentation of the metadata attributes. The metadata themselves are the values that go with that and that document also lists the valids that each of those attributes can have. The first paper that was listed on that reference sheet includes in it references to other papers which provide further documentation about the science data model. You can get more than you really want to know about it, and some of it probably is not the clearest in the world to someone who's not intimately familiar with the data engineering within ECS. I don't know how to address that. We've tried several approaches, tried to make it simpler to understand. It took me 6 months before I was familiar with it enough to feel comfortable about talking about it. But I don't want to put you off on that. We have made great strides in trying to point out which attributes are important and tried to clarify how they're to be used. There was a metadata workshop in early April. A lot of material was generated from that. If you have additional questions, feel free to send me an e-mail and I'll try to get back to you with a URL of the site—I don't have it right here with me—where you can pick up information, all the information from that workshop.

Audience member: Ed Lerner of Fortner Software. Would it be possible for ECS to make available representative HDF-EOS granules that you must be using or planning to use for internal V&V? And I propose that if this is doable that we get samples that represent multiple instruments.

LK: There are several dozen test data sets and heritage data sets that have been migrated or created in HDF-EOS, some of them reside at the Goddard DAAC and EDC, and we have them locally. As far as how we can make them available, I think that has to be, from our point of view, on a case-by-case basis. We don't have, we need permission from the DAACs and the instrument teams that distribute them, but we would certainly entertain a request and see if we can get them to you.

CC: We'll take that as another action for the project to work, because I know there is a need for that. Larry [Fishtahler]?

Larry: Yeah, I would just like to say that MODIS has produced a number of these type of data sets; we're interested in working with people who are interested in getting them.

CC: Okay, MODIS would like to share their data sets.

—Question 16—

CC: Okay, the next question is for Ken McDonald. Doug, do you see him? Can you take a mike up to him? Okay, somewhere meet in the middle, Ken and the mike. Since it was said there was no limitation on international distribution of data, is there any plan to join with any others internationally to increase data available?

Ken McDonald (KM): Okay, there are a number of activities that the ESDIS project is involved in in the international community. One activity that's rather recent is sponsored by the Committee on Earth Observation Satellites, and in this program there's a rather new experiment, it's called the Data Interoperability Experiment. And one of the basic technologies for that is this Data Information and Access Link program that I think Suresh mentioned, probably, in his talk. This is a rather new program, but there are a number of sites in the international Earth science data community that are using this and are bringing it up. There's another activity that's sponsored by the International Geosphere/Biosphere Program, IGBP. And again, DIAL is a candidate technology, and there's a number of centers in Southeast Asia and temperate East Asia that are potentially bringing these capabilities up. Now in both of these cases, these are rather new initiatives, so the immediate accessibility of the data has not come to pass. However, through these types of activities., ESDIS is trying to encourage the accessibility of the data by providing the underlying tools, and once they are populated at these centers, the data is available, or will be available, over the Internet. So those are a couple examples, I guess just to give you, it's not as directly related to the HDF discussions today, but I guess over the past probably 4 years or so, again through CEOS, there is a Catalog Interoperability Experiment that is much more mature, and as far as search and order access for data sets through the Version 0 IMS, there's probably on the order of 8 or 10 international sites that do provide this access. And again, that's search and order, but it does, I think, show the model that we're using in trying to make some of the international data sets more accessible to the community.

CC: Okay, thank you. Okay, Suresh would like to add something, so please pass him a mike.

RS: I'd like to add two things. One, HDF is also being used by the Japanese Space Agency, NASDA, and they have a lot of data in HDF. I think it was used in the ADEOS/OCTS kind of satellite data. The second thing I would like to mention is the CEOS group itself is very much aware of HDF and HDF-EOS activities. The CEOS data subgroup Format Guideline's task team does such a thing, and they approved HDF as the international data distribution standard about 3 months ago. There is an official document put out by CEOS, but it's still in draft version. It's going to be available for public probably very soon.

—Question 17—

CC: Okay, the next question: Seems focus is on utilities to read and write files, etc. Is there a need for full fledged applications and what do you envision these being? In what timeframe would they be required and useful? The DAACs talked about this a little bit today, and I think that the science panel is going to be talking about it more tomorrow, but is there anybody that would like to comment on this now?

—Question 18—

CC: Okay, hearing no volunteers, let's go forward to the next question: What about using HDF-EOS as an archive format? And that's for Karl Cox, or another of his ECS friends.

KC: I think I drew this one by default. As an archive format, depending upon the data, HDF-EOS is not optimal. But with regard to ECS, we had, there was a decision that had to be made—do we generate HDF-EOS for distribution, or do we store it that way as a matter of resources? The decision was made to store it, standard products, as HDF-EOS, and then on distribution we don't have to deal with the conversion from whatever binary format the instrument team might give it to us into HDF-EOS. If you look at the products, initially there will be mostly HDF and binary formats from heritage data in the ECS system. But with TRMM, excuse me, with the launch of AM, very quickly the number of HDF-EOS granules starts growing, and by the time of the PM launch there will be over 40 million data granules in the ECS archive, and nearly 90 percent of those will be in HDF-EOS. The bulk of that will be level-1 and level-2 products, and a much smaller portion, probably on the order of a quarter, will be level-3 products and above. That's just rough statistics.

LK: I just want to, on a very closely related topic, there may be some misconceptions about HDF-EOS. HDF-EOS is HDF, with ECS or EOS metadata attached. An HDF-EOS granule could be a Vdata or an SDS with our mandatory metadata attached. There's a particular combination of HDF objects, Vdatas and SDSs, which make up grid, point, and swath structures, but as far as an archive granule is concerned, there's not much distinction between HDF-EOS. It's when the services are applied later that there's distinctions.

MF: Okay, for a number of years I went around declaring that HDF was definitely not an archive format. I think I'm going to have eat my words on that, because it is de facto becoming one. When I heard that this question was going to be raised, I tried to think a little bit about what might help clarify things for people, and I wrote down what it means to be in archive format. First of all, I didn't write down what it means, I wrote some things that to me it means, and I really would like to hear more about that, because I think given that it is going to be used for archiving, we need to know what needs to be done to make sure that it's not too troublesome. A couple of things I came up with was typically our "archive formats" are sequential in nature. In other words, you can, you open up the beginning of the file, and that will tell you the information you need to know, a lot of the information you need to know, about whether or not you really want to get that file. And the reason for that is that typically it's stored on a tape. HDF is not a sequential format, okay—that's the way it is. I will say that HDF 5, I'm not trying to plug HDF 5, but one of the things we did, and actually this came from a suggestion 6 or 7 years ago from Al Fleig, down on the MODIS team, I guess—is he still-- HDF 5 will allow a user or application to store a large record at the beginning of the file, before there's even a magic number. So that if you did want to use it as an archive format, and you wanted to put some of that kind of essential information in there at the beginning, in your own format, ASCII or whatever, that will be supported. Secondly, an archive format to me should be simple and unambiguously defined. Maybe those are two things. Simple HDF is not, and I don't think anybody will argue with me on that point. It is unambiguously defined, but it's not, my next point was that it needs to be well documented. We've already visited that point. The fact that it's not possible without somebody coming and talking to us and working with us to do a Java reader for a particular kind of HDF object speaks to the fact that the documentation needs work, or the specification needs work. So I think what I'm beginning to see, first of all I'm getting to accept the fact that it's going to be an archive format whether we want it to or not, and what we now have to do is say okay what do we need to do to make sure that it is usable in that way, and I welcome any kinds of comments or feedback on that.

—Question 19—

CC: Okay, keep the mike, because the next question is also for you: When will HDF support shared libraries?

MF: Well, I knew that question was coming, and I'm not very good at, I'm not the right person to ask that question, so I called one of our developers last night, he was still at work late last night. And we had, actually we've been discussing that question in a fair amount of detail over the last few weeks, and it became very clear to me when I looked around the room and saw the various vendors that that is going to be a rather important thing to support. We've been working on, the first problem that we had was in the source code of HDF, there were some things done that really, you know, it wasn't designed to eventually be used in that way, and there were some problems. Over the last couple years, we've cleaned up most of the problems in the source code, and now the problem is one of how the library is actually built. It consists of a whole lot of libraries, as I've mentioned—there's JPG, there's GZIP, there's a lot of different modules in the building process, and people have spoken to that by saying that half the people who get HDF hate it because it's so hard to install it, okay—it's very complex. There's a lot of autoconfiguration stuff that goes in there and so forth. That was all put together before this desire for it to be a shared library came along. So really what we've determined that we need to do is simply go back, well not simply, but what we need to do is go back and look at that. When you're supporting 20-some different platforms, different operating systems, different compilers, that's not a trivial thing to do. We think it could be done for all of the systems that we support in between 2 and 4 months. So it's a matter of now deciding whether this is a priority thing for us to work on and then assigning somebody to do it. So if the question was when could it be done, if we started it today, it could probably be done by the end of the year. But--

CC: You have other things you're doing.

MF: Yeah, we have other things we're doing.

CC: So we would need to hear, I guess, input from the community about how important that is in relationship to other features.

MF: Oh, one thing I should add. I mentioned that Fortner Software has been, we and Fortner Software have been working on a way for them to help us support the Mac and PC platforms. They already implement DLL-based implementation of HDF and with this new agreement, that's going to become part of our distribution as well, so if you're using that platform, then you should be in good shape, and I don't know when that will happen. We can ask them, well, I don't think we'd better ask them that question because we haven't really addressed when it could happen.

CC: Okay, the next two questions are--

Audience member: Excuse me, could I ask a question? I'm Charles Falkenberg from the University of Maryland. I was also interested, I asked Mike about the shared library, I've had a need to do that in the last couple of years just with HDF, but I'm also interested in knowing if HDF-EOS is being designed so it too can be compiled into a shared library or compiled as a shared library by itself.

CC: ECS?

LK: No, we're not putting anything, we're just a layer on top of HDF, but we're not adding anything that HDF doesn't already do, so I don't see any problem there.

—Question 20—

CC: Okay, the next two questions are also asked to HDF and then to HDF-EOS, so I'll ask them to Mike first and then to Hughes or Hughes STX. The first one: Will HDF be made threadsafe?

MF: Another tough question. I assume we're talking about HDF 4, the current version of HDF. That's really a tough question to answer, and I think whoever is asking that, and I think a number of people are, we need to get together and decide what it is that we mean. The problem, or a problem that makes this particularly a difficult question, is that HDF does I/O, okay, and when you have multiple threads trying to do I/O, then you have these different little processes potentially stepping on one another, okay, and to say that HDF is threadsafe, if that means having the HDF library make sure that these completely independent threads, or not completely independent threads, but these threads that could be coming from any direction, are not trying to read, or both write to the same place in the same file at the same time. That's really a difficult task for the HDF library. We've spent a fair amount of time discussing this issue among ourselves, and once again, I can say what I said before, that we did not design HDF with threads in mind at all. As we've begun to look at the issue, what we're sort of gravitating towards is that if we're going to do it at all, and we don't know how hard it will be, so I have to be honest about that, we don't really know how difficult it will be. But if we're going to do it, it will have to be in a context where there is a conversation between HDF and the application. The application has to be able to manage to some extent, or coordinate, the reading and writing that's going on and the locking and unlocking of threads or of accesses to data. So I guess my answer is that it's a real complicated question. We've heard a lot about it, and we recognize it's important. We think it's a real tough problem, and we really kind of want to find out at this point really specifically what people are trying to do where they need it to be threadsafe. So let us hear from you on that.

CC: And ECS, would you like to add anything, as far as HDF-EOS goes?

LK: No, again, we, HDF-EOS is calling HDF libraries, so they're doing the I/O. It is a hard problem, though, because one of the hard problems with building ECS is it's layered COTS, and you try to figure out how to layer a lot of COTS on top of a library or a piece of COTS that's not threadsafe, so we've had to find various workarounds to that. It's a very difficult problem. Sometimes it's easier for somebody building an application to design it knowing that they've got a nonthreadsafe library under them.

—Question 21—

CC: Will there be an HDF standard; for example, ANSI or ISO? Mike.

MF: We don't have any plans to push HDF as an ISO standard. That's a big job. If somebody would like to partner with us in doing that, you know, we'd be glad to work with you. But we don't have any plans to try to do that.

—Question 22—

CC: The next question is: Qwhich I'll answer and if anybody wants to elaborate on it. We have done a very preliminary thing with the Federal Geographic Data Committee in terms of writing a proposal for a swath to be an FGDC standard, and we're at the proposal stage right now. We have written that as a content standard, so that it would be the sorts of things that you need to talk about when you're talking about a swath, it's kind of at that level. Did you want to elaborate on that, Suresh?

RS: For those of you who know about FGDC standards, they have a mechanism where anybody can propose a standard, you know, independent of software. So what has been done, we have not proposed an HDF implementation or an HDF as a standard in the FGDC. Instead, all the conventions that are required to define a swath so it can be implemented using any other library other than HDF, you know, that's what is being proposed. And the proposal is already approved, as Candace mentioned; we are currently working on a draft for submission of the proposal. And it will be published in the Federal Register, so it gives an opportunity for all of you to comment on the document, if you do understand it.

CC: Well, we did the proposal, the proposal got preliminary approval, but then we did get some comments from the public review, so we need to address the public review in the proposal. Now FGDC does have a plan to try to migrate their standards towards the ANSI or ISO, but that's going to be a very long, drawn-out process.

—Question 23—

CC: I've just reached, this upcoming question is the last of the written questions. Do you think that an adequate set of analytical and display tools will be available in the timeframe necessary to meet users' demands? And Karl Cox was going to take the first stab at this.

KC: I attempted to say that looking at these eager people here that the answer is obviously yes, but I obviously am not in a position to guarantee that any particular vendor will step forward with a tool that will answer any particular investigator's research needs. From the ECS standpoint, the EOSView that's being provided deliberately does not have much of that capability. We're not really in the business of providing that type of tools to analyze or visualize data with all kinds of complex and neat capabilities. That's best left up, in my opinion, to people who are in the business of doing that, and I see many of those faces right here. The investigators will obviously need to develop tools of their own if there are not commercial tools or free tools available to them, there will probably be a mixture of things. So I guess what I'm saying is the answer to that question is really up to the people that are here.

—Audience Question 1—

CC: Okay, that's the end of the written questions. I promised to get you out of here by 5:00, but we do have time for a couple of questions or comments from the audience. Okay, the man in the pink shirt, you had the fastest hand.

Audience member: Yeah, John Kerrick, CSC. You've talked about the HDF 5 being totally different between the current HDF 4. Does that mean there's not going to be any more bug fixes on HDF 4, since you seem to want to throw everything out and go with this 5 change, and also, how's this going to impact with all the DAACs that are building all their products through these Vgroups and scientific, you know, and Vdata, which you seem to be throwing out and won't exist anymore—will this mean we'll be using unsupported software at this point?

MF: Yeah, I'm glad you asked that question. Yeah, I should have elaborated a little more when I said that HDF 5, I want to point out, as far as NASA's concerned, is a research project, and one implication of that is that we fully plan to fully support HDF 4, including bug fixes—full maintenance. In other words, it's really no different from the way it is now, which may not be satisfactory, but indefinitely, you know. And if the decision is, there are a couple of possible different directions the decision could go. What one could be is that there is a decision in a few years that HDF 5 is really, truly, compellingly, you know, different enough and better enough that we switch over to HDF 5 and then we have to figure out how that transition will occur. Another would be that HDF 5 never gets used in any way, as a part of this project, and the other would be that both of them in parallel continue indefinitely. But as long as we get the kind of support that we've gotten from the ESDIS project, we will continue, I mean we, I guess I can't say forever, because, you know, a lot of things happen, but it's our intention fully to stick with HDF, with the current version of HDF. We don't, this sort of speaks also to the Mr. Uhlir from IDL, about how often we keep changing, trying to keep up with Netscape, I thought that was a really interesting comment, because we lost out, Mosaic we lost out, and so it's only by changing HDF every few months that we can. But we really think, we really feel, and this last year has strengthened our feeling about that, that HDF has about as many features as it's going to need. I'm sure there are a few more that are going to come up, but the reason we've had to make the changes we've had over the last few years is almost exclusively requirements coming from EOSDIS, things that were needed. So we don't see ourselves as really adding many new features, but your question is a good one, and the answer is in the affirmative: yeah.

John: I want to follow up on something. When is your next bug fix, and will it have the Vdelete finally working again?

MF: The Vdelete. That's a real, that has a very high priority on our list and I'm pretty sure that one we will fix because we've been hearing from a lot of people on that one. We had originally scheduled it for November, but we've been slowed down because we've given better documentation a higher priority and want to do another couple months' work on that, so probably early next year, January or February next year would be the next release. But we don't, we just plan it to be release 2, rather than a new version; no new features.

—Audience Question 2—

CC: One more question from the audience.

Audience member: I just have a comment, really, and I'd be interested in hearing responses from the panel. I'm Liam Gumley from the University of Wisconsin. The EOS mission, as I recall, is supposed to be a mission with a 15-year lifetime, is that correct? Okay. There's hundreds of millions of dollars being spent on the hardware that goes into space, to make sure that it's actually going to function for a long time period. And for science, that's absolutely what we want, that's the whole idea behind the mission. I don't necessarily have a comfortable feeling, yet, that the same can be said once the data's on the ground, that it's going to have a 15-year lifetime. And I'd like, just as a comment, I don't expect anyone to respond, but is there, you know, heaven forbid that in 5 years NASA says to NCSA, thanks guys, you did a great job, pats on the back, but we don't need you anymore. Is there some driving philosophy within ECS to make sure that in 15 years I can go back and have my colleague in Flashback Imaging make me that movie of 15 years of MODIS cloud cover, because that's what the project is all about.

CC: This is Gail McConaughy from the ESDIS project, she's our system architect. She's the one that's in charge of making sure that our system evolves and is usable into the future.

Gail: If it reassures you at all, we're in the process of doing an exercise of restructuring our budget such that we go out to 2010. And I'm doing an estimation on our evolutionary budget, and I laid in a budget line item to make sure we can support the evolution of this data format. So the only things that limits us, from an EOSDIS perspective, is the vision from the scientific community. If the vision from the scientific community were to change into the future, and they would, say, not endorse a standard or not understand a need for a standard, the scientific community then could tell us to make changes. But obviously, with a 15-year data set, we are going to be archiving that data. And the research is global change research, and it doesn't, you know it's not, you can use our data for lots of different things, but clearly one of the things we're looking at is interactions of the climate over a long timeframe. So the system itself is, the concept today, is to make sure that we run that through. And even if, for instance, there's a difference in the vision of EOSDIS in the long range, we would hope that holding to the vision that the key thing that's most important to do is to exchange data as easily as possible, and to make data access as easy as possible, to the end users. So that's what I can tell you. The bottom line is that we've run our budget exercise, and I got a budget in there through 2010. Let's hope I did it right, let's hope Congress let's us keep it. That's kind of where we are.

CC: Okay, I want to wrap up the formal question and answer session so I can keep my promise. However, I think people will be here for a few more minutes if you wanted to chat with anyone. Please come back tomorrow. Tomorrow I promise to get you out of here by noon. But what we are going to have is we are going to have a science panel, and I think it'll be very interesting to interact with them, and we're also going to have two more presentations, one from Simpson Weather Associates, one from IBM, and then we're going to have a wrap-up session where we talk about what the action items are, what we need to continue to do with the Web site, and what we need to continue to do with our interaction among developers, among vendors, among the project, among the community, to make sure that all the good interaction we've started with this workshop continues forward. So I hope to see you all here tomorrow morning at 8:30.

Return to Top of Questions page