Apache Sedona is a cluster geo-spatial computing framework. This example provides how to use Apache Sedona with NASA HDF-EOS data products.
Apache Sedona uses netcdf-java to handle NASA HDF-EOS data. Therefore, it can read HDF-EOS2/HDF4, HDF-EOS5/HDF5, and OPeNDAP.
This example assumes that you have installed the following software.
This example uses the latest releases of Apache Spark and Apache Sedona.
Although Apache Sedona supports Apache Flink, you can't use Apache Flink for HDF. Thus, downloading Apache Flink is not necessary.
Apache Sedona disabled HDF support by default since version 1.0.0. Sample MODIS data are available from 1.0.0 archive.
Unpack both Spark binary and Sedona source archive.
Use the patched SerNetCDFUtils.java code.
The getDataAsym()
function needs a change.
The patch modifies the sedona-sedona-1.3.1-incubating/core/src/main/java/org/apache/sedona/core/formatMapper/netcdfParser/SerNetCDFUtils.java
as follows:
Build Apache Sedona without testing because some tests may fail.
Use mvn install -DskipTests
to build jars:
Copy all Sedona jar files to spark-3.3.2-bin-hadoop3/jars/
If you modified source code under core
,
you can just copy the updated jar file only.
Add the following lines to spark-3.3.2-bin-hadoop3/conf/spark-defaults.conf:
Unpack apache-sedona-1.0.0-incubating-src.tar.gz source and copy vizulation example test resource to 1.3.1:
Unpack the wb_coastlines_10m.zip map shape file and copy it to the same test resource:
Use the patched ScalaExample.scala code to update sedona-sedona-1.3.1-incubating/examples/viz/src/main/scala/ScalaExample.scala
.
Use the patched build.sbt script to update build.sbt file.
Change to the sedona-sedona-1.3.1-incubating/examples/viz
directory.
Build the example using sbt assembly
.
Copy the target/scala-2.12/SedonaVizTemplate-assembly-0.1.0.jar to
the Spark jars location spark-3.3.2-bin-hadoop3/jars/
.
Finally, run the example Spark job using the following command:
The job will create a PNG output image file called
sedona-sedona-1.3.1-incubating/examples/viz/target/demo/earthdatascatterplot.png
from the sample MODIS files.
Only Spark RDD with Java/Scala API works for HDF [1]. All Python/Jupyter Sedona examples will not work for HDF yet.