Apache Sedona

Apache Sedona is a cluster geo-spatial computing framework. This example provides how to use Apache Sedona with NASA HDF-EOS data products.

Apache Sedona uses netcdf-java to handle NASA HDF-EOS data. Therefore, it can read HDF-EOS2/HDF4, HDF-EOS5/HDF5, and OPeNDAP.

Prerequisite

This example assumes that you have installed the following software.

jdk 1.8.0
sbt
mvn

Download

This example uses the latest releases of Apache Spark and Apache Sedona.

Although Apache Sedona supports Apache Flink, you can't use Apache Flink for HDF. Thus, downloading Apache Flink is not necessary.

Apache Sedona disabled HDF support by default since version 1.0.0. Sample MODIS data are available from 1.0.0 archive.

spark-3.3.2-bin-hadoop3.tgz
apache-sedona-1.3.1-incubating-src.tar.gz
apache-sedona-1.0.0-incubating-src.tar.gz: This has the sample MODIS data.
wb_coastlines_10m.zip: This is for visualizing worldmap.

Installation

Unpack both Spark binary and Sedona source archive.

Use the patched SerNetCDFUtils.java code. The getDataAsym() function needs a change. The patch modifies the sedona-sedona-1.3.1-incubating/core/src/main/java/org/apache/sedona/core/formatMapper/netcdfParser/SerNetCDFUtils.java as follows:

default: return new Double((Integer) array.getObject(dataIndex));

Build Apache Sedona without testing because some tests may fail. Use mvn install -DskipTests to build jars:

$cd ~/sedona-sedona-1.3.1-incubating $mvn install -DskipTests

Copy all Sedona jar files to spark-3.3.2-bin-hadoop3/jars/ If you modified source code under core, you can just copy the updated jar file only.

$cp ./core/target/sedona-core-3.0_2.12-1.3.1-incubating.jar ~/spark-3.3.2-bin-hadoop3/jars/

Add the following lines to spark-3.3.2-bin-hadoop3/conf/spark-defaults.conf:

spark.driver.extraClassPath /home/hdfeos/spark-3.3.2-bin-hadoop3/jars/sedona-sql-3.0_2.12-1.3.1-incubating.jar:/home/hdfeos/spark-3.3.2-bin-hadoop3/jars/sedona-viz-3.0_2.12-1.3.1-incubating.jar:/home/hdfeos/spark-3.3.2-bin-hadoop3/jars/sedona-core-3.0_2.12-1.3.1-incubating.jar spark.executor.extraClassPath /home/hdfeos/spark-3.3.2-bin-hadoop3/jars/sedona-sql-3.0_2.12-1.3.1-incubating.jar:/home/hdfeos/spark-3.3.2-bin-hadoop3/jars/sedona-viz-3.0_2.12-1.3.1-incubating.jar:/home/hdfeos/spark-3.3.2-bin-hadoop3/jars/sedona-core-3.0_2.12-1.3.1-incubating.jar

Usage

Unpack apache-sedona-1.0.0-incubating-src.tar.gz source and copy vizulation example test resource to 1.3.1:

$cp -r apache-sedona-1.0.0-incubating-src/examples/viz/src/test/resources/modis sedona-sedona-1.3.1-incubating/examples/viz/src/test/resources/

Unpack the wb_coastlines_10m.zip map shape file and copy it to the same test resource:

$cp -r WB_Coastlines_10m/ sedona-sedona-1.3.1-incubating/examples/viz/src/test/resources/

Use the patched ScalaExample.scala code to update sedona-sedona-1.3.1-incubating/examples/viz/src/main/scala/ScalaExample.scala.

Use the patched build.sbt script to update build.sbt file.

Change to the sedona-sedona-1.3.1-incubating/examples/viz directory.

Build the example using sbt assembly.

Copy the target/scala-2.12/SedonaVizTemplate-assembly-0.1.0.jar to the Spark jars location spark-3.3.2-bin-hadoop3/jars/.

Finally, run the example Spark job using the following command:

$export SPARK_HOME=/home/hdfeos/spark-3.3.2-bin-hadoop3 $SPARK_HOME/bin/spark-submit target/scala-2.12/SedonaVizTemplate-assembly-0.1.0.jar

The job will create a PNG output image file called sedona-sedona-1.3.1-incubating/examples/viz/target/demo/earthdatascatterplot.png from the sample MODIS files.

Limitation

Only Spark RDD with Java/Scala API works for HDF [1]. All Python/Jupyter Sedona examples will not work for HDF yet.

Reference

https://sedona.apache.org/latest-snapshot/setup/maven-coordinates/#netcdf-java-542_1

Last modified: 03/09/2023