How to Generate Executable Jar for Spark using Maven in Eclipse

This article “How to Generate Executable Jar for Spark using Maven in Eclipse” is having steps to generate executable Jar by creating Maven project in eclipse without installing Spark in local machine. Eclipse contains inbuilt maven plugin, which is used to compile and create jar to execute in spark environment.

 

Create Maven Project in Eclipse Oxygen

It’s recommended to use Eclipse Oxygen IDE because Maven plugin is already present in this version of eclipse.

You can install eclipse oxygen from https://www.eclipse.org/oxygen/

Steps to create Maven project.

  • Open Eclipse Oxygen IDE and Goto File → New → Other → <In the text area type Maven> →<click the drop down arrow to the left of “Maven” icon which appears after search> →<select “Maven Project”> -> click next

 

  • Select the default workspace location and click next.
  • Select default Maven version type and click next
  • Enter group Id – “spark” and artifact – “testproject”. You can take any logical names. It will give package as spark.testproject.
  • Click on Finish button.
  • Maven project is created with this file structure.
  • Write code in App.java which is present in src.main.<group id>.<artifact id>. Use this class file for creating spark jobs. You can delete and can create your own class file.

 

Edit the “pom.xml” file for compiling and creating the executable Jar

In pom.xml, add following entries:

  • To compile project using Java 1.8, add the below mentioned entries in pom.xml. Replace existing information contained in between the properties tags i.e. with the below mentioned entries.

<Properties>

<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

<maven.compiler.source>1.8</maven.compiler.source>

<maven.compiler.target>1.8</ maven.compiler.target>

</Properties>

 

  • For creating an executable jar with all the dependencies present in it add the following entries into the pom.xml file. Add this entry exactly where the dependencies tags ends.

<build>

<plugins>

<plugin>

<artifactId>maven-assembly-plugin</artifactId>

<configuration>

<archive>

<manifest>

<mainClass>spark.testproject.MyApp</mainClass>

</manifest>

</archive>

<descriptorRefs>

<descriptorRef>jar-with-dependencies</descriptorRef>

</descriptorRefs>

</configuration>

</plugin>

</plugins>

</build>

Mention complete main class path in between <mainClass></mainClass> tags. You have to mention main class name of project, which contains main method.

 

For Spark RDD, add following dependencies.

Note: Replace existing content given in <dependency> tag.

<dependency>

<groupId>org.apache.spark</groupId>

<artifactId>spark-core_2.11</artifactId>

<version>2.2.0</version>

</dependency>

 

Compiling and creating executable jar

Right click on project ”testproject” → Run as → 4 Maven Build .. → Type “clean package assembly:single” in Goals>→ Run . This will clean all the previously created class files or jar files and compile the project again.

After this maven code compilation will start and all necessary dependencies will be downloaded. You have to be connected to internet during this activity.

Once compilation is finished, it will show message in console.

Right click on the project and click refresh.

In the project “testproject” go to the folder named “target” where you can see all the required jars are created. For example, in this case it will create testproject-0.0.01-SNAPSHOT-jar-with-dependencies.jar and testproject-0.0.01-SNAPSHOT.jar

You can use this jar to execute Spark in Cloudera EC2 or other environments.

About: Avinash