. Advertisement .
. Advertisement .
This article will explain thoroughly on how to properly execute your Apache Spark installation on Windows. Let’s learn more!
What Is Apache Spark?
When used alone or in conjunction with other decentralized computing technologies, Apache Spark is a data processing framework that can swiftly conduct operations on vastly considerable data sets and disseminate implementations over several computers.
These two characteristics are essential to the fields of big data and machine learning, which call for the mobilization of enormous computer power to process vast data warehouses.
Apache Spark Installation on Windows
Step 1: Install Java 8 or other newest version of it
Java 8 or a more recent version is required to install Apache Spark on Windows, so make sure you have obtained and installed one.
You may download OpenJDK from this link for your convenience.
Double-click the downloaded.exe (jdk-8u201-windows-x64.exe) file to install it on your Windows machine when it has finished downloading.
Alternatively, you may stick with the default directory.
Just a heads-up: The techniques in this article for installing Apache Spark on Java 8 also apply to Java 11 and 13.
Step 2: Install Apache Spark on Windows
- Step 2.1: Visit the Apache Spark official download website linked and select the most recent version. Choose “Pre-built for Apache Hadoop” as the package type.
- Step 2.2: Use WinZip, WinRAR, or 7-ZIP to unzip the file when the download is finished.
- Step 2.3: In your user directory, create a folder named Spark and copy and paste the contents of the unzipped file into it: C:\Users\<USER>\Spark
- Step 2.4: Open the log4j.properties template file by going to the conf folder. Turn INFO into WARN (It can be ERROR to reduce the log). Both of the next steps are available as choices. Remember to delete the template so that the file can be read by Spark.
- Step 2.5: We must now configure the path. Access the Control Panel by going to System and Security, System, Advanced Settings, and Environment Variables.
Don’t fail to adjoin the next user variable (or System variable) as well. (Click the New button under the User variable for USER> to add a new user variable.)
Click OK after a new window pops up. Include %SPARK_HOME%\bin as an addition to the path variable and then click OK once again.
- Step 2.6: Hadoop must be present for Spark to function. As a consequence, Winutils.exe installation is in great need in order to employ Hadoop 2.7.
- Step 2.7: Generate a folder named bin inside the winutils folder on the C drive in advance. After that, transfer the winutils file you downloaded to the following bin folder: C:\winutils\bin
Don’t forget to include the user (or system) variable %HADOOP_HOME% like SPARK_HOME. After that, choose OK.
That’s how you get your Apache Spark installation on Windows done without any flaws coming on the way. Hopefully, this article will be a great help to you. See then!