Monday, March 13, 2017

Spark Installation on Windows

  • Choose a Spark pre built package for Hadoop i.e.  Pre-built for Hadoop 2.6 or later. Download and extract it to any drive i.e. D:\spark-2.1.0-bin-hadoop2.6
  • Set SPARK_HOME and add %SPARK_HOME%\bin in PATH in environment variables
  • Run following command on command line.
spark-shell
  • You’ll get an error for winutils.exe:
winutil
      Though we aren’t using Hadoop with Spark, but somewhere it checks for HADOOP_HOME variable in configuration. So to overcome this error, download winutils.exe and place it in any location (i.e. D:\winutils\bin\winutils.exe).
P.S. As per the Operating system version, this winutils.exe may vary. So in case, if it doesn't support to your OS, please find another one and use. You can refer this Problems running Hadoop on Windows link for winutils.exe.
  • Set HADOOP_HOME = D:\winutils in environment variable
  • Now, Re run the command "spark-shell", you’ll see the scala shell. For latest spark releases, if you get the permission error for /tmp/hive directory as given below:
The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
You need to run following command :
D:\spark>D:\winutils\bin\winutils.exe chmod 777 D:\tmp\hive

No comments:

Post a Comment

Apache Spark – Catalyst Optimizer

Optimizer is the one that automatically finds out the most efficient plan to execute data operations specified in the user’s program. In...