0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

I'm not able to run a simple spark job in Scala IDE (Maven spark project) installed on Windows 7

Spark core dependency has been added.

val conf = new SparkConf().setAppName("DemoDF").setMaster("local")
val sc = new SparkContext(conf)
val logData = sc.textFile("File.txt")
logData.count()


Error:

 18:29:33 INFO SparkContext: Created broadcast 0 from textFile at FrameDemo.scala:13
18:29:34 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
    at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:278)
    at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:300)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:293)
    at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76)
    at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:362)
    at <br>org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
    at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
    at <br>org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
    at <br>org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)<br>
    at scala.Option.map(Option.scala:145)<br>
    at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)<br>
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:195)<br>
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)<br>
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)<br>
    at scala.Option.getOrElse(Option.scala:120)<br>
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)<br>
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)<br>
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)<br>
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)<br>
    at scala.Option.getOrElse(Option.scala:120)<br>
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)<br>
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)<br>
    at org.apache.spark.rdd.RDD.count(RDD.scala:1143)<br>
    at com.org.SparkDF.FrameDemo$.main(FrameDemo.scala:14)<br>
    at com.org.SparkDF.FrameDemo.main(FrameDemo.scala)<br>

1 Answer

0 votes
by (31.4k points)
edited by

The following error is due to missing winutils binary in the classpath while running Spark application. Winutils is a part of Hadoop ecosystem and is not included in Spark. The actual functionality of your  application may run correctly even after an exception is thrown. But it is better to have it in place to avoid unnecessary problems. In order to avoid error, download winutils.exe binary and add the same to the classpath.

Copy the downloaded file to ANY_DIRECTORY/bin/winutils.exe

SetUp your HADOOP_HOME environment variable on the OS level or programmatically:

System.setProperty("hadoop.home.dir", "Full\folder\path:\\winutil\\");

If you want to know more about Scala, then do check out this awesome video tutorial:

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...