Back

Explore Courses Blog Tutorials Interview Questions
0 votes
1 view
in Big Data Hadoop & Spark by (11.9k points)

I'm not able to run a simple spark job in Scala IDE (Maven spark project) installed on Windows 7

Spark core dependency has been added.

val conf = new SparkConf().setAppName("DemoDF").setMaster("local")

val sc = new SparkContext(conf)

val logData = sc.textFile("File.txt")

logData.count()

Error:

16/02/26 18:29:33 INFO SparkContext: Created broadcast 0 from textFile at FrameDemo.scala:13

16/02/26 18:29:34 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

    at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:278)

    at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:300)

    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:293)

    at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76)

    at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:362)

    at <br>org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)

    at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)

    at <br>org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)

    at <br>org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)<br>

    at scala.Option.map(Option.scala:145)<br>

    at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)<br>

    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:195)<br>

    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)<br>

    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)<br>

    at scala.Option.getOrElse(Option.scala:120)<br>

    at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)<br>

    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)<br>

    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)<br>

    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)<br>

    at scala.Option.getOrElse(Option.scala:120)<br>

    at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)<br>

    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)<br>

    at org.apache.spark.rdd.RDD.count(RDD.scala:1143)<br>

    at com.org.SparkDF.FrameDemo$.main(FrameDemo.scala:14)<br>

    at com.org.SparkDF.FrameDemo.main(FrameDemo.scala)<br>

1 Answer

0 votes
by (32.1k points)

You can solve this by using the following:

  1. Download winutils.exe from http://public-repo-1.hortonworks.com/hdp-win-alpha/winutils.exe.
  2. SetUp your HADOOP_HOME environment variable on the OS level or programmatically:

    System.setProperty("hadoop.home.dir", "full path to the folder with winutils");

  3. Enjoy

Browse Categories

...