Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
in Big Data Hadoop & Spark by (50.2k points)

I am trying to reproduce an Amazon EMR cluster on my local machine. For that purpose, I have installed the latest stable version of Hadoop as of now - 2.6.0. Now I would like to access an S3 bucket, as I do inside the EMR cluster.

I have added the aws credentials in core-site.xml:

  <value>some id</value>

  <value>some id</value>

  <value>some key</value>

  <value>some key</value>

Note: Since there are some slashes on the key, I have escaped them with %2F

If I try to list the contents of the bucket:

hadoop fs -ls s3://some-url/bucket/
I get this error:

ls: No FileSystem for scheme: s3

I edited core-site.xml again, and added information related to the fs:



This time I get a different error:

-ls: Fatal internal error
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3.S3FileSystem not found
        at org.apache.hadoop.conf.Configuration.getClass(
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(
        at org.apache.hadoop.fs.FileSystem.createFileSystem(

Somehow I suspect the Yarn distribution does not have the necessary jars to be able to read S3, but I have no idea where to get those.

Can someone help me out with this?

1 Answer

0 votes
by (32.3k points)
edited by

For Hadoop 2.6 and 2.7, by default the jar hadoop-aws-[version].jar which contains the implementation to NativeS3FileSystem is not present in the classpath of Hadoop. So, try and add it to the classpath by adding the following line in which is located in $HADOOP_HOME/etc/hadoop/


If you want to know more about Hadoop, you can refer to the following video tutorial:


Browse Categories
