Explore Courses Blog Tutorials Interview Questions
0 votes
in Big Data Hadoop & Spark by (11.4k points)

I have the following scenario-

Pig version used 0.70

Sample HDFS directory structure:

/user/training/test/20100810/<data files>

/user/training/test/20100811/<data files>

/user/training/test/20100812/<data files>

/user/training/test/20100813/<data files>

/user/training/test/20100814/<data files>

As you can see in the paths listed above, one of the directory names is a date stamp.

Problem: I want to load files from a date range say from 20100810 to 20100813.

I can pass the 'from' and 'to' of the date range as parameters to the Pig script but how do I make use of these parameters in the LOAD statement. I am able to do the following

temp = LOAD '/user/training/test/{20100810,20100811,20100812}' USING SomeLoader() AS (...);

The following works with hadoop:

hadoop fs -ls /user/training/test/{20100810..20100813}

But it fails when I try the same with LOAD inside the pig script. How do I make use of the parameters passed to the Pig script to load data from a date range?

Error log follows:

Backend error message during job submission


org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: hdfs://<ServerName>.com/user/training/test/{20100810..20100813}

        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(

        at org.apache.hadoop.mapred.JobClient.writeNewSplits(

        at org.apache.hadoop.mapred.JobClient.writeSplits(

        at org.apache.hadoop.mapred.JobClient.access$500(

        at org.apache.hadoop.mapred.JobClient$

        at org.apache.hadoop.mapred.JobClient$

        at Method)



        at org.apache.hadoop.mapred.JobClient.submitJobInternal(

        at org.apache.hadoop.mapred.JobClient.submitJob(

        at org.apache.hadoop.mapred.jobcontrol.Job.submit(

        at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(



Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input Pattern hdfs://<ServerName>.com/user/training/test/{20100810..20100813} matches 0 files

        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(

        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(

        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(

        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(

        ... 14 more

Pig Stack Trace


ERROR 2997: Unable to recreate exception from backend error: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: hdfs://<ServerName>.com/user/training/test/{20100810..20100813}

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias test

        at org.apache.pig.PigServer.openIterator(



1 Answer

0 votes
by (32.3k points)
edited by

One common way to solve your problem is to simply use Pig parameters (which is a good way to make your script more reusable anyway). 

Different mechanisms to define parameters that can be referenced in a Pig Latin script are:

  • Parameters can be defined as command-line arguments; each parameter is passed to Pig as a separate argument using -param switches at script execution time

  • Parameters can be defined in a parameter file that's passed to Pig using the -param_file command-line argument when the script is executed

  • Parameters can be defined inside Pig Latin scripts using the "%declare" and "%default" preprocessor statements


pig -f script.pig -param input=/user/training/test/{20100810..20100812}


pig -f script.pig -param input=’/user/training/test/{20100810..20100812}’


temp = LOAD '$input' USING SomeLoader() AS (...);

If you want to know more about Apache Pig, then do check out this awesome video tutorial:

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

30.5k questions

32.6k answers


108k users

Browse Categories