0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)
To effectively utilise map-reduce jobs in Hadoop, i need data to be stored in hadoop's sequence file format. However,currently the data is only in flat .txt format.Can anyone suggest a way i can convert a .txt file to a sequence file?

1 Answer

0 votes
by (31.4k points)
edited by

To get a sequence file out of a .txt file you just need to perform an "identity" job that has a SequenceFile output.

Below is the java code:

    public static void main(String[] args) throws IOException,

        InterruptedException, ClassNotFoundException {

    Configuration conf = new Configuration();

    Job job = new Job(conf);

    job.setJobName("Convert Text");

    job.setJarByClass(Mapper.class);

    job.setMapperClass(Mapper.class);

    job.setReducerClass(Reducer.class);

    // increase if you need sorting or a special number of files

    job.setNumReduceTasks(0);

    job.setOutputKeyClass(LongWritable.class);

    job.setOutputValueClass(Text.class);

    job.setOutputFormatClass(SequenceFileOutputFormat.class);

    job.setInputFormatClass(TextInputFormat.class);

    TextInputFormat.addInputPath(job, new Path("/your/path1"));

    SequenceFileOutputFormat.setOutputPath(job, new Path("/your/path2"));

    // submit and wait for completion

    job.waitForCompletion(true);

   }

You can refer to the following video for learning more about Spark:

Related questions

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...