In order to keep everything on the grid use hadoop streaming with a single reducer and cat as the mapper and reducer.
hadoop jar /usr/hdp/22.214.171.124-37/hadoop-mapreduce/hadoop-streaming-126.96.36.199.5.3.0-37.jar \
-input "<input-path-directory>" \
-output "<output-path-directory>" \
-mapper cat \
make sure that you are using suitable version of hadoop streaming jar, according to your system.
Now, give the input path and make sure the output directory is not existed as this job will merge the files and creates the output directory for you.
Here is what i tried:-
#hdfs dfs -ls /user/amit/fold2/
Found 2 items
-rw-r--r-- 3 hdfs hdfs 150 2017-09-26 17:55 /user/amit/fold2/part1.txt
-rw-r--r-- 3 hdfs hdfs 20 2017-09-27 09:07 /user/amit/fold2/part1_sed.txt
#hadoop jar /usr/hdp/188.8.131.52-37/hadoop-mapreduce/hadoop-streaming-184.108.40.206.5.3.0-37.jar \
> -Dmapred.reduce.tasks=1 \
> -input "/user/amit/fold2/" \
> -output "/user/amit/fold1/" \
> -mapper cat \
> -reducer cat
Fold2 having 2 files after running the above command, I am storing the merged files to fold1 directory and the 2 files got merged into 1 file as you can see below.
#hdfs dfs -ls /user/amit/fold1/
Found 2 items
-rw-r--r-- 3 hdfs hdfs 0 2017-10-09 16:00 /user/amit/fold1/_SUCCESS
-rw-r--r-- 3 hdfs hdfs 174 2017-10-09 16:00 /user/amit/fold1/part-00000
If you want to know more about Hadoop, then do check out this awesome video tutorial: