0 votes
1 view
in Big Data Hadoop & Spark by (160 points)

I have to insert duplicate columns which empty 

Header : , , , xxx, , , YYY, , ,

dataSet.coalesce(1).write().mode(ignore).format(csv).options(header= true,\u0000=false).save(targetPath);

while writing into csv file am getting a issue following below 

Caused by: org.apache.spark.sql.AnalysisException: Found duplicate column(s) when inserting into file:/C:/Users/xxx/summaryReport/20190901: 

at org.apache.spark.sql.util.SchemaUtils$.checkColumnNameDuplication(SchemaUtils.scala:85)

1 Answer

0 votes
by (33.2k points)

Hi Hussain

You should assign a new name to each duplicate column. A unique name for every column would also help you to perform data preprocessing on a specific column.

I hope this answer will help you!

by (160 points)
Hi Anurag,
Thanks for the suggestions !!!

Actual requirement is

Columns | A | B | C | D | E | F | G | H |
columns c and G should have headers XXX and YYY.
Columns A,B,D, E,F,H should be empty

I have fixed this . following below

created dataset with the above structure
while writing ds.write.options(header,false) header set as false .. it is resolves the issue

Related questions

0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...