0 votes
1 view
in Big Data Hadoop & Spark by (120 points)

Following is my scenario,

I have 2 different users say user1 which has access to hive table customer and user2 which has access to HDFS directory but not to Hive table. I have a spark SQL program for the same in which I submit my spark job with user1 so that I can read the data from hive table into the dataframe but when I try to write the same dataframe to HDFS it tries to write with same user i.e. user1 but, this will not work as the HDFS directory has only permission to user2. So is there any way to read data with one user and write with another user?
Following is the sample code,

// ---------- Using spark session ---------- //

SparkSession spark = SparkSession.builder()

.appName("Spark Sql example")




// read from hive table

Dataset<Row> dataSet = spark.sql("select * from USER_DB.customer").cache(); // user1 has access to USER_DB.customer


// write into CSV



.option("sep", ",")

.option("quote", "\"")

.option("charToEscapeQuoteEscaping", ",")

.option("inferSchema", "true")

.option("header", "true")

.option("timestampFormat", "yyyy-MM-dd HH:mm:ss.SSS")

.save("/app/logs/output"); // user2 has access to /app/logs/output HDFS path

if(spark != null)

// Stopping spark session



Running above spark job command -
spark-submit --class ClassName --deploy-mode client --driver-memory 2g --executor-memory 4g --num-executors 3 --executor-cores 2 --principal "[email protected]" --keytab "user1.keytab" ClassName.jar

Need help on this.

Please log in or register to answer this question.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !