Following is my scenario,
I have 2 different users say user1 which has access to hive table customer and user2 which has access to HDFS directory but not to Hive table. I have a spark SQL program for the same in which I submit my spark job with user1 so that I can read the data from hive table into the dataframe but when I try to write the same dataframe to HDFS it tries to write with same user i.e. user1 but, this will not work as the HDFS directory has only permission to user2. So is there any way to read data with one user and write with another user?
Following is the sample code,
------------------------------------------------------
// ---------- Using spark session ---------- //
SparkSession spark = SparkSession.builder()
.appName("Spark Sql example")
.master("yarn")
.enableHiveSupport()
.getOrCreate();
// read from hive table
Dataset<Row> dataSet = spark.sql("select * from USER_DB.customer").cache(); // user1 has access to USER_DB.customer
dataSet.schema();
// write into CSV
dataSet.write()
.format("csv")
.option("sep", ",")
.option("quote", "\"")
.option("charToEscapeQuoteEscaping", ",")
.option("inferSchema", "true")
.option("header", "true")
.option("timestampFormat", "yyyy-MM-dd HH:mm:ss.SSS")
.save("/app/logs/output"); // user2 has access to /app/logs/output HDFS path
if(spark != null)
// Stopping spark session
spark.stop();
-----------------------------------------------------
Running above spark job command -
spark-submit --class ClassName --deploy-mode client --driver-memory 2g --executor-memory 4g --num-executors 3 --executor-cores 2 --principal "user1@DOMAIN" --keytab "user1.keytab" ClassName.jar
Need help on this.
Thanks