Spark Sql - How can I read Hive table from one user and write a dataframe to HDFS with another user in a single spark sql program

Question

asked Jan 6, 2021 in Big Data Hadoop & Spark by knikhil (120 points)

Following is my scenario,

I have 2 different users say user1 which has access to hive table customer and user2 which has access to HDFS directory but not to Hive table. I have a spark SQL program for the same in which I submit my spark job with user1 so that I can read the data from hive table into the dataframe but when I try to write the same dataframe to HDFS it tries to write with same user i.e. user1 but, this will not work as the HDFS directory has only permission to user2. So is there any way to read data with one user and write with another user?
Following is the sample code,
------------------------------------------------------

// ---------- Using spark session ---------- //

SparkSession spark = SparkSession.builder()

.appName("Spark Sql example")

.master("yarn")

.enableHiveSupport()

.getOrCreate();

// read from hive table

Dataset<Row> dataSet = spark.sql("select * from USER_DB.customer").cache(); // user1 has access to USER_DB.customer

dataSet.schema();

// write into CSV

dataSet.write()

.format("csv")

.option("sep", ",")

.option("quote", "\"")

.option("charToEscapeQuoteEscaping", ",")

.option("inferSchema", "true")

.option("header", "true")

.option("timestampFormat", "yyyy-MM-dd HH:mm:ss.SSS")

.save("/app/logs/output"); // user2 has access to /app/logs/output HDFS path

if(spark != null)

// Stopping spark session

spark.stop();

-----------------------------------------------------

Running above spark job command -
spark-submit --class ClassName --deploy-mode client --driver-memory 2g --executor-memory 4g --num-executors 3 --executor-cores 2 --principal "user1@DOMAIN" --keytab "user1.keytab" ClassName.jar

Need help on this.
Thanks

Spark Sql - How can I read Hive table from one user and write a dataframe to HDFS with another user in a single spark sql program

Spark Sql - How can I read Hive table from one user and write a dataframe to HDFS with another user in a single spark sql program

Please log in or register to add a comment.

Please log in or register to answer this question.

0 Answers

Related questions

Browse Categories

Popular Courses

Top Tutorials

Top Articles

Top Interview Questions