How to load table from SQLLite db file from PySpark?

Question

asked Jul 13, 2019 in Data Science by sourav (17.6k points)

I am trying to load table from a SQLLite .db file stored at local disk. Is there any clean way to do this in PySpark?

Currently, I am using a solution that works but not as elegant. First I read the table using pandas though sqlite3. One concern is that during the process schema information is not passed (may or may not be a problem). I am wondering whether there is a direct way to load the table without using Pandas.

import sqlite3
import pandas as pd
db_path = 'alocalfile.db'
query = 'SELECT * from ATableToLoad'
conn = sqlite3.connect(db_path)
a_pandas_df = pd.read_sql_query(query, conn)
a_spark_df = SQLContext.createDataFrame(a_pandas_df)
There seems a way using jdbc to do this, but I have not figure out how to use it in PySpark.

1 Answer

Shlok Pandey · Answer 1 · 2019-07-20T10:05:07+0000

1. Download the sqllite jdbc driver and provide the jar path below . https://bitbucket.org/xerial/sqlite-jdbc/downloads/sqlite-jdbc-3.8.6.jar

pyspark --conf spark.executor.extraClassPath=<jdbc.jar> --driver-class-path <jdbc.jar> --jars <jdbc.jar> --master <master-URL>

2.Then read the sqlite database file into spark dataframe.

df = sqlContext.read.format('jdbc').\
options(url='jdbc:sqlite:Chinook_Sqlite.sqlite',\
dbtable='employee',driver='org.sqlite.JDBC').load()
df.printSchema() to see your schema.

If you wanted to know What is PySpark then visit this PySpark Tutorial.

How to load table from SQLLite db file from PySpark?

1 Answer

Related questions

Browse Categories