Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (19.9k points)

I want to check what values from a pandas dataframe are not in a SQL database. So basically a left join(left being pandas df) where the right(SQL DB) key is NULL.

The DB is quite big, about 5 million entries, but i'm only interested in the primary key.

Pandas dataframe(50k rows) is much smaller than the SQL DB(5M), so i'd rather move data to the batabase that bring all of it

I've thought about creating a temporary table in SQL and making a LEFT JOIN, but it might be possible to do it with just a query

pandas dataframe:

    index   

    0        

    1  

    2  

    3  

    4

sql database:

    index(primary key)     

    1        

    2        

    3  

result: 0, 4

1 Answer

0 votes
by (25.1k points)

Just create a temporary table out of the dataframe and then find it using the following SQL query:

SELECT P.Index 

FROM PandasTable as P 

WHERE P.Index NOT IN 

    (SELECT B.Index FROM DatabaseTable)

Browse Categories

...