0 votes
1 view
in SQL by (22.4k points)

I've got two postgresql tables:

table name     column names

-----------    ------------------------

login_log      ip | etc.

ip_location    ip | location | hostname | etc.

I want to get every IP address from login_log which doesn't have a row in ip_location.

I tried this query but it throws a syntax error.

SELECT login_log.ip 

FROM login_log 

WHERE NOT EXIST (SELECT ip_location.ip

                 FROM ip_location

                 WHERE login_log.ip = ip_location.ip)

ERROR: syntax error at or near "SELECT"

LINE 3: WHERE NOT EXIST (SELECT ip_location.ip`

I'm also wondering if this query (with adjustments to make it work) is the best performing query for this purpose.

1 Answer

0 votes
by (40.3k points)

Following 4 methods to select rows that are not present in other tables, all of them are standard SQL.

1. NOT EXISTS

For more information refer to this link:

https://www.postgresql.org/docs/current/functions-subquery.html#FUNCTIONS-SUBQUERY-EXISTS

This is the fastest in Postgres.

SELECT ip 

FROM   login_log l 

WHERE  NOT EXISTS (

   SELECT  -- SELECT list mostly irrelevant; can just be empty in Postgres

   FROM   ip_location

   WHERE  ip = l.ip

   );

In EXISTS subqueries, what is easier to read?

2. Use LEFT JOIN / IS NULL

For more information refer to this link:

https://www.postgresql.org/docs/current/queries-table-expressions.html#QUERIES-FROM

Sometimes this is the fastest. But, often the shortest. Most of the time results in the same query plan as NOT EXISTS.

SELECT l.ip 

FROM   login_log l 

LEFT   JOIN ip_location i USING (ip)  -- short for: ON i.ip = l.ip

WHERE  i.ip IS NULL;

Use EXCEPT

3. EXCEPT

For more information refer to this link:

https://www.postgresql.org/docs/current/queries-union.html

The below code will not be easily integrated into more complex queries.

SELECT ip 

FROM   login_log

EXCEPT ALL  -- Here, "ALL" keeps duplicates and makes it faster

SELECT ip

FROM   ip_location;

Note: Duplicates will be eliminated unless EXCEPT ALL is used.

Typically, you need the ALL keyword. Even if you don't care, use it because it makes the query faster.

4.  Use NOT IN 

For more information refer to this link:

https://www.postgresql.org/docs/current/functions-subquery.html#FUNCTIONS-SUBQUERY-NOTIN 

It's only good without NULL values or if you know to handle NULL properly. 

SELECT ip 

FROM   login_log

WHERE  ip NOT IN (

   SELECT DISTINCT ip  -- DISTINCT is optional

   FROM   ip_location

   );

NOT IN: It carries a "trap" for NULL values on either side:

Find records where join doesn't exist. Select rows where the value of the second column is not present in the first column.

Related questions

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...