# Introduction
INNER JOIN and LEFT JOIN deal with most SQL queries. A smaller class of issues wants different be part of varieties: counting set-returning perform outcomes row by row, filtering rows by existence in one other desk, and returning rows that haven’t any match in one other desk.
Three less-common joins deal with these cleanly. LATERAL joins let a subquery within the FROM clause reference columns from earlier in the identical FROM clause. Semi joins return rows the place a match exists in one other desk, with out duplicating these rows. Anti joins return rows the place no match exists.
Let’s discover the way to apply these patterns in follow.

# LATERAL Joins
A LATERAL subquery within the FROM clause can reference columns from previous tables in the identical FROM clause. With out LATERAL, a subquery in FROM is evaluated independently and can’t see these columns.
This issues most when calling a set-returning perform (one which returns a number of rows per enter). Set-returning capabilities could be known as within the SELECT record, however to use them row-by-row to a column from an outer desk contained in the FROM clause, LATERAL is required.
Frequent instances:
- Calling
unnest()on an array column to get one row per array factor - Calling
regexp_matches()with the'g'flag to extract each match per row - Computing a top-N-per-group end result with a correlated subquery in FROM
- Splitting JSON arrays per row
// Instance: Counting Phrase Occurrences
This Google query asks us to rely what number of instances the phrases “bull” and “bear” seem in a contents column. Matches have to be case-insensitive, and substrings like bullish or bearing must be excluded.
Knowledge: the google_file_store desk is:
| filename | contents |
|---|---|
| draft1.txt | The inventory alternate predicts a bull market which might make many buyers joyful. |
| draft2.txt | The inventory alternate predicts a bull market… however analysts warn… we’re awaiting a bear market. |
| ultimate.txt | The inventory alternate predicts a bull market… a bear market. As at all times predicting the long run market is unsure… |
Code: regexp_matches() returns one row per match. To run it as soon as per row of google_file_store and rely all matches throughout the desk, we put it within the FROM clause with LATERAL. The m and M anchors are PostgreSQL phrase boundaries, which is what excludes “bullish” and “bearing”.
SELECT 'bull' AS phrase,
COUNT(*) AS nentry
FROM google_file_store,
LATERAL regexp_matches(LOWER(contents), 'm(bull)M', 'g')
UNION ALL
SELECT 'bear' AS phrase,
COUNT(*) AS nentry
FROM google_file_store,
LATERAL regexp_matches(LOWER(contents), 'm(bear)M', 'g');
// Output
| phrase | nentry |
|---|---|
| bull | 3 |
| bear | 2 |
# Semi Joins
A semi be part of returns rows from the left desk the place a minimum of one match exists in the precise desk, with every left-table row showing at most as soon as. INNER JOIN duplicates left-table rows when the precise aspect has a number of matches. Semi joins don’t.
Two SQL implementations:
WHERE EXISTS (SELECT 1 FROM ...)WHERE col IN (SELECT col FROM ...)
EXISTS is the extra common kind as a result of it handles multi-column be part of circumstances and correlated subqueries with out rewriting the question.
// Instance: Discovering Excessive-Worth Prospects
This query asks us to seek out clients who’ve positioned a minimum of one order over $100 and return their buyer ID and title.
Knowledge: Previews of online_store_customers and online_store_orders:
| customer_id | customer_name |
|---|---|
| 1 | Alice Johnson |
| 2 | Bob Smith |
| 3 | Carol Williams |
| … | … |
| 10 | Jack Anderson |
| order_id | customer_id | quantity | standing |
|---|---|---|---|
| 101 | 1 | 150 | paid |
| 102 | 1 | 200 | paid |
| 103 | 1 | 75 | paid |
| … | … | … | … |
| 115 | 9 | 450 | paid |
Code: The EXISTS subquery checks, per buyer, whether or not any order over $100 exists. SELECT 1 is the conference as a result of EXISTS solely cares whether or not any row comes again, not what’s in it.
SELECT
c.customer_id,
c.customer_name
FROM online_store_customers c
WHERE EXISTS (
SELECT 1
FROM online_store_orders o
WHERE o.customer_id = c.customer_id
AND o.quantity > 100
);
If we used INNER JOIN as a substitute, buyer 1 would seem twice within the end result as a result of two orders match. EXISTS returns buyer 1 as soon as.
// Output
| customer_id | customer_name |
|---|---|
| 1 | Alice Johnson |
| 2 | Bob Smith |
| 3 | Carol Williams |
| … | … |
| 9 | Ivy Taylor |
# Anti Joins
An anti be part of returns rows from the left desk the place no match exists in the precise desk. It’s the inverse of a semi be part of.
Two SQL implementations:
LEFT JOIN ... WHERE right_table.col IS NULLWHERE NOT EXISTS (SELECT 1 FROM ...)
Each produce the identical end result. NOT EXISTS typically produces a greater question plan in trendy PostgreSQL variations and reads extra straight. The LEFT JOIN + IS NULL sample is older and helpful while you additionally want columns from the precise aspect for non-matching rows.
// Instance: Free Customers With No April Calls
This query asks us to return free customers who didn’t make any calls in April 2020.
Knowledge: Previews of rc_calls and rc_users:
| user_id | call_id | call_date |
|---|---|---|
| 1218 | 0 | 2020-04-19 01:06:00 |
| 1554 | 1 | 2020-03-01 16:51:00 |
| 1857 | 2 | 2020-03-29 07:06:00 |
| 1525 | 3 | 2020-03-07 02:01:00 |
| … | … | … |
| 1910 | 39 | 2020-03-11 08:33:00 |
| user_id | standing | company_id |
|---|---|---|
| 1218 | free | 1 |
| 1554 | inactive | 1 |
| 1857 | free | 2 |
| … | … | … |
| 1884 | free | 1 |
Code: The date filter sits within the ON clause, not WHERE. That distinction is what makes this an anti be part of. Placing the date filter in WHERE would drop rows the place the LEFT JOIN produced NULLs, collapsing it again to an INNER JOIN. With the filter in ON, free customers with no qualifying April name nonetheless produce a row, with NULLs on the precise aspect, and the IS NULL verify retains solely these rows.
SELECT DISTINCT u.user_id
FROM rc_users u
LEFT JOIN rc_calls c
ON u.user_id = c.user_id
AND c.call_date BETWEEN '2020-04-01' AND '2020-04-30'
WHERE u.standing="free"
AND c.user_id IS NULL;
// Output
# Conclusion

These three joins resolve instances the place INNER JOIN and LEFT JOIN are awkward or incorrect:
- LATERAL is the best way to name set-returning capabilities row by row inside FROM.
- EXISTS offers you “rows with a match” with out the duplication that INNER JOIN causes.
- NOT EXISTS or LEFT JOIN + IS NULL offers you “rows with no match” cleanly.
The sample to recollect is brief. When INNER JOIN duplicates rows you do not need, use EXISTS. Once you want rows that haven’t any match, use NOT EXISTS or LEFT JOIN + IS NULL. When a subquery in FROM must reference columns from an outer desk, add LATERAL.
Observe these on actual SQL interview questions, and the syntax turns into computerized.
Nate Rosidi is an information scientist and in product technique. He is additionally an adjunct professor instructing analytics, and is the founding father of StrataScratch, a platform serving to information scientists put together for his or her interviews with actual interview questions from high firms. Nate writes on the newest tendencies within the profession market, offers interview recommendation, shares information science initiatives, and covers the whole lot SQL.
