Think About SQL MERGE in Terms of a RIGHT JOIN

A SQL MERGE statement performs actions based on a RIGHT JOIN
RIGHT JOIN
is an esoteric feature in the SQL language, and hardly ever seen in the real world, because almost every RIGHT JOIN
can just be expressed as an equivalent LEFT JOIN
. The following two statements are equivalent:
-- Popular SELECT c.first_name, c.last_name, p.amount FROM customer AS c LEFT JOIN payment AS p ON c.customer_id = p.customer_id -- Esoteric SELECT c.first_name, c.last_name, p.amount FROM payment AS p RIGHT JOIN customer AS c ON c.customer_id = p.customer_id
It’s not unreasonable to expect these two statements to produce the same execution plan on most RDBMS, given that they’re logically equivalent. Since we’ve grown used to reading things from left to right and top to bottom, I don’t think RIGHT JOIN
will become more popular any time soon.
There is, however, one place in the SQL language where RIGHT JOIN
is surprisingly ubiquitous!
The MERGE statement
Why is it a surprise? Because that place doesn’t use the same syntax for joining two tables. But that’s exactly what happens in the MERGE
statement. Let’s look at the following MERGE
statement that takes:
- A staging table (
SOURCE
table) to load data from - An ordinary table (
TARGET
table) to store data into
With a schema like this:
CREATE TABLE book_to_book_store ( book_id BIGINT NOT NULL REFERENCES book, name TEXT NOT NULL REFERENCES book_store, stock INT NOT NULL, PRIMARY KEY (book_id, name) ); CREATE TABLE book_to_book_store_staging AS SELECT * FROM book_to_book_store WITH NO DATA;
A query that could be typical of an ETL job:
-- The target table MERGE INTO book_to_book_store AS t -- The source table USING book_to_book_store_staging AS s -- The RIGHT JOIN predicate ON t.book_id = s.book_id AND t.name = s.name -- The actions for each row, based on RIGHT JOIN matching WHEN MATCHED THEN UPDATE SET stock = s.stock WHEN NOT MATCHED THEN INSERT (book_id, name, stock) VALUES (s.book_id, s.name, s.stock);
This is simply taking all the rows from the BOOK_TO_BOOK_STORE_STAGING
table, and merges them into BOOK_TO_BOOK_STORE
:
- If the row already exists (there’s a
MATCH
), then theSTOCK
is updated - If the row doesn’t already exist (there’s no
MATCH
), then the row is inserted
But we don’t use this source -> target syntactic order, we first specify the target table BOOK_TO_BOOK_STORE
, and then we RIGHT JOIN
the BOOK_TO_BOOK_STORE_STAGING
table to it. Think about it this way:
SELECT * FROM book_to_book_store AS t RIGHT JOIN book_to_book_store_staging AS s ON t.book_id = s.book_id AND t.name = s.name
And, if we think of a RIGHT JOIN
not as a Venn diagram, but as a cartesian product as follows, then it can be seen easily what is done per MATCH
or non-MATCH
:
|t.name |t.book_id|t.stock|s.name |s.book_id|s.stock|
|------------|---------|-------|------------|---------|-------|
| | | |Faraway Land|1 |9 | <-- NOT MATCHED
|Faraway Land|2 |10 |Faraway Land|2 |12 | <-- MATCHED
|Faraway Land|3 |10 |Faraway Land|3 |5 | <-- MATCHED
| | | |Paper Trail |1 |1 | <-- NOT MATCHED
|Paper Trail |3 |2 |Paper Trail |3 |0 | <-- MATCHED
As always with a RIGHT JOIN
, every row from right side of the join is matched with a matching row from the left side of the join, or an empty row of NULL
values, if there’s no such match. After this MERGE
, we want the resulting data to be updated as follows:
|t.name |t.book_id|t.stock|s.name |s.book_id|s.stock|
|------------|---------|-------|------------|---------|-------|
|Faraway Land|1 |9 |Faraway Land|1 |9 | <-- NOT MATCHED
|Faraway Land|2 |12 |Faraway Land|2 |12 | <-- MATCHED
|Faraway Land|3 |5 |Faraway Land|3 |5 | <-- MATCHED
|Faraway Land|1 |1 |Paper Trail |1 |1 | <-- NOT MATCHED
|Paper Trail |3 |0 |Paper Trail |3 |0 | <-- MATCHED
This is how the MERGE
statement works.
Note, I said before that the JOIN
is producing a cartesian product. Unlike with SELECT
statements, however, there’s a limitation to MERGE
where the cartesian product must not produce any duplicate matches per TARGET
row, as the order of actions wouldn’t be defined if there were multiple SOURCE
rows per TARGET
row.
Deleting rows
MERGE
is more powerful than just performing INSERT
and UPDATE
. It can also DELETE
rows. Let’s assume that we want a staging table’s STOCK = 0
to mean that the row should be deleted, instead of the STOCK
being set to 0
. Then we can write:
MERGE INTO book_to_book_store AS t USING book_to_book_store_staging AS s ON t.book_id = s.book_id AND t.name = s.name WHEN MATCHED AND s.stock = 0 THEN DELETE WHEN MATCHED THEN UPDATE SET stock = s.stock WHEN NOT MATCHED THEN INSERT (book_id, name, stock) VALUES (s.book_id, s.name, s.stock);
Now, with the above staging data, we’ll remove the last row instead of updating it:
|t.name |t.book_id|t.stock|s.name |s.book_id|s.stock|
|------------|---------|-------|------------|---------|-------|
|Faraway Land|1 |9 |Faraway Land|1 |9 | <-- NOT MATCHED : INSERT
|Faraway Land|2 |10 |Faraway Land|2 |12 | <-- MATCHED : UPDATE
|Faraway Land|3 |10 |Faraway Land|3 |5 | <-- MATCHED : UPDATE
|Paper Trail |1 |1 |Paper Trail |1 |1 | <-- NOT MATCHED : INSERT
| | | |Paper Trail |3 |0 | <-- MATCHED : DELETE
The RIGHT JOIN
semantics is still the same, just the action is different now, depending on the additional AND
clause of the WHEN MATCHED
clause.
Matching by source
Some RDBMS support an even more powerful vendor specific variant of MERGE
, which should be added to the IEC/ISO 9075 standard, in my opinion. The BY TARGET
/ BY SOURCE
clause. Let’s have a look at the following statement:
MERGE INTO book_to_book_store AS t USING book_to_book_store_staging AS s ON t.book_id = s.book_id AND t.name = s.name WHEN MATCHED THEN UPDATE SET stock = s.stock WHEN NOT MATCHED BY TARGET THEN INSERT (book_id, name, stock) VALUES (s.book_id, s.name, s.stock) WHEN NOT MATCHED BY SOURCE THEN DELETE;
Adding a WHEN NOT MATCHED BY SOURCE
clause has the simple effect of turning the RIGHT JOIN
operation into a FULL JOIN
operation. Think of it this way:
SELECT * FROM book_to_book_store AS t FULL JOIN book_to_book_store_staging AS s ON t.book_id = s.book_id AND t.name = s.name
Now, the result might look something like this:
|t.name |t.book_id|t.stock|s.name |s.book_id|s.stock|
|------------|---------|-------|------------|---------|-------|
| | | |Faraway Land|1 |9 | <-- NOT MATCHED BY TARGET
|Faraway Land|2 |10 |Faraway Land|2 |12 | <-- MATCHED
|Faraway Land|3 |10 |Faraway Land|3 |5 | <-- MATCHED
| | | |Paper Trail |1 |1 | <-- NOT MATCHED BY TARGET
|Paper Trail |3 |2 | | | | <-- NOT MATCHED BY SOURCE
The terms NOT MATCHED BY TARGET
and NOT MATCHED BY SOURCE
are quite self-explanatory when visualised as above, and probably less confusing to beginners than LEFT JOIN
and RIGHT JOIN
. I wouldn’t mind SQL syntax to be enhanced in a way that it would be possible to identify whether a NULL
value originating from an OUTER JOIN
is due to:
- The source data containing the
NULL
value - The row being
NOT MATCHED
by the “other side” of theOUTER JOIN
Imagine a hypothetical syntax like this:
SELECT c.first_name, c.last_name, p.amount FROM customer AS c LEFT JOIN payment AS p ON c.customer_id = p.customer_id WHERE p IS NOT MATCHED BY JOIN -- Effectively an ANTI JOIN
Anyway…
When deleting rows, this approach is much more convenient than having to rely on an interpretation of the semantics of data, such as STOCK = 0
meaning a deletion. We now have absent rows in the SOURCE
table (the staging) table, which simply mean the row must be deleted, if that’s how we want to model things. So, after running the above MERGE
statement, we’ll get this outcome again:
|t.name |t.book_id|t.stock|s.name |s.book_id|s.stock|
|------------|---------|-------|------------|---------|-------|
|Faraway Land|1 |9 |Faraway Land|1 |9 | <-- NOT MATCHED BY TARGET : INSERT
|Faraway Land|2 |12 |Faraway Land|2 |12 | <-- MATCHED : UPDATE
|Faraway Land|3 |5 |Faraway Land|3 |5 | <-- MATCHED : UPDATE
|Faraway Land|1 |1 |Paper Trail |1 |1 | <-- NOT MATCHED BY TARGET : INSERT
| | | | | | | <-- NOT MATCHED BY SOURCE : DELETE
At least the following RDBMS support the BY SOURCE
and BY TARGET
clauses:
- Databricks
- Firebird 5
- PostgreSQL 17
- SQL Server
Given how useful this is, I’ll expect more RDBMS to adopt this T-SQL syntax, soon. jOOQ 3.20 has added support for it, and a future version of jOOQ may emulate it by moving the FULL JOIN
into the USING
clause.
- Cекретный ключ Live
- Causes
- Crafts
- Dance
- Drinks
- Film
- Fitness
- Food
- Игры
- Gardening
- Health
- Главная
- Literature
- Music
- Networking
- Другое
- Party
- Religion
- Shopping
- Sports
- Theater
- Wellness
- Art
- Life
- Coding