Think About SQL MERGE in Terms of a RIGHT JOIN
A SQL MERGE statement performs actions based on a RIGHT JOIN
RIGHT JOIN is an esoteric feature in the SQL language, and hardly ever seen in the real world, because almost every RIGHT JOIN can just be expressed as an equivalent LEFT JOIN. The following two statements are equivalent:
-- Popular SELECT c.first_name, c.last_name, p.amount FROM customer AS c LEFT JOIN payment AS p ON c.customer_id = p.customer_id -- Esoteric SELECT c.first_name, c.last_name, p.amount FROM payment AS p RIGHT JOIN customer AS c ON c.customer_id = p.customer_id
It’s not unreasonable to expect these two statements to produce the same execution plan on most RDBMS, given that they’re logically equivalent. Since we’ve grown used to reading things from left to right and top to bottom, I don’t think RIGHT JOIN will become more popular any time soon.
There is, however, one place in the SQL language where RIGHT JOIN is surprisingly ubiquitous!
The MERGE statement
Why is it a surprise? Because that place doesn’t use the same syntax for joining two tables. But that’s exactly what happens in the MERGE statement. Let’s look at the following MERGE statement that takes:
- A staging table (
SOURCEtable) to load data from - An ordinary table (
TARGETtable) to store data into
With a schema like this:
CREATE TABLE book_to_book_store ( book_id BIGINT NOT NULL REFERENCES book, name TEXT NOT NULL REFERENCES book_store, stock INT NOT NULL, PRIMARY KEY (book_id, name) ); CREATE TABLE book_to_book_store_staging AS SELECT * FROM book_to_book_store WITH NO DATA;
A query that could be typical of an ETL job:
-- The target table MERGE INTO book_to_book_store AS t -- The source table USING book_to_book_store_staging AS s -- The RIGHT JOIN predicate ON t.book_id = s.book_id AND t.name = s.name -- The actions for each row, based on RIGHT JOIN matching WHEN MATCHED THEN UPDATE SET stock = s.stock WHEN NOT MATCHED THEN INSERT (book_id, name, stock) VALUES (s.book_id, s.name, s.stock);
This is simply taking all the rows from the BOOK_TO_BOOK_STORE_STAGING table, and merges them into BOOK_TO_BOOK_STORE:
- If the row already exists (there’s a
MATCH), then theSTOCKis updated - If the row doesn’t already exist (there’s no
MATCH), then the row is inserted
But we don’t use this source -> target syntactic order, we first specify the target table BOOK_TO_BOOK_STORE, and then we RIGHT JOIN the BOOK_TO_BOOK_STORE_STAGING table to it. Think about it this way:
SELECT * FROM book_to_book_store AS t RIGHT JOIN book_to_book_store_staging AS s ON t.book_id = s.book_id AND t.name = s.name
And, if we think of a RIGHT JOIN not as a Venn diagram, but as a cartesian product as follows, then it can be seen easily what is done per MATCH or non-MATCH:
|t.name |t.book_id|t.stock|s.name |s.book_id|s.stock|
|------------|---------|-------|------------|---------|-------|
| | | |Faraway Land|1 |9 | <-- NOT MATCHED
|Faraway Land|2 |10 |Faraway Land|2 |12 | <-- MATCHED
|Faraway Land|3 |10 |Faraway Land|3 |5 | <-- MATCHED
| | | |Paper Trail |1 |1 | <-- NOT MATCHED
|Paper Trail |3 |2 |Paper Trail |3 |0 | <-- MATCHED
As always with a RIGHT JOIN, every row from right side of the join is matched with a matching row from the left side of the join, or an empty row of NULL values, if there’s no such match. After this MERGE, we want the resulting data to be updated as follows:
|t.name |t.book_id|t.stock|s.name |s.book_id|s.stock|
|------------|---------|-------|------------|---------|-------|
|Faraway Land|1 |9 |Faraway Land|1 |9 | <-- NOT MATCHED
|Faraway Land|2 |12 |Faraway Land|2 |12 | <-- MATCHED
|Faraway Land|3 |5 |Faraway Land|3 |5 | <-- MATCHED
|Faraway Land|1 |1 |Paper Trail |1 |1 | <-- NOT MATCHED
|Paper Trail |3 |0 |Paper Trail |3 |0 | <-- MATCHED
This is how the MERGE statement works.
Note, I said before that the JOIN is producing a cartesian product. Unlike with SELECT statements, however, there’s a limitation to MERGE where the cartesian product must not produce any duplicate matches per TARGET row, as the order of actions wouldn’t be defined if there were multiple SOURCE rows per TARGET row.
Deleting rows
MERGE is more powerful than just performing INSERT and UPDATE. It can also DELETE rows. Let’s assume that we want a staging table’s STOCK = 0 to mean that the row should be deleted, instead of the STOCK being set to 0. Then we can write:
MERGE INTO book_to_book_store AS t USING book_to_book_store_staging AS s ON t.book_id = s.book_id AND t.name = s.name WHEN MATCHED AND s.stock = 0 THEN DELETE WHEN MATCHED THEN UPDATE SET stock = s.stock WHEN NOT MATCHED THEN INSERT (book_id, name, stock) VALUES (s.book_id, s.name, s.stock);
Now, with the above staging data, we’ll remove the last row instead of updating it:
|t.name |t.book_id|t.stock|s.name |s.book_id|s.stock|
|------------|---------|-------|------------|---------|-------|
|Faraway Land|1 |9 |Faraway Land|1 |9 | <-- NOT MATCHED : INSERT
|Faraway Land|2 |10 |Faraway Land|2 |12 | <-- MATCHED : UPDATE
|Faraway Land|3 |10 |Faraway Land|3 |5 | <-- MATCHED : UPDATE
|Paper Trail |1 |1 |Paper Trail |1 |1 | <-- NOT MATCHED : INSERT
| | | |Paper Trail |3 |0 | <-- MATCHED : DELETE
The RIGHT JOIN semantics is still the same, just the action is different now, depending on the additional AND clause of the WHEN MATCHED clause.
Matching by source
Some RDBMS support an even more powerful vendor specific variant of MERGE, which should be added to the IEC/ISO 9075 standard, in my opinion. The BY TARGET / BY SOURCE clause. Let’s have a look at the following statement:
MERGE INTO book_to_book_store AS t USING book_to_book_store_staging AS s ON t.book_id = s.book_id AND t.name = s.name WHEN MATCHED THEN UPDATE SET stock = s.stock WHEN NOT MATCHED BY TARGET THEN INSERT (book_id, name, stock) VALUES (s.book_id, s.name, s.stock) WHEN NOT MATCHED BY SOURCE THEN DELETE;
Adding a WHEN NOT MATCHED BY SOURCE clause has the simple effect of turning the RIGHT JOIN operation into a FULL JOIN operation. Think of it this way:
SELECT * FROM book_to_book_store AS t FULL JOIN book_to_book_store_staging AS s ON t.book_id = s.book_id AND t.name = s.name
Now, the result might look something like this:
|t.name |t.book_id|t.stock|s.name |s.book_id|s.stock|
|------------|---------|-------|------------|---------|-------|
| | | |Faraway Land|1 |9 | <-- NOT MATCHED BY TARGET
|Faraway Land|2 |10 |Faraway Land|2 |12 | <-- MATCHED
|Faraway Land|3 |10 |Faraway Land|3 |5 | <-- MATCHED
| | | |Paper Trail |1 |1 | <-- NOT MATCHED BY TARGET
|Paper Trail |3 |2 | | | | <-- NOT MATCHED BY SOURCE
The terms NOT MATCHED BY TARGET and NOT MATCHED BY SOURCE are quite self-explanatory when visualised as above, and probably less confusing to beginners than LEFT JOIN and RIGHT JOIN. I wouldn’t mind SQL syntax to be enhanced in a way that it would be possible to identify whether a NULL value originating from an OUTER JOIN is due to:
- The source data containing the
NULLvalue - The row being
NOT MATCHEDby the “other side” of theOUTER JOIN
Imagine a hypothetical syntax like this:
SELECT c.first_name, c.last_name, p.amount FROM customer AS c LEFT JOIN payment AS p ON c.customer_id = p.customer_id WHERE p IS NOT MATCHED BY JOIN -- Effectively an ANTI JOIN
Anyway…
When deleting rows, this approach is much more convenient than having to rely on an interpretation of the semantics of data, such as STOCK = 0 meaning a deletion. We now have absent rows in the SOURCE table (the staging) table, which simply mean the row must be deleted, if that’s how we want to model things. So, after running the above MERGE statement, we’ll get this outcome again:
|t.name |t.book_id|t.stock|s.name |s.book_id|s.stock|
|------------|---------|-------|------------|---------|-------|
|Faraway Land|1 |9 |Faraway Land|1 |9 | <-- NOT MATCHED BY TARGET : INSERT
|Faraway Land|2 |12 |Faraway Land|2 |12 | <-- MATCHED : UPDATE
|Faraway Land|3 |5 |Faraway Land|3 |5 | <-- MATCHED : UPDATE
|Faraway Land|1 |1 |Paper Trail |1 |1 | <-- NOT MATCHED BY TARGET : INSERT
| | | | | | | <-- NOT MATCHED BY SOURCE : DELETE
At least the following RDBMS support the BY SOURCE and BY TARGET clauses:
- Databricks
- Firebird 5
- PostgreSQL 17
- SQL Server
Given how useful this is, I’ll expect more RDBMS to adopt this T-SQL syntax, soon. jOOQ 3.20 has added support for it, and a future version of jOOQ may emulate it by moving the FULL JOIN into the USING clause.
- Live Stream
- Causes
- Crafts
- Dance
- Drinks
- Film
- Fitness
- Food
- Games
- Gardening
- Health
- Home
- Literature
- Music
- Networking
- Other
- Party
- Religion
- Shopping
- Sports
- Theater
- Wellness
- Art
- Life
- Coding