Think About SQL MERGE in Terms of a RIGHT JOIN

0
354

A SQL MERGE statement performs actions based on a RIGHT JOIN

RIGHT JOIN is an esoteric feature in the SQL language, and hardly ever seen in the real world, because almost every RIGHT JOIN can just be expressed as an equivalent LEFT JOIN. The following two statements are equivalent:

-- Popular
SELECT c.first_name, c.last_name, p.amount
FROM customer AS c
LEFT JOIN payment AS p ON c.customer_id = p.customer_id

-- Esoteric
SELECT c.first_name, c.last_name, p.amount
FROM payment AS p
RIGHT JOIN customer AS c ON c.customer_id = p.customer_id

It’s not unreasonable to expect these two statements to produce the same execution plan on most RDBMS, given that they’re logically equivalent. Since we’ve grown used to reading things from left to right and top to bottom, I don’t think RIGHT JOIN will become more popular any time soon.

There is, however, one place in the SQL language where RIGHT JOIN is surprisingly ubiquitous!

The MERGE statement

Why is it a surprise? Because that place doesn’t use the same syntax for joining two tables. But that’s exactly what happens in the MERGE statement. Let’s look at the following MERGE statement that takes:

  • A staging table (SOURCE table) to load data from
  • An ordinary table (TARGET table) to store data into

With a schema like this:

CREATE TABLE book_to_book_store (
  book_id BIGINT NOT NULL REFERENCES book,
  name TEXT NOT NULL REFERENCES book_store,
  stock INT NOT NULL,

  PRIMARY KEY (book_id, name)
);

CREATE TABLE book_to_book_store_staging AS 
SELECT * FROM book_to_book_store
WITH NO DATA;

A query that could be typical of an ETL job:

-- The target table
MERGE INTO book_to_book_store AS t

-- The source table
USING book_to_book_store_staging AS s

-- The RIGHT JOIN predicate
ON t.book_id = s.book_id AND t.name = s.name

-- The actions for each row, based on RIGHT JOIN matching
WHEN MATCHED THEN UPDATE SET stock = s.stock
WHEN NOT MATCHED THEN INSERT (book_id, name, stock) 
VALUES (s.book_id, s.name, s.stock);

This is simply taking all the rows from the BOOK_TO_BOOK_STORE_STAGING table, and merges them into BOOK_TO_BOOK_STORE:

  • If the row already exists (there’s a MATCH), then the STOCK is updated
  • If the row doesn’t already exist (there’s no MATCH), then the row is inserted

But we don’t use this source -> target syntactic order, we first specify the target table BOOK_TO_BOOK_STORE, and then we RIGHT JOIN the BOOK_TO_BOOK_STORE_STAGING table to it. Think about it this way:

SELECT *
FROM book_to_book_store AS t
RIGHT JOIN book_to_book_store_staging AS s
ON t.book_id = s.book_id AND t.name = s.name

And, if we think of a RIGHT JOIN not as a Venn diagram, but as a cartesian product as follows, then it can be seen easily what is done per MATCH or non-MATCH:

|t.name      |t.book_id|t.stock|s.name      |s.book_id|s.stock|
|------------|---------|-------|------------|---------|-------|
| | | |Faraway Land|1 |9 | <-- NOT MATCHED
|Faraway Land|2 |10 |Faraway Land|2 |12 | <-- MATCHED
|Faraway Land|3 |10 |Faraway Land|3 |5 | <-- MATCHED
| | | |Paper Trail |1 |1 | <-- NOT MATCHED
|Paper Trail |3 |2 |Paper Trail |3 |0 | <-- MATCHED

As always with a RIGHT JOIN, every row from right side of the join is matched with a matching row from the left side of the join, or an empty row of NULL values, if there’s no such match. After this MERGE, we want the resulting data to be updated as follows:

|t.name      |t.book_id|t.stock|s.name      |s.book_id|s.stock|
|------------|---------|-------|------------|---------|-------|
|Faraway Land|1 |9 |Faraway Land|1 |9 | <-- NOT MATCHED
|Faraway Land|2 |12 |Faraway Land|2 |12 | <-- MATCHED
|Faraway Land|3 |5 |Faraway Land|3 |5 | <-- MATCHED
|Faraway Land|1 |1 |Paper Trail |1 |1 | <-- NOT MATCHED
|Paper Trail |3 |0 |Paper Trail |3 |0 | <-- MATCHED

This is how the MERGE statement works.

Note, I said before that the JOIN is producing a cartesian product. Unlike with SELECT statements, however, there’s a limitation to MERGE where the cartesian product must not produce any duplicate matches per TARGET row, as the order of actions wouldn’t be defined if there were multiple SOURCE rows per TARGET row.

Deleting rows

MERGE is more powerful than just performing INSERT and UPDATE. It can also DELETE rows. Let’s assume that we want a staging table’s STOCK = 0 to mean that the row should be deleted, instead of the STOCK being set to 0. Then we can write:

MERGE INTO book_to_book_store AS t
USING book_to_book_store_staging AS s
ON t.book_id = s.book_id AND t.name = s.name
WHEN MATCHED AND s.stock = 0 THEN DELETE
WHEN MATCHED THEN UPDATE SET stock = s.stock
WHEN NOT MATCHED THEN INSERT (book_id, name, stock) 
VALUES (s.book_id, s.name, s.stock);

Now, with the above staging data, we’ll remove the last row instead of updating it:

|t.name      |t.book_id|t.stock|s.name      |s.book_id|s.stock|
|------------|---------|-------|------------|---------|-------|
|Faraway Land|1 |9 |Faraway Land|1 |9 | <-- NOT MATCHED : INSERT
|Faraway Land|2 |10 |Faraway Land|2 |12 | <-- MATCHED : UPDATE
|Faraway Land|3 |10 |Faraway Land|3 |5 | <-- MATCHED : UPDATE
|Paper Trail |1 |1 |Paper Trail |1 |1 | <-- NOT MATCHED : INSERT
| | | |Paper Trail |3 |0 | <-- MATCHED : DELETE

The RIGHT JOIN semantics is still the same, just the action is different now, depending on the additional AND clause of the WHEN MATCHED clause.

Matching by source

Some RDBMS support an even more powerful vendor specific variant of MERGE, which should be added to the IEC/ISO 9075 standard, in my opinion. The BY TARGET / BY SOURCE clause. Let’s have a look at the following statement:

MERGE INTO book_to_book_store AS t
USING book_to_book_store_staging AS s
ON t.book_id = s.book_id AND t.name = s.name
WHEN MATCHED THEN UPDATE SET stock = s.stock
WHEN NOT MATCHED BY TARGET THEN INSERT (book_id, name, stock) 
VALUES (s.book_id, s.name, s.stock)
WHEN NOT MATCHED BY SOURCE THEN DELETE;

Adding a WHEN NOT MATCHED BY SOURCE clause has the simple effect of turning the RIGHT JOIN operation into a FULL JOIN operation. Think of it this way:

SELECT *
FROM book_to_book_store AS t
FULL JOIN book_to_book_store_staging AS s
ON t.book_id = s.book_id AND t.name = s.name

Now, the result might look something like this:

|t.name      |t.book_id|t.stock|s.name      |s.book_id|s.stock|
|------------|---------|-------|------------|---------|-------|
| | | |Faraway Land|1 |9 | <-- NOT MATCHED BY TARGET
|Faraway Land|2 |10 |Faraway Land|2 |12 | <-- MATCHED
|Faraway Land|3 |10 |Faraway Land|3 |5 | <-- MATCHED
| | | |Paper Trail |1 |1 | <-- NOT MATCHED BY TARGET
|Paper Trail |3 |2 | | | | <-- NOT MATCHED BY SOURCE

The terms NOT MATCHED BY TARGET and NOT MATCHED BY SOURCE are quite self-explanatory when visualised as above, and probably less confusing to beginners than LEFT JOIN and RIGHT JOIN. I wouldn’t mind SQL syntax to be enhanced in a way that it would be possible to identify whether a NULL value originating from an OUTER JOIN is due to:

  • The source data containing the NULL value
  • The row being NOT MATCHED by the “other side” of the OUTER JOIN

Imagine a hypothetical syntax like this:

SELECT c.first_name, c.last_name, p.amount
FROM customer AS c
LEFT JOIN payment AS p ON c.customer_id = p.customer_id
WHERE p IS NOT MATCHED BY JOIN -- Effectively an ANTI JOIN

Anyway…

When deleting rows, this approach is much more convenient than having to rely on an interpretation of the semantics of data, such as STOCK = 0 meaning a deletion. We now have absent rows in the SOURCE table (the staging) table, which simply mean the row must be deleted, if that’s how we want to model things. So, after running the above MERGE statement, we’ll get this outcome again:

|t.name      |t.book_id|t.stock|s.name      |s.book_id|s.stock|
|------------|---------|-------|------------|---------|-------|
|Faraway Land|1 |9 |Faraway Land|1 |9 | <-- NOT MATCHED BY TARGET : INSERT
|Faraway Land|2 |12 |Faraway Land|2 |12 | <-- MATCHED : UPDATE
|Faraway Land|3 |5 |Faraway Land|3 |5 | <-- MATCHED : UPDATE
|Faraway Land|1 |1 |Paper Trail |1 |1 | <-- NOT MATCHED BY TARGET : INSERT
| | | | | | | <-- NOT MATCHED BY SOURCE : DELETE

At least the following RDBMS support the BY SOURCE and BY TARGET clauses:

  • Databricks
  • Firebird 5
  • PostgreSQL 17
  • SQL Server

Given how useful this is, I’ll expect more RDBMS to adopt this T-SQL syntax, soon. jOOQ 3.20 has added support for it, and a future version of jOOQ may emulate it by moving the FULL JOIN into the USING clause.

Sponzorováno
Sponzorováno
Sponzorováno
Sponzorováno
Sponzorováno
Hledat
Sponzorováno
Virtuala FansOnly
CDN FREE
Cloud Convert
Kategorie
Číst více
Art
Taylor Swifts neue Website: Ein Design, das nicht überzeugt
Taylor Swift, eine Ikone der Musikindustrie, hat kürzlich ihre neue Website gelauncht. Während...
Od Frida Lina 2025-08-23 10:05:17 1 316
Ostatní
Rising counterfeit activities accelerating demand for comprehensive brand protection solutions worldwide
Polaris Market Research has introduced the latest market research report titled Authentication...
Od MAYUR YADAV 2025-11-25 09:57:24 0 18
Art
La Legion Go 2 de Lenovo终于在IFA上首次亮相,真是让人感到无比失望。经过一年的等待,我们期待的到底是什么?难道是更强大的性能?更炫酷的设计?可事实却是,这款产品的发布似乎没有带来任何令人兴奋的创新。
## 在IFA上的首次亮相:期待与失望 Lenovo的Legion Go 2在IFA上的首次亮相,原本是万众瞩目的时刻。作为一款备受期待的游戏设备,Legion Go...
Od Peng Wan 2025-09-06 22:05:21 1 340
Dance
BSN Case Study Writing: Reliable and Accurate Help
Bachelor of Science in Nursing (BSN) programs are designed to prepare students for the dynamic...
Od Votoci4769 Votoci4769 2025-01-21 07:17:56 0 583
Ostatní
„TETRA“  Sustavi  
“Tetra” sustav  ovisi o rasprostarnjesti baznih stanica što je dobra...
Od Drago Merkaš 2025-01-07 19:26:24 0 658
Sponzorováno
Virtuala FansOnly https://virtuala.site