PostgreSQL에서 ON CONFLICT와 함께 RETURNING을 사용하는 방법은 무엇입니까?

Programing

PostgreSQL에서 ON CONFLICT와 함께 RETURNING을 사용하는 방법은 무엇입니까?

crosscheck 2020. 7. 10. 07:59

PostgreSQL에서 ON CONFLICT와 함께 RETURNING을 사용하는 방법은 무엇입니까?

PostgreSQL 9.5에는 다음과 같은 UPSERT가 있습니다.

INSERT INTO chats ("user", "contact", "name") 
           VALUES ($1, $2, $3), 
                  ($2, $1, NULL) 
ON CONFLICT("user", "contact") DO NOTHING
RETURNING id;

충돌이 없으면 다음과 같이 반환됩니다.

----------
    | id |
----------
  1 | 50 |
----------
  2 | 51 |
----------

그러나 충돌이 있으면 행을 반환하지 않습니다.

----------
    | id |
----------

id충돌이 없으면 새 열 을 반환 하거나 충돌하는 id열의 기존 열 을 반환하고 싶습니다 .
이것을 할 수 있습니까? 그렇다면 어떻게?

나는 정확히 같은 문제가 있었고, 업데이트 할 것이 없어도 '아무것도하지 않음'대신 '업데이트 수행'을 사용하여 해결했습니다. 귀하의 경우 다음과 같습니다.

INSERT INTO chats ("user", "contact", "name") 
       VALUES ($1, $2, $3), 
              ($2, $1, NULL) 
ON CONFLICT("user", "contact") DO UPDATE SET name=EXCLUDED.name RETURNING id;

이 쿼리는 방금 삽입했거나 이전에 존재했던 모든 행을 반환합니다.

현재 허용 대답은 A에 대한 확인을 보인다 단일 충돌 대상, 몇 충돌, 작은 튜플없이 트리거. 그리고 무차별 강제로 동시성 문제 1 (아래 참조)을 피 합니다 . 간단한 해결책은 매력이 있으며 부작용은 덜 중요 할 수 있습니다.

그러나 다른 모든 경우에는 필요없이 동일한 행을 업데이트 하지 마십시오 . 표면에 차이가 보이지 않더라도 다양한 부작용이 있습니다 .

발사해서는 안되는 방아쇠를 발사 할 수 있습니다.
"무고한"행을 쓰기 잠금하여 동시 트랜잭션에 비용이 발생할 수 있습니다.
오래되었지만 행이 새로운 것처럼 보일 수 있습니다 (트랜잭션 타임 스탬프).
가장 중요한 것은 , 함께 PostgreSQL을의 MVCC 모델 에 새로운 행 버전은 어느 쪽이든, 행 데이터가 동일 여부에 상관없이 기록됩니다. 이로 인해 UPSERT 자체에 대한 성능 저하, 테이블 팽창, 인덱스 팽창, 테이블의 모든 후속 작업에 대한 성능 저하, VACUUM비용이 발생합니다. 몇 번의 복제에는 미미한 효과가 있지만 대부분의 듀피 에는 거대 합니다.

또한 때로는 실용적이지 않거나 사용하기가 불가능합니다 ON CONFLICT DO UPDATE. 매뉴얼 :

를 들어 ON CONFLICT DO UPDATE, A를 conflict_target제공해야합니다.

빈 업데이트 나 부작용없이 거의 동일한 결과를 얻을 수 있습니다. 그리고 다음 해결책 중 일부는 발생할 수있는 모든 충돌 ON CONFLICT DO NOTHING을 포착하기 위해 "충돌 대상" 과 함께 작동 합니다. (바람직하지 않을 수도 있습니다.)

동시 쓰기로드없이

WITH input_rows(usr, contact, name) AS (
   VALUES
      (text 'foo1', text 'bar1', text 'bob1')  -- type casts in first row
    , ('foo2', 'bar2', 'bob2')
    -- more?
   )
, ins AS (
   INSERT INTO chats (usr, contact, name) 
   SELECT * FROM input_rows
   ON CONFLICT (usr, contact) DO NOTHING
   RETURNING id  --, usr, contact              -- return more columns?
   )
SELECT 'i' AS source                           -- 'i' for 'inserted'
     , id  --, usr, contact                    -- return more columns?
FROM   ins
UNION  ALL
SELECT 's' AS source                           -- 's' for 'selected'
     , c.id  --, usr, contact                  -- return more columns?
FROM   input_rows
JOIN   chats c USING (usr, contact);           -- columns of unique index

이 source열은 작동 방식을 보여주기 위해 선택 사항으로 추가되었습니다. 실제로 두 경우의 차이를 알려주기 위해 필요할 수 있습니다 (빈 쓰기보다 다른 이점).

JOIN chats연결된 데이터 수정 CTE 에서 새로 삽입 된 행 이 아직 기본 테이블에 표시되지 않기 때문에 최종 작업 이 이루어집니다. 동일한 SQL 문의 모든 부분은 동일한 기본 테이블의 스냅 샷을 참조합니다.

Since the VALUES expression is free-standing (not directly attached to an INSERT) Postgres cannot derive data types from the target columns and you may have to add explicit type casts. The manual:

When VALUES is used in INSERT, the values are all automatically coerced to the data type of the corresponding destination column. When it's used in other contexts, it might be necessary to specify the correct data type. If the entries are all quoted literal constants, coercing the first is sufficient to determine the assumed type for all.

The query itself may be a bit more expensive for few dupes, due to the overhead of the CTE and the additional SELECT (which should be cheap since the perfect index is there by definition - a unique constraint is implemented with an index).

May be (much) faster for many duplicates. The effective cost of additional writes depends on many factors.

But there are fewer side effects and hidden costs in any case. It's most probably cheaper overall.

(Attached sequences are still advanced, since default values are filled in before testing for conflicts.)

About CTEs:

With concurrent write load

Assuming default READ COMMITTED transaction isolation.

Related answer on dba.SE with detailed explanation:

Concurrent transactions result in race condition with unique constraint on insert

The best strategy to defend against race conditions depends on exact requirements, the number and size of rows in the table and in the UPSERTs, the number of concurrent transactions, the likelihood of conflicts, available resources and other factors ...

Concurrency issue 1

If a concurrent transaction has written to a row which your transaction now tries to UPSERT, your transaction has to wait for the other one to finish.

If the other transaction ends with ROLLBACK (or any error, i.e. automatic ROLLBACK), your transaction can proceed normally. Minor side effect: gaps in the sequential numbers. But no missing rows.

If the other transaction ends normally (implicit or explicit COMMIT), your INSERT will detect a conflict (the UNIQUE index / constraint is absolute) and DO NOTHING, hence also not return the row. (Also cannot lock the row as demonstrated in concurrency issue 2 below, since it's not visible.) The SELECT sees the same snapshot from the start of the query and also cannot return the yet invisible row.

Any such rows are missing from the result set (even though they exist in the underlying table)!

This may be ok as is. Especially if you are not returning rows like in the example and are satisfied knowing the row is there. If that's not good enough, there are various ways around it.

You could check the row count of the output and repeat the statement if it does not match the row count of the input. May be good enough for the rare case. The point is to start a new query (can be in the same transaction), which will then see the newly committed rows.

Or check for missing result rows within the same query and overwrite those with the brute force trick demonstrated in Alextoni's answer.

WITH input_rows(usr, contact, name) AS ( ... )  -- see above
, ins AS (
   INSERT INTO chats AS c (usr, contact, name) 
   SELECT * FROM input_rows
   ON     CONFLICT (usr, contact) DO NOTHING
   RETURNING id, usr, contact                   -- we need unique columns for later join
   )
, sel AS (
   SELECT 'i'::"char" AS source                 -- 'i' for 'inserted'
        , id, usr, contact
   FROM   ins
   UNION  ALL
   SELECT 's'::"char" AS source                 -- 's' for 'selected'
        , c.id, usr, contact
   FROM   input_rows
   JOIN   chats c USING (usr, contact)
   )
, ups AS (                                      -- RARE corner case
   INSERT INTO chats AS c (usr, contact, name)  -- another UPSERT, not just UPDATE
   SELECT i.*
   FROM   input_rows i
   LEFT   JOIN sel   s USING (usr, contact)     -- columns of unique index
   WHERE  s.usr IS NULL                         -- missing!
   ON     CONFLICT (usr, contact) DO UPDATE     -- we've asked nicely the 1st time ...
   SET    name = c.name                         -- ... this time we overwrite with old value
   -- SET name = EXCLUDED.name                  -- alternatively overwrite with *new* value
   RETURNING 'u'::"char" AS source              -- 'u' for updated
           , id  --, usr, contact               -- return more columns?
   )
SELECT source, id FROM sel
UNION  ALL
TABLE  ups;

It's like the query above, but we add one more step with the CTE ups, before we return the complete result set. That last CTE will do nothing most of the time. Only if rows go missing from the returned result, we use brute force.

More overhead, yet. The more conflicts with pre-existing rows, the more likely this will outperform the simple approach.

One side effect: the 2nd UPSERT writes rows out of order, so it re-introduces the possibility of deadlocks (see below) if three or more transactions writing to the same rows overlap. If that's a problem, you need a different solution.

Concurrency issue 2

If concurrent transactions can write to involved columns of affected rows, and you have to make sure the rows you found are still there at a later stage in the same transaction, you can lock rows cheaply with:

...
ON CONFLICT (usr, contact) DO UPDATE
SET name = name WHERE FALSE  -- never executed, but still locks the row
...

And add a locking clause to the SELECT as well, like FOR UPDATE.

This makes competing write operations wait till the end of the transaction, when all locks are released. So be brief.

More details and explanation:

Deadlocks?

Defend against deadlocks by inserting rows in consistent order. See:

Deadlock with multi-row INSERTs despite ON CONFLICT DO NOTHING

Data types and casts

Existing table as template for data types ...

Explicit type casts for the first row of data in the free-standing VALUES expression may be inconvenient. There are ways around it. You can use any existing relation (table, view, ...) as row template. The target table is the obvious choice for the use case. Input data is coerced to appropriate types automatically, like in a VALUES clause of an INSERT:

WITH input_rows AS (
  (SELECT usr, contact, name FROM chats LIMIT 0)  -- only copies column names and types
   UNION ALL
   VALUES
      ('foo1', 'bar1', 'bob1')  -- no type casts needed
    , ('foo2', 'bar2', 'bob2')
   )
   ...

This does not work for some data types (explanation in the linked answer at the bottom). The next trick works for all data types:

... and names

If you insert whole rows (all columns of the table - or at least a set of leading columns), you can omit column names, too. Assuming table chats in the example only has the 3 columns used:

WITH input_rows AS (
   SELECT * FROM (
      VALUES
      ((NULL::chats).*)         -- copies whole row definition
      ('foo1', 'bar1', 'bob1')  -- no type casts needed
    , ('foo2', 'bar2', 'bob2')
      ) sub
   OFFSET 1
   )
   ...

Detailed explanation and more alternatives:

Casting NULL type when updating multiple rows

Aside: don't use reserved words like "user" as identifier. That's a loaded footgun. Use legal, lower-case, unquoted identifiers. I replaced it with usr.

Upsert, being an extension of the INSERT query can be defined with two different behaviors in case of a constraint conflict: DO NOTHING or DO UPDATE.

INSERT INTO upsert_table VALUES (2, 6, 'upserted')
   ON CONFLICT DO NOTHING RETURNING *;

 id | sub_id | status
----+--------+--------
 (0 rows)

Note as well that RETURNING returns nothing, because no tuples have been inserted. Now with DO UPDATE, it is possible to perform operations on the tuple there is a conflict with. First note that it is important to define a constraint which will be used to define that there is a conflict.

INSERT INTO upsert_table VALUES (2, 2, 'inserted')
   ON CONFLICT ON CONSTRAINT upsert_table_sub_id_key
   DO UPDATE SET status = 'upserted' RETURNING *;

 id | sub_id |  status
----+--------+----------
  2 |      2 | upserted
(1 row)

참고URL : https://stackoverflow.com/questions/34708509/how-to-use-returning-with-on-conflict-in-postgresql

'Programing' 카테고리의 다른 글

Numpy array, 여러 조건을 만족하는 인덱스를 선택하는 방법은 무엇입니까? (0)	2020.07.10
customerrors =“On”인 경우 Application_Error가 발생하지 않습니다 (0)	2020.07.10
파이썬 욕심없는 정규식 (0)	2020.07.10
SQL 데이터베이스 디자인 초보자 가이드 (0)	2020.07.09
IntelliJ Idea의 프로젝트 트리에서 컴파일 오류를 즉시 보는 방법은 무엇입니까? (0)	2020.07.09

현재글PostgreSQL에서 ON CONFLICT와 함께 RETURNING을 사용하는 방법은 무엇입니까?

crosscheck

PostgreSQL에서 ON CONFLICT와 함께 RETURNING을 사용하는 방법은 무엇입니까?

PostgreSQL에서 ON CONFLICT와 함께 RETURNING을 사용하는 방법은 무엇입니까?

동시 쓰기로드없이

With concurrent write load

Concurrency issue 1

Concurrency issue 2

Deadlocks?

Data types and casts

Existing table as template for data types ...

... and names

'Programing' 카테고리의 다른 글

'Programing'의 다른글

티스토리툴바

PostgreSQL에서 ON CONFLICT와 함께 RETURNING을 사용하는 방법은 무엇입니까?

PostgreSQL에서 ON CONFLICT와 함께 RETURNING을 사용하는 방법은 무엇입니까?

동시 쓰기로드없이

With concurrent write load

Concurrency issue 1

Concurrency issue 2

Deadlocks?

Data types and casts

Existing table as template for data types ...

... and names

'Programing' 카테고리의 다른 글

'Programing'의 다른글

관련글

티스토리툴바