Programing

여러 열에서 중복 항목을 찾으려면 어떻게합니까?

crosscheck 2020. 9. 14. 07:57
반응형

여러 열에서 중복 항목을 찾으려면 어떻게합니까?


그래서 아래의 SQL 코드와 같은 작업을하고 싶습니다.

select s.id, s.name,s.city 
from stuff s
group by s.name having count(where city and name are identical) > 1

다음을 생성하려면 (그러나 이름 만 또는 도시 만 일치하는 경우 무시하고 두 열 모두에 있어야 함) :

id      name  city   
904834  jim   London  
904835  jim   London  
90145   Fred  Paris   
90132   Fred  Paris
90133   Fred  Paris

id쌍에 대해 복제 됨 namecity:

select s.id, t.* 
from [stuff] s
join (
    select name, city, count(*) as qty
    from [stuff]
    group by name, city
    having count(*) > 1
) t on s.name = t.name and s.city = t.city

 SELECT name, city, count(*) as qty 
 FROM stuff 
 GROUP BY name, city HAVING count(*)> 1

이와 같은 것이 트릭을 할 것입니다. 성능에 대해 모르기 때문에 몇 가지 테스트를 수행하십시오.

select
  id, name, city
from
  [stuff] s
where
1 < (select count(*) from [stuff] i where i.city = s.city and i.name = s.name)

를 사용하면 count(*) over(partition by...)원치 않는 반복을 찾을 수있는 간단하고 효율적인 방법을 제공하는 동시에 영향을받는 모든 행과 원하는 열을 모두 나열합니다.

SELECT
    t.*
FROM (
    SELECT
        s.*
      , COUNT(*) OVER (PARTITION BY s.name, s.city) AS qty
    FROM stuff s
    ) t
WHERE t.qty > 1
ORDER BY t.name, t.city

최신 RDBMS 버전은 count(*) over(partition by...) MySQL V 8.0을 지원하지만 아래와 같이 "창 기능"을 도입했습니다 (MySQL 8.0).

CREATE TABLE stuff(
   id   INTEGER  NOT NULL
  ,name VARCHAR(60) NOT NULL
  ,city VARCHAR(60) NOT NULL
);
INSERT INTO stuff(id,name,city) VALUES 
  (904834,'jim','London')
, (904835,'jim','London')
, (90145,'Fred','Paris')
, (90132,'Fred','Paris')
, (90133,'Fred','Paris')

, (923457,'Barney','New York') # not expected in result
;
SELECT
    t.*
FROM (
    SELECT
        s.*
      , COUNT(*) OVER (PARTITION BY s.name, s.city) AS qty
    FROM stuff s
    ) t
WHERE t.qty > 1
ORDER BY t.name, t.city
    아이디 | 이름 | 도시 | 수량
----- : | : --- | : ----- | -:
 90145 | 프레드 | 파리 | 
 90132 | 프레드 | 파리 | 
 90133 | 프레드 | 파리 | 
904834 | 짐 | 런던 | 2
904835 | jim  | London |   2

db<>fiddle here

Window functions. MySQL now supports window functions that, for each row from a query, perform a calculation using rows related to that row. These include functions such as RANK(), LAG(), and NTILE(). In addition, several existing aggregate functions now can be used as window functions; for example, SUM() and AVG(). For more information, see Section 12.21, “Window Functions”.


You have to self join stuff and match name and city. Then group by count.

select 
   s.id, s.name, s.city 
from stuff s join stuff p ON (
   s.name = p.city OR s.city = p.name
)
group by s.name having count(s.name) > 1

A little late to the game on this post, but I found this way to be pretty flexible / efficient

select 
    s1.id
    ,s1.name
    ,s1.city 
from 
    stuff s1
    ,stuff s2
Where
    s1.id <> s2.id
    and s1.name = s2.name
    and s1.city = s2.city

Given a staging table with 70 columns and only 4 representing duplicates, this code will return the offending columns:

SELECT 
    COUNT(*)
    ,LTRIM(RTRIM(S.TransactionDate)) 
    ,LTRIM(RTRIM(S.TransactionTime))
    ,LTRIM(RTRIM(S.TransactionTicketNumber)) 
    ,LTRIM(RTRIM(GrossCost)) 
FROM Staging.dbo.Stage S
GROUP BY 
    LTRIM(RTRIM(S.TransactionDate)) 
    ,LTRIM(RTRIM(S.TransactionTime))
    ,LTRIM(RTRIM(S.TransactionTicketNumber)) 
    ,LTRIM(RTRIM(GrossCost)) 
HAVING COUNT(*) > 1

.

참고URL : https://stackoverflow.com/questions/8149210/how-do-i-find-duplicates-across-multiple-columns

반응형