Programing

mysql에서 순차 번호 매기기의 차이를 찾는 방법은 무엇입니까?

crosscheck 2020. 8. 9. 09:47
반응형

mysql에서 순차 번호 매기기의 차이를 찾는 방법은 무엇입니까?


다른 시스템에서 값을 가져온 테이블이있는 데이터베이스가 있습니다. 자동 증가 열이 있고 중복 값이 ​​없지만 누락 된 값이 있습니다. 예를 들어 다음 쿼리를 실행합니다.

select count(id) from arrc_vouchers where id between 1 and 100

100을 반환해야하지만 대신 87을 반환합니다. 누락 된 숫자의 값을 반환하는 쿼리를 실행할 수 있습니까? 예를 들어, ID 1-70 및 83-100에 대한 레코드가있을 수 있지만 ID가 71-82 인 레코드가 없습니다. 71, 72, 73 등을 반환하고 싶습니다.

이게 가능해?


최신 정보

ConfexianMJS 는 성능 측면에서 훨씬 더 나은 답변제공했습니다 .

(가능한 한 빠르지 않은) 대답

다음은 100 행뿐 아니라 모든 크기의 테이블에서 작동하는 버전입니다.

SELECT (t1.id + 1) as gap_starts_at, 
       (SELECT MIN(t3.id) -1 FROM arrc_vouchers t3 WHERE t3.id > t1.id) as gap_ends_at
FROM arrc_vouchers t1
WHERE NOT EXISTS (SELECT t2.id FROM arrc_vouchers t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL
  • gap_starts_at -현재 간격의 첫 번째 ID
  • gap_ends_at -현재 간격의 마지막 ID

이것은 80k 행이 넘는 테이블에서 간격을 찾는 데 도움이되었습니다.

SELECT
 CONCAT(z.expected, IF(z.got-1>z.expected, CONCAT(' thru ',z.got-1), '')) AS missing
FROM (
 SELECT
  @rownum:=@rownum+1 AS expected,
  IF(@rownum=YourCol, 0, @rownum:=YourCol) AS got
 FROM
  (SELECT @rownum:=0) AS a
  JOIN YourTable
  ORDER BY YourCol
 ) AS z
WHERE z.got!=0;

결과:

+------------------+
| missing          |
+------------------+
| 1 thru 99        |
| 666 thru 667     |
| 50000            |
| 66419 thru 66456 |
+------------------+
4 rows in set (0.06 sec)

열의 순서 유의 expected하고 got중요합니다.

이것이 YourCol1에서 시작하지 않고 중요하지 않다는 것을 알고 있다면

(SELECT @rownum:=0) AS a

(SELECT @rownum:=(SELECT MIN(YourCol)-1 FROM YourTable)) AS a

새로운 결과 :

+------------------+
| missing          |
+------------------+
| 666 thru 667     |
| 50000            |
| 66419 thru 66456 |
+------------------+
3 rows in set (0.06 sec)

누락 된 ID에 대해 일종의 셸 스크립트 작업을 수행해야하는 경우 bash에서 반복 할 수있는 표현식을 직접 생성하기 위해이 변형을 사용할 수도 있습니다.

SELECT GROUP_CONCAT(IF(z.got-1>z.expected, CONCAT('$(',z.expected,' ',z.got-1,')'), z.expected) SEPARATOR " ") AS missing
FROM (  SELECT   @rownum:=@rownum+1 AS expected,   IF(@rownum=height, 0, @rownum:=height) AS got  FROM   (SELECT @rownum:=0) AS a   JOIN block   ORDER BY height  ) AS z WHERE z.got!=0;

이것은 다음과 같은 출력을 생성합니다.

$(seq 1 99) $(seq 666 667) 50000 $(seq 66419 66456)

그런 다음 bash 터미널의 for 루프에 복사하여 붙여 넣어 모든 ID에 대해 명령을 실행할 수 있습니다.

for ID in $(seq 1 99) $(seq 666 667) 50000 $(seq 66419 66456); do
  echo $ID
  # fill the gaps
done

위와 동일하지만 읽기와 실행이 모두 가능합니다. 위의 "CONCAT"명령을 변경하면 다른 프로그래밍 언어에 대한 구문을 생성 할 수 있습니다. 아니면 SQL 일 수도 있습니다.


트릭을 수행해야하는 빠르고 더러운 쿼리 :

SELECT a AS id, b AS next_id, (b - a) -1 AS missing_inbetween
FROM 
 (
SELECT a1.id AS a , MIN(a2.id) AS b 
FROM arrc_vouchers  AS a1
LEFT JOIN arrc_vouchers AS a2 ON a2.id > a1.id
WHERE a1.id <= 100
GROUP BY a1.id
) AS tab

WHERE 
b > a + 1

그러면 위에 누락 된 ID가있는 ID와 존재하는 next_id, 그리고 그 사이에 누락 된 항목 수가 표시되는 테이블이 표시됩니다.

 
id next_id missing_inbetween
 1 4 2
68 70 1
75 87 11

를 사용하는 MariaDB경우 시퀀스 저장소 엔진을 사용하는 더 빠른 (800 %) 옵션이 있습니다 .

SELECT * FROM seq_1_to_50000 WHERE SEQ NOT IN (SELECT COL FROM TABLE);

100 개의 행과 1-100 값을 포함하는 단일 열이있는 임시 테이블을 만듭니다.

외부이 테이블을 arrc_vouchers 테이블에 조인하고 arrc_vouchers ID가 널인 단일 열 값을 선택하십시오.

이 블라인드를 코딩하지만 작동합니다.

select tempid from temptable 
left join arrc_vouchers on temptable.tempid = arrc_vouchers.id 
where arrc_vouchers.id is null

쿼리 + 일부 처리를 수행하는 코드가 필요한 대체 솔루션은 다음과 같습니다.

select l.id lValue, c.id cValue, r.id rValue 
  from 
  arrc_vouchers l 
  right join arrc_vouchers c on l.id=IF(c.id > 0, c.id-1, null)
  left  join arrc_vouchers r on r.id=c.id+1
where 1=1
  and c.id > 0 
  and (l.id is null or r.id is null)
order by c.id asc;

쿼리에는 MySQL의 플래너가 성능을 발휘하지 못하는 것으로 알고있는 하위 선택이 포함되어 있지 않습니다.

That will return one entry per centralValue (cValue) that does not have a smaller value (lValue) or a greater value (rValue), ie:

lValue |cValue|rValue
-------+------+-------
{null} | 2    | 3      
8      | 9    | {null} 
{null} | 22   | 23     
23     | 24   | {null} 
{null} | 29   | {null} 
{null} | 33   | {null} 


Without going into further details (we'll see them in next paragraphs) this output means that:

  • No values between 0 and 2
  • No values between 9 and 22
  • No values between 24 and 29
  • No values between 29 and 33
  • No values between 33 and MAX VALUE

So the basic idea is to do a RIGHT and LEFT joins with the same table seeing if we have adjacents values per value (ie: if central value is '3' then we check for 3-1=2 at left and 3+1 at right), and when a ROW has a NULL value at RIGHT or LEFT then we know there is no adjacent value.

The complete raw output of my table is:

select * from arrc_vouchers order by id asc;

0  
2  
3  
4  
5  
6  
7  
8  
9  
22 
23 
24 
29 
33 

Some notes:

  1. The SQL IF statement in the join condition is needed if you define the 'id' field as UNSIGNED, therefore it will not allow you to decrease it under zero. This is not strictly necessary if you keep the c.value > 0 as it's stated in the next note, but I'm including it just as doc.
  2. I'm filtering the zero central value as we are not interested in any previous value and we can derive the post value from the next row.

based on the answer given above by Lucek this stored procedure allows you to specify the table and column names that you wish to test to find non-contiguous records - thus answering the original question and also demonstrating how one could use @var to represent tables &/or columns in a stored procedure.

create definer=`root`@`localhost` procedure `spfindnoncontiguous`(in `param_tbl` varchar(64), in `param_col` varchar(64))
language sql
not deterministic
contains sql
sql security definer
comment ''
begin
declare strsql varchar(1000);
declare tbl varchar(64);
declare col varchar(64);

set @tbl=cast(param_tbl as char character set utf8);
set @col=cast(param_col as char character set utf8);

set @strsql=concat("select 
    ( t1.",@col," + 1 ) as starts_at, 
  ( select min(t3.",@col,") -1 from ",@tbl," t3 where t3.",@col," > t1.",@col," ) as ends_at
    from ",@tbl," t1
        where not exists ( select t2.",@col," from ",@tbl," t2 where t2.",@col," = t1.",@col," + 1 )
        having ends_at is not null");

prepare stmt from @strsql;
execute stmt;
deallocate prepare stmt;
end

If there is a sequence having gap of maximum one between two numbers (like 1,3,5,6) then the query that can be used is:

select s.id+1 from source1 s where s.id+1 not in(select id from source1) and s.id+1<(select max(id) from source1);
  • table_name - source1
  • column_name - id

Although these all seem to work, the result set returns in a very lengthy time when there are 50,000 records.

I used this, and it find the gap or the next available (last used + 1) with a much faster return from the query.

SELECT a.id as beforegap, a.id+1 as avail
FROM table_name a
where (select b.id from table_name b where b.id=a.id+1) is null
limit 1;

I tried it in different manners and the best performance that I found was this simple query:

select a.id+1 gapIni
    ,(select x.id-1 from arrc_vouchers x where x.id>a.id+1 limit 1) gapEnd
    from arrc_vouchers a
    left join arrc_vouchers b on b.id=a.id+1
    where b.id is null
    order by 1
;

... one left join to check if the next id exists, only if next if is not found, then the subquery find the next id that exists to find the end of gap. I did it because query with equal (=) is better performance than greater than (>) operator.

Using the sqlfiddle it not show so different performance of others query but in a real database this query above result 3 times more fast than others.

The schema:

CREATE TABLE arrc_vouchers (id int primary key)
;
INSERT INTO `arrc_vouchers` (`id`) VALUES (1),(4),(5),(7),(8),(9),(10),(11),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29)
;

Follow bellow all query that I made to compare the performance:

select a.id+1 gapIni
    ,(select x.id-1 from arrc_vouchers x where x.id>a.id+1 limit 1) gapEnd
    from arrc_vouchers a
    left join arrc_vouchers b on b.id=a.id+1
    where b.id is null
    order by 1
;
select *, (gapEnd-gapIni) qt
    from (
        select id+1 gapIni
        ,(select x.id from arrc_vouchers x where x.id>a.id limit 1) gapEnd
        from arrc_vouchers a
        order by id
    ) a where gapEnd <> gapIni
;
select id+1 gapIni
    ,(select x.id from arrc_vouchers x where x.id>a.id limit 1) gapEnd
    #,coalesce((select id from arrc_vouchers x where x.id=a.id+1),(select x.id from arrc_vouchers x where x.id>a.id limit 1)) gapEnd
    from arrc_vouchers a
    where id+1 <> (select x.id from arrc_vouchers x where x.id>a.id limit 1)
    order by id
;
select id+1 gapIni
    ,coalesce((select id from arrc_vouchers x where x.id=a.id+1),(select x.id from arrc_vouchers x where x.id>a.id limit 1)) gapEnd
    from arrc_vouchers a
    order by id
;
select id+1 gapIni
    ,coalesce((select id from arrc_vouchers x where x.id=a.id+1),concat('*** GAT *** ',(select x.id from arrc_vouchers x where x.id>a.id limit 1))) gapEnd
    from arrc_vouchers a
    order by id
;

Maybe it helps someone and useful.

You can see and test my query using this sqlfiddle:

http://sqlfiddle.com/#!9/6bdca7/1


Probably not relevant, but I was looking for something like this to list the gaps in a sequence of numbers and found this post, that has multiple different solutions depending upon exactly what you are looking for. I was looking for the first available gap in the sequence (i.e. next available number), and this seems to work fine.

SELECT MIN(l.number_sequence + 1) as nextavabile from patients as l LEFT OUTER JOIN patients as r on l.number_sequence + 1 = r.number_sequence WHERE r.number_sequence is NULL. Several other scenarios and solutions discussed there, from 2005 !

How to Find Missing Values in a Sequence With SQL

참고URL : https://stackoverflow.com/questions/4340793/how-to-find-gaps-in-sequential-numbering-in-mysql

반응형