Programing

GCC는 루프 내에서 증가하는 사용되지 않는 변수를 어떻게 최적화합니까?

crosscheck 2020. 11. 21. 15:39
반응형

GCC는 루프 내에서 증가하는 사용되지 않는 변수를 어떻게 최적화합니까?


이 간단한 C 프로그램을 작성했습니다.

int main() {
    int i;
    int count = 0;
    for(i = 0; i < 2000000000; i++){
        count = count + 1;
    }
}

gcc 컴파일러가이 루프를 최적화하는 방법을보고 싶었습니다 (분명히 1을 2000000000 번 추가하면 " 한 번 2000000000을 추가"해야합니다 ). 그래서:

gcc test.c 다음 timeon a.out다음 제공합니다.

real 0m7.717s  
user 0m7.710s  
sys 0m0.000s  

$ gcc -O2 test.c 다음 time ona.out`은 다음을 제공합니다.

real 0m0.003s  
user 0m0.000s  
sys 0m0.000s  

그런 다음 gcc -S. 첫 번째는 매우 명확 해 보입니다.

    .file "test.c"  
    .text  
.globl main
    .type   main, @function  
main:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    movq    %rsp, %rbp
    .cfi_offset 6, -16
    .cfi_def_cfa_register 6
    movl    $0, -8(%rbp)
    movl    $0, -4(%rbp)
    jmp .L2
.L3:
    addl    $1, -8(%rbp)
    addl    $1, -4(%rbp)
.L2:
    cmpl    $1999999999, -4(%rbp)
    jle .L3
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
    .section    .note.GNU-stack,"",@progbits

L3은 추가하고, L2는 L3 비교 -4(%rbp)하고 .1999999999i < 2000000000

이제 최적화 된 것 :

    .file "test.c"  
    .text
    .p2align 4,,15
.globl main
    .type main, @function
main:
.LFB0:
    .cfi_startproc
    rep
    ret
    .cfi_endproc
.LFE0:
    .size main, .-main
    .ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
    .section .note.GNU-stack,"",@progbits

나는 거기에서 무슨 일이 일어나고 있는지 전혀 이해할 수 없습니다! 나는 조립에 대한 지식이 거의 없지만 다음과 같은 것을 기대했습니다.

addl $2000000000, -8(%rbp)

나는 gcc -c -g -Wa, -a, -ad -O2 test.c 로 변환 된 어셈블리와 함께 C 코드를 보려고 시도했지만 결과는 이전 코드보다 더 명확하지 않았습니다.

누군가 간단히 설명 할 수 있습니까?

  1. GCC는 -O2 -S 출력.
  2. 루프가 예상대로 최적화 된 경우 (여러 합계 대신 하나의 합계)?

컴파일러는 그것보다 훨씬 더 똑똑합니다. :)

사실, 그것은 당신이 루프의 결과를 사용하고 있지 않다는 것을 깨닫습니다. 그래서 전체 루프를 완전히 제거했습니다!

이를 데드 코드 제거 라고 합니다.

더 나은 테스트는 결과를 인쇄하는 것입니다.

#include <stdio.h>
int main(void) {
    int i; int count = 0;
    for(i = 0; i < 2000000000; i++){
        count = count + 1;
    }

    //  Print result to prevent Dead Code Elimination
    printf("%d\n", count);
}

편집 : 나는 필수를 추가했습니다 #include <stdio.h>; MSVC 어셈블리 목록은이없는 버전에 해당 #include하지만 동일해야합니다.


I don't have GCC in front of me at the moment, since I'm booted into Windows. But here's the disassembly of the version with the printf() on MSVC:

EDIT : I had the wrong assembly output. Here's the correct one.

; 57   : int main(){

$LN8:
    sub rsp, 40                 ; 00000028H

; 58   : 
; 59   : 
; 60   :     int i; int count = 0;
; 61   :     for(i = 0; i < 2000000000; i++){
; 62   :         count = count + 1;
; 63   :     }
; 64   : 
; 65   :     //  Print result to prevent Dead Code Elimination
; 66   :     printf("%d\n",count);

    lea rcx, OFFSET FLAT:??_C@_03PMGGPEJJ@?$CFd?6?$AA@
    mov edx, 2000000000             ; 77359400H
    call    QWORD PTR __imp_printf

; 67   : 
; 68   : 
; 69   : 
; 70   :
; 71   :     return 0;

    xor eax, eax

; 72   : }

    add rsp, 40                 ; 00000028H
    ret 0

So yes, Visual Studio does this optimization. I'd assume GCC probably does too.

And yes, GCC performs a similar optimization. Here's an assembly listing for the same program with gcc -S -O2 test.c (gcc 4.5.2, Ubuntu 11.10, x86):

        .file   "test.c"
        .section        .rodata.str1.1,"aMS",@progbits,1
.LC0:
        .string "%d\n"
        .text
        .p2align 4,,15
.globl main
        .type   main, @function
main:
        pushl   %ebp
        movl    %esp, %ebp
        andl    $-16, %esp
        subl    $16, %esp
        movl    $2000000000, 8(%esp)
        movl    $.LC0, 4(%esp)
        movl    $1, (%esp)
        call    __printf_chk
        leave
        ret
        .size   main, .-main
        .ident  "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
        .section        .note.GNU-stack,"",@progbits

Compilers have a few tools at their disposal to make code more efficient or more "efficient":

  1. If the result of a computation is never used, the code that performs the computation can be omitted (if the computation acted upon volatile values, those values must still be read but the results of the read may be ignored). If the results of the computations that fed it weren't used, the code that performs those can be omitted as well. If such omission makes the code for both paths on a conditional branch identical, the condition may be regarded as unused and omitted. This will have no effect on the behaviors (other than execution time) of any program that doesn't make out-of-bounds memory accesses or invoke what Annex L would call "Critical Undefined Behaviors".

  2. If the compiler determines that the machine code that computes a value can only produce results in a certain range, it may omit any conditional tests whose outcome could be predicted on that basis. As above, this will not affect behaviors other than execution time unless code invokes "Critical Undefined Behaviors".

  3. If the compiler determines that certain inputs would invoke any form of Undefined Behavior with the code as written, the Standard would allow the compiler to omit any code which would only be relevant when such inputs are received, even if the natural behavior of the execution platform given such inputs would have been benign and the compiler's rewrite would make it dangerous.

Good compilers do #1 and #2. For some reason, however, #3 has become fashionable.

참고URL : https://stackoverflow.com/questions/8841865/how-does-gcc-optimize-out-an-unused-variable-incremented-inside-a-loop

반응형