Programing

컴파일러는 어떻게 스스로 컴파일 할 수 있습니까?

crosscheck 2020. 5. 27. 21:41
반응형

컴파일러는 어떻게 스스로 컴파일 할 수 있습니까?


웹 사이트 http://coffeescript.org/ 에서 CoffeeScript를 연구 중이며 텍스트가 있습니다.

CoffeeScript 컴파일러 자체는 CoffeeScript로 작성되었습니다

컴파일러는 어떻게 스스로 컴파일 할 수 있습니까? 아니면이 문장은 무엇을 의미합니까?


컴파일러의 첫 번째 버전은 특정 프로그래밍 언어에서 기계로 생성 할 수 없습니다. 당신의 혼란은 이해할 수 있습니다. 더 많은 언어 기능을 가진 최신 버전의 컴파일러 (새 언어의 첫 번째 버전에서 소스가 다시 작성된)는 첫 번째 컴파일러에 의해 빌드 될 수 있습니다. 그런 다음 해당 버전은 다음 컴파일러 등을 컴파일 할 수 있습니다. 예를 들면 다음과 같습니다.

  1. 첫 번째 CoffeeScript 컴파일러는 Ruby로 작성되어 CoffeeScript 버전 1을 생성합니다.
  2. CS 컴파일러의 소스 코드는 CoffeeScript 1로 다시 작성되었습니다.
  3. 원래 CS 컴파일러는 CS 1로 작성된 새 코드를 컴파일러의 버전 2로 컴파일합니다.
  4. 새로운 언어 기능을 추가하기 위해 컴파일러 소스 코드가 변경되었습니다.
  5. 두 번째 CS 컴파일러 (CS로 작성된 첫 번째)는 수정 된 새 소스 코드를 컴파일러의 버전 3으로 컴파일합니다.
  6. 각 반복에 대해 4 단계와 5 단계를 반복하십시오.

참고 : CoffeeScript 버전의 번호가 정확히 어떻게 사용되는지 잘 모르겠습니다. 이는 단지 예일뿐입니다.

이 과정을 일반적으로 부트 스트랩 이라고 합니다. 부트 스트랩 컴파일러의 또 다른 예 rustcRust 언어 의 컴파일러입니다 .


신뢰 신뢰에 관한 논문 에서 유닉스의 창시자 중 하나 인 켄 톰슨 (Ken Thompson)은 C 컴파일러가 어떻게 컴파일되는지에 대한 흥미롭고 읽기 쉬운 개요를 작성합니다. CoffeeScript 또는 다른 언어에도 비슷한 개념을 적용 할 수 있습니다.

자체 코드를 컴파일하는 컴파일러의 아이디어는 실행시 원래 소스 코드를 출력으로 생성 하는 quine : 소스 코드 와 모호 합니다. 다음은 CoffeeScript quine의 입니다. 톰슨은 다음과 같은 C 퀴인 예제를 제공했습니다.

char s[] = {
    '\t',
    '0',
    '\n',
    '}',
    ';',
    '\n',
    '\n',
    '/',
    '*',
    '\n',
    … 213 lines omitted …
    0
};

/*
 * The string s is a representation of the body
 * of this program from '0'
 * to the end.
 */

main()
{
    int i;

    printf("char\ts[] = {\n");
    for(i = 0; s[i]; i++)
        printf("\t%d,\n", s[i]);
    printf("%s", s);
}

다음으로, 이스케이프 시퀀스와 같은 이스케이프 시퀀스가 '\n'ASCII 코드 10을 나타내는 것으로 컴파일러에게 어떻게 배울 지 궁금 할 것입니다 . 답은 C 컴파일러 어딘가에 문자 리터럴을 해석하는 루틴이 있으며, 이와 같은 조건을 포함하여 백 슬래시 시퀀스를 인식합니다.


c = next();
if (c != '\\') return c;        /* A normal character */
c = next();
if (c == '\\') return '\\';     /* Two backslashes in the code means one backslash */
if (c == 'r')  return '\r';     /* '\r' is a carriage return */

위 코드에 하나의 조건을 추가 할 수 있습니다.

if (c == 'n')  return 10;       /* '\n' is a newline */

'\n'ASCII 10 나타내는 것을 알고있는 컴파일러를 만드는 것 . 흥미롭게도, 그 컴파일러 와 그에 의해 컴파일 된 모든 후속 컴파일러는 그 매핑을 "알고"있으므로 다음 세대의 소스 코드에서 마지막 행을 다음과 같이 변경할 수 있습니다.

if (c == 'n')  return '\n';

… 그리고 옳은 일을 할 것입니다! 10컴파일러에서 온다, 더 이상 명시 적으로 컴파일러의 소스 코드에 정의 될 필요가있다. 1

That is one example of a C language feature that was implemented in C code. Now, repeat that process for every single language feature, and you have a "self-hosting" compiler: a C compiler that is written in C.


1 The plot twist described in the paper is that since the compiler can be "taught" facts like this, it can also be mis-taught to generate trojaned executables in a way that is difficult to detect, and such an act of sabotage can persist in all compilers produced by the tainted compiler.


You have already gotten a very good answer, however I want to offer you a different perspective, that will hopefully be enlightening to you. Let's first establish two facts that we can both agree on:

  1. The CoffeeScript compiler is a program which can compile programs written in CoffeeScript.
  2. The CoffeeScript compiler is a program written in CoffeeScript.

I'm sure you can agree that both #1 and #2 are true. Now, look at the two statements. Do you see now that it is completely normal for the CoffeeScript compiler to be able to compile the CoffeeScript compiler?

The compiler doesn't care what it compiles. As long as it's a program written in CoffeeScript, it can compile it. And the CoffeeScript compiler itself just happens to be such a program. The CoffeeScript compiler doesn't care that it's the CoffeeScript compiler itself it is compiling. All it sees is some CoffeeScript code. Period.

How can a compiler compile itself, or what does this statement mean?

Yes, that's exactly what that statement means, and I hope you can see now how that statement is true.


How can a compiler compile itself, or what does this statement mean?

It means exactly that. First of all, some things to consider. There are four objects we need to look at:

  • The source code of any arbitrary CoffeScript program
  • The (generated) assembly of any arbitrary CoffeScript program
  • The source code of the CoffeScript compiler
  • The (generated) assembly of the CoffeScript compiler

Now, it should be obvious that you can use the generated assembly - the executable - of the CoffeScript compiler to compile any arbitrary CoffeScript program, and generate the assembly for that program.

Now, the CoffeScript compiler itself is just an arbitrary CoffeScript program, and thus, it can be compiled by the CoffeScript compiler.

It seems that your confusion stems from the fact that when you create your own new language, you don't have a compiler yet you can use to compile your compiler. This surely looks like an chicken-egg problem, right?

Introduce the process called bootstrapping.

  1. You write a compiler in an already existing language (in case of CoffeScript, the original compiler was written in Ruby) that can compile a subset of the new language
  2. You write a compiler that can compile a subset of the new language in the new language itself. You can only use language features the compiler from the step above can compile.
  3. You use the compiler from step 1 to compile the compiler from step 2. This leaves you with an assembly that was originally written in a subset of the new language, and that is able to compile a subset of the new language.

Now you need to add new features. Say you have only implemented while-loops, but also want for-loops. This isn't a problem, since you can rewrite any for-loop in such a way that it is a while-loop. This means you can only use while-loops in the source code of your compiler, since the assembly you have at hand can only compile those. But you can create functions inside your compiler that can pase and compile for-loops with it. Then you use the assembly you already have, and compile the new compiler version. And now you have an assembly of an compiler that can also parse and compile for-loops! You can now go back to the source file of your compiler, and rewrite any while-loops you don't want into for-loops.

Rinse and repeat until all language features that are desired can be compiled with the compiler.

while and for obviously were only examples, but this works for any new language feature you want. And then you are in the situation CoffeScript is in now: The compiler compiles itself.

There is much literature out there. Reflections on Trusting Trust is a classic everyone interested in that topic should read at least once.


A small but important clarification

Here the term compiler glosses over the fact that there are two files involved. One is an executable which takes as input files written in CoffeScript and produces as its output file another executable, a linkable object file, or a shared library. The other is a CoffeeScript source file which just happens to describe the procedure for compiling CoffeeScript.

You apply the first file to the second, producing a third which is capable of performing the same act of compilation as the first (possibly more, if the second file defines features not implemented by the first), and so may replace the first if you so desire.


  1. The CoffeeScript compiler was first written in Ruby.
  2. The CoffeeScript compiler was then re-written in CoffeeScript.

Since the Ruby version of the CoffeeScript compiler already existed, it was used to create the CoffeeScript version of the CoffeeScript compiler.

enter image description here This is known as a self-hosting compiler.

It's extremely common, and usually results from an author's desire to use their own language to maintain that language's growth.


It's not a matter of compilers here, but a matter of expressiveness of the language, since a compiler is just a program written in some language.

When we say that "a language is written/implemented" we actually mean that a compiler or interpreter for that language is implemented. There are programming languages in which you can write programs that implement the language (are compilers/interpreters for the same language). These languages are called universal languages.

In order to be able to understand this, think about a metal lathe. It is a tool used to shape metal. It is possible, using just that tool, to create another, identical tool, by creating its parts. Thus, that tool is a universal machine. Of course, the first one was created using other means (other tools), and was probably of lower quality. But the first one was used to build new ones with higher precision.

A 3D printer is almost a universal machine. You can print the whole 3D printer using a 3D printer (you can't build the tip that melts the plastic).


Proof by induction

Inductive step

The n+1th version of the compiler is written in X.

Thus it can be compiled by the nth version of the compiler (also written in X).

Base case

But the first version of the compiler written in X must be compiled by a compiler for X that is written in a language other than X. This step is called bootstrapping the compiler.


Compilers take a high-level specification and turn it into a low-level implementation, such as can be executed on hardware. Therefore there is no relationship between the format of the specification and the actual execution besides the semantics of the language being targeted.

Cross-compilers move from one system to another system, cross-language compilers compile one language specification into another language specification.

Basically compiling is a just translation, and the level is usually higher-level of language to lower-level of language, but there are many variants.

Bootstrapping compilers are the most confusing, of course, because they compile the language they are written in. Don't forget the initial step in bootstrapping which requires at least a minimal existing version that is executable. Many bootstrapped compilers work on the minimal features of a programming language first and add additional complex language features going forward as long as the new feature can be expressed using the previous features. If that were not the case it would require to have that part of the "compiler" be developed in another language beforehand.

참고URL : https://stackoverflow.com/questions/38005656/how-can-a-compiler-compile-itself

반응형