V8에서이 코드 스 니펫을 사용하여 <=가 <보다 느린 이유는 무엇입니까?

Programing

V8에서이 코드 스 니펫을 사용하여 <=가 <보다 느린 이유는 무엇입니까?

crosscheck 2020. 5. 30. 09:22

V8에서이 코드 스 니펫을 사용하여 <=가 <보다 느린 이유는 무엇입니까?

V8을 사용하여 Javascript 속도 제한을 깨는 슬라이드를 읽고 있으며 아래 코드와 같은 예가 있습니다. 왜이 경우 <=보다 느린 지 알 수 없습니다. <아무도 설명 할 수 있습니까? 모든 의견을 부탁드립니다.

느린:

this.isPrimeDivisible = function(candidate) {
    for (var i = 1; i <= this.prime_count; ++i) {
        if (candidate % this.primes[i] == 0) return true;
    }
    return false;
}

(힌트 : 소수는 길이 prime_count의 배열입니다)

더 빠름 :

this.isPrimeDivisible = function(candidate) {
    for (var i = 1; i < this.prime_count; ++i) {
        if (candidate % this.primes[i] == 0) return true;
    }
    return false;
}

[자세한 정보] 로컬 환경 테스트에서 속도 향상이 크게 향상되었습니다. 결과는 다음과 같습니다.

V8 version 7.3.0 (candidate)

느린:

 time d8 prime.js
 287107
 12.71 user 
 0.05 system 
 0:12.84 elapsed

더 빠름 :

time d8 prime.js
287107
1.82 user 
0.01 system 
0:01.84 elapsed

Google V8에서 일하고 있으며 기존 답변과 의견에 대한 추가 정보를 제공하고자했습니다.

참고로 슬라이드 의 전체 코드 예제는 다음과 같습니다.

var iterations = 25000;

function Primes() {
  this.prime_count = 0;
  this.primes = new Array(iterations);
  this.getPrimeCount = function() { return this.prime_count; }
  this.getPrime = function(i) { return this.primes[i]; }
  this.addPrime = function(i) {
    this.primes[this.prime_count++] = i;
  }
  this.isPrimeDivisible = function(candidate) {
    for (var i = 1; i <= this.prime_count; ++i) {
      if ((candidate % this.primes[i]) == 0) return true;
    }
    return false;
  }
};

function main() {
  var p = new Primes();
  var c = 1;
  while (p.getPrimeCount() < iterations) {
    if (!p.isPrimeDivisible(c)) {
      p.addPrime(c);
    }
    c++;
  }
  console.log(p.getPrime(p.getPrimeCount() - 1));
}

main();

무엇보다도 성능 차이는 <and <=연산자와 직접 관련 이 없습니다 . 따라서 <=스택 오버플로에서 느리다는 것을 읽었으므로 코드에서 피하기 위해 농구 대를 뛰어 넘지 마십시오.

둘째, 사람들은 배열이 "홀리"라고 지적했다. 이것은 OP 게시물의 코드 스 니펫에서 명확하지 않지만 초기화하는 코드를 볼 때 분명합니다 this.primes.

this.primes = new Array(iterations);

와 배열이 결과 종류 소자 어레이가 끝나더라도는 V8에 완전히 연속 / 충전 / 충전. 일반적으로, 구멍 투성이의 배열에 대한 작업은 포장 배열에 대한 작업을보다 느린하지만,이 경우의 차이는 무시할 수있다 : 그것은 1 추가들은 SMI (금액 작은 정수 (구멍을 방지하려면)) 확인란 우리가 공격 할 때마다 내에서 루프 . 별거 아니야!HOLEYthis.primes[i]isPrimeDivisible

TL; DR 여기서 배열 HOLEY은 문제가되지 않습니다.

다른 사람들은 코드가 범위를 벗어났다고 지적했습니다. 일반적으로 배열의 길이를 넘어서 읽지 않는 것이 좋습니다 .이 경우 실제로 성능이 크게 떨어지는 것을 피했을 것입니다. 그런데 왜? V8은 성능에 약간의 영향 만 미치면서 이러한 아웃 바운드 시나리오 중 일부를 처리 할 수 있습니다. 그렇다면이 특별한 경우에 특별한 점은 무엇입니까?

아웃 오브 경계에서 결과를 읽을 수 this.primes[i]있는 undefined이 줄에 :

if ((candidate % this.primes[i]) == 0) return true;

그리고 그것은 우리를 진짜 문제로 만듭니다 : %연산자는 이제 정수가 아닌 피연산자와 함께 사용되고 있습니다!

integer % someOtherInteger매우 효율적으로 계산 될 수 있습니다. JavaScript 엔진은이 경우에 최적화 된 머신 코드를 생성 할 수 있습니다.
integer % undefined반면 에 이중으로 표시 Float64Mod되기 때문에 효율성이 떨어집니다 undefined.

코드 스 니펫은 실제로 다음 줄 에서 <=를 변경하여 향상시킬 수 있습니다 <.

for (var i = 1; i <= this.prime_count; ++i) {

... <=어떻게 든보다 우수한 연산자 <이기 때문에이 특별한 경우에 범위를 벗어난 읽기를 피하기 때문입니다.

다른 답변과 의견에 따르면 두 루프의 차이점은 첫 번째 루프가 두 번째 루프보다 하나 더 많은 반복을 실행한다는 것입니다. 이것은 사실이지만 25,000 개의 요소로 커지는 배열에서는 한 번의 반복이 약간의 차이를 만들뿐입니다. 야구장 추측으로, 우리가 자라면서 평균 길이가 12,500이라고 가정하면, 우리가 기대할 수있는 차이는 1 / 12,500 정도 또는 0.008 %에 불과합니다.

여기에서 성능 차이는 한 번의 추가 반복으로 설명되는 것보다 훨씬 크며 문제는 프레젠테이션 끝 부분에서 설명합니다.

this.primes 연속 배열이며 (모든 요소에 값이 있음) 요소는 모두 숫자입니다.

JavaScript 엔진은 이러한 배열을 숫자를 포함하지만 다른 값을 포함하거나 포함하지 않을 수있는 객체의 배열 대신 실제 숫자의 간단한 배열로 최적화 할 수 있습니다. 첫 번째 형식은 액세스 속도가 훨씬 빠릅니다. 코드가 적고 배열이 훨씬 작아 캐시에 더 적합합니다. 그러나이 최적화 된 형식을 사용하지 못하게하는 몇 가지 조건이 있습니다.

배열 요소 중 일부가 누락 된 경우가 있습니다. 예를 들면 다음과 같습니다.

let array = [];
a[0] = 10;
a[2] = 20;

이제의 가치는 a[1]무엇입니까? 그것은 값이 없습니다 . (값이 있다고 말하는 것은 옳지 않습니다 undefined- 값을 포함 undefined하는 배열 요소가 완전히 누락 된 배열 요소와 다릅니다.)

숫자로만 표현할 수있는 방법이 없으므로 JavaScript 엔진은 덜 최적화 된 형식을 사용해야합니다. a[1]다른 두 요소와 같은 숫자 값 이 포함 된 경우 배열은 숫자 배열로만 최적화 될 수 있습니다.

배열을 최적화되지 않은 형식으로 강제 설정해야하는 또 다른 이유는 프레젠테이션에서 설명한대로 배열 경계 외부의 요소에 액세스하려는 경우 일 수 있습니다.

첫 번째 루프 <=는 배열의 끝을지나 요소를 읽으려고 시도합니다. 마지막 추가 반복에서 알고리즘이 여전히 올바르게 작동합니다.

this.primes[i]배열 끝을 지났기 undefined때문에 i로 평가됩니다 .
candidate % undefined(의 값에 해당 candidate)은로 평가됩니다 NaN.
NaN == 0로 평가됩니다 false.
따라서이 return true실행되지 않습니다.

따라서 추가 반복이 발생하지 않은 것처럼 나머지 논리에는 영향을 미치지 않습니다. 코드는 추가 반복이없는 것과 동일한 결과를 생성합니다.

그러나 거기에 도달하기 위해 배열의 끝을지나 존재하지 않는 요소를 읽으려고했습니다. 이로 인해 어레이가 최적화되지 않거나 적어도이 대화 시점에 수행되었습니다.

와 두 번째 루프 <는 배열 내에 존재하는 요소 만 읽으므로 최적화 된 배열과 코드를 허용합니다.

문제는 연설의 90-91 페이지에 설명되어 있으며 그 전후 페이지에서 관련 토론이 있습니다.

나는이 Google I / O 프리젠 테이션에 참석 한 후 스피커 (V8 저자 중 한 명)와 이야기를 나 ed습니다. 나는 자신의 코드에서 배열의 끝을 지나서 특정 상황을 최적화하기위한 오도 된 (후시) 시도로 읽는 기술을 사용하고있었습니다. 그는 배열의 끝을 지나서 읽으 려고 시도 하면 간단한 최적화 형식을 사용하지 못하게 될 것이라고 확인했습니다.

V8 저자가 말한 내용이 여전히 사실이라면, 배열의 끝을 지나서 읽으면 배열이 최적화되지 않고 느린 형식으로 돌아 가야합니다.

이제 V8이이 경우를 효율적으로 처리하기 위해 개선되었거나 다른 JavaScript 엔진이 다르게 처리 할 수 있습니다. 나는 그것에 대해 어떤 식 으로든 다른 방법을 모르지만,이 최적화 해제는 프레젠테이션이 말한 것입니다.

TL; DR 느린 루프는 Array 'out-of-bounds'에 액세스하기 때문에 엔진이 최적화를하지 않거나 전혀 최적화하지 않고 함수를 다시 컴파일하거나 이러한 최적화로 함수를 컴파일하지 않아야합니다. (JIT-) 컴파일러가 첫 번째 컴파일 'version'전에이 조건을 감지 / 의심 한 경우)를 아래에서 읽으십시오.

누군가 단지 가 이 (완전히 깜짝 놀라게 아무도 이미 않았다) 말 :
영업의 조각은 프로그래밍 책 개요 의도 초보자의 사실상의 예가 될 것이다 때 시간이있을 사용 / 자바 스크립트 '배열'인덱스 시작임을 강조 1이 아닌 0에서 시작하여 일반적인 '초보자 실수'의 예로 사용됩니다 ( '프로그래밍 오류'구절을 피하는 방법을 좋아하지 않습니다 ;)) : 범위를 벗어난 배열 액세스 .

예 1 : 0 기반 인덱싱 (항상 ES262에서는)을 사용하는 5 개 요소
의 Dense Array(인접한 (인덱스 사이에 간격이 없음을 의미하며 실제로 각 인덱스의 요소))

var arr_five_char=['a', 'b', 'c', 'd', 'e']; // arr_five_char.length === 5
//  indexes are:    0 ,  1 ,  2 ,  3 ,  4    // there is NO index number 5

Thus we are not really talking about performance difference between < vs <= (or 'one extra iteration'), but we are talking:
'why does the correct snippet (b) run faster than erroneous snippet (a)'?

The answer is 2-fold (although from a ES262 language implementer's perspective both are forms of optimization):

Data-Representation: how to represent/store the Array internally in memory (object, hashmap, 'real' numerical array, etc.)
Functional Machine-code: how to compile the code that accesses/handles (read/modify) these 'Arrays'

Item 1 is sufficiently (and correctly IMHO) explained by the accepted answer, but that only spends 2 words ('the code') on Item 2: compilation.

More precisely: JIT-Compilation and even more importantly JIT-RE-Compilation !

The language specification is basically just a description of a set of algorithms ('steps to perform to achieve defined end-result'). Which, as it turns out is a very beautiful way to describe a language. And it leaves the actual method that an engine uses to achieve specified results open to the implementers, giving ample opportunity to come up with more efficient ways to produce defined results. A spec conforming engine should give spec conforming results for any defined input.

Now, with javascript code/libraries/usage increasing, and remembering how much resources (time/memory/etc) a 'real' compiler uses, it's clear we can't make users visiting a web-page wait that long (and require them to have that many resources available).

Imagine the following simple function:

function sum(arr){
  var r=0, i=0;
  for(;i<arr.length;) r+=arr[i++];
  return r;
}

Perfectly clear, right? Doesn't require ANY extra clarification, Right? The return-type is Number, right?
Well.. no, no & no... It depends on what argument you pass to named function parameter arr...

sum('abcde');   // String('0abcde')
sum([1,2,3]);   // Number(6)
sum([1,,3]);    // Number(NaN)
sum(['1',,3]);  // String('01undefined3')
sum([1,,'3']);  // String('NaN3')
sum([1,2,{valueOf:function(){return this.val}, val:6}]);  // Number(9)
var val=5; sum([1,2,{valueOf:function(){return val}}]);   // Number(8)

See the problem ? Then consider this is just barely scraping the massive possible permutations... We don't even know what kind of TYPE the function RETURN until we are done...

Now imagine this same function-code actually being used on different types or even variations of input, both completely literally (in source code) described and dynamically in-program generated 'arrays'..

Thus, if you were to compile function sum JUST ONCE, then the only way that always returns the spec-defined result for any and all types of input then, obviously, only by performing ALL spec-prescribed main AND sub steps can guarantee spec conforming results (like an unnamed pre-y2k browser). No optimizations (because no assumptions) and dead slow interpreted scripting language remains.

JIT-Compilation (JIT as in Just In Time) is the current popular solution.

So, you start to compile the function using assumptions regarding what it does, returns and accepts.
you come up with checks as simple as possible to detect if the function might start returning non-spec conformant results (like because it receives unexpected input). Then, toss away the previous compiled result and recompile to something more elaborate, decide what to do with the partial result you already have (is it valid to be trusted or compute again to be sure), tie in the function back into the program and try again. Ultimately falling back to stepwise script-interpretation as in spec.

All of this takes time!

All browsers work on their engines, for each and every sub-version you will see things improve and regress. Strings were at some point in history really immutable strings (hence array.join was faster than string concatenation), now we use ropes (or similar) which alleviate the problem. Both return spec-conforming results and that is what matters!

Long story short: just because javascript's language's semantics often got our back (like with this silent bug in the OP's example) does not mean that 'stupid' mistakes increases our chances of the compiler spitting out fast machine-code. It assumes we wrote the 'usually' correct instructions: the current mantra we 'users' (of the programming language) must have is: help the compiler, describe what we want, favor common idioms (take hints from asm.js for basic understanding what browsers can try to optimize and why).

Because of this, talking about performance is both important BUT ALSO a mine-field (and because of said mine-field I really want to end with pointing to (and quoting) some relevant material:

Access to nonexistent object properties and out of bounds array elements returns the undefined value instead of raising an exception. These dynamic features make programming in JavaScript convenient, but they also make it difficult to compile JavaScript into efficient machine code.

...

An important premise for effective JIT optimization is that programmers use dynamic features of JavaScript in a systematic way. For example, JIT compilers exploit the fact that object properties are often added to an object of a given type in a specific order or that out of bounds array accesses occur rarely. JIT compilers exploit these regularity assumptions to generate efficient machine code at runtime. If a code block satisfies the assumptions, the JavaScript engine executes efficient, generated machine code. Otherwise, the engine must fall back to slower code or to interpreting the program.

Source:
"JITProf: Pinpointing JIT-unfriendly JavaScript Code"
Berkeley publication,2014, by Liang Gong, Michael Pradel, Koushik Sen.
http://software-lab.org/publications/jitprof_tr_aug3_2014.pdf

ASM.JS (also doesn't like out off bound array access):

Ahead-Of-Time Compilation

Because asm.js is a strict subset of JavaScript, this specification only defines the validation logic—the execution semantics is simply that of JavaScript. However, validated asm.js is amenable to ahead-of-time (AOT) compilation. Moreover, the code generated by an AOT compiler can be quite efficient, featuring:

unboxed representations of integers and floating-point numbers;

absence of runtime type checks;

absence of garbage collection; and

efficient heap loads and stores (with implementation strategies varying by platform).

Code that fails to validate must fall back to execution by traditional means, e.g., interpretation and/or just-in-time (JIT) compilation.

http://asmjs.org/spec/latest/

and finally https://blogs.windows.com/msedgedev/2015/05/07/bringing-asm-js-to-chakra-microsoft-edge/
were there is a small subsection about the engine's internal performance improvements when removing bounds-check (whilst just lifting the bounds-check outside the loop already had an improvement of 40%).

EDIT:
note that multiple sources talk about different levels of JIT-Recompilation down to interpretation.

Theoretical example based on above information, regarding the OP's snippet:

Call to isPrimeDivisible
Compile isPrimeDivisible using general assumptions (like no out of bounds access)
Do work
BAM, suddenly array accesses out of bounds (right at the end).
Crap, says engine, let's recompile that isPrimeDivisible using different (less) assumptions, and this example engine doesn't try to figure out if it can reuse current partial result, so
Recompute all work using slower function (hopefully it finishes, otherwise repeat and this time just interpret the code).
Return result

Hence time then was:
First run (failed at end) + doing all work all over again using slower machine-code for each iteration + the recompilation etc.. clearly takes >2 times longer in this theoretical example!

EDIT 2: (disclaimer: conjecture based in facts below)
The more I think of it, the more I think that this answer might actually explain the more dominant reason for this 'penalty' on erroneous snippet a (or performance-bonus on snippet b, depending on how you think of it), precisely why I'm adament in calling it (snippet a) a programming error:

It's pretty tempting to assume that this.primes is a 'dense array' pure numerical which was either

Hard-coded literal in source-code (known excelent candidate to become a 'real' array as everything is already known to the compiler before compile-time) OR
most likely generated using a numerical function filling a pre-sized (new Array(/*size value*/)) in ascending sequential order (another long-time known candidate to become a 'real' array).

We also know that the primes array's length is cached as prime_count ! (indicating it's intent and fixed size).

We also know that most engines initially pass Arrays as copy-on-modify (when needed) which makes handeling them much more fast (if you don't change them).

It is therefore reasonable to assume that Array primes is most likely already an optimized array internally which doesn't get changed after creation (simple to know for the compiler if there is no code modifiying the array after creation) and therefore is already (if applicable to the engine) stored in an optimized way, pretty much as if it was a Typed Array.

As I have tried to make clear with my sum function example, the argument(s) that get passed higly influence what actually needs to happen and as such how that particular code is being compiled to machine-code. Passing a String to the sum function shouldn't change the string but change how the function is JIT-Compiled! Passing an Array to sum should compile a different (perhaps even additional for this type, or 'shape' as they call it, of object that got passed) version of machine-code.

As it seems slightly bonkus to convert the Typed_Array-like primes Array on-the-fly to something_else while the compiler knows this function is not even going to modify it!

Under these assumptions that leaves 2 options:

Compile as number-cruncher assuming no out-of-bounds, run into out-of-bounds problem at the end, recompile and redo work (as outlined in theoretical example in edit 1 above)
Compiler has already detected (or suspected?) out of bound acces up-front and the function was JIT-Compiled as if the argument passed was a sparse object resulting in slower functional machine-code (as it would have more checks/conversions/coercions etc.). In other words: the function was never eligable for certain optimisations, it was compiled as if it received a 'sparse array'(-like) argument.

I now really wonder which of these 2 it is!

To add some scientificness to it, here's a jsperf

https://jsperf.com/ints-values-in-out-of-array-bounds

It tests the control case of an array filled with ints and looping doing modular arithmetic while staying within bounds. It has 5 test cases:

1. Looping out of bounds
2. Holey arrays
3. Modular arithmetic against NaNs
4. Completely undefined values
5. Using a new Array()

It shows that the first 4 cases are really bad for performance. Looping out of bounds is a bit better than the other 3, but all 4 are roughly 98% slower than the best case.
The new Array() case is almost as good as the raw array, just a few percent slower.

참고URL : https://stackoverflow.com/questions/53643962/why-is-slower-than-using-this-code-snippet-in-v8

'Programing' 카테고리의 다른 글

Express에서 멋진 형식의 HTML을 출력하려면 어떻게해야합니까? (0)	2020.05.30
문자열을 n 문자 세그먼트로 나누려면 어떻게해야합니까? (0)	2020.05.30
httpd : ServerName에 127.0.0.1을 사용하여 서버의 정규화 된 도메인 이름을 안정적으로 확인할 수 없습니다. (0)	2020.05.30
float 값이 정수인지 확인하는 방법 (0)	2020.05.30
Java에서 세트를 목록으로 정렬하려면 어떻게합니까? (0)	2020.05.30

현재글V8에서이 코드 스 니펫을 사용하여 <=가 <보다 느린 이유는 무엇입니까?

crosscheck

V8에서이 코드 스 니펫을 사용하여 <=가 <보다 느린 이유는 무엇입니까?

V8에서이 코드 스 니펫을 사용하여 <=가 <보다 느린 이유는 무엇입니까?

'Programing' 카테고리의 다른 글

'Programing'의 다른글

티스토리툴바

V8에서이 코드 스 니펫을 사용하여 &lt;=가 &lt;보다 느린 이유는 무엇입니까?

V8에서이 코드 스 니펫을 사용하여 <=가 <보다 느린 이유는 무엇입니까?

'Programing' 카테고리의 다른 글

'Programing'의 다른글

관련글

티스토리툴바

V8에서이 코드 스 니펫을 사용하여 <=가 <보다 느린 이유는 무엇입니까?