說明
- 根據 How to Benchmark Code Execution Times on Intel® IA-32 and IA-64 Instruction Set Architectures 這篇文章
下載
結果
- 每個 loop 執行 1000 次,關閉大部分會影響測量的功能
Loading hello module...
loop_size:0 >>>> variance(cycles): 3; max_deviation: 8 ;min time: 44
loop_size:1 >>>> variance(cycles): 3; max_deviation: 28 ;min time: 44
loop_size:2 >>>> variance(cycles): 3; max_deviation: 12 ;min time: 44
loop_size:3 >>>> variance(cycles): 5; max_deviation: 40 ;min time: 44
loop_size:4 >>>> variance(cycles): 4; max_deviation: 32 ;min time: 44
loop_size:5 >>>> variance(cycles): 5; max_deviation: 32 ;min time: 44
loop_size:6 >>>> variance(cycles): 6; max_deviation: 48 ;min time: 44
loop_size:7 >>>> variance(cycles): 1; max_deviation: 32 ;min time: 48
loop_size:8 >>>> variance(cycles): 4; max_deviation: 20 ;min time: 48
loop_size:9 >>>> variance(cycles): 7; max_deviation: 48 ;min time: 48
loop_size:10 >>>> variance(cycles): 5; max_deviation: 32 ;min time: 48
loop_size:11 >>>> variance(cycles): 10; max_deviation: 84 ;min time: 48
.........
.........
loop_size:994 >>>> variance(cycles): 1922; max_deviation: 1388 ;min time: 2028
loop_size:995 >>>> variance(cycles): 0; max_deviation: 0 ;min time: 2032
loop_size:996 >>>> variance(cycles): 1923; max_deviation: 1388 ;min time: 2032
loop_size:997 >>>> variance(cycles): 0; max_deviation: 0 ;min time: 2036
loop_size:998 >>>> variance(cycles): 3; max_deviation: 4 ;min time: 2036
loop_size:999 >>>> variance(cycles): 1815; max_deviation: 1348 ;min time: 2040
total number of spurious min values = 0
total variance = 2520492
absolute max deviation = 1144364
variance of variances = 17554753199565
variance of minimum values = 335594
- 每個 loop 執行 1000000 次,關閉大部分會影響測量的功能
Loading hello module...
loop_size:0 >>>> variance(cycles): 809; max_deviation: 23816 ;min time: 44
loop_size:1 >>>> variance(cycles): 405; max_deviation: 19300 ;min time: 44
loop_size:2 >>>> variance(cycles): 41; max_deviation: 4992 ;min time: 44
loop_size:3 >>>> variance(cycles): 13; max_deviation: 1920 ;min time: 44
loop_size:4 >>>> variance(cycles): 6300; max_deviation: 65320 ;min time: 44
loop_size:5 >>>> variance(cycles): 378; max_deviation: 19012 ;min time: 44
loop_size:6 >>>> variance(cycles): 2512; max_deviation: 46956 ;min time: 44
loop_size:7 >>>> variance(cycles): 14308; max_deviation: 109424 ;min time: 48
loop_size:8 >>>> variance(cycles): 128449; max_deviation: 357728 ;min time: 48
loop_size:9 >>>> variance(cycles): 1696; max_deviation: 40980 ;min time: 48
loop_size:10 >>>> variance(cycles): 834; max_deviation: 22336 ;min time: 48
loop_size:11 >>>> variance(cycles): 4143; max_deviation: 63780 ;min time: 48
.........
.........
loop_size:994 >>>> variance(cycles): 914214; max_deviation: 668016 ;min time: 2028
loop_size:995 >>>> variance(cycles): 1596810; max_deviation: 728892 ;min time: 2032
loop_size:996 >>>> variance(cycles): 1775690; max_deviation: 866988 ;min time: 2032
loop_size:997 >>>> variance(cycles): 2589904; max_deviation: 984516 ;min time: 2036
loop_size:998 >>>> variance(cycles): 957907; max_deviation: 677884 ;min time: 2036
loop_size:999 >>>> variance(cycles): 1254143; max_deviation: 748936 ;min time: 2040
total number of spurious min values = 4
total variance = 2631291
absolute max deviation = 246593400
variance of variances = 17487031211352
variance of minimum values = 335929
- 每個 loop 執行 1000000 次,開啟大部分會影響測量的功能
Loading hello module...
loop_size:0 >>>> variance(cycles): 2425; max_deviation: 49056 ;min time: 42
loop_size:1 >>>> variance(cycles): 20; max_deviation: 3444 ;min time: 42
loop_size:2 >>>> variance(cycles): 26; max_deviation: 2697 ;min time: 42
loop_size:3 >>>> variance(cycles): 97; max_deviation: 4395 ;min time: 42
loop_size:4 >>>> variance(cycles): 40; max_deviation: 2826 ;min time: 42
loop_size:5 >>>> variance(cycles): 1437; max_deviation: 27309 ;min time: 42
loop_size:6 >>>> variance(cycles): 30; max_deviation: 2802 ;min time: 42
loop_size:7 >>>> variance(cycles): 6; max_deviation: 2541 ;min time: 42
loop_size:8 >>>> variance(cycles): 13; max_deviation: 2433 ;min time: 45
loop_size:9 >>>> variance(cycles): 60; max_deviation: 3594 ;min time: 42
loop_size:10 >>>> variance(cycles): 35; max_deviation: 2661 ;min time: 45
loop_size:11 >>>> variance(cycles): 31; max_deviation: 3534 ;min time: 45
.........
.........
loop_size:994 >>>> variance(cycles): 32588; max_deviation: 46620 ;min time: 1935
loop_size:995 >>>> variance(cycles): 11208; max_deviation: 22932 ;min time: 1935
loop_size:996 >>>> variance(cycles): 9178; max_deviation: 15753 ;min time: 1938
loop_size:997 >>>> variance(cycles): 11525; max_deviation: 55938 ;min time: 1938
loop_size:998 >>>> variance(cycles): 62386; max_deviation: 229224 ;min time: 1941
loop_size:999 >>>> variance(cycles): 7847; max_deviation: 6255 ;min time: 1944
total number of spurious min values = 4
total variance = 103398
absolute max deviation = 1191852
variance of variances = 132114077117
variance of minimum values = 306145
心得
- 每個迴圈 1000 次跟 1000000 次的測量結果大致相同,僅有少部分有些微差異,這代表不需要太多次的測量就能得出還能接受的結果。開了所有功能且在使用中的測量結果略快於啥都沒開,可能是因為核心數多或是有開 turbo mode 的關係。我不清楚 RDTSC 用在多核心 CPU 上會不會產生奇怪的結果,至少數據部分看起來算正常,跑越多指令就越慢。另外這測量結果是有經過重開機的,所以數據會比較漂亮,我印象中重開機前跑出來的結果在 total number of spurious min values 這一項應該是有約 100,好在誤差也都小於 20 個 cycle,由於我只需要相對而非絕對的速度,應該不會有任何影響
-
0000000000000000 <measured_loop>: 0: 31 c0 xor eax,eax 2: 85 c9 test ecx,ecx 4: 74 17 je 1d <measured_loop+0x1d> 6: 66 2e 0f 1f 84 00 00 nop WORD PTR cs:[rax+rax*1+0x0] d: 00 00 00 10: 83 c0 01 add eax,0x1 13: c7 02 01 00 00 00 mov DWORD PTR [rdx],0x1 19: 39 c8 cmp eax,ecx 1b: 75 f3 jne 10 <measured_loop+0x10> 1d: f3 c3 repz ret 1f: 90 nop
- 根據 assembly code,每次迴圈應該做了 add, mov, cmp, jne 四個指令。沒開任何功能時,可以由之前的結果計算出,從第 68 個迴圈開始,每兩個迴圈需要花 4 個 cycle;開了所有功能時,則是從第 112 個迴圈開始,每 65 個迴圈需要花 114 個 cycle,非常固定
沒有留言:
張貼留言