在 Python 上用 __rdtsc,__cpuid 和 __rdtscp 測量效能的解析度
說明
- 根據 How to Benchmark Code Execution Times on Intel® IA-32 and IA-64 Instruction Set Architectures 這篇文章
下載
結果
- 跑一個空迴圈的結果
Loading hello module...
loop_size: 0 >>>> variance(cycles): 9875; max_deviation: 4248 ;min time: 1149
loop_size: 1 >>>> variance(cycles): 15458; max_deviation: 4950 ;min time: 1566
loop_size: 2 >>>> variance(cycles): 20495; max_deviation: 5643 ;min time: 1734
loop_size: 3 >>>> variance(cycles): 15559; max_deviation: 4608 ;min time: 1788
loop_size: 4 >>>> variance(cycles): 19601; max_deviation: 5550 ;min time: 1830
loop_size: 5 >>>> variance(cycles): 19101; max_deviation: 4878 ;min time: 1887
loop_size: 6 >>>> variance(cycles): 53959; max_deviation: 18093 ;min time: 1899
loop_size: 7 >>>> variance(cycles): 19669; max_deviation: 5022 ;min time: 1956
loop_size: 8 >>>> variance(cycles): 12512; max_deviation: 4458 ;min time: 1989
loop_size: 9 >>>> variance(cycles): 20429; max_deviation: 4977 ;min time: 2070
loop_size: 10 >>>> variance(cycles): 16398; max_deviation: 5097 ;min time: 2082
loop_size: 11 >>>> variance(cycles): 23348; max_deviation: 5352 ;min time: 2142
.........
.........
loop_size: 994 >>>> variance(cycles): 61068431; max_deviation: 527208 ;min time: 73176
loop_size: 995 >>>> variance(cycles): 55047806; max_deviation: 541896 ;min time: 73107
loop_size: 996 >>>> variance(cycles): 51354276; max_deviation: 527874 ;min time: 73275
loop_size: 997 >>>> variance(cycles): 39929555; max_deviation: 532137 ;min time: 73317
loop_size: 998 >>>> variance(cycles): 107756539; max_deviation: 718764 ;min time: 73431
loop_size: 999 >>>> variance(cycles): 104747306; max_deviation: 720498 ;min time: 73419
total number of spurious min values = 167
total variance = 12669681
absolute max deviation = 720498
variance of variances = 487873495499190
variance of minimum values = 473095955
minimum value = 1149
心得
- 數值並不穩定,常常有跑比較多迴圈卻比較快的情況出現。總體來說,百分位以上還是有參考價值的。在我的電腦,一秒約有 3484046235 個 cycle,而 QueryPerformanceCounter 一秒記數 3404332 次,去掉參考價值低的誤差部分,兩者精確度相差不多,而且 QueryPerformanceCounter 的每一個 tick 時間是固定的,其實比較好用
資源
回顧
- 其實 SDL 就有內建 QueryPerformanceCounter 相關函數,可以直接使用
沒有留言:
張貼留言