Anyone concerned about Amlogic cheating wrt CPU clockspeeds? Based on tests when more than 3 CPU cores are busy maximum real clockspeed will be limited to 1.2 GHz while cpufreq reported by the kernel is still at 1.5GHz.
IMHO. I saw your comparison chart. Consider. S912 has two clusters. The first (4 cores) has a limit of 1000. The second (4 cores) - 1500. The first cluster cannot switch to a frequency higher than 1000 (it cannot switch to 1200). When using 8 cores, the test calculates the âaverageâ. 1000 + 1500 = 2500 \ 2 =1200 ⌠I have doubts that the test program is able to calculate frequencies for different clusters separately. As far as I know, cluster 1000 works first and then (if necessary) 1500 connects to it.
Oh! I was not aware of the little cluster being that limited. Should be easy to check using taskset
:
taskset -c 0-3 sysbench --test=cpu run --num-threads=4 --cpu-max-prime=20000
taskset -c 4-7 sysbench --test=cpu run --num-threads=4 --cpu-max-prime=20000
taskset -c 3 sysbench --test=cpu run --num-threads=1 --cpu-max-prime=20000
taskset -c 7 sysbench --test=cpu run --num-threads=1 --cpu-max-prime=20000
The numbers generated by the last 2 runs should be exactly four times larger than the two first runs.
Back then when you tested results with just 4 threads already showed a drop in performance. CuriousâŚ
I would believe itâs a bit different. The S912 is a TV box SoC where big.LITTLE makes no sense at all. We already know that DVFS/cpufreq scaling is controlled by a BLOB and the values reported to and by the kernel are all bogus.
I would assume (and the tests you did almost half a year ago confirmed that) that the DVFS code simply clocks all CPU cores at 1200 MHz when multithreaded loads on more than 4 cores are running (while reporting bogus cpufreq and both /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
and /sys/devices/system/cpu/cpu4/cpufreq/scaling_cur_freq
cheating on us).
Sysbench can also be used to identify this since it provides min, max and average values. If the little cluster would really be running just at 1.0 GHz while the big one runs at 1.5GHz those 4 sysbench output lines would reveal this:
per-request statistics:
min: 2.91ms
avg: 3.08ms
max: 5.51ms
approx. 95 percentile: 3.25ms
But itâs important to switch to performance
cpufreq governor prior to executing the tests:
echo performance >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo performance >/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor
Another very simple test: openssl speed -elapsed -evp aes-128-cbc
These are the results of other Cortex-A53 with ARMv8 Crypto Extensions. Single threaded operation:
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
H6 / 1.8 GHz 226657.97k 606014.83k 1013054.98k 1259576.66k 1355773.27k
S5P6818/1.6 GHz 200591.68k 538595.61k 900359.25k 1115728.97k 1204936.70k
RK3328/1.3 GHz 163161.40k 436259.80k 729289.90k 906723.33k 975929.34k
A64 / 1152 MHz 144995.37k 387488.51k 648090.20k 805775.36k 867464.53k
Given S912 is clocking on the little cores with 1.0 GHz and on the big ones with 1.5 GHz the two following lines should show results below A64 (little) and between S5P6818 and H6 (big):
taskset -c 3 openssl speed -elapsed -evp aes-128-cbc
taskset -c 7 openssl speed -elapsed -evp aes-128-cbc
And to see what happens when all 4 big cores are in use itâs just this as script:
#!/bin/bash
while true; do
for i in 0 1 2 3 ; do
taskset -c $i openssl speed -elapsed -evp aes-128-cbc 2>/dev/null &
done
wait
done
And for all 8 cores itâs simply replacing the for
line with for i in 0 1 2 3 4 5 6 7 ; do
So itâs pretty straightforward to check for these issues but of course one needs the board (Iâm not owning a Vim2 so canât test myself)
Observed action 100% CPU load on a single (allegedly 1.5MHz) core, I have VIM2 Max, always run performance governor
openssl speed -elapsed -evp aes-128-cbc You have chosen to measure elapsed time instead of user CPU time. Doing aes-128-cbc for 3s on 16 size blocks: 33910308 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 64 size blocks: 22548575 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 256 size blocks: 9222714 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 1024 size blocks: 2887950 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 8192 size blocks: 389737 aes-128-cbc's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,64) rc4(ptr,char) des(idx,cisc,16,int) aes(partial) blowfish(ptr) compiler: cc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -g -O2 -fdebug-prefix-map=/build/openssl-Bwh9JU/openssl-1.0.2g=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 180854.98k 481036.27k 787004.93k 985753.60k 1064241.83k
on a little core:
> taskset -c 7 openssl speed -elapsed -evp aes-128-cbc > You have chosen to measure elapsed time instead of user CPU time. > Doing aes-128-cbc for 3s on 16 size blocks: 23935181 aes-128-cbc's in 3.00s > Doing aes-128-cbc for 3s on 64 size blocks: 15916089 aes-128-cbc's in 3.00s > Doing aes-128-cbc for 3s on 256 size blocks: 6510493 aes-128-cbc's in 3.00s > Doing aes-128-cbc for 3s on 1024 size blocks: 2038914 aes-128-cbc's in 3.00s > Doing aes-128-cbc for 3s on 8192 size blocks: 275104 aes-128-cbc's in 3.00s > OpenSSL 1.0.2g 1 Mar 2016 > built on: reproducible build, date unspecified > options:bn(64,64) rc4(ptr,char) des(idx,cisc,16,int) aes(partial) blowfish(ptr) > compiler: cc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -g -O2 -fdebug-prefix-map=/build/openssl-Bwh9JU/openssl-1.0.2g=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM > The 'numbers' are in 1000s of bytes per second processed. > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes > aes-128-cbc 127654.30k 339543.23k 555562.07k 695949.31k 751217.32k
Running the script the output is not pretty, but 4 consecutive lines (running on cores 0, 1, 2 & 3)
aes-128-cbc 180892.77k 481128.58k 787118.34k 986037.93k 1064476.67k aes-128-cbc 180906.70k 481173.93k 787130.54k 986139.31k 1064334.68k aes-128-cbc 180406.20k 480401.66k 786049.79k 984578.39k 1061393.75k aes-128-cbc 180655.19k 480587.65k 786338.30k 984979.11k 1063135.91k
Which to me (but I am no expert) looks the same as the single threaded version. And those seem to be pretty much between the S5P6818/1.6 GHz & RK3328/1.3 GHz you quote. But also I donât think supports your initial feeling
For completeness results with the script loading all 8 cores. Here I have deleted intermediate lines to just leave 8 consecutive results:
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 180546.78k 480275.52k 785989.46k 984144.21k 1062712.66k aes-128-cbc 180689.26k 480634.15k 786231.21k 984887.98k 1063277.91k aes-128-cbc 174722.02k 466864.55k 764708.61k 956771.33k 1031809.71k aes-128-cbc 178963.86k 481044.50k 787039.57k 985796.61k 1064260.95k aes-128-cbc 127293.45k 338584.41k 554141.87k 694058.67k 749382.31k aes-128-cbc 127733.72k 339738.88k 555838.55k 696330.58k 751643.31k aes-128-cbc 127748.04k 339756.48k 555897.17k 696391.68k 751504.04k aes-128-cbc 127731.74k 339766.42k 555901.27k 696310.10k 751583.23k
Feeling? These were measurements done by @balbes150 last year: https://forum.armbian.com/topic/2138-armbian-for-amlogic-s912/?do=findComment&comment=43338
I only interpreted the numbers. Sysbench provides execution time and standard deviation so itâs pretty capable of reporting whatâs happening.
For whatever reasons so far no one tested again with sysbench but at least itâs obvious that the cpufreq values able to set and retrieve to access âclockspeedsâ via sysfs
at least for the big cluster are still bogus.
Based on your single threaded tests it looks like this with openssl
:
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
H6 / 1.8 GHz 226657.97k 606014.83k 1013054.98k 1259576.66k 1355773.27k
S5P6818/1.6 GHz 200591.68k 538595.61k 900359.25k 1115728.97k 1204936.70k
S912 / 1416 MHz 180854.98k 481036.27k 787004.93k 985753.60k 1064241.83k
RK3328/1.3 GHz 163161.40k 436259.80k 729289.90k 906723.33k 975929.34k
A64 / 1152 MHz 144995.37k 387488.51k 648090.20k 805775.36k 867464.53k
S912 / 1000 MHz 127654.30k 339543.23k 555562.07k 695949.31k 751217.32k
AES encryption though is something special since this is done on an own special engine when ARMv8 Crypto Extensions are available as on the S912. So still curious how sysbench
results look like as an example of full load directly on the CPU cores.
Itâs pretty simple to let this small script run and report results:
#!/bin/bash
echo performance >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo performance >/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor
for o in 1 4 8 ; do
for i in $(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies) ; do
echo $i >/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo -e "$o cores, $(( $i / 1000)) MHz: \c"
sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=$o 2>&1 | grep 'execution time'
done
done
sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=8 2>&1 | egrep "percentile|min:|max:|avg:"
Or using 7-zipâs benchmark mode (with 7-zip also memory performance plays an important role so itâs not an ideal tool to draw conclusions wrt count of CPU cores and actual clockspeeds. But if 7-zip performance on the big cluster is below RPi 3 numbers then thereâs something seriously wrong):
sudo apt install p7zip
taskset -c 0-3 7zr b -mmt1
taskset -c 0-3 7zr b -mmt4
taskset -c 4-7 7zr b -mmt4
7zr b
Hi tkaiser,
After execute the script, I got this:
1 cores, 100 MHz: execution time (avg/stddev): 58.1148/0.00
1 cores, 250 MHz: execution time (avg/stddev): 47.8097/0.00
1 cores, 500 MHz: execution time (avg/stddev): 63.7481/0.00
1 cores, 667 MHz: execution time (avg/stddev): 53.2392/0.00
1 cores, 1000 MHz: execution time (avg/stddev): 36.7519/0.00
1 cores, 1200 MHz: execution time (avg/stddev): 30.6434/0.00
1 cores, 1512 MHz: execution time (avg/stddev): 25.8836/0.00
4 cores, 100 MHz: execution time (avg/stddev): 12.0569/0.02
4 cores, 250 MHz: execution time (avg/stddev): 14.3230/0.00
4 cores, 500 MHz: execution time (avg/stddev): 12.1902/0.00
4 cores, 667 MHz: execution time (avg/stddev): 11.0352/0.00
4 cores, 1000 MHz: execution time (avg/stddev): 9.1944/0.00
4 cores, 1200 MHz: execution time (avg/stddev): 8.0781/0.00
4 cores, 1512 MHz: execution time (avg/stddev): 6.9720/0.00
8 cores, 100 MHz: execution time (avg/stddev): 11.7022/0.02
8 cores, 250 MHz: execution time (avg/stddev): 9.7152/0.01
8 cores, 500 MHz: execution time (avg/stddev): 7.3731/0.01
8 cores, 667 MHz: execution time (avg/stddev): 6.5240/0.01
8 cores, 1000 MHz: execution time (avg/stddev): 5.3011/0.01
8 cores, 1200 MHz: execution time (avg/stddev): 4.8013/0.02
8 cores, 1512 MHz: execution time (avg/stddev): 4.3739/0.02
min: 2.58ms
avg: 3.39ms
max: 30.63ms
approx. 95 percentile: 3.68ms
Thank you. This was on an Ubuntu Xenial aarch64 OS image?
Based on the numbers the bl30.bin
BLOB youâre using seems to do cpufreq scaling somewhat different compared to @balbes150âs test last year and I fear you were running into throttling exceeding or reaching 80°C at the end of the benchmark?
Anyway: the numbers are still totally bogus. 1 core at 100 MHz needing 58 seconds is impossible when running at 1000 MHz only takes 36.75 seconds.
Yes. It is.But different image.
Whoa - the script has a defect that I can see on gkrellm: it is setting scaling_max_freq on the big cores but then sysbench is running on the little cores. In fact when it got to 4 copies I could see 3 running on little cores and 1 on a big core!
I am no good at scripting but will try a mod in the next minutes.
Some preliminary results.
Some things have changed since last year when @balbes150 tested but some not. Pretty obvious: the cpufreq scaling code running in Linux has only a limited influence on whatâs happening in reality. Same with set and reported clockspeeds.
A task that runs completely inside the CPU cache has to run 10 times slower when running at 100 MHz compared to running at 1000 MHz. This is not the case here, we now even see completely weird relationships between cpufreq in Linux and real clockspeed, see. e.g. those single threaded results where 500 MHz performs lower than 250 MHz:
100: execution time (avg/stddev): 58.1148/0.00
250: execution time (avg/stddev): 47.8097/0.00
500: execution time (avg/stddev): 63.7481/0.00
Obviously whatâs happening below 1000 MHz is totally weird since when translating between cpufreq set and real clockspeeds reported by the benchmark we look at this table:
fake 1 4 8
100 632 763 392
250 769 642 473
500 576 754 623
1000 1000 1000 867
1512 1420 1319 1051
When @balbes150 did the tests last year it looked like this:
fake 1 4 8
100 868 744 512
250 955 812 574
500 1000 866 686
1000 1000 992 964
1512 1448 1200 1200
Still weird behaviour below 1000 MHz but at least somewhat predictable. Also interesting/important: Back then he clearly showed that sysbench
running on the 4 big cores was twice as slow as when running on all 8 cores:
4 cores, 1000 MHz: execution time (avg/stddev): 9.1695/0.00
4 cores, 1512 MHz: execution time (avg/stddev): 7.5821/0.00
8 cores, 1000 MHz: execution time (avg/stddev): 4.7245/0.01
8 cores, 1512 MHz: execution time (avg/stddev): 3.7980/0.01
Which is a clear indication that back then there was no big.LITTLE behaviour implemented under load and all CPU cores were running at 1200 MHz when performing intensive tasks. This has changed now and we see different behaviour but I fear throttling is also involved since the 8 thread results are pretty worse compared to before.
You could give this a try:
#!/bin/bash
echo performance >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo performance >/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor
for o in 1 4 8 ; do
for i in $(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies) ; do
echo $i >/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo $i >/sys/devices/system/cpu/cpu4/cpufreq/scaling_max_freq 2>/dev/null
case $o in
1)
TasksetParm="-c 0"
;;
4)
TasksetParm="-c 0-3"
;;
*)
TasksetParm="-c 0-7"
;;
esac
echo -e "$o cores, $(( $i / 1000)) MHz: \c"
taskset ${TasksetParm} sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=$o 2>&1 | grep 'execution time'
cat /sys/devices/virtual/thermal/thermal_zone0/temp
done
done
Just tried (my own hack!)
#!/bin/bash echo performance >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor echo performance >/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor for o in 1 4 ; do for i in $(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies) ; do echo $i >/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq echo -e "$o cores, $(( $i / 1000)) MHz: \c" taskset -c 0-3 sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=$o 2>&1 | grep 'execution time' done done sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=8 2>&1 | egrep "percentile|min:|max:|avg:"
limiting to 4 cores but setting to always use big cores apart from last run ran on all 8. Observation on gkrellm of core freq and which core was in use was what I expected. But the results are (I think?) bizarre!
# ./s912sysbench.sh 1 cores, 100 MHz: execution time (avg/stddev): 10.2136/0.00 1 cores, 250 MHz: execution time (avg/stddev): 10.2282/0.00 1 cores, 500 MHz: execution time (avg/stddev): 10.1051/0.00 1 cores, 667 MHz: execution time (avg/stddev): 10.0186/0.00 1 cores, 1000 MHz: execution time (avg/stddev): 10.0253/0.00 1 cores, 1200 MHz: execution time (avg/stddev): 10.0355/0.00 1 cores, 1512 MHz: execution time (avg/stddev): 10.0126/0.00 4 cores, 100 MHz: execution time (avg/stddev): 10.0963/0.07 4 cores, 250 MHz: execution time (avg/stddev): 10.1236/0.02 4 cores, 500 MHz: execution time (avg/stddev): 10.0290/0.03 4 cores, 667 MHz: execution time (avg/stddev): 10.0589/0.02 4 cores, 1000 MHz: execution time (avg/stddev): 10.0330/0.02 4 cores, 1200 MHz: execution time (avg/stddev): 10.0162/0.01 4 cores, 1512 MHz: execution time (avg/stddev): 10.0291/0.01 min: 42.40 avg: 51.52 max: 90.89 95th percentile: 66.84
Will try your script now
# ./s912b.sh 1 cores, 100 MHz: execution time (avg/stddev): 10.1409/0.00 44000 1 cores, 250 MHz: execution time (avg/stddev): 10.1826/0.00 43000 1 cores, 500 MHz: execution time (avg/stddev): 10.0523/0.00 43000 1 cores, 667 MHz: execution time (avg/stddev): 10.0149/0.00 44000 1 cores, 1000 MHz: execution time (avg/stddev): 10.0222/0.00 44000 1 cores, 1200 MHz: execution time (avg/stddev): 10.0302/0.00 45000 1 cores, 1512 MHz: execution time (avg/stddev): 10.0332/0.00 45000 4 cores, 100 MHz: execution time (avg/stddev): 10.3109/0.23 44000 4 cores, 250 MHz: execution time (avg/stddev): 10.1222/0.03 43000 4 cores, 500 MHz: execution time (avg/stddev): 10.0484/0.04 44000 4 cores, 667 MHz: execution time (avg/stddev): 10.0538/0.03 45000 4 cores, 1000 MHz: execution time (avg/stddev): 10.0345/0.02 46000 4 cores, 1200 MHz: execution time (avg/stddev): 10.0294/0.01 47000 4 cores, 1512 MHz: execution time (avg/stddev): 10.0166/0.01 49000 8 cores, 100 MHz: execution time (avg/stddev): 10.3265/0.15 45000 8 cores, 250 MHz: execution time (avg/stddev): 10.1160/0.08 46000 8 cores, 500 MHz: execution time (avg/stddev): 10.0486/0.03 46000 8 cores, 667 MHz: execution time (avg/stddev): 10.0346/0.03 47000 8 cores, 1000 MHz: execution time (avg/stddev): 10.0288/0.01 49000 8 cores, 1200 MHz: execution time (avg/stddev): 10.0155/0.02 51000 8 cores, 1512 MHz: execution time (avg/stddev): 10.0166/0.01 53000
Observed results on gkrellm as expected: note the little cores max_freq 1000 so at end of run (8 cores) when script increases freq 1000/1200/1512 gkrellm reports little cores constant at 1000.
Indeed. 10.1 seconds with an Ubuntu 16.04 aarch64 sysbench
binary are achieved with 4 Cortex-A53 running at ~900 MHz (or 2 running at ~1800 MHz or 8 running at ~450 MHz)
Well, this is expected since here DT and Linux cpufreq code are the limit. But the execution times are simply weird. Care to reboot the board once?
uname -a Linux VIM2.dukla.net 4.9.40 #2 SMP PREEMPT Wed Sep 20 10:03:20 CST 2017 aarch64 aarch64 aarch64 GNU/Linux root@VIM2:/home/chris/bin# lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 17.10 Release: 17.10 Codename: artful
The environment is far from pristine: last rebooted 6 days ago and running firefox and a couple of other desktop applications at the same time (as well as a desktop!) But the 10seconds constant seems odd to me!
Use this script got different result
1 cores, 100 MHz: execution time (avg/stddev): 382.9829/0.00
43000
1 cores, 250 MHz: execution time (avg/stddev): 148.9977/0.00
43000
1 cores, 500 MHz: execution time (avg/stddev): 73.8164/0.00
43000
1 cores, 667 MHz: execution time (avg/stddev): 55.2353/0.00
43000
1 cores, 1000 MHz: execution time (avg/stddev): 36.7397/0.00
44000
1 cores, 1200 MHz: execution time (avg/stddev): 30.5951/0.00
44000
1 cores, 1512 MHz: execution time (avg/stddev): 25.9128/0.00
45000
4 cores, 100 MHz: execution time (avg/stddev): 94.4586/0.01
43000
4 cores, 250 MHz: execution time (avg/stddev): 37.1176/0.01
44000
4 cores, 500 MHz: execution time (avg/stddev): 18.4188/0.00
45000
4 cores, 667 MHz: execution time (avg/stddev): 13.7993/0.00
45000
4 cores, 1000 MHz: execution time (avg/stddev): 9.1685/0.00
46000
4 cores, 1200 MHz: execution time (avg/stddev): 7.6367/0.00
46000
4 cores, 1512 MHz: execution time (avg/stddev): 6.4686/0.00
47000
8 cores, 100 MHz: execution time (avg/stddev): 47.7804/0.01
44000
8 cores, 250 MHz: execution time (avg/stddev): 18.7053/0.01
45000
8 cores, 500 MHz: execution time (avg/stddev): 9.2905/0.00
45000
8 cores, 667 MHz: execution time (avg/stddev): 6.9671/0.00
46000
8 cores, 1000 MHz: execution time (avg/stddev): 4.6269/0.00
48000
8 cores, 1200 MHz: execution time (avg/stddev): 4.1788/0.01
49000
8 cores, 1512 MHz: execution time (avg/stddev): 3.8022/0.00
50000```
hereâs my result on a vim2pro, kernel 4.9.76, debian buster rootfs:
1 cores, 100 MHz: execution time (avg/stddev): 10.0188/0.00
36000
1 cores, 250 MHz: execution time (avg/stddev): 9.9992/0.00
35000
1 cores, 500 MHz: execution time (avg/stddev): 9.9968/0.00
35000
1 cores, 667 MHz: execution time (avg/stddev): 10.0012/0.00
35000
1 cores, 1000 MHz: execution time (avg/stddev): 9.9998/0.00
35000
1 cores, 1200 MHz: execution time (avg/stddev): 10.0001/0.00
35000
1 cores, 1512 MHz: execution time (avg/stddev): 9.9976/0.00
35000
4 cores, 100 MHz: execution time (avg/stddev): 10.0085/0.01
34000
4 cores, 250 MHz: execution time (avg/stddev): 10.0052/0.01
33000
4 cores, 500 MHz: execution time (avg/stddev): 10.0042/0.00
34000
4 cores, 667 MHz: execution time (avg/stddev): 10.0005/0.00
34000
4 cores, 1000 MHz: execution time (avg/stddev): 9.9996/0.00
35000
4 cores, 1200 MHz: execution time (avg/stddev): 9.9992/0.00
36000
4 cores, 1512 MHz: execution time (avg/stddev): 9.9985/0.00
37000
8 cores, 100 MHz: execution time (avg/stddev): 10.0198/0.01
33000
8 cores, 250 MHz: execution time (avg/stddev): 10.0049/0.01
33000
8 cores, 500 MHz: execution time (avg/stddev): 10.0023/0.00
34000
8 cores, 667 MHz: execution time (avg/stddev): 10.0010/0.00
35000
8 cores, 1000 MHz: execution time (avg/stddev): 10.0004/0.00
37000
8 cores, 1200 MHz: execution time (avg/stddev): 10.0000/0.00
38000
8 cores, 1512 MHz: execution time (avg/stddev): 9.9976/0.00
39000