S912 limited to 1200 MHz with multithreaded loads


#1

Anyone concerned about Amlogic cheating wrt CPU clockspeeds? Based on tests when more than 3 CPU cores are busy maximum real clockspeed will be limited to 1.2 GHz while cpufreq reported by the kernel is still at 1.5GHz.


#2

IMHO. I saw your comparison chart. Consider. S912 has two clusters. The first (4 cores) has a limit of 1000. The second (4 cores) - 1500. The first cluster cannot switch to a frequency higher than 1000 (it cannot switch to 1200). When using 8 cores, the test calculates the “average”. 1000 + 1500 = 2500 \ 2 =1200 … I have doubts that the test program is able to calculate frequencies for different clusters separately. As far as I know, cluster 1000 works first and then (if necessary) 1500 connects to it.


#3

Oh! I was not aware of the little cluster being that limited. Should be easy to check using taskset:

taskset -c 0-3 sysbench --test=cpu run --num-threads=4 --cpu-max-prime=20000
taskset -c 4-7 sysbench --test=cpu run --num-threads=4 --cpu-max-prime=20000
taskset -c 3 sysbench --test=cpu run --num-threads=1 --cpu-max-prime=20000
taskset -c 7 sysbench --test=cpu run --num-threads=1 --cpu-max-prime=20000

The numbers generated by the last 2 runs should be exactly four times larger than the two first runs.

Back then when you tested results with just 4 threads already showed a drop in performance. Curious…


Underwhelming performance Khadas Vim2 Max in video rendering kdenlive
#4

I would believe it’s a bit different. The S912 is a TV box SoC where big.LITTLE makes no sense at all. We already know that DVFS/cpufreq scaling is controlled by a BLOB and the values reported to and by the kernel are all bogus.

I would assume (and the tests you did almost half a year ago confirmed that) that the DVFS code simply clocks all CPU cores at 1200 MHz when multithreaded loads on more than 4 cores are running (while reporting bogus cpufreq and both /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq and /sys/devices/system/cpu/cpu4/cpufreq/scaling_cur_freq cheating on us).

Sysbench can also be used to identify this since it provides min, max and average values. If the little cluster would really be running just at 1.0 GHz while the big one runs at 1.5GHz those 4 sysbench output lines would reveal this:

per-request statistics:
     min:                                  2.91ms
     avg:                                  3.08ms
     max:                                  5.51ms
     approx.  95 percentile:               3.25ms

But it’s important to switch to performance cpufreq governor prior to executing the tests:

echo performance >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo performance >/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor

#5

Another very simple test: openssl speed -elapsed -evp aes-128-cbc

These are the results of other Cortex-A53 with ARMv8 Crypto Extensions. Single threaded operation:

type              16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
H6 / 1.8 GHz     226657.97k   606014.83k  1013054.98k  1259576.66k  1355773.27k
S5P6818/1.6 GHz  200591.68k   538595.61k   900359.25k  1115728.97k  1204936.70k
RK3328/1.3 GHz   163161.40k   436259.80k   729289.90k   906723.33k   975929.34k
A64 / 1152 MHz   144995.37k   387488.51k   648090.20k   805775.36k   867464.53k

Given S912 is clocking on the little cores with 1.0 GHz and on the big ones with 1.5 GHz the two following lines should show results below A64 (little) and between S5P6818 and H6 (big):

taskset -c 3 openssl speed -elapsed -evp aes-128-cbc
taskset -c 7 openssl speed -elapsed -evp aes-128-cbc

And to see what happens when all 4 big cores are in use it’s just this as script:

#!/bin/bash
while true; do
	for i in 0 1 2 3 ; do 
		taskset -c $i openssl speed -elapsed -evp aes-128-cbc 2>/dev/null &
	done
	wait
done

And for all 8 cores it’s simply replacing the for line with for i in 0 1 2 3 4 5 6 7 ; do

So it’s pretty straightforward to check for these issues but of course one needs the board (I’m not owning a Vim2 so can’t test myself)


Underwhelming performance Khadas Vim2 Max in video rendering kdenlive
#6

Observed action 100% CPU load on a single (allegedly 1.5MHz) core, I have VIM2 Max, always run performance governor

openssl speed -elapsed -evp aes-128-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 33910308 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 22548575 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 9222714 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 2887950 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 389737 aes-128-cbc's in 3.00s
OpenSSL 1.0.2g  1 Mar 2016
built on: reproducible build, date unspecified
options:bn(64,64) rc4(ptr,char) des(idx,cisc,16,int) aes(partial) blowfish(ptr) 
compiler: cc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -g -O2 -fdebug-prefix-map=/build/openssl-Bwh9JU/openssl-1.0.2g=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc     180854.98k   481036.27k   787004.93k   985753.60k  1064241.83k

on a little core:

> taskset -c 7 openssl speed -elapsed -evp aes-128-cbc
> You have chosen to measure elapsed time instead of user CPU time.
> Doing aes-128-cbc for 3s on 16 size blocks: 23935181 aes-128-cbc's in 3.00s
> Doing aes-128-cbc for 3s on 64 size blocks: 15916089 aes-128-cbc's in 3.00s
> Doing aes-128-cbc for 3s on 256 size blocks: 6510493 aes-128-cbc's in 3.00s
> Doing aes-128-cbc for 3s on 1024 size blocks: 2038914 aes-128-cbc's in 3.00s
> Doing aes-128-cbc for 3s on 8192 size blocks: 275104 aes-128-cbc's in 3.00s
> OpenSSL 1.0.2g  1 Mar 2016
> built on: reproducible build, date unspecified
> options:bn(64,64) rc4(ptr,char) des(idx,cisc,16,int) aes(partial) blowfish(ptr) 
> compiler: cc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -g -O2 -fdebug-prefix-map=/build/openssl-Bwh9JU/openssl-1.0.2g=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM
> The 'numbers' are in 1000s of bytes per second processed.
> type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
> aes-128-cbc     127654.30k   339543.23k   555562.07k   695949.31k   751217.32k

Running the script the output is not pretty, but 4 consecutive lines (running on cores 0, 1, 2 & 3)

aes-128-cbc     180892.77k   481128.58k   787118.34k   986037.93k  1064476.67k
aes-128-cbc     180906.70k   481173.93k   787130.54k   986139.31k  1064334.68k
aes-128-cbc     180406.20k   480401.66k   786049.79k   984578.39k  1061393.75k
aes-128-cbc     180655.19k   480587.65k   786338.30k   984979.11k  1063135.91k

Which to me (but I am no expert) looks the same as the single threaded version. And those seem to be pretty much between the S5P6818/1.6 GHz & RK3328/1.3 GHz you quote. But also I don’t think supports your initial feeling

For completeness results with the script loading all 8 cores. Here I have deleted intermediate lines to just leave 8 consecutive results:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc     180546.78k   480275.52k   785989.46k   984144.21k  1062712.66k
aes-128-cbc     180689.26k   480634.15k   786231.21k   984887.98k  1063277.91k
aes-128-cbc     174722.02k   466864.55k   764708.61k   956771.33k  1031809.71k
aes-128-cbc     178963.86k   481044.50k   787039.57k   985796.61k  1064260.95k
aes-128-cbc     127293.45k   338584.41k   554141.87k   694058.67k   749382.31k
aes-128-cbc     127733.72k   339738.88k   555838.55k   696330.58k   751643.31k
aes-128-cbc     127748.04k   339756.48k   555897.17k   696391.68k   751504.04k
aes-128-cbc     127731.74k   339766.42k   555901.27k   696310.10k   751583.23k

#7

Feeling? These were measurements done by @balbes150 last year: https://forum.armbian.com/topic/2138-armbian-for-amlogic-s912/?do=findComment&comment=43338

I only interpreted the numbers. Sysbench provides execution time and standard deviation so it’s pretty capable of reporting what’s happening.

For whatever reasons so far no one tested again with sysbench but at least it’s obvious that the cpufreq values able to set and retrieve to access ‘clockspeeds’ via sysfs at least for the big cluster are still bogus.

Based on your single threaded tests it looks like this with openssl:

type              16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
H6 / 1.8 GHz     226657.97k   606014.83k  1013054.98k  1259576.66k  1355773.27k
S5P6818/1.6 GHz  200591.68k   538595.61k   900359.25k  1115728.97k  1204936.70k
S912 / 1416 MHz  180854.98k   481036.27k   787004.93k   985753.60k  1064241.83k
RK3328/1.3 GHz   163161.40k   436259.80k   729289.90k   906723.33k   975929.34k
A64 / 1152 MHz   144995.37k   387488.51k   648090.20k   805775.36k   867464.53k
S912 / 1000 MHz  127654.30k   339543.23k   555562.07k   695949.31k   751217.32k

AES encryption though is something special since this is done on an own special engine when ARMv8 Crypto Extensions are available as on the S912. So still curious how sysbench results look like as an example of full load directly on the CPU cores.

It’s pretty simple to let this small script run and report results:

#!/bin/bash
echo performance >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo performance >/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor
for o in 1 4 8 ; do
	for i in $(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies) ; do
		echo $i >/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
		echo -e "$o cores, $(( $i / 1000)) MHz: \c"
		sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=$o 2>&1 | grep 'execution time'
	done
done
sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=8 2>&1 | egrep "percentile|min:|max:|avg:"

Or using 7-zip’s benchmark mode (with 7-zip also memory performance plays an important role so it’s not an ideal tool to draw conclusions wrt count of CPU cores and actual clockspeeds. But if 7-zip performance on the big cluster is below RPi 3 numbers then there’s something seriously wrong):

sudo apt install p7zip
taskset -c 0-3 7zr b -mmt1
taskset -c 0-3 7zr b -mmt4
taskset -c 4-7 7zr b -mmt4
7zr b

Linux full disk encryption hardware SOC support
CPU frequency up to 2GHz?
#8

Hi tkaiser,

After execute the script, I got this:

1 cores, 100 MHz:     execution time (avg/stddev):   58.1148/0.00
1 cores, 250 MHz:     execution time (avg/stddev):   47.8097/0.00
1 cores, 500 MHz:     execution time (avg/stddev):   63.7481/0.00
1 cores, 667 MHz:     execution time (avg/stddev):   53.2392/0.00
1 cores, 1000 MHz:     execution time (avg/stddev):   36.7519/0.00
1 cores, 1200 MHz:     execution time (avg/stddev):   30.6434/0.00
1 cores, 1512 MHz:     execution time (avg/stddev):   25.8836/0.00
4 cores, 100 MHz:     execution time (avg/stddev):   12.0569/0.02
4 cores, 250 MHz:     execution time (avg/stddev):   14.3230/0.00
4 cores, 500 MHz:     execution time (avg/stddev):   12.1902/0.00
4 cores, 667 MHz:     execution time (avg/stddev):   11.0352/0.00
4 cores, 1000 MHz:     execution time (avg/stddev):   9.1944/0.00
4 cores, 1200 MHz:     execution time (avg/stddev):   8.0781/0.00
4 cores, 1512 MHz:     execution time (avg/stddev):   6.9720/0.00
8 cores, 100 MHz:     execution time (avg/stddev):   11.7022/0.02
8 cores, 250 MHz:     execution time (avg/stddev):   9.7152/0.01
8 cores, 500 MHz:     execution time (avg/stddev):   7.3731/0.01
8 cores, 667 MHz:     execution time (avg/stddev):   6.5240/0.01
8 cores, 1000 MHz:     execution time (avg/stddev):   5.3011/0.01
8 cores, 1200 MHz:     execution time (avg/stddev):   4.8013/0.02
8 cores, 1512 MHz:     execution time (avg/stddev):   4.3739/0.02
         min:                                  2.58ms
         avg:                                  3.39ms
         max:                                 30.63ms
         approx.  95 percentile:               3.68ms


#9

Thank you. This was on an Ubuntu Xenial aarch64 OS image?

Based on the numbers the bl30.bin BLOB you’re using seems to do cpufreq scaling somewhat different compared to @balbes150’s test last year and I fear you were running into throttling exceeding or reaching 80°C at the end of the benchmark?

Anyway: the numbers are still totally bogus. 1 core at 100 MHz needing 58 seconds is impossible when running at 1000 MHz only takes 36.75 seconds.


#10

Yes. It is.But different image.


#11

Whoa - the script has a defect that I can see on gkrellm: it is setting scaling_max_freq on the big cores but then sysbench is running on the little cores. In fact when it got to 4 copies I could see 3 running on little cores and 1 on a big core!

I am no good at scripting but will try a mod in the next minutes.


#12

Some preliminary results.

Some things have changed since last year when @balbes150 tested but some not. Pretty obvious: the cpufreq scaling code running in Linux has only a limited influence on what’s happening in reality. Same with set and reported clockspeeds.

A task that runs completely inside the CPU cache has to run 10 times slower when running at 100 MHz compared to running at 1000 MHz. This is not the case here, we now even see completely weird relationships between cpufreq in Linux and real clockspeed, see. e.g. those single threaded results where 500 MHz performs lower than 250 MHz:

 100: execution time (avg/stddev):   58.1148/0.00
 250: execution time (avg/stddev):   47.8097/0.00
 500: execution time (avg/stddev):   63.7481/0.00

Obviously what’s happening below 1000 MHz is totally weird since when translating between cpufreq set and real clockspeeds reported by the benchmark we look at this table:

fake      1       4       8
 100     632     763     392
 250     769     642     473
 500     576     754     623
1000    1000    1000     867
1512    1420    1319    1051

When @balbes150 did the tests last year it looked like this:

fake      1       4       8
 100     868     744     512
 250     955     812     574
 500    1000     866     686
1000    1000     992     964
1512    1448    1200    1200

Still weird behaviour below 1000 MHz but at least somewhat predictable. Also interesting/important: Back then he clearly showed that sysbench running on the 4 big cores was twice as slow as when running on all 8 cores:

4 cores, 1000 MHz: execution time (avg/stddev):   9.1695/0.00
4 cores, 1512 MHz: execution time (avg/stddev):   7.5821/0.00
8 cores, 1000 MHz: execution time (avg/stddev):   4.7245/0.01
8 cores, 1512 MHz: execution time (avg/stddev):   3.7980/0.01

Which is a clear indication that back then there was no big.LITTLE behaviour implemented under load and all CPU cores were running at 1200 MHz when performing intensive tasks. This has changed now and we see different behaviour but I fear throttling is also involved since the 8 thread results are pretty worse compared to before.


#13

You could give this a try:

#!/bin/bash
echo performance >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo performance >/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor
for o in 1 4 8 ; do
	for i in $(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies) ; do
		echo $i >/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
		echo $i >/sys/devices/system/cpu/cpu4/cpufreq/scaling_max_freq 2>/dev/null
		case $o in
			1)
				TasksetParm="-c 0"
				;;
			4)
				TasksetParm="-c 0-3"
				;;
			*)
				TasksetParm="-c 0-7"
				;;
		esac
		echo -e "$o cores, $(( $i / 1000)) MHz: \c"
		taskset ${TasksetParm} sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=$o 2>&1 | grep 'execution time'
		cat /sys/devices/virtual/thermal/thermal_zone0/temp
	done
done

#14

Just tried (my own hack!)

#!/bin/bash
echo performance >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo performance >/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor
for o in 1 4 ; do
	for i in $(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies) ; do
		echo $i >/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
		echo -e "$o cores, $(( $i / 1000)) MHz: \c"
		taskset -c 0-3 sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=$o 2>&1 | grep 'execution time'
	done
done
sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=8 2>&1 | egrep "percentile|min:|max:|avg:"

limiting to 4 cores but setting to always use big cores apart from last run ran on all 8. Observation on gkrellm of core freq and which core was in use was what I expected. But the results are (I think?) bizarre!

# ./s912sysbench.sh 
1 cores, 100 MHz:     execution time (avg/stddev):   10.2136/0.00
1 cores, 250 MHz:     execution time (avg/stddev):   10.2282/0.00
1 cores, 500 MHz:     execution time (avg/stddev):   10.1051/0.00
1 cores, 667 MHz:     execution time (avg/stddev):   10.0186/0.00
1 cores, 1000 MHz:     execution time (avg/stddev):   10.0253/0.00
1 cores, 1200 MHz:     execution time (avg/stddev):   10.0355/0.00
1 cores, 1512 MHz:     execution time (avg/stddev):   10.0126/0.00
4 cores, 100 MHz:     execution time (avg/stddev):   10.0963/0.07
4 cores, 250 MHz:     execution time (avg/stddev):   10.1236/0.02
4 cores, 500 MHz:     execution time (avg/stddev):   10.0290/0.03
4 cores, 667 MHz:     execution time (avg/stddev):   10.0589/0.02
4 cores, 1000 MHz:     execution time (avg/stddev):   10.0330/0.02
4 cores, 1200 MHz:     execution time (avg/stddev):   10.0162/0.01
4 cores, 1512 MHz:     execution time (avg/stddev):   10.0291/0.01
         min:                                 42.40
         avg:                                 51.52
         max:                                 90.89
         95th percentile:                     66.84

Will try your script now


#15
# ./s912b.sh 
1 cores, 100 MHz:     execution time (avg/stddev):   10.1409/0.00
44000
1 cores, 250 MHz:     execution time (avg/stddev):   10.1826/0.00
43000
1 cores, 500 MHz:     execution time (avg/stddev):   10.0523/0.00
43000
1 cores, 667 MHz:     execution time (avg/stddev):   10.0149/0.00
44000
1 cores, 1000 MHz:     execution time (avg/stddev):   10.0222/0.00
44000
1 cores, 1200 MHz:     execution time (avg/stddev):   10.0302/0.00
45000
1 cores, 1512 MHz:     execution time (avg/stddev):   10.0332/0.00
45000
4 cores, 100 MHz:     execution time (avg/stddev):   10.3109/0.23
44000
4 cores, 250 MHz:     execution time (avg/stddev):   10.1222/0.03
43000
4 cores, 500 MHz:     execution time (avg/stddev):   10.0484/0.04
44000
4 cores, 667 MHz:     execution time (avg/stddev):   10.0538/0.03
45000
4 cores, 1000 MHz:     execution time (avg/stddev):   10.0345/0.02
46000
4 cores, 1200 MHz:     execution time (avg/stddev):   10.0294/0.01
47000
4 cores, 1512 MHz:     execution time (avg/stddev):   10.0166/0.01
49000
8 cores, 100 MHz:     execution time (avg/stddev):   10.3265/0.15
45000
8 cores, 250 MHz:     execution time (avg/stddev):   10.1160/0.08
46000
8 cores, 500 MHz:     execution time (avg/stddev):   10.0486/0.03
46000
8 cores, 667 MHz:     execution time (avg/stddev):   10.0346/0.03
47000
8 cores, 1000 MHz:     execution time (avg/stddev):   10.0288/0.01
49000
8 cores, 1200 MHz:     execution time (avg/stddev):   10.0155/0.02
51000
8 cores, 1512 MHz:     execution time (avg/stddev):   10.0166/0.01
53000

Observed results on gkrellm as expected: note the little cores max_freq 1000 so at end of run (8 cores) when script increases freq 1000/1200/1512 gkrellm reports little cores constant at 1000.


#16

Indeed. 10.1 seconds with an Ubuntu 16.04 aarch64 sysbench binary are achieved with 4 Cortex-A53 running at ~900 MHz (or 2 running at ~1800 MHz or 8 running at ~450 MHz)


#17

Well, this is expected since here DT and Linux cpufreq code are the limit. But the execution times are simply weird. Care to reboot the board once?


#18
uname -a
Linux VIM2.dukla.net 4.9.40 #2 SMP PREEMPT Wed Sep 20 10:03:20 CST 2017 aarch64 aarch64 aarch64 GNU/Linux
root@VIM2:/home/chris/bin# lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 17.10
Release:	17.10
Codename:	artful

The environment is far from pristine: last rebooted 6 days ago and running firefox and a couple of other desktop applications at the same time (as well as a desktop!) But the 10seconds constant seems odd to me!


#19

Use this script got different result

1 cores, 100 MHz:     execution time (avg/stddev):   382.9829/0.00
43000
1 cores, 250 MHz:     execution time (avg/stddev):   148.9977/0.00
43000
1 cores, 500 MHz:     execution time (avg/stddev):   73.8164/0.00
43000
1 cores, 667 MHz:     execution time (avg/stddev):   55.2353/0.00
43000
1 cores, 1000 MHz:     execution time (avg/stddev):   36.7397/0.00
44000
1 cores, 1200 MHz:     execution time (avg/stddev):   30.5951/0.00
44000
1 cores, 1512 MHz:     execution time (avg/stddev):   25.9128/0.00
45000
4 cores, 100 MHz:     execution time (avg/stddev):   94.4586/0.01
43000
4 cores, 250 MHz:     execution time (avg/stddev):   37.1176/0.01
44000
4 cores, 500 MHz:     execution time (avg/stddev):   18.4188/0.00
45000
4 cores, 667 MHz:     execution time (avg/stddev):   13.7993/0.00
45000
4 cores, 1000 MHz:     execution time (avg/stddev):   9.1685/0.00
46000
4 cores, 1200 MHz:     execution time (avg/stddev):   7.6367/0.00
46000
4 cores, 1512 MHz:     execution time (avg/stddev):   6.4686/0.00
47000
8 cores, 100 MHz:     execution time (avg/stddev):   47.7804/0.01
44000
8 cores, 250 MHz:     execution time (avg/stddev):   18.7053/0.01
45000
8 cores, 500 MHz:     execution time (avg/stddev):   9.2905/0.00
45000
8 cores, 667 MHz:     execution time (avg/stddev):   6.9671/0.00
46000
8 cores, 1000 MHz:     execution time (avg/stddev):   4.6269/0.00
48000
8 cores, 1200 MHz:     execution time (avg/stddev):   4.1788/0.01
49000
8 cores, 1512 MHz:     execution time (avg/stddev):   3.8022/0.00
50000```

#20

here’s my result on a vim2pro, kernel 4.9.76, debian buster rootfs:

1 cores, 100 MHz:     execution time (avg/stddev):   10.0188/0.00
36000
1 cores, 250 MHz:     execution time (avg/stddev):   9.9992/0.00
35000
1 cores, 500 MHz:     execution time (avg/stddev):   9.9968/0.00
35000
1 cores, 667 MHz:     execution time (avg/stddev):   10.0012/0.00
35000
1 cores, 1000 MHz:     execution time (avg/stddev):   9.9998/0.00
35000
1 cores, 1200 MHz:     execution time (avg/stddev):   10.0001/0.00
35000
1 cores, 1512 MHz:     execution time (avg/stddev):   9.9976/0.00
35000
4 cores, 100 MHz:     execution time (avg/stddev):   10.0085/0.01
34000
4 cores, 250 MHz:     execution time (avg/stddev):   10.0052/0.01
33000
4 cores, 500 MHz:     execution time (avg/stddev):   10.0042/0.00
34000
4 cores, 667 MHz:     execution time (avg/stddev):   10.0005/0.00
34000
4 cores, 1000 MHz:     execution time (avg/stddev):   9.9996/0.00
35000
4 cores, 1200 MHz:     execution time (avg/stddev):   9.9992/0.00
36000
4 cores, 1512 MHz:     execution time (avg/stddev):   9.9985/0.00
37000
8 cores, 100 MHz:     execution time (avg/stddev):   10.0198/0.01
33000
8 cores, 250 MHz:     execution time (avg/stddev):   10.0049/0.01
33000
8 cores, 500 MHz:     execution time (avg/stddev):   10.0023/0.00
34000
8 cores, 667 MHz:     execution time (avg/stddev):   10.0010/0.00
35000
8 cores, 1000 MHz:     execution time (avg/stddev):   10.0004/0.00
37000
8 cores, 1200 MHz:     execution time (avg/stddev):   10.0000/0.00
38000
8 cores, 1512 MHz:     execution time (avg/stddev):   9.9976/0.00
39000