S912 limited to 1200 MHz with multithreaded loads

Just tried (my own hack!)

#!/bin/bash
echo performance >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo performance >/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor
for o in 1 4 ; do
	for i in $(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies) ; do
		echo $i >/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
		echo -e "$o cores, $(( $i / 1000)) MHz: \c"
		taskset -c 0-3 sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=$o 2>&1 | grep 'execution time'
	done
done
sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=8 2>&1 | egrep "percentile|min:|max:|avg:"

limiting to 4 cores but setting to always use big cores apart from last run ran on all 8. Observation on gkrellm of core freq and which core was in use was what I expected. But the results are (I think?) bizarre!

# ./s912sysbench.sh 
1 cores, 100 MHz:     execution time (avg/stddev):   10.2136/0.00
1 cores, 250 MHz:     execution time (avg/stddev):   10.2282/0.00
1 cores, 500 MHz:     execution time (avg/stddev):   10.1051/0.00
1 cores, 667 MHz:     execution time (avg/stddev):   10.0186/0.00
1 cores, 1000 MHz:     execution time (avg/stddev):   10.0253/0.00
1 cores, 1200 MHz:     execution time (avg/stddev):   10.0355/0.00
1 cores, 1512 MHz:     execution time (avg/stddev):   10.0126/0.00
4 cores, 100 MHz:     execution time (avg/stddev):   10.0963/0.07
4 cores, 250 MHz:     execution time (avg/stddev):   10.1236/0.02
4 cores, 500 MHz:     execution time (avg/stddev):   10.0290/0.03
4 cores, 667 MHz:     execution time (avg/stddev):   10.0589/0.02
4 cores, 1000 MHz:     execution time (avg/stddev):   10.0330/0.02
4 cores, 1200 MHz:     execution time (avg/stddev):   10.0162/0.01
4 cores, 1512 MHz:     execution time (avg/stddev):   10.0291/0.01
         min:                                 42.40
         avg:                                 51.52
         max:                                 90.89
         95th percentile:                     66.84

Will try your script now

# ./s912b.sh 
1 cores, 100 MHz:     execution time (avg/stddev):   10.1409/0.00
44000
1 cores, 250 MHz:     execution time (avg/stddev):   10.1826/0.00
43000
1 cores, 500 MHz:     execution time (avg/stddev):   10.0523/0.00
43000
1 cores, 667 MHz:     execution time (avg/stddev):   10.0149/0.00
44000
1 cores, 1000 MHz:     execution time (avg/stddev):   10.0222/0.00
44000
1 cores, 1200 MHz:     execution time (avg/stddev):   10.0302/0.00
45000
1 cores, 1512 MHz:     execution time (avg/stddev):   10.0332/0.00
45000
4 cores, 100 MHz:     execution time (avg/stddev):   10.3109/0.23
44000
4 cores, 250 MHz:     execution time (avg/stddev):   10.1222/0.03
43000
4 cores, 500 MHz:     execution time (avg/stddev):   10.0484/0.04
44000
4 cores, 667 MHz:     execution time (avg/stddev):   10.0538/0.03
45000
4 cores, 1000 MHz:     execution time (avg/stddev):   10.0345/0.02
46000
4 cores, 1200 MHz:     execution time (avg/stddev):   10.0294/0.01
47000
4 cores, 1512 MHz:     execution time (avg/stddev):   10.0166/0.01
49000
8 cores, 100 MHz:     execution time (avg/stddev):   10.3265/0.15
45000
8 cores, 250 MHz:     execution time (avg/stddev):   10.1160/0.08
46000
8 cores, 500 MHz:     execution time (avg/stddev):   10.0486/0.03
46000
8 cores, 667 MHz:     execution time (avg/stddev):   10.0346/0.03
47000
8 cores, 1000 MHz:     execution time (avg/stddev):   10.0288/0.01
49000
8 cores, 1200 MHz:     execution time (avg/stddev):   10.0155/0.02
51000
8 cores, 1512 MHz:     execution time (avg/stddev):   10.0166/0.01
53000

Observed results on gkrellm as expected: note the little cores max_freq 1000 so at end of run (8 cores) when script increases freq 1000/1200/1512 gkrellm reports little cores constant at 1000.

Indeed. 10.1 seconds with an Ubuntu 16.04 aarch64 sysbench binary are achieved with 4 Cortex-A53 running at ~900 MHz (or 2 running at ~1800 MHz or 8 running at ~450 MHz)

Well, this is expected since here DT and Linux cpufreq code are the limit. But the execution times are simply weird. Care to reboot the board once?

uname -a
Linux VIM2.dukla.net 4.9.40 #2 SMP PREEMPT Wed Sep 20 10:03:20 CST 2017 aarch64 aarch64 aarch64 GNU/Linux
root@VIM2:/home/chris/bin# lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 17.10
Release:	17.10
Codename:	artful

The environment is far from pristine: last rebooted 6 days ago and running firefox and a couple of other desktop applications at the same time (as well as a desktop!) But the 10seconds constant seems odd to me!

Use this script got different result

1 cores, 100 MHz:     execution time (avg/stddev):   382.9829/0.00
43000
1 cores, 250 MHz:     execution time (avg/stddev):   148.9977/0.00
43000
1 cores, 500 MHz:     execution time (avg/stddev):   73.8164/0.00
43000
1 cores, 667 MHz:     execution time (avg/stddev):   55.2353/0.00
43000
1 cores, 1000 MHz:     execution time (avg/stddev):   36.7397/0.00
44000
1 cores, 1200 MHz:     execution time (avg/stddev):   30.5951/0.00
44000
1 cores, 1512 MHz:     execution time (avg/stddev):   25.9128/0.00
45000
4 cores, 100 MHz:     execution time (avg/stddev):   94.4586/0.01
43000
4 cores, 250 MHz:     execution time (avg/stddev):   37.1176/0.01
44000
4 cores, 500 MHz:     execution time (avg/stddev):   18.4188/0.00
45000
4 cores, 667 MHz:     execution time (avg/stddev):   13.7993/0.00
45000
4 cores, 1000 MHz:     execution time (avg/stddev):   9.1685/0.00
46000
4 cores, 1200 MHz:     execution time (avg/stddev):   7.6367/0.00
46000
4 cores, 1512 MHz:     execution time (avg/stddev):   6.4686/0.00
47000
8 cores, 100 MHz:     execution time (avg/stddev):   47.7804/0.01
44000
8 cores, 250 MHz:     execution time (avg/stddev):   18.7053/0.01
45000
8 cores, 500 MHz:     execution time (avg/stddev):   9.2905/0.00
45000
8 cores, 667 MHz:     execution time (avg/stddev):   6.9671/0.00
46000
8 cores, 1000 MHz:     execution time (avg/stddev):   4.6269/0.00
48000
8 cores, 1200 MHz:     execution time (avg/stddev):   4.1788/0.01
49000
8 cores, 1512 MHz:     execution time (avg/stddev):   3.8022/0.00
50000```
1 Like

here’s my result on a vim2pro, kernel 4.9.76, debian buster rootfs:

1 cores, 100 MHz:     execution time (avg/stddev):   10.0188/0.00
36000
1 cores, 250 MHz:     execution time (avg/stddev):   9.9992/0.00
35000
1 cores, 500 MHz:     execution time (avg/stddev):   9.9968/0.00
35000
1 cores, 667 MHz:     execution time (avg/stddev):   10.0012/0.00
35000
1 cores, 1000 MHz:     execution time (avg/stddev):   9.9998/0.00
35000
1 cores, 1200 MHz:     execution time (avg/stddev):   10.0001/0.00
35000
1 cores, 1512 MHz:     execution time (avg/stddev):   9.9976/0.00
35000
4 cores, 100 MHz:     execution time (avg/stddev):   10.0085/0.01
34000
4 cores, 250 MHz:     execution time (avg/stddev):   10.0052/0.01
33000
4 cores, 500 MHz:     execution time (avg/stddev):   10.0042/0.00
34000
4 cores, 667 MHz:     execution time (avg/stddev):   10.0005/0.00
34000
4 cores, 1000 MHz:     execution time (avg/stddev):   9.9996/0.00
35000
4 cores, 1200 MHz:     execution time (avg/stddev):   9.9992/0.00
36000
4 cores, 1512 MHz:     execution time (avg/stddev):   9.9985/0.00
37000
8 cores, 100 MHz:     execution time (avg/stddev):   10.0198/0.01
33000
8 cores, 250 MHz:     execution time (avg/stddev):   10.0049/0.01
33000
8 cores, 500 MHz:     execution time (avg/stddev):   10.0023/0.00
34000
8 cores, 667 MHz:     execution time (avg/stddev):   10.0010/0.00
35000
8 cores, 1000 MHz:     execution time (avg/stddev):   10.0004/0.00
37000
8 cores, 1200 MHz:     execution time (avg/stddev):   10.0000/0.00
38000
8 cores, 1512 MHz:     execution time (avg/stddev):   9.9976/0.00
39000

Did a reboot, no change in results (10s “constant”). Did obseve temperature barely moves FWIW. Shutdown desktop, same result (although couldnt watch gkrellm). Did put a stopwatch on part of the test - the results lines really do appear every 10 seconds!

Gee numbqq, how come you can get the right answer and I can’t!! Fortunately g4b42 has joined me in the idiots corner so I don’t feel so stupid!! :grinning:

Hi dukla2000,

Have you tried this script provided by @tkaiser ?

#!/bin/bash
echo performance >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo performance >/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor
for o in 1 4 8 ; do
	for i in $(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies) ; do
		echo $i >/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
		echo $i >/sys/devices/system/cpu/cpu4/cpufreq/scaling_max_freq 2>/dev/null
		case $o in
			1)
				TasksetParm="-c 0"
				;;
			4)
				TasksetParm="-c 0-3"
				;;
			*)
				TasksetParm="-c 0-7"
				;;
		esac
		echo -e "$o cores, $(( $i / 1000)) MHz: \c"
		taskset ${TasksetParm} sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=$o 2>&1 | grep 'execution time'
		cat /sys/devices/virtual/thermal/thermal_zone0/temp
	done
done

Wow, that makes a real difference since now cpufreq settings and reality match very closely:

fake      1        4       8
 100      96      97      97
 250     247     247     247
 500     498     498     498
1000    1000    1000    1000
1512    1417    1417    1216

So we have only the mismatch between 1512 MHz and the ~1420 MHz in reality and when running on all 8 cores the results pretty much describe a system running with 4 cores at 1.0 GHz and 4 at 1.4 GHz.

I only totally fail to understand what’s happening since others are reporting totally weird results. Kernel, ATF and bl30.bin version could matter and also HMP settings in the kernel config since with fixed CPU affinity at least your board starts to behave ‘sane’…

? sysbench version? I have

$ sysbench -i
sysbench 1.0.8 (using system LuaJIT 2.0.4)
1 Like

O yes, sysbench version is the most important factor and also compiler version the binary has been compiled with (related to distro). All my testing so far happened with sysbench 0.4.2 which somewhat behaves consistently wrt thread count and so on (sysbench 0.5 which I compiled myself last year showed lower numbers and sysbench 1.0.8 is obviously broken).

1 Like

sysbench version 1.0.14 here

1 Like

Can one of you guys please provide full output? Obviously the measurement mode has changed and tests seem now to be limited to 10 seconds execution then providing a performance indicator or something like that (while with eaerlier versions only execution time was interesting)

root@amlogic:~# cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.4 LTS"
root@amlogic:~# 
root@amlogic:~# sysbench --version
sysbench 0.4.12
root@amlogic:~# 
# sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=8 2
WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
Unrecognized command line argument: 2

have to leave, back later

You forget that a lot of time has passed since then. The system has changed. The kernel configuration has changed and the options responsible for the correct operation of big.lite may not have been previously enable dor the number of cores used. Have changed the kernel sources. The compiler that is used to build has changed. Changed many utilities / software and their settings (other sources used patches with their Assembly, etc.). Dtb files changed.

If you want to get the right results, you should use a single image (system) with all settings.

Sorry, it’s about the simple issue whether the cpufreq code living in the kernel controls CPU clockspeeds or something else. With Raspberry Pi and Amlogic it’s something else while on most other SoCs it’s the kernel controlling clockspeeds.

If I adjust /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq with performance cpufreq governor and set 1512 MHz then this should happen especially if the kernel reports having done this. There are some situations where the kernel might disagree (throttling) but then if you query /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq the real clockspeed should be returned and not just some bogus number as it’s always the case with Amlogic SoCs (S905 on ODROID-C2 being the only exception since Hardkernel got a special BLOB from Amlogic allowing real clockspeed control).

We’re dealing here with a platform implementing bogus cpufreq adjustment. Isn’t there a Cortex-M core inside the SoC dealing with this stuff?

At least on the Raspberries the real cpufreq scaling happens solely on the VideoCore implemented in the proprietary ThreadX RTOS and they just like Amlogic chose to return bogus values to the kernel. Their excuse can be read here: Under-voltage detected! (0x00050005) spams dmesg on new kernel 4.14.30-v7+ · Issue #2512 · raspberrypi/linux · GitHub

btw, I’ve been wondering if there’s any way to control CPU DVFS on amlogic SoCs without passing through the SCPI firmware… (maybe not since there is an internal PMIC)