S912 limited to 1200 MHz with multithreaded loads

btw, I’ve been wondering if there’s any way to control CPU DVFS on amlogic SoCs without passing through the SCPI firmware… (maybe not since there is an internal PMIC)

According to http://events17.linuxfoundation.org/sites/events/files/slides/elcna-2017-amlogic.pdf there’s an SCM firmware running on an embedded Cortex-M3 and all communications is through a Mailbox interface. So I would assume it’s not only about ATF but also about the ‘firmware’ loaded on the M3 core…

Maybe @narmstrong knows a bit more?

it’s not clear to me where this M3’s firmware comes from, if it’s loaded by BL2/ATF or is present in maskrom. feedback from @narmstrong would indeed be great

According to the Libre Computer guys it’s part of a BLOB, see 2nd post Amlogic still cheating with clockspeeds - Amlogic meson - Armbian Community Forums (S905X is also affected and based on similar tests a few months ago the 1512 MHz there are also lower in reality)

The first 4 cores are limited to 1512MHz, and the 4 last cores are limited to 1GHz.

And yes, you can only control DVFS using SCPI since it’s in control of the M3 co-processor.

The logic is in the M3 firmware, but the DVFS tables are built with U-boot and loaded by the ATF firmware, but you won’t be able to go further these frequencies.

If you run code among the 8 cores, you won’t have max performance since 4 of them are limited to 1GHz.

3 Likes

We do not even reach the frequencies defined by the DVFS table. That’s my only problem. With current kernel/ATF combination the ‘big’ cores max out at 1416 MHz. And I wonder which component is responsible for this.

is that firmware signed using an amlogic private key ? (ie. to which extent could it be modified, even if it’s just by poking around through reverse engineering)

BTW: With S905X and ‘default’ bl30.bin BLOB this SoC is limited to 1200 MHz (while reporting 1512 MHz via /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq) and only with a modified BLOB it is able to reach higher clockspeeds (~1470MHz while still reporting 1512 MHz via sysfs): Some basic benchmarks for Le Potato? - Le Potato - Armbian Community Forums

IMO a pretty annoying situation wrt Amlogic SoCs when we can neither trust in nor set clockspeeds like we want.

assuming gxm (S912) bl30 is based on same code as gxl (S905), this commit could explain the 1.4 / 1.5 Ghz discrepancy:

1 Like

Appreciate this is now academic, but worked out how to run a single version and capture the results. Oh, and 2 threads happened to run on big cores and 2 on little.

$ sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=4 2>&1
WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
WARNING: --num-threads is deprecated, use --threads instead
sysbench 1.0.8 (using system LuaJIT 2.0.4)

Running the test with following options:
Number of threads: 4
Initializing random number generator from current time


Prime numbers limit: 20000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:    77.99

General statistics:
    total time:                          10.0589s
    total number of events:              785

Latency (ms):
         min:                                 42.93
         avg:                                 51.11
         max:                                 62.86
         95th percentile:                     62.19
         sum:                              40125.05

Threads fairness:
    events (avg/stddev):           196.2500/33.25
    execution time (avg/stddev):   10.0313/0.02

Indeed but thank you anyway. So operation mode of sysbench has changed, standard execution time is now only 10 seconds by default instead of running until all prime numbers are calculated. At least standard deviation shows that some threads were running on the slow little and some on the faster little cores.

This is a clear sign of scheduler madness (the faster cores should get the jobs of course), then @numbqq’s numbers with and without fixed CPU affinity are just strange, the whole S912 design is strange (being a stupid little.LITTLE design with one of the clusters being limited to lower clockspeeds for exactly no reason) and then the ‘firmware’ or mailbox interface cheating on us (and having to rely on proprietary crap like bl30.bin BLOBs who control the CPU cores instead of the kernel) is the next annoyance.

If Amlogic really capped the real clockspeeds down to 1416 MHz already two years ago at a time they advertised their SoCs as being capable of running at 2.0 GHz this is just an insane joke.

Not interested in anything S912 or Amlogic related any more…

For the main application this chip is designed for this entirely academic since it is totally capable of decoding all video types. Its a shame that AMlogic has chosen to destroy its reputation in this way but soo what ?
For me it’s the lack of product support from AMlogic which sucks.

Shoog

Just got my Vim2 and tried the script. I’m not sure about the real speed, but at least the proportion between clock speeds and number of threads is correct (i.e., 1 thread @100 ~= 10 x 1thread@1000; 1 thread@1512 ~= 4 x 4 thread@1512, etc.)

1 cores, 100 MHz:     execution time (avg/stddev):   382.4676/0.00
Temp: 50000
1 cores, 250 MHz:     execution time (avg/stddev):   148.9245/0.00
Temp: 49000
1 cores, 500 MHz:     execution time (avg/stddev):   73.8128/0.00
Temp: 49000
1 cores, 667 MHz:     execution time (avg/stddev):   55.2327/0.00
Temp: 49000
1 cores, 1000 MHz:     execution time (avg/stddev):   36.7415/0.00
Temp: 50000
1 cores, 1200 MHz:     execution time (avg/stddev):   30.5977/0.00
Temp: 50000
1 cores, 1512 MHz:     execution time (avg/stddev):   25.9122/0.00
Temp: 51000
4 cores, 100 MHz:     execution time (avg/stddev):   95.1046/0.01
Temp: 48000
4 cores, 250 MHz:     execution time (avg/stddev):   37.1072/0.01
Temp: 48000
4 cores, 500 MHz:     execution time (avg/stddev):   18.4622/0.01
Temp: 49000
4 cores, 667 MHz:     execution time (avg/stddev):   13.7924/0.01
Temp: 49000
4 cores, 1000 MHz:     execution time (avg/stddev):   9.1759/0.00
Temp: 51000
4 cores, 1200 MHz:     execution time (avg/stddev):   7.6418/0.00
Temp: 52000
4 cores, 1512 MHz:     execution time (avg/stddev):   6.4735/0.00
Temp: 54000
8 cores, 100 MHz:     execution time (avg/stddev):   48.0306/0.01
Temp: 48000
8 cores, 250 MHz:     execution time (avg/stddev):   18.7397/0.01
Temp: 49000
8 cores, 500 MHz:     execution time (avg/stddev):   9.2862/0.00
Temp: 50000
8 cores, 667 MHz:     execution time (avg/stddev):   6.9622/0.00
Temp: 52000
8 cores, 1000 MHz:     execution time (avg/stddev):   4.6392/0.00
Temp: 54000
8 cores, 1200 MHz:     execution time (avg/stddev):   4.1788/0.01
Temp: 56000
8 cores, 1512 MHz:     execution time (avg/stddev):   3.8117/0.01
Temp: 58000

Using default Khadas dual boot img (VIM2_DualOS_Nougat_Ubuntu-16.04_V171028)
$ uname -a
Linux Khadas 4.9.40 #2 SMP PREEMPT Wed Sep 20 10:03:20 CST 2017 aarch64 aarch64 aarch64 GNU/Linux

Yep, same numbers as @numbqq generated above confirming that you’re running with 1416 MHz maximum.

But I’ve not the slightest idea why the other test @numbqq made before shows such weird results. It seems cpufreq behaviour on Amlogic platforms is not reproducible (see also Amlogic still cheating with clockspeeds - Page 2 - Amlogic meson - Armbian Community Forums)

@numbqq However, I think you guys should ask Amlogic for binaries with unlocked DVFS. You don’t care much about those things when you are just using the device as a TV Box, but if you want to get serious about stuff like DIY, server, cluster, etc., you need to be able to tune the real performance/consumption ratio.

I have read that other companies have gotten that from Amlogic. I don’t think they should treat you differently just because you are a young company. If you need user support, I’m sure there’s many of us who would be willing to write mails to Amlogic complaining about their untrustworthy policy.

Only Hardkernel for their ODROID-C2. Neither FriendlyELEC (NanoPi K2 with S905) nor Libre Computer (‘Le Potato’) have blobs they are allowed to share. Read here what Da Xue (Libre Computer) writes: Some basic benchmarks for Le Potato? - Le Potato - Armbian Community Forums

Well, if they did it once, then they are obliged to do it with others. They have no right to make a distinction between “first-class” and “second class” companies.

I think so too. But Hardkernel’s issue was about 500Mhz difference of the promised speed. 1.5Ghz instead 2Ghz. I fear they don’t care about a -100Mhz difference.
I would love to see them care a lot more. SBC users depend a lot on what the SoC manufacturer releases for CPU control, hardware acceleration, … Too bad not enough people use SBC’s. Else they wouldn’t dare not to release important information.

Hi tkaiser,

I use different script that you provided.

#!/bin/bash
echo performance >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo performance >/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor
for o in 1 4 8 ; do
	for i in $(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies) ; do
		echo $i >/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
		echo -e "$o cores, $(( $i / 1000)) MHz: \c"
		sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=$o 2>&1 | grep 'execution time'
	done
done
sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=8 2>&1 | egrep "percentile|min:|max:|avg:"

Result is:

1 cores, 100 MHz:     execution time (avg/stddev):   58.1148/0.00
1 cores, 250 MHz:     execution time (avg/stddev):   47.8097/0.00
1 cores, 500 MHz:     execution time (avg/stddev):   63.7481/0.00
1 cores, 667 MHz:     execution time (avg/stddev):   53.2392/0.00
1 cores, 1000 MHz:     execution time (avg/stddev):   36.7519/0.00
1 cores, 1200 MHz:     execution time (avg/stddev):   30.6434/0.00
1 cores, 1512 MHz:     execution time (avg/stddev):   25.8836/0.00
4 cores, 100 MHz:     execution time (avg/stddev):   12.0569/0.02
4 cores, 250 MHz:     execution time (avg/stddev):   14.3230/0.00
4 cores, 500 MHz:     execution time (avg/stddev):   12.1902/0.00
4 cores, 667 MHz:     execution time (avg/stddev):   11.0352/0.00
4 cores, 1000 MHz:     execution time (avg/stddev):   9.1944/0.00
4 cores, 1200 MHz:     execution time (avg/stddev):   8.0781/0.00
4 cores, 1512 MHz:     execution time (avg/stddev):   6.9720/0.00
8 cores, 100 MHz:     execution time (avg/stddev):   11.7022/0.02
8 cores, 250 MHz:     execution time (avg/stddev):   9.7152/0.01
8 cores, 500 MHz:     execution time (avg/stddev):   7.3731/0.01
8 cores, 667 MHz:     execution time (avg/stddev):   6.5240/0.01
8 cores, 1000 MHz:     execution time (avg/stddev):   5.3011/0.01
8 cores, 1200 MHz:     execution time (avg/stddev):   4.8013/0.02
8 cores, 1512 MHz:     execution time (avg/stddev):   4.3739/0.02
         min:                                  2.58ms
         avg:                                  3.39ms
         max:                                 30.63ms
         approx.  95 percentile:               3.68ms

And the other script:

#!/bin/bash
echo performance >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo performance >/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor
for o in 1 4 8 ; do
	for i in $(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies) ; do
		echo $i >/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
		echo $i >/sys/devices/system/cpu/cpu4/cpufreq/scaling_max_freq 2>/dev/null
		case $o in
			1)
				TasksetParm="-c 0"
				;;
			4)
				TasksetParm="-c 0-3"
				;;
			*)
				TasksetParm="-c 0-7"
				;;
		esac
		echo -e "$o cores, $(( $i / 1000)) MHz: \c"
		taskset ${TasksetParm} sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=$o 2>&1 | grep 'execution time'
		cat /sys/devices/virtual/thermal/thermal_zone0/temp
	done
done

Result is:

1 cores, 100 MHz:     execution time (avg/stddev):   382.9829/0.00
43000
1 cores, 250 MHz:     execution time (avg/stddev):   148.9977/0.00
43000
1 cores, 500 MHz:     execution time (avg/stddev):   73.8164/0.00
43000
1 cores, 667 MHz:     execution time (avg/stddev):   55.2353/0.00
43000
1 cores, 1000 MHz:     execution time (avg/stddev):   36.7397/0.00
44000
1 cores, 1200 MHz:     execution time (avg/stddev):   30.5951/0.00
44000
1 cores, 1512 MHz:     execution time (avg/stddev):   25.9128/0.00
45000
4 cores, 100 MHz:     execution time (avg/stddev):   94.4586/0.01
43000
4 cores, 250 MHz:     execution time (avg/stddev):   37.1176/0.01
44000
4 cores, 500 MHz:     execution time (avg/stddev):   18.4188/0.00
45000
4 cores, 667 MHz:     execution time (avg/stddev):   13.7993/0.00
45000
4 cores, 1000 MHz:     execution time (avg/stddev):   9.1685/0.00
46000
4 cores, 1200 MHz:     execution time (avg/stddev):   7.6367/0.00
46000
4 cores, 1512 MHz:     execution time (avg/stddev):   6.4686/0.00
47000
8 cores, 100 MHz:     execution time (avg/stddev):   47.7804/0.01
44000
8 cores, 250 MHz:     execution time (avg/stddev):   18.7053/0.01
45000
8 cores, 500 MHz:     execution time (avg/stddev):   9.2905/0.00
45000
8 cores, 667 MHz:     execution time (avg/stddev):   6.9671/0.00
46000
8 cores, 1000 MHz:     execution time (avg/stddev):   4.6269/0.00
48000
8 cores, 1200 MHz:     execution time (avg/stddev):   4.1788/0.01
49000
8 cores, 1512 MHz:     execution time (avg/stddev):   3.8022/0.00
50000

I don’t know how Hardkernel get custom binary, but I think we can try to ask Amlogic for it.

3 Likes