btw, I’ve been wondering if there’s any way to control CPU DVFS on amlogic SoCs without passing through the SCPI firmware… (maybe not since there is an internal PMIC)
According to http://events17.linuxfoundation.org/sites/events/files/slides/elcna-2017-amlogic.pdf there’s an SCM firmware running on an embedded Cortex-M3 and all communications is through a Mailbox interface. So I would assume it’s not only about ATF but also about the ‘firmware’ loaded on the M3 core…
Maybe @narmstrong knows a bit more?
it’s not clear to me where this M3’s firmware comes from, if it’s loaded by BL2/ATF or is present in maskrom. feedback from @narmstrong would indeed be great
According to the Libre Computer guys it’s part of a BLOB, see 2nd post Amlogic still cheating with clockspeeds - Amlogic meson - Armbian Community Forums (S905X is also affected and based on similar tests a few months ago the 1512 MHz there are also lower in reality)
The first 4 cores are limited to 1512MHz, and the 4 last cores are limited to 1GHz.
And yes, you can only control DVFS using SCPI since it’s in control of the M3 co-processor.
The logic is in the M3 firmware, but the DVFS tables are built with U-boot and loaded by the ATF firmware, but you won’t be able to go further these frequencies.
If you run code among the 8 cores, you won’t have max performance since 4 of them are limited to 1GHz.
We do not even reach the frequencies defined by the DVFS table. That’s my only problem. With current kernel/ATF combination the ‘big’ cores max out at 1416 MHz. And I wonder which component is responsible for this.
is that firmware signed using an amlogic private key ? (ie. to which extent could it be modified, even if it’s just by poking around through reverse engineering)
BTW: With S905X and ‘default’ bl30.bin
BLOB this SoC is limited to 1200 MHz (while reporting 1512 MHz via /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
) and only with a modified BLOB it is able to reach higher clockspeeds (~1470MHz while still reporting 1512 MHz via sysfs): Some basic benchmarks for Le Potato? - Le Potato - Armbian Community Forums
IMO a pretty annoying situation wrt Amlogic SoCs when we can neither trust in nor set clockspeeds like we want.
assuming gxm (S912) bl30 is based on same code as gxl (S905), this commit could explain the 1.4 / 1.5 Ghz discrepancy:
Appreciate this is now academic, but worked out how to run a single version and capture the results. Oh, and 2 threads happened to run on big cores and 2 on little.
$ sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=4 2>&1 WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options. WARNING: --num-threads is deprecated, use --threads instead sysbench 1.0.8 (using system LuaJIT 2.0.4) Running the test with following options: Number of threads: 4 Initializing random number generator from current time Prime numbers limit: 20000 Initializing worker threads... Threads started! CPU speed: events per second: 77.99 General statistics: total time: 10.0589s total number of events: 785 Latency (ms): min: 42.93 avg: 51.11 max: 62.86 95th percentile: 62.19 sum: 40125.05 Threads fairness: events (avg/stddev): 196.2500/33.25 execution time (avg/stddev): 10.0313/0.02
Indeed but thank you anyway. So operation mode of sysbench has changed, standard execution time is now only 10 seconds by default instead of running until all prime numbers are calculated. At least standard deviation shows that some threads were running on the slow little and some on the faster little cores.
This is a clear sign of scheduler madness (the faster cores should get the jobs of course), then @numbqq’s numbers with and without fixed CPU affinity are just strange, the whole S912 design is strange (being a stupid little.LITTLE design with one of the clusters being limited to lower clockspeeds for exactly no reason) and then the ‘firmware’ or mailbox interface cheating on us (and having to rely on proprietary crap like bl30.bin
BLOBs who control the CPU cores instead of the kernel) is the next annoyance.
If Amlogic really capped the real clockspeeds down to 1416 MHz already two years ago at a time they advertised their SoCs as being capable of running at 2.0 GHz this is just an insane joke.
Not interested in anything S912 or Amlogic related any more…
For the main application this chip is designed for this entirely academic since it is totally capable of decoding all video types. Its a shame that AMlogic has chosen to destroy its reputation in this way but soo what ?
For me it’s the lack of product support from AMlogic which sucks.
Shoog
Just got my Vim2 and tried the script. I’m not sure about the real speed, but at least the proportion between clock speeds and number of threads is correct (i.e., 1 thread @100 ~= 10 x 1thread@1000; 1 thread@1512 ~= 4 x 4 thread@1512, etc.)
1 cores, 100 MHz: execution time (avg/stddev): 382.4676/0.00
Temp: 50000
1 cores, 250 MHz: execution time (avg/stddev): 148.9245/0.00
Temp: 49000
1 cores, 500 MHz: execution time (avg/stddev): 73.8128/0.00
Temp: 49000
1 cores, 667 MHz: execution time (avg/stddev): 55.2327/0.00
Temp: 49000
1 cores, 1000 MHz: execution time (avg/stddev): 36.7415/0.00
Temp: 50000
1 cores, 1200 MHz: execution time (avg/stddev): 30.5977/0.00
Temp: 50000
1 cores, 1512 MHz: execution time (avg/stddev): 25.9122/0.00
Temp: 51000
4 cores, 100 MHz: execution time (avg/stddev): 95.1046/0.01
Temp: 48000
4 cores, 250 MHz: execution time (avg/stddev): 37.1072/0.01
Temp: 48000
4 cores, 500 MHz: execution time (avg/stddev): 18.4622/0.01
Temp: 49000
4 cores, 667 MHz: execution time (avg/stddev): 13.7924/0.01
Temp: 49000
4 cores, 1000 MHz: execution time (avg/stddev): 9.1759/0.00
Temp: 51000
4 cores, 1200 MHz: execution time (avg/stddev): 7.6418/0.00
Temp: 52000
4 cores, 1512 MHz: execution time (avg/stddev): 6.4735/0.00
Temp: 54000
8 cores, 100 MHz: execution time (avg/stddev): 48.0306/0.01
Temp: 48000
8 cores, 250 MHz: execution time (avg/stddev): 18.7397/0.01
Temp: 49000
8 cores, 500 MHz: execution time (avg/stddev): 9.2862/0.00
Temp: 50000
8 cores, 667 MHz: execution time (avg/stddev): 6.9622/0.00
Temp: 52000
8 cores, 1000 MHz: execution time (avg/stddev): 4.6392/0.00
Temp: 54000
8 cores, 1200 MHz: execution time (avg/stddev): 4.1788/0.01
Temp: 56000
8 cores, 1512 MHz: execution time (avg/stddev): 3.8117/0.01
Temp: 58000
Using default Khadas dual boot img (VIM2_DualOS_Nougat_Ubuntu-16.04_V171028)
$ uname -a
Linux Khadas 4.9.40 #2 SMP PREEMPT Wed Sep 20 10:03:20 CST 2017 aarch64 aarch64 aarch64 GNU/Linux
Yep, same numbers as @numbqq generated above confirming that you’re running with 1416 MHz maximum.
But I’ve not the slightest idea why the other test @numbqq made before shows such weird results. It seems cpufreq behaviour on Amlogic platforms is not reproducible (see also Amlogic still cheating with clockspeeds - Page 2 - Amlogic meson - Armbian Community Forums)
@numbqq However, I think you guys should ask Amlogic for binaries with unlocked DVFS. You don’t care much about those things when you are just using the device as a TV Box, but if you want to get serious about stuff like DIY, server, cluster, etc., you need to be able to tune the real performance/consumption ratio.
I have read that other companies have gotten that from Amlogic. I don’t think they should treat you differently just because you are a young company. If you need user support, I’m sure there’s many of us who would be willing to write mails to Amlogic complaining about their untrustworthy policy.
Only Hardkernel for their ODROID-C2. Neither FriendlyELEC (NanoPi K2 with S905) nor Libre Computer (‘Le Potato’) have blobs they are allowed to share. Read here what Da Xue (Libre Computer) writes: Some basic benchmarks for Le Potato? - Le Potato - Armbian Community Forums
Well, if they did it once, then they are obliged to do it with others. They have no right to make a distinction between “first-class” and “second class” companies.
I think so too. But Hardkernel’s issue was about 500Mhz difference of the promised speed. 1.5Ghz instead 2Ghz. I fear they don’t care about a -100Mhz difference.
I would love to see them care a lot more. SBC users depend a lot on what the SoC manufacturer releases for CPU control, hardware acceleration, … Too bad not enough people use SBC’s. Else they wouldn’t dare not to release important information.
Hi tkaiser,
I use different script that you provided.
#!/bin/bash
echo performance >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo performance >/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor
for o in 1 4 8 ; do
for i in $(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies) ; do
echo $i >/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo -e "$o cores, $(( $i / 1000)) MHz: \c"
sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=$o 2>&1 | grep 'execution time'
done
done
sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=8 2>&1 | egrep "percentile|min:|max:|avg:"
Result is:
1 cores, 100 MHz: execution time (avg/stddev): 58.1148/0.00
1 cores, 250 MHz: execution time (avg/stddev): 47.8097/0.00
1 cores, 500 MHz: execution time (avg/stddev): 63.7481/0.00
1 cores, 667 MHz: execution time (avg/stddev): 53.2392/0.00
1 cores, 1000 MHz: execution time (avg/stddev): 36.7519/0.00
1 cores, 1200 MHz: execution time (avg/stddev): 30.6434/0.00
1 cores, 1512 MHz: execution time (avg/stddev): 25.8836/0.00
4 cores, 100 MHz: execution time (avg/stddev): 12.0569/0.02
4 cores, 250 MHz: execution time (avg/stddev): 14.3230/0.00
4 cores, 500 MHz: execution time (avg/stddev): 12.1902/0.00
4 cores, 667 MHz: execution time (avg/stddev): 11.0352/0.00
4 cores, 1000 MHz: execution time (avg/stddev): 9.1944/0.00
4 cores, 1200 MHz: execution time (avg/stddev): 8.0781/0.00
4 cores, 1512 MHz: execution time (avg/stddev): 6.9720/0.00
8 cores, 100 MHz: execution time (avg/stddev): 11.7022/0.02
8 cores, 250 MHz: execution time (avg/stddev): 9.7152/0.01
8 cores, 500 MHz: execution time (avg/stddev): 7.3731/0.01
8 cores, 667 MHz: execution time (avg/stddev): 6.5240/0.01
8 cores, 1000 MHz: execution time (avg/stddev): 5.3011/0.01
8 cores, 1200 MHz: execution time (avg/stddev): 4.8013/0.02
8 cores, 1512 MHz: execution time (avg/stddev): 4.3739/0.02
min: 2.58ms
avg: 3.39ms
max: 30.63ms
approx. 95 percentile: 3.68ms
And the other script:
#!/bin/bash
echo performance >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo performance >/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor
for o in 1 4 8 ; do
for i in $(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies) ; do
echo $i >/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo $i >/sys/devices/system/cpu/cpu4/cpufreq/scaling_max_freq 2>/dev/null
case $o in
1)
TasksetParm="-c 0"
;;
4)
TasksetParm="-c 0-3"
;;
*)
TasksetParm="-c 0-7"
;;
esac
echo -e "$o cores, $(( $i / 1000)) MHz: \c"
taskset ${TasksetParm} sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=$o 2>&1 | grep 'execution time'
cat /sys/devices/virtual/thermal/thermal_zone0/temp
done
done
Result is:
1 cores, 100 MHz: execution time (avg/stddev): 382.9829/0.00
43000
1 cores, 250 MHz: execution time (avg/stddev): 148.9977/0.00
43000
1 cores, 500 MHz: execution time (avg/stddev): 73.8164/0.00
43000
1 cores, 667 MHz: execution time (avg/stddev): 55.2353/0.00
43000
1 cores, 1000 MHz: execution time (avg/stddev): 36.7397/0.00
44000
1 cores, 1200 MHz: execution time (avg/stddev): 30.5951/0.00
44000
1 cores, 1512 MHz: execution time (avg/stddev): 25.9128/0.00
45000
4 cores, 100 MHz: execution time (avg/stddev): 94.4586/0.01
43000
4 cores, 250 MHz: execution time (avg/stddev): 37.1176/0.01
44000
4 cores, 500 MHz: execution time (avg/stddev): 18.4188/0.00
45000
4 cores, 667 MHz: execution time (avg/stddev): 13.7993/0.00
45000
4 cores, 1000 MHz: execution time (avg/stddev): 9.1685/0.00
46000
4 cores, 1200 MHz: execution time (avg/stddev): 7.6367/0.00
46000
4 cores, 1512 MHz: execution time (avg/stddev): 6.4686/0.00
47000
8 cores, 100 MHz: execution time (avg/stddev): 47.7804/0.01
44000
8 cores, 250 MHz: execution time (avg/stddev): 18.7053/0.01
45000
8 cores, 500 MHz: execution time (avg/stddev): 9.2905/0.00
45000
8 cores, 667 MHz: execution time (avg/stddev): 6.9671/0.00
46000
8 cores, 1000 MHz: execution time (avg/stddev): 4.6269/0.00
48000
8 cores, 1200 MHz: execution time (avg/stddev): 4.1788/0.01
49000
8 cores, 1512 MHz: execution time (avg/stddev): 3.8022/0.00
50000
I don’t know how Hardkernel get custom binary, but I think we can try to ask Amlogic for it.