Yes, it looks like marketing bla bla.
But HardKernel did push AMLogic to release an unlimited firmware (bl30.bin) . And the got a new one from AMLogic with all frequencies unlocked up to 2 GHz.
So maybe “bl30.bin” for VIM2 has already a complete table of frequencies or Khadas stuff can push AMLogic to get new “bl30.bin” like HardKernel did.
Nope, it’s 1448 MHz max and also as soon as all 4 big cores are busy at the same time this will be decreased to just 1200 MHz.
Pretty easy to test this out BTW with benchmarks like sysbench who scale linear with count of CPU cores. So simply try out these three tests and compare the results to see how real cpufreq clockspeeds look like:
sysbench --test=cpu run --num-threads=1 --cpu-max-prime=20000
sysbench --test=cpu run --num-threads=2 --cpu-max-prime=20000
sysbench --test=cpu run --num-threads=4 --cpu-max-prime=20000
tested the mhz tool on a vim2pro (with performance cpufreq governor, cpufreq-info reporting 1.51Ghz on cores 0-3, 1000 Mhz on cores 4-7), here’s the result:
@Gouwa have you got any thoughts on this thread? Dont suppose you have talked to amlogic about it? Is it possible to get a different bit of code from them to adjust it? Thanks!
I’ve got another process eating up all other cores, I think that’d be equivalent to sysbench, but I can give sysbench a try.
I’ve seen the other thread as well. I’ve been observing various eyebrow-raising results when using various s912 boards, and this indeed does raises some questions…
Well, while sysbench is a pretty lousy tool to measure hardware performance of different architectures it’s really great when doing these sorts of tests since the whole job is done inside the CPU’s caches so not influenced by memory bandwidth/latency and also scaling linearly with count of CPU cores (so you can compare --num-threads=1 with --num-threads=8 and if the latter number is not 8 times lower you know there’s something wrong)
I posted over there a simple script able to repeat @balbes150’s tests from last year that clearly showed back then that with multithreaded CPU loads clockspeeds further decrease. Should be just a matter of minutes to repeat…
Back then when @balbes150 thankfully tested for me I was not aware that S912’s boot blob wants to play little.LITTLE (it’s an octa-core A53 design with two little clusters so there’s no big.LITTLE here, it’s just Amlogic for whatever funny reasons shipping this SoC with a firmware that artificially limits 4 CPU cores to 1.0 GHz and 4 CPU cores to 1.4 GHz while faking the clockspeed readouts of the faster cluster for whatever reasons)
So when a load like sysbench is running that neither depends on external memory bandwidth nor on anything else happening outside the CPU cores the result with an 8 thread load is an average 1.2 GHz clockspeed and that was exactly what sysbench reported back then.
In reality the situation with S912 is much worse since while with a full load on all 8 cores at least all 4 ‘fast’ cores at 1.4 GHz are utilized with normal workloads that are single-threaded it can happen easily that a demanding task ends up on one of those bottlenecked CPU cores then limited to 1000 MHz.
On average tasks that are single-threaded are slower on Vim 2 than on Vim since on the latter all 4 CPU cores are allowed to clock at up to 1.4 GHz while on Vim2 for whatever funny reasons the scheduler keeps tasks on the artificially bottlenecked CPU cores and limiting single-threaded loads to 1 GHz.
It seems to me that what probably happened is AML set out with the intention of releasing a real big little design, but under testing discovered that the SOC die was way to flakey at these speeds, with throttling and unacceptably high failure rates. Its bad enough that there chip can reach 90C as it is currently configured. Performance = heat and thats basic physics, and the cost of providing an adequate cooling solution is simply beyond the margins of their target market. Instead of admitting that their design was flawed they simply lied and made their software lie.
However none of this really matters to 99.9% of their user base since the chip is well capable of running Android as a media player at performance way beyond just about all of the similarly priced products. Also in the target market of media players performance of each CPU is totally unimportant - but the ability to do multiple tasks in the background competently and in parallel is critical - hence eight low spec CPU’s makes perfect sense and high performance high speed cores make none.
This just shows what an absolutely shitty company AMLogic is. They have shat on their reputation in the western market and it seems that they have abandoned most plans to stay within that market. Why , because they have spotted an opportunity to sell tightly controlled TV boxes to the domestic Chinese market - with the software locked down and the hardware optimised for this sole purpose.
Hobby SOC user’s are about as important to AML as a nat on the arse of an elephant.
If you want a decent product then move along and spend a bit more money on one of the Samsung based chips.
Its all a bit disgusting really. Its bad enough that I have sworn off ever buying a AML SOC product again and am fairly lairy of touching any arm based product that isn’t a Android phone (where they excel). Intel based SOC products are infinitely better performance and peripheral support wise, and are getting to be cost and energy competitive with the ARM based SOC chips and that is where my money will go in the future.
Thanks a bunch. So even 7-zip’s benchmark mode is sufficient to confirm Amlogic cheating with clockspeeds
On the other hand for whatever reasons the ‘per core per GHz’ performance also differs a lot between both clusters:
0-3: 3999 7-zip MIPS at 1415 MHz → 707 per core @ 1 GHz
4-7: 2270 7-zip MIPS at 1000 MHz → 568 per core @ 1 GHz
I fail to interpret the numbers since especially decompression speed is almost twice as fast on cores 0-3 (see comments at the bottom of https://www.7-cpu.com)
@g4b42 In case time permits are you able to run sbc-bench neon on your Vim2?
We started to assemble some benchmarks over there https://github.com/ThomasKaiser/sbc-bench and included a lot of monitoring to get a clue what’s going on on platforms that behave somewhat strange (RPi and Amlogic S905X/S912 as best examples).
khadas@Khadas:~$ uname -a
Linux Khadas 4.17.3 #1 SMP PREEMPT Mon Jul 30 03:06:44 UTC 2018 aarch64 aarch64 aarch64 GNU/Linux
Result:
khadas@Khadas:~$ sudo /bin/bash ./sbc-bench.sh neon
Average load is above 0.1. Way too much background activity.
System too busy for benchmarking: 06:14:56 up 1:55, 3 users, load average: 0.14, 0.19, 0.43
System too busy for benchmarking: 06:15:01 up 1:55, 3 users, load average: 0.12, 0.19, 0.42
System too busy for benchmarking: 06:15:06 up 1:55, 3 users, load average: 0.11, 0.19, 0.42
System too busy for benchmarking: 06:15:11 up 1:55, 3 users, load average: 0.10, 0.18, 0.42
System too busy for benchmarking: 06:15:16 up 1:55, 3 users, load average: 0.10, 0.18, 0.42
sbc-bench v0.4
Installing needed tools. This may take some time... Done.
Checking cpufreq OPP... Done.
Executing tinymembench. This will take a long time... Done.
Executing OpenSSL benchmark. This will take 3 minutes... Done.
Executing 7-zip benchmark. This will take a long time... Done.
Executing cpuminer. This will take 5 minutes... Done.
Checking cpufreq OPP... Done.
Memory performance (big.LITTLE cores measured individually):
memcpy: 1922.6 MB/s (1.0%)
memset: 5917.9 MB/s (0.4%)
memcpy: 1756.5 MB/s (0.9%)
memset: 5112.5 MB/s
Cpuminer total scores (5 minutes execution): 8.61,8.60,8.59,8.58,8.57 kH/s
7-zip total scores (3 consecutive runs): 5421,5483,5460
OpenSSL results (big.LITTLE cores measured individually):
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-128-cbc 126769.69k 374826.41k 716033.19k 955438.42k 1059151.87k 1064255.49k
aes-128-cbc 89687.37k 265021.03k 506280.02k 675587.75k 748544.00k 753303.55k
aes-192-cbc 120395.16k 331951.45k 583681.45k 736646.83k 797349.21k 800828.07k
aes-192-cbc 85160.92k 235401.39k 412691.03k 520572.25k 563516.76k 566127.27k
aes-256-cbc 92078.63k 260123.78k 468108.63k 601614.68k 655682.22k 659603.46k
aes-256-cbc 82717.74k 216091.39k 357307.82k 435378.86k 465046.19k 466780.16k
Full results uploaded to http://ix.io/1iJ7. Please check the log for anomalies (e.g. swapping
or throttling happenend) and otherwise share this URL.
The ‘big’ cluster with this kernel is cluster 0 unlike all real big.LITTLE implementations out there where the little cluster is 0. Therefore in sbc-bench monitoring output big.LITTLE column shows frequencies wrong
Willy Tarreau’s tool again confirms that the 1512 MHz the kernel is talking about are just 1414 MHz in reality
tinymembench numbers for both clusters are slightly different but that’s most probably just related to different clockspeeds the 2 A53 clusters are allowed to run at. I wonder how it would look like when doing echo 1000000> /sys/devices/system/cpu/cpu${1}/cpufreq/scaling_max_freq ; taskset -c 0 /tmp//tinymembench/tinymembench (forcing the ‘big’ cluster also to 1GHz and then executing tinymembench there)
Little throttling happened when running the 7-zip benchmark single threaded on each cluster. But since your image relies on zram for swap this should’ve not affected results
I added your numbers to results. Would be very interesting to repeat the test with a Debian Stretch and/or 4.9 kernel (if that one is still in use – I have not kept track with meson64 kernel situation)