I know but the usage of those two scripts and the different results show clearly 3 different types of problems that need to be fixed:
- 4.9 kernel and scheduler: Demanding tasks do not end up on the faster cores (cpu 0-3) but for whatever reasons are sent to the slower ones (cpu 4-7). This needs to be fixed in the kernel (maybe @narmstrong has an idea how?) since one of the results is that especially single threaded real world tasks that need performance end up being limited to 1000 MHz which is clearly something you do not want to have on a device advertised as being capable of 1500 MHz, right?
- The kernel has no control over cpufreq clockspeeds. When we want 1512 MHz all we get in reality are 1416 MHz instead. This is something that does not affect performance that much since it’s a difference below 10% but still it’s annoying buying something advertised as being 1.5 GHz capable and then get 1.4 GHz in reality while the kernel and all usual tools report bogus numbers (1512 while it’s 1416 in reality)
- The real problem is that the
bl30.bin
thing seems to do make some weird decisions depending on CPU affinity. Even when we tell the cpufreq driver to always use maximum clockspeeds (be it 1512 or 1416 on the faster cores is irrelevant this time) this is not what’s happening without fixed CPU affinity. So when we’re not usingtaskset
to pin tasks to specific CPU cores or clusters the firmware on the M3 decides on its own to do fancy things with real clockspeeds instead of using those the cpufreq driver demands. No idea why it’s that way but your results withouttaskset
clearly show totally weird numbers both below 1000 MHz and above. The purpose of the cpufreq driver framework is to control this behavour and not just to give some hints some proprietary firmware running somewhere else is free to ignore (totally trashing performance as a side effect)
IMO the only real fix would be a new firmware comparable to the situation with Hardkernel and S905 that fixes the following issues
- stop reporting bogus/faked values back to the cpufreq driver
- do what the cpufreq driver wants. If the driver demands 1512 MHz then set 1512 MHz, if the driver demands 100 MHz then do this as well (the user for whatever reasons might want to save energy – allow him to do this)
- stop the big.little emulation and treat all A53 in an equal way. It makes no sense to artificiialy differentiate between ‘fast’ and ‘slow’ cores if they’re all the same
And not related to the blob situation: the SMP/HMP scheduling needs a fix in your kernel since on S912 cpu 0-3 are always the cores where the work should end up first as long as the firmware plays big.LITTLE emulation.