I was running one of the Khadas Ubuntu images when suddenly it froze. The heartbeat on the white LED stopped. After a short while it went into a boot loop dumping a load of what looked like register details to the screen on each boot.
I tried plugging it into another machine to flash a new image to eMMC but it won’t respect the three quick presses of the function key. The blue led lights ever so briefly and then it continues to attempt to boot normally - it returns to the boot loop.
Trying Krescue (tested on another VIM3 as working) results in one of the two attached images 90% of the time. 10% it starts Krescue but within 10seconds it locks up.
On one of the few occasions Krescue started, I got it to start restoring an image before it became unresponsive. The progress bar didn’t move and the write speed was 0 however the cumulative time kept counting and the estimated time to completion went up and up and up. It never wrote any data according to the UI.
It appears to have died. It’s less than a week old! Should I return it or is there hope?
(I’ve got images of two different failure modes but being a new user I’m only allowed to upload 1)
I’ve tried booting from the SD without pressing the function button. I get exactly the same issues. Either the VIM3 crashes producing one of the two images I’ve now uploaded, or it freezes within seconds of Krescue starting. It is impossible to run the EMMc test because it doesn’t stay functioning long enough to navigate to the test.
I’ve also tried a number of power supplies. Sadly this makes no difference either. I have 4 x VIM3 and all except this one work 100% off all of the power supplies tested. Only this unit crashes every time.
I’ve attached the image showing the other crash message on startup of the Krescue.
I will try later this evening with a clean build of the Krescue SD card however, the card boots perfectly my other VIM3 units.
I believe I’ve found the issue! It is thermal. When I came back to do more tests it was 100% fine … briefly!
I managed to get it to boot Krescue and erased the eMMC through the advanced menu. I was 10% of the way through loading a new image onto it when it froze once again. As usual, the flashing heartbeat led had stopped.
I tried again with Krescue but every time it failed or locked up on Krescue start. By now I was suspicious (I’ve a background in software and electronics). I left it for 10 minutes and sure enough all was OK for around 5 mins. Then it crashed again.
Once warm (i.e. it has been powered in the last 5 minutes), It will start an eMMC full erase but freezes before it completes. If you boot immediately after that it crashes on Krescue boot. If you leave it 5 minutes to cool off, you can try again but it will freeze within 5 minutes of applying power.
This does not appear to be a software issue. There is no consistency in when it crashes or what you’re doing when it crashes. There is a far greater correlation with how long the device has been powered.
On one of numerous screenshots I see:
thermal thermal_zome0: binding zone soc_thermal with cdev thermal-cpufreq-0 failed:-22
I’m using the Khadas passive heatsink right now. When it initially failed I was using the Khadas heatsink with fan and the fan worked fine. I also had the new expansion board and a Samsung EVO SSD that worked perfectly. I’ve removed both card and SSD to minimize potentially faulty components. It didn’t make a difference.
Tomorrow I’ll try replacing the thermal pads although even with no heatsink, I would expect the SoC to throttle and stay within acceptable thermal tolerances. Is that not correct?