VIM3 - spontaneous dead short

prozor · May 31, 2021, 7:25pm

Hi. I got my VIM3 about a year ago and I’ve been using it as a small home media server, VPN server, IRC bouncer, MQTT server and stuff like that, nothing heavy. It was kept in the official plastic case, with the official heatsink+fan, powered by the official power supply and the cable that I bought directly from the Khadas AliExpress store. Just sitting on my desk, doing what it’s been doing the past year, with temperatures constantly <60°C, running Ubuntu Legacy.
Last night I noticed its services were off, and the power LED was off. The external HDD’s LED was also off. Unplugging and plugging the cable back in didn’t work, neither did different cables and power supplies. I tried a power supply with a USB power tester in series, and the moment I would plug the VIM3 in, the USB tester would go out. I took a multimeter, set it to ohms and found that it read a short (0.9 ohms exactly) between DCIN and GND on the VIN connector.
At this point, it seems like it’s fried somewhere before or at the U1 and U2 power ICs. I am upset as I was really relying on the VIM3 and did not expect it to fail spontaneously after not even a full year of use.
I am handy with electronics, I own an oscilloscope and can do some diagnostics myself. I am open to any and all advice and technical support in order to try to figure out what the problem is and fix the board.

goenjoy · June 1, 2021, 1:37am

@Totti Please help…

Totti · June 1, 2021, 1:53am

What version of VIM3 was you keep? This is very important for me to analyze the problem.

prozor · June 1, 2021, 8:15am

Ah, it’s V12, I am sorry for not including this info in my original post. Thanks for the quick reply.

Totti · June 1, 2021, 9:30am

OK, I know;
You can remove the C8/C9 and then measure the resistance between DCIN and GND; Tell me the result.
Thanks!

prozor · June 1, 2021, 12:58pm

I removed the capacitors and the short was gone! I had a few spare ones of the right value so I soldered them on and the VIM3 is working normally. Thank you! As a hobbyist, and a maker who loves DIY, I must say I’ve never been let down by your quick service regarding any questions or requests I had in the past year.

Still, I often run experiments that last a week or a month and store sensor data to the Khadas. This could have easily happened while an experiment was running, and I would’ve lost data. The fact that this happened at all after not even a year is pretty bad and in the future, I likely won’t be considering Khadas for running anything serious.

Thank you for the help once again!

Gouwa · June 2, 2021, 2:32am

Thanks for the updates and just let us know if any further questions there

Good day!

RDFTKV · June 2, 2021, 3:43am

I do not think it can be considered an indicator of future events. I also do not think this is typical, more of a fluke. Any device, from any manufacturer, can experience component failure at any time. Sometimes a weak component can get by the best QC(Quality Control) screening. Also a component can be damaged from unseen line spikes, surges, sags or other forces.
Critical or essential systems will always have several failsafes built in to their design, Typical consumer or maker SBCs likely do not. However, even triple redundancy has been known to fail, so even failsafes are not failsafe 100% of the time.

Glad you got it fixed. Good work.

prozor · June 2, 2021, 10:05pm

That is true, and from what I can see, the board is designed just fine, these caps are properly rated etc., so it comes down to the failure rate of this particular part, which depends on the capacitor manufacturer’s QC. If I experience a failure, I am more inclined to think that it’s not a one-in-a-million shot and that there’s something behind it, simply because if it’s that rare (and an MLCC cap, if I’m not mistaken, is really not something you expect to literally short out in less than a year at half its rated voltage), I must’ve been really unlucky. On the other hand, it is possible that it’s a bad batch, a poor manufacturer with high failure rates (i.e. poor component sourcing on Khadas’ part), but it is also certainly possible that everything was fine and I was, indeed, just that unlucky, because you can never have a zero failure rate. From a company’s point of view, you have the means to assess failure rates with big numbers and get meaningful statistics that tell you something. My position as a consumer doesn’t give me many options to assess the build quality of boards I have bought or might buy in the future. Given my sample size of 1 Khadas board and 3-4 other boards over the past decade, I really cannot reject any possibility whatsoever, definitely including the possibility that this was just an unfortunate fluke. But, given the choice, and lacking ways to get more information, in the end I am still more likely to go with the other manufacturers’ boards that have been used and (unlike this VIM3) abused, for years, one of them almost a decade now, and are still running great. So, while I generally do agree with you, this is the best choice I can make personally, given the circumstances. Thanks, and best regards

birty · June 3, 2021, 8:29pm

Ive got a bit of a bigger sample size - have been running 90 VIM2 24*7 at full load for over 2 years and have got 35 VIM3 running as well now, not for as long but at high load. Ive not had a single board failure so far - the build quality of the VIM3 is better than quite a few of the other SBC ive used over the years and is the reason i’ve stuck with them even though not the cheapest for my application.

MTBF of any given component will be over some form of distribution and you might have just got unlucky with that capacitor. Someone unfortunately gets the earlier failures, otherwise regardless of distribution the MTBF would increase (unless they all happen to die at exactly the same point…)

prozor · June 4, 2021, 6:47am

Nice, that’s very valuable info for me. Thank you!

ufneeme · June 8, 2021, 6:33am

A general advise against data loss on any gathering device, Khadas or not.

If you’re gathering data on a single device, this device becomes SPOF (singe point of failure). If your data has value, some measures should be taken to duplicate it. Sending the data regularly to a server is a usual way to handle the problem effectively. Of course the server has to have regular backups too.

Btw, in most cases there is no need to send all of the readings to the server. It may be wiser to average the actual readings on spot into certain real-time clock synced time-intervals - like one minute or 10 seconds for example, to get one averaged reading per “absolute clock” time interval.

Syncing the averages with the real time clock may be sometimes skipped, but it’s strongly advisable if the data has any scientific value.