[Khadas WiP] VIM4 NVMe IO Errors

Which system do you use? Android, Ubuntu, OOWOW or others?

Ubuntu 22.02 Desktop

Which version of system do you use? Khadas official images, self built images, or others?

Official from Oowow installer

Please describe your issue below:

When the CPU is under heavy load (compiling software) I am seeing a lot of IO errors to the NVMe. My OS is installed to eMMC but use NVMe for my data. NVMe is WD 2TB WD Green SN350.

Post a console log of your issue below:

[ 1019.135040] blk_update_request: I/O error, dev nvme0n1, sector 130301656 op 0x1:(WRITE) flags 0x0 phys_seg 29 prio class 0
[ 1019.135774] EXT4-fs warning (device dm-0): ext4_end_bio:309: I/O error 10 writing to inode 4064097 (offset 0 size 4096 starting block 16287195)
[ 1019.135782] buffer_io_error: 876 callbacks suppressed
[ 1019.135787] Buffer I/O error on device dm-0, logical block 16287195
[ 1019.136588] EXT4-fs warning (device dm-0): ext4_end_bio:309: I/O error 10 writing to inode 4064090 (offset 0 size 8192 starting block 16287196)
[ 1019.136594] Buffer I/O error on device dm-0, logical block 16287196
[ 1019.137385] Buffer I/O error on device dm-0, logical block 16287197
[ 1019.138208] EXT4-fs warning (device dm-0): ext4_end_bio:309: I/O error 10 writing to inode 4064091 (offset 0 size 8192 starting block 16287198)
[ 1019.138214] Buffer I/O error on device dm-0, logical block 16287198
[ 1019.139010] Buffer I/O error on device dm-0, logical block 16287199
[ 1019.139827] EXT4-fs warning (device dm-0): ext4_end_bio:309: I/O error 10 writing to inode 4064093 (offset 0 size 65536 starting block 16287200)
[ 1019.139832] Buffer I/O error on device dm-0, logical block 16287200
[ 1019.140635] Buffer I/O error on device dm-0, logical block 16287201
[ 1019.141446] Buffer I/O error on device dm-0, logical block 16287202
[ 1019.142258] Buffer I/O error on device dm-0, logical block 16287203
[ 1019.143070] Buffer I/O error on device dm-0, logical block 16287204

1 Like

I have switched from using a generic USB-C power supply to using 12vdc 15A regulated power supply and the issue is still happening. The NVMe device tests fine in a PC.

1 Like

Hello @RIGeek

We will check this issue.

Thanks.

I’ve installed a heatsink on the NVMe and also put a large 80mm fan on the VIM4 that blows across the top and bottom of the board. It seems I’m getting less write errors now but it’s still happening after the device is under full load for a while.

1 Like

Continuing on, I still feel this is heat related so here is what I’ve done. I removed the heatsink, applied thermal pads to the RAM. I removed the M2X board and placed thermal pads on the 2 ICs that are on the bottom of the VIM4. I also put thermal pads under my NVMe so that the M2X is now being used as a heatsink. It’s much more stable now but still gets write errors after being under 100% load for a while.

1 Like

Today I tried something different. I wanted to make an image of the NVMe. The system was under little load but after about 30 minutes of constant reading from the NVMe and the IO errors started. The NVMe (controller and actual memory chip) never went over 45c as I was monitoring it with an IR (FLIR) camera.

1 Like

I will keep posting any new info as I get it. I am planning to order a different NVMe device to see if it is an issue with this model NVMe.

Don’t know if you saw this but you are not alone

https://forum.khadas.com/t/970evo-1tb-on-vim4-new-m2x/15510/65

2 Likes

Hello @RIGeek @technodevotee @JeremiahCornelius

We need collect more SSD models about the NVMe SSD which don’t work.

As far as I know, the follow models have issues from your side.

  • Samsung 970 EVO 1TB
  • WD 2TB WD Green SN350
  • Netac N930E Plus
  • Sabrent Rocket Nano NVMe PCIe M.2 2242 SSD

Do you guys have other models you test and have issues?

Here is the test results from my side:

  • Kingston A2000 - Works
  • Samsung 980 250GB - Doesn’t work
  • WD 250GB WD Green SN550 - Doesn’t work
1 Like

For clarification, this is the “Sabrent Rocket Nano NVMe PCIe M.2 2242 SSD”

Best,
— Jeremiah

1 Like

I have searched around and do not have surplus NVMe drives. I have lots of surplus SATA and SAS drives. I am still experimenting to see what might be the cause. I’ve ruled out heat and system load completely now. The issue happens after extended high IO to or from the NVMe. It seems that the command queue of the NVMe might be being exceeded. I’ve not been able to prove this. Maybe over this weekend I can put more time into it.

1 Like

That sounds feasible to me given my experience.

When I tried making an image to my Netac formatted as NTFS, it was pretty slow (~35 MB/sec) but managed about 12GB. When I tried the same thing with it formatted as ext4, it was fast (~125 MB/sec) but failed after only a few GB.

So, it seems that the system handles NTFS quite slowly and not just on NVME because SD cards formatted as NTFS are much slower than those formatted as ext4 as well.

For whatever reason, it seems there’s a bottleneck that allows more data to be written before it craps out entirely.

Not that what actually gets written is any good, as I discovered when all my dashcam videos got corrupted.

1 Like

My NVMe is partitioned as LVM and formatted as ext4. I wanted ZFS but I would have had to rebuild the kernel.

1 Like

What nvmes work? @numbqq only: Kingston A2000?

Hello @RIGeek @technodevotee @Jart25

We have reproduced this issue on our side and we are are working on it now.

4 Likes

Whoowee! This is great.

great news, commenting just to follow the thread, as my 970 pro has the same IO buffer errors mentioned by others.

2 Likes

I mentioned above that I had added thermal pads to my NVMe device. While this did not help the issue, I still wanted to share this as NVMe devices do produce a decent amount of heat under load. The small controller chip, closest to the socket is typically where the majority of heat will be produced too. The M2X board has a lot of copper in it so it should work as a decent heat spreader.

1 Like

I can corroborate this, with a 1TB Sabrent Rocket m.2 NVMe. I have a thermal pad between the SSD and the M2X expansion board, and a sizable heatsink thermally adhered to the drive itself. Operating temperature averages 47c, never rising above 50c. The M2X becomes noticeably warm to the touch, validating your observation that it is a good dissapator, without ever becoming hot.

Regardless, the NVMe data corruption, read-only lockups, and occasional system freezes still occur when external cooling drops op temp to 40c.

— Jeremiah

1 Like

I have a couple aluminum heatsinks ready to adhear to the NVMe storage and controller chip but since putting the thermal pads, mine has not gone over 50c. After the io errors are addressed, I might need to add more cooling.

1 Like