Quectel EM06-E issue (memory leak?) on VIM3

Which Khadas SBC do you use?

VIM3 Basic with M2X WWAN Extension

Which system do you use? Android, Ubuntu, OOWOW or others?

Ubuntu

Which version of system do you use? Khadas official images, self built images, or others?

vim3-ubuntu-20.04-gnome-linux-4.9-fenix-1.1.1-220725-emmc.img, and “apt full-upgrade” last week

Please describe your issue below:

I set up the 4G module according the documentation khadas. I have the network but I found the errors in the log dmesg, and after several hours the network will disconnect and reconnect. And the network speed will be slower. I tried one solution to add “vm.min_free_kbytes = 32768” in “/etc/sysctl.conf”, this delays the occurrence of the error but not permanently solve the problem.
This error occurs immediately when the memory is occupied. Looks a bit like a memory leak in the network.

I found some discussions on the similar issue on Raspberry Pi: "eth0: kevent 2 may have been dropped" is still here · Issue #309 · raspberrypi/linux · GitHub

Post a console log of your issue below:

Could you please help me to fix it? Thanks in advance.
@Frank @numbqq

Hello @JJ1997

@ivan.li will help you about this issue.

Thanks a lot! Looking forward to your reply @ivan.li

@JJ1997

  • I haven’t had the problem you mentioned for the time being.
khadas@Khadas:~$ dmesg | grep cdc_mbim
[   12.586839] cdc_mbim 1-1.3:1.4: cdc-wdm0: USB WDM device
[   12.587189] cdc_mbim 1-1.3:1.4 wwan0: register 'cdc_mbim' at usb-xhci-hcd.0.auto-1.3, CDC MBIM, ea:5d:d0:4c:04:e7
[   12.587235] usbcore: registered new interface driver cdc_mbim

@ivan.li, I think that this error only occurs when the RAM memory is fully occupied. In the case of free memory, because the network module has a lot of memory available, although there is a “memory leak” problem, we cannot see the error. Did you test with the memory occupied? What is your current kernel version? I try to update to the same version. Thank you.

Hello @JJ1997

Please try to check with this version: https://dl.khadas.com/products/vim3/firmware/ubuntu/emmc/vim3-ubuntu-20.04-gnome-linux-4.9-fenix-1.4-221229-emmc.img.xz

Hello @numbqq and @ivan.li,

I tried the version you sent (with the kernel version 4.9.241 after updating), but the error still occurs when the RAM memory is occupied. You can use the command in a tmux session to occupy the memory:

stress --vm-bytes $(awk ‘/MemAvailable/{printf “%d\n”, $2 * 0.99;}’ < /proc/meminfo)k --vm-keep -m 1

If the memory is freed, the error will stop appearing, but 4G performance has degraded.

Hello @ivan.li, did you reproduce the error? Please let me know if I need to do other tests. Thanks a lot.

@JJ1997

After I successfully connected, no error log appeared. But after the memory runs out, it appears.

I guess it was an app that caused the memory leak. Do you run any special apps?

Hello @ivan.li, I need to run an AI inference program which uses most of the memory. While the program is running, it has no effect on the use of other units or other network modules such as Wi-Fi, Ethernet or other 4G USB sticks.

I tried to set “vm.min_free_kbytes = 32768” in “/etc/sysctl.conf”, which forces the kernel’s memory manager to keep at least 32MB of free memory in order to ensure proper functioning of the system itself. The reserved memory will not be allocated to other programs.
In this case, the error with the Quectel 4G module did not appear immediately, but after about 2 days, the error appeared. That is to say, even if the memory does not run out, the memory will be slowly occupied by the 4G module over time.
Just like the previous stress test, we just occupied the memory, not a memory leak or a momory problem, but the 4G module had an error. This is not normal.

I think it’s the driver or kernel who has a memory problem with the 4G module, it gradually fills up the reserved memory. Currently I can only temporarily resolve this error by rebooting. I will do more tests to see where the memory leak is.

@JJ1997
We are trying to reproduce the problem and it may take some time.

Hello @ivan.li, thanks a lot. I wait the test results on your side.

Is it possible to reboot/reset the 4G module without rebooting the vim3? For example by sending AT commands or restart driver or network unit etc. Because we can only temporarily fix this problem by rebooting vim3 each day. It’s inconvenient.

Hi~ @JJ1997
Sorry for the late reply :sweat_smile:. This problem has been included in my work plan for this week. I’ve done one a few times before. The most illustrative test was a two-day, two-night test. According to the data presented, the 4G module does not seem to be the real culprit.
Can I offer you something that will allow me to actually recreate your usage scenario. Let me look for more clues.

Sorry again for the late reply!!!

Hi @ivan.li
Thanks for your efforts. You have reproduced the problem but did not find the cause, do I understand correctly?
I think you meant that we offer you something instead of you offer us something. If yes, because our program involves our IP like code, algorithm, AI model etc, it’s not easy to install and configure by yourselvs, and it is difficult to provide it to you to recreate our usage scenario.
We will try to provide you with the information you need as much as we can. Our program does inference on the NPU 24/7. The previous stress test which ran out memory can also reproduce the problem. In the idle case (such as your screenshot I think), we have not observed this issue. Could you tell us more about your current progress and the information you need?
Thanks a lot!

We haven’t reproduced this issue only with 4G module running without other applications.

So the issue occurred only when the memory is low, right? While your AI inference program which uses most of the memory and the issue occurred. Can you try to optimize your program to use less memory? Have you checked your program about the memory?

If you doubt there is a memory leak with the 4G module driver then we need to use other way to reproduce this issue.

@numbqq , we run our program before with 4G USB sticks, Wi-Fi or Ethernet, we didn’t experience any issues like this. Now we try to switch to 4G M.2 module, we have this issue.

When the memory is occupied by our program, our program didn’t cause system freezes or crashes. As the 4G module belongs to the basic operation of the system, I don’t think it is normal.

Even if we limit the memory used by our program, the error don’t appear right away, after 24h for example, it will appear. Our program restart each day, the error should either appear in the 24h timeframe between restart or not at all, but the error is always here after the first 24h. The only solution we found now is a reboot Ubuntu each day.

For now we found the issue only when the memory is low. We haven’t checked our program about the memory, but a normal stress test can cause this problem. Even with memory usage optimized of our program, I’m afraid I’ll end up in the same situation as limiting memory: the error don’t appear right away, but after a while, it will appear.

I have no idea. There are some discussions on the Internet about similar issues, [all variants] dmesg - smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped, [PATCH] usbnet: smsc95xx: add reset_resume function with reset operation - Joonyoung Shim
Do we have similar configuration for Quectecl?

Hello @numbqq , Based on other discussions on the Internet and our previous experience with the program using other network modules, we doubt there is a problem with driver or kernel. Do you have some official examples to run AI inference on VIM3 to avoid using our program to reproduce the issue? As we don’t have much experience, it will be difficult for us to do memory analysis for our program, it should take longer. Thanks a lot.

Not sure, we will check with Quectecl about this issue.

Hello @JJ1997

Can you try to get some information from /dev/ttyUSB2 via AT comands?

AT+QCFG="usbnet"
AT+QUSBCFG?