Face_recognition npu example fails to run

@numbqq Now we have a fix for the network crash issue, it would be really helpful to be able to have a Debian build capability using Fenix

Hello @davidharding

@Electr1 can you try to build the debian images with 5.15 kernel?

1 Like

Hello @davidharding I will provide you with debian images shortly.

1 Like

Hello @davidharding @RichardPar

please try this Debian image and provide us feedback, please note it will still have some known bugs, if you find anything out of the blue, please let us know!

https://github.com/sravansenthiln1/images/releases/download/v0.4/vim4-debian-11-server-linux-5.15-fenix-1.5.2-231031.img.xz

Hi @Electr1 @numbqq
I have installed the Debian 5.15 server image, and on a clean install things appear to work fine.
I have seen no kernel panics, and the npu examples work as expected.

However, after installing various packages, I am again seeing a problem with the npu function.
I get the following error

root@Khadas:/media/nvme/khadas/vim4_npu_applications-master/face_recognition/build# sudo ./face_recognition -M …/data/model/retinaface_int8.adla -m …/data/model/facenet_int8.adla -p 1
adla usr space 1.2.0.5
E NN_SDK:[aml_adla_create_network_common:357]Error: create network fail.
amlnn_init is fail

What additional logs can I source to investigate this further?

To reproduce
(1) Flash the emmc with the image
(2) Install the face_recognition demo
(3) Attempt to run the demo, and it should work successfully
(4) apt install runc
(5) Attempt to run the demo, and it should work successfully
(6) apt install containerd
(7) Attempt to run the demo, and it should now fail with the mentioned error

@davidharding debian images do not come with the same packages as Ubuntu image, there may be things missing regarding the npu, this is standard.

we will check this and provide you with necessary packages

Hello @davidharding

Can you check the new image ? It works on my side. You can use OOWOW to install vim4-debian-11-server-linux-5.15-fenix-1.5.2-231102-develop-test-only online.

khadas@Khadas:~/vim4_npu_applications/face_recognition/build$ 
khadas@Khadas:~/vim4_npu_applications/face_recognition/build$ cat /etc/fenix-release 
# PLEASE DO NOT EDIT THIS FILE
BOARD=VIM4
VENDOR=Amlogic
VERSION=1.5.2
ARCH=arm64
INITRD_ARCH=arm64
IMAGE_VERSION=1.5.2-231102
################ GIT VERSION ################
UBOOT_GIT_VERSION=khadas-vims-u-boot-2019.01-v1.5.2-release-709-g1b24e6d
LINUX_GIT_VERSION=v5.15.78-6346-g33a25ce
FENIX_GIT_VERSION=v1.5.2-132-g203e2c6
#############################################
khadas@Khadas:~/vim4_npu_applications/face_recognition/build$ 
khadas@Khadas:~/vim4_npu_applications/face_recognition/build$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 11 (bullseye)
Release:        11
Codename:       bullseye
khadas@Khadas:~/vim4_npu_applications/face_recognition/build$ 
khadas@Khadas:~/vim4_npu_applications/face_recognition/build$ 
khadas@Khadas:~/vim4_npu_applications/face_recognition/build$ sudo ./face_recognition -M ../data/model/retinaface_int8.adla -m ../data/model/facenet_int8.adla -p 1
adla usr space 1.2.0.5
adla usr space 1.2.0.5
[ 1134.260636][1 T416   ..] adlak_core clk requirement of 800000000 Hz,and real val is 799999988 Hz.
khadas@Khadas:~/vim4_npu_applications/face_recognition/build$ 
khadas@Khadas:~/vim4_npu_applications/face_recognition/build$ 
khadas@Khadas:~/vim4_npu_applications/face_recognition/build$ sudo ./face_recognition -M ../data/model/retinaface_int8.adla -m ../data/model/facenet_int8.adla -p ../data/img/lin_2.jpg
adla usr space 1.2.0.5
adla usr space 1.2.0.5
lin_2.dat
1.000000
lin_1.dat
0.873803
lin_3.dat
0.795178
xu_1.dat
0.457403
xu_3.dat
0.377446
xu_2.dat
0.307754
class:face,label_num:0,prob:0.999055,left:30,top:55,right:128,bot:158
khadas@Khadas:~/vim4_npu_applications/face_recognition/build$ 

I have tried with the latest image on OOWOW

PLEASE DO NOT EDIT THIS FILE

BOARD=VIM4
VENDOR=Amlogic
VERSION=1.5.2
ARCH=arm64
INITRD_ARCH=arm64
IMAGE_VERSION=1.5.2-231102
################ GIT VERSION ################
UBOOT_GIT_VERSION=khadas-vims-u-boot-2019.01-v1.5.2-release-709-g1b24e6d
LINUX_GIT_VERSION=v5.15.78-6346-g33a25ce
FENIX_GIT_VERSION=v1.5.2-132-g203e2c6

#############################################
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 11 (bullseye)
Release: 11
Codename: bullseye

It is definitely the installation of β€œcontainerd” that causes the problem.
Before that the example works fine.

I can’t track this fault down further myself, as the fault appears to originate from within the libnnsdk.so

This is 100% reproducible

What you mean about this ? Just install containerd package will break the npu? Can you provide the reproduce steps?

Here are the steps on my side, it works.

khadas@Khadas:~/vim4_npu_applications/face_recognition/build$ sudo apt install containerd
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  runc
Suggested packages:
  containernetworking-plugins
Recommended packages:
  criu
The following NEW packages will be installed:
  containerd runc
0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 16.8 MB of archives.
After this operation, 77.9 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://mirrors.tuna.tsinghua.edu.cn/debian bullseye/main arm64 runc arm64 1.0.0~rc93+ds1-5+deb11u2 [2,078 kB]
Get:2 http://mirrors.tuna.tsinghua.edu.cn/debian bullseye/main arm64 containerd arm64 1.4.13~ds1-1~deb11u4 [14.7 MB]
Fetched 16.8 MB in 1s (21.0 MB/s)    
Selecting previously unselected package runc.
(Reading database ... 170434 files and directories currently installed.)
Preparing to unpack .../runc_1.0.0~rc93+ds1-5+deb11u2_arm64.deb ...
Unpacking runc (1.0.0~rc93+ds1-5+deb11u2) ...
Selecting previously unselected package containerd.
Preparing to unpack .../containerd_1.4.13~ds1-1~deb11u4_arm64.deb ...
Unpacking containerd (1.4.13~ds1-1~deb11u4) ...
Setting up runc (1.0.0~rc93+ds1-5+deb11u2) ...
Setting up containerd (1.4.13~ds1-1~deb11u4) ...
Created symlink /etc/systemd/system/multi-user.target.wants/containerd.service β†’ /lib/systemd/system/containerd.service.
Processing triggers for man-db (2.9.4-2) ...
khadas@Khadas:~/vim4_npu_applications/face_recognition/build$ 
khadas@Khadas:~/vim4_npu_applications/face_recognition/build$ sudo ./face_recognition -M ../data/model/retinaface_int8.adla -m ../data/model/facenet_int8.adla -p 1
adla usr space 1.2.0.5
adla usr space 1.2.0.5
[   59.351966][1 T422   ..] adlak_core clk requirement of 800000000 Hz,and real val is 799999988 Hz.

If I do the same thing as you, I get different results
I have copied below the complete console output, from a first time boot of a fresh install of the debian 5.15 server image via OOWOW
This is still 100% reproducible for me

I get the same error from the steps …

Additionally, Its VERY SLOW to load β€˜mc’ (Midnight Commander)

Hello @RichardPar @davidharding

I will check again on my side.

Strange things happening…

I ran gdb with the face_detection and it started working - the failure went away! Something is wonky :stuck_out_tongue:

MC is still slow though

An strace comparison between a working and not working setup
There appears to be a problem created the β€œadlau_thread_0”

Working
clone(child_stack=0x7fb3993c70, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[4225], tls=0x7fb3994a70, child_tidptr=0x7fb3994440) = 4225
sched_setscheduler(4225, SCHED_FIFO, [99]) = 0

Not Working
clone(child_stack=0x7fac275c70, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[4156], tls=0x7fac276a70, child_tidptr=0x7fac276440) = 4156
sched_setscheduler(4156, SCHED_FIFO, [99]) = -1 EPERM (Operation not permitted)

I am running the LTP test (Linux Test Project)

Most of the scheduler tests fail

root@Khadas:/opt/ltp/results# cat LTP_RUN_ON-2023_11_09-22h_22m_48s.log |grep FAIL
bpf_prog06 FAIL 33
epoll_pwait03 FAIL 1
ioctl_loop01 FAIL 1
ioctl_loop02 FAIL 1
fanotify10 FAIL 36
openat04 FAIL 1
sched_rr_get_interval01 FAIL 1
sched_rr_get_interval02 FAIL 1
sched_rr_get_interval03 FAIL 1
sched_setparam02 FAIL 1
sched_setparam03 FAIL 2
sched_getscheduler01 FAIL 1
semctl09 FAIL 1

Hello @RichardPar @davidharding

Can you try to upgrade the kernel and check whether this issue still exist?

$ wget https://dl.khadas.com/.test/vim4/5.15/linux-dtb-amlogic-5.15_1.5.2_arm64.deb
$ wget https://dl.khadas.com/.test/vim4/5.15/linux-image-amlogic-5.15_1.5.2_arm64.deb
$ sudo dpkg -i linux-dtb-amlogic-5.15_1.5.2_arm64.deb linux-image-amlogic-5.15_1.5.2_arm64.deb
$ sync
$ sudo reboot

After reboot, please check again.

Hi @numbqq ,
I’m seeing mixed results, but things have definitely improved.
The npu examples can now run successfully, but on repeated attempts I am getting device resets
Please see the logs below

Thanks
Dave

Can you try this new kernel?

$ wget https://dl.khadas.com/.test/vim4/5.15/1/linux-dtb-amlogic-5.15_1.5.2_arm64.deb
$ wget https://dl.khadas.com/.test/vim4/5.15/1/linux-image-amlogic-5.15_1.5.2_arm64.deb
$ sudo dpkg -i linux-dtb-amlogic-5.15_1.5.2_arm64.deb linux-image-amlogic-5.15_1.5.2_arm64.deb
$ sync
$ sudo reboot

The second set of patches appear to be much more stable.
I have run several thousand iterations on my test script, and I haven’t seen a failure, nor a device reset

1 Like