YoloV4 demo release

Frank · September 18, 2020, 10:22am

Release repository address
repository: https://gitlab.com/khadas/aml_npu_demo_binaries
How To Use

follow the readme to use it , try with yolov4,

 $ cd detect_demo
 $ sudo ./INSTALL
 $ ./detect_demo_x11 /dev/videoX 4
 $ ./detect_demo_mipi_fb /dev/videoX 4
 $ ./detect_demo_uvc_fb /dev/videoX 4

The source code
repository: https://gitlab.com/khadas/aml_npu_app
path to yolov4 : https://gitlab.com/khadas/aml_npu_app/-/tree/master/DDK_6.3.3.4/detect_library/model_code/detect_yolo_v4
How To train
yolov4-leaky-416.cfg is the relase version
yolov4-custom-voc.cfg use for voc data
yolov4-pacsp-s.cfg is a reduced model . The frame rate is higher, but the accuracy is reduced .
Description

You need to use the least release firwamre or use update & upgrade to upgrade you system
How to get a higher frame rate:
- Use a smaller detection frame, like 320x320.
- Reduce training classes, like voc data.
  But these two methods will reduce the accuracy while increasing the frame rate
We will evaluate the performance of yolov4Tiny and maybe release Tiny version in the future
This is the initial version, the parameter adjustment and performance are not an ideal result, yolov4 has not been converted well after the conversion tool, resulting in a lower frame rate, and will find ways to optimize in the future

others
If you have run the cfg file of yolov4 with better effect on VIM3, welcome to communicate here, or send me a private message.

fguerzoni · September 20, 2020, 1:35am

@Frank
thank you for yolo-v4 support.

On a fresh new built today vim3 image with fenix,
#VERSION: 0.9.4
#KHADAS_BOARD=VIM3
#VENDOR=Amlogic
#CHIP=A311D
#LINUX=4.9
#UBOOT=2015.01
#DISTRIBUTION=Ubuntu
#DISTRIB_RELEASE=focal
#DISTRIB_TYPE=server
#DISTRIB_ARCH=arm64
#INSTALL_TYPE=SD-USB

I’ve troubles running the yolo-v4 detection. I don’t have any camera attached to the board so, to work on pictures, I built the ‘Files · master · khadas / aml_npu_app · GitLab’ pointing to ‘aml_npu_sdk_6.4.0.10/linux_sdk/linux_sdk_6.4.0.10’. I issued also the INSTALL command from your binaries to install libraries.
What I get is:

./bin_r/detect_demo 4 1080p.bmp 
init_fb...
1920x1080, 32bpp
W Detect_api:[det_set_log_level:19]Set log level=1
W Detect_api:[det_set_log_level:21]output_format not support Imperfect, default to DET_LOG_TERMINAL
W Detect_api:[det_set_log_level:26]Not exist VSI_NN_LOG_LEVEL, Setenv set_vsi_log_error_level
det_set_log_config Debug
Error: yolo_v4.c: model_create at 31
E Detect_api:[det_set_model:225]Model_create fail, file_path=nn_data, dev_type=1
det_set_model fail. ret=-4

The same binary works fine with yolo-v3 (det = 2)

I also tried my test application where I did the shrink between app and libs in just one binary for ease of use. I got same error on model creation with more debug details:

./cmake-build-debug-vim3/TestDetectDemoFull 1080p.bmp
Read Cpuinfo: processor : 0
BogoMIPS : 48.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4

processor : 1
BogoMIPS : 48.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4

processor : 2
BogoMIPS : 48.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd09
CPU revision : 2

processor : 3
BogoMIPS : 48.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd09
CPU revision : 2

processor : 4
BogoMIPS : 48.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd09
CPU revision : 2

processor : 5
BogoMIPS : 48.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd09
CPU revision : 2

Serial : 290b1000010e0d00000d35374d4d4e50
Hardware : Khadas VIM3

290 index=1096
set_dev_type REVB and setenv 1
Start create Model, data_file_path=/home/khadas/workspace/TestDetectDemoFull/nn_data
D [setup_node:368]Setup node id[0] uid[0] op[NBG]
D [print_tensor:136]in(0) : id[ 0] vtl[0] const[0] shape[ 416, 416, 3, 1 ] fmt[i8 ] qnt[DFP fl= 7]
D [print_tensor:136]out(0): id[ 1] vtl[0] const[0] shape[ 52, 52, 255, 1 ] fmt[i8 ] qnt[DFP fl= 1]
D [print_tensor:136]out(1): id[ 2] vtl[0] const[0] shape[ 26, 26, 255, 1 ] fmt[i8 ] qnt[DFP fl= 2]
D [print_tensor:136]out(2): id[ 3] vtl[0] const[0] shape[ 13, 13, 255, 1 ] fmt[i8 ] qnt[DFP fl= 2]
D [optimize_node:312]Backward optimize neural network
D [optimize_node:319]Forward optimize neural network
I [compute_node:261]Create vx node
Error: /home/khadas/workspace/TestDetectDemoFull/src/yolo_v4.cpp: model_create at 31
*** stack smashing detected ***: terminated
Aborted

I noticed there’s a difference in size between libovxlib.so installed on /usr/lib and the one available on SDK-6.4.0.10

What do you think?
Thanks and regards
F

Frank · September 21, 2020, 8:47am

@fguerzoni Maybe you can try detect_demo ? I hav test it . detect_demo_pictures have some errors , I will fix it .

fguerzoni · September 21, 2020, 10:31am

@Frank

the issue was caused by yolov4_88.nb I found in the folder:
aml_npu_app/DDK_6.3.3.4/detect_library/model_code/detect_yolo_v4/nn_data
which is just 8.4MB
I noticed that it’s different from the one you made available under detect_demo_pictures binaries, which is over 64MB.
Replacing that file did the trick.
My own test program still does not work, but I think I’ll quick fix it.
Now I can start to evaluate the yolo v4 detection results quality.
Thank you very much for the support
Regards
F

Frank · September 21, 2020, 10:36am

@fguerzoni What errors are reported in your own test program, if you can post the log, maybe it can help you solve it . I suggest you change it on the basis of my demo, not on the basis of yolov3. The data processing method has been changed.

fguerzoni · September 21, 2020, 10:50am

@Frank

Thank you for the support,
I posted the log in my first message and ends with:

D [optimize_node:312]Backward optimize neural network
D [optimize_node:319]Forward optimize neural network
I [compute_node:261]Create vx node
Error: /home/khadas/workspace/TestDetectDemoFull/src/yolo_v4.cpp: model_create at 31
*** stack smashing detected ***: terminated
Aborted

I basically merged the app, the detect library and the yolo library just to avoid inconsistencies between the many versions I build and because I just need only the yolo detection.
I think the fix will be an easy task thanks to the detect_library sources you provided.
I’ll eventually get back to you in case of troubles.
Regards
F

Frank · September 21, 2020, 10:54am

@fguerzoni OK,I will push new code after fixup the errors with detect_demo_picturs

fguerzoni · September 22, 2020, 9:16am

@Frank
after fixing a trivial issue related to a char buffer that was to short to contain the full path of the model (yes, in dev I prefer to point to the full path to avoid any inconsistency) it works fine.

First impressions:

the model you provided runs at about 4 FPS (250ms each frame) vs the 9 FPS of yolo-v3
the detection of the categories is more robust than yolo-v3
the bounding boxes are quite larger than the bbs returned by yolo-v3. I wonder why. In any case the targets are centered inside the bounding boxes so it’s possible to crop them.

Next step is to try converting voc model, as you suggested, to get results quicker.

Frank · September 22, 2020, 9:21am

@fguerzoni Originally, my expectation was that the result of the conversion was around 6FPS, but in the end it was a bit lower than expected. I initially thought that the conversion tool did not support the new layer of yolo very well.Hope that the new version of the tool will be improved in the future. You can try other model configuration files.

github.com

yan-wyb/darknet/blob/master/yan/cfg/yolov4-pacsp-s.cfg

[net]
# Testing
batch=1
subdivisions=1
# Training
#batch=64
#subdivisions=8
width=320
height=320
channels=3
momentum=0.949
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.00261
burn_in=1000
max_batches = 500500

This file has been truncated. show original

This simplified model can reach 10 frames, but the effect is not good.
Maybe the tiny model would be a better choice

fguerzoni · September 22, 2020, 9:26am

@Frank
Thank you, I’ll try it.

I’ve always considered the tiny version neither robust nor reliable.
I’ll post my results when ready.
Regards

Frank · September 22, 2020, 9:45am

@fguerzoni I have the same idea, so the last release is the standard version