Khadas VIM3 custom one-class YOLOv3 inference issue

Frank · October 27, 2021, 10:31am

@Akkisony I’m not sure why this problem occurs, I haven’t seen it yet, what have you modified, can you provide me with a look？

Akkisony · November 2, 2021, 12:21pm

@Frank Can you please let me know how can I measure the accuracy loss of the model after converting them from darknet to tengine format. Because there is always a loss in accuracy after quantization. I need to know if there is any means where I can measure them? @Frank
@alcohol
Thanks

Frank · November 8, 2021, 1:05am

@Akkisony Maybe you can found it in https://github.com/OAID/Tengine/tree/tengine-lite/doc

Akkisony · November 22, 2021, 9:25am

@Frank Hi, can you please explain briefly how do you calculate the ‘top’ and ‘left’ co-ordinates of the bounding box?

if (cls >= 0)
{
box b = dets[i].bbox;
int left = (b.x - b.w / 2.) * frame.cols;
int top = (b.y - b.h / 2.) * frame.rows;
if (top < 30) {
top = 30;
left +=10;
}

#if DEBUG_OPTION
fprintf(stderr, “left = %d,top = %d\n”, left, top);

Does ‘top’ mean that the co-ordinate of the bounding box is ‘top’ pixels below?
Does ‘left’ mean that the co-ordinate of the bounding box is ‘left’ number of pixels away?

I need this small clarification. Thanks in advance.

Frank · November 22, 2021, 9:40am

@Akkisony

You get a 1920x1080 picture from the camera.
Resize to 416x416 for NPU reasoning.
Assuming that an object is recognized, the center point and length and width of the object area will be obtained.
Next, on the 416x416 picture, through the position and length and width of this area, calculate the upper left corner point and w and h of this area.
Finally, according to the initial resize ratio, the data on the original picture is calculated, that is, the information drawn on the picture through opencv.

The code here is step 4

Akkisony · November 22, 2021, 11:02am

@Frank Thank you for the explanation with the diagram.

I have trained a model to detect a single class using yolov3. The detection seems fine on the CPU, however on the NPU, I have some issues with the non max suppression (as I get 2601 detections on a single image). Can you share your expereince, so it would help me to solve this problem?

These below are the parameter:
const int classes = 1;
const float thresh = 0.5;
const float hier_thresh = 0.5;
const float nms = 0.80;
const int numBBoxes = 5;
const int relative = 1;
const char *coco_names[1] = {“battery”};
float biases[18] = {10, 13, 16, 30, 33, 23, 30, 61, 62, 45, 59, 119, 116, 90, 156, 198, 373, 326};

I even increased the nms value to 0.80, yet, I have the same issue. Please shed some input which can help me to solve the issue! Thanks in advance.

Please find the sample output.
Repeat 1 times, thread 1, avg time 85.24 ms, max_time 85.24 ms, min_time 85.24 ms

num_detections,2601
0: 100%
left = 245,top = 30
0: 100%
left = 253,top = 30
0: 100%
left = 269,top = 30
0: 100%
left = 385,top = 35
0: 100%
left = 102,top = 59
0: 100%
left = -23668,top = 51
0: 100%
left = 110,top = 51
0: 100%
left = -1333402262,top = 58
0: 100%

Frank · November 23, 2021, 1:13am

@Akkisony Many people have reported the problem of the single-category yolo model, so I will make a single-category yolo model this week to see where the problem is

Akkisony · November 23, 2021, 9:32am

@Frank Thank you. Please update me if you find a solution.

Frank · November 23, 2021, 9:50am

@Akkisony I successfully converted a hand detection model. I will release it this week or next. You can follow the forum or docs at that time.

Akkisony · November 25, 2021, 11:31am

@Frank Looking forward to the release. I hope now single class detection works using yolov3.
Thank you

Frank · November 26, 2021, 12:52am

@Akkisony Yes. I will release it today.

Frank · November 26, 2021, 6:39am

@Akkisony KSNN: yolov3 hand detection and face detection demo

Akkisony · November 26, 2021, 10:25am

@Frank I was working with 1 class yolov3 tengine model.
However, I still did not get an answer what changes I need to make to get my model running. I am facing issue of multiple detections in a single frame (problem on Non max suppression) though my model preforms better on CPU.

Frank · November 26, 2021, 10:27am

@Akkisony I don’t understand your question very well

Akkisony · November 26, 2021, 10:35am

@Frank
I mean to say, I have trained a model with single class and I did inference on CPU. I was able to get the detections.
Later, I converted the model to NPU compatible(tengine format) using tengine SDK and now when I do inference, I get the result as explained in my post below.

I get predicitions of 2000+ bouding boxes in a single frame - which is wrong.

Frank · November 29, 2021, 12:56am

@Akkisony Can you show you cfg file ?

Akkisony · November 30, 2021, 10:09am

@Frank
https://drive.google.com/drive/folders/14wDTO81KBYYfyz0Jj0aWsCXWFixbHpwS?usp=sharing

please find the cfg file.

Frank · November 30, 2021, 10:15am

@Akkisony

#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=16

you should use bahtch=1 subdivisions=1

Akkisony · November 30, 2021, 10:22am

@Frank Thanks. May be I forgot to change the parameter of cfg file while converting using tengine sdk. I will try and update you.
Thanks again!

Frank · November 30, 2021, 10:24am

@Akkisony OK， if it not work , I will try you cfg and weights

Khadas VIM3 custom one-class YOLOv3 inference issue

@Frank Hi, can you please explain briefly how do you calculate the ‘top’ and ‘left’ co-ordinates of the bounding box?

#if DEBUG_OPTION fprintf(stderr, “left = %d,top = %d\n”, left, top);

#if DEBUG_OPTION
fprintf(stderr, “left = %d,top = %d\n”, left, top);