Khadas VIM3 custom one-class YOLOv3 inference issue

@Akkisony I’m not sure why this problem occurs, I haven’t seen it yet, what have you modified, can you provide me with a look?

@Frank Can you please let me know how can I measure the accuracy loss of the model after converting them from darknet to tengine format. Because there is always a loss in accuracy after quantization. I need to know if there is any means where I can measure them? @Frank
@alcohol
Thanks

@Akkisony Maybe you can found it in https://github.com/OAID/Tengine/tree/tengine-lite/doc

@Frank Hi, can you please explain briefly how do you calculate the ‘top’ and ‘left’ co-ordinates of the bounding box?

if (cls >= 0)
{
box b = dets[i].bbox;
int left = (b.x - b.w / 2.) * frame.cols;
int top = (b.y - b.h / 2.) * frame.rows;
if (top < 30) {
top = 30;
left +=10;
}

#if DEBUG_OPTION
fprintf(stderr, “left = %d,top = %d\n”, left, top);

Does ‘top’ mean that the co-ordinate of the bounding box is ‘top’ pixels below?
Does ‘left’ mean that the co-ordinate of the bounding box is ‘left’ number of pixels away?

I need this small clarification. Thanks in advance.

@Akkisony

  1. You get a 1920x1080 picture from the camera.
  2. Resize to 416x416 for NPU reasoning.
  3. Assuming that an object is recognized, the center point and length and width of the object area will be obtained.
  4. Next, on the 416x416 picture, through the position and length and width of this area, calculate the upper left corner point and w and h of this area.
  5. Finally, according to the initial resize ratio, the data on the original picture is calculated, that is, the information drawn on the picture through opencv.

The code here is step 4

1 Like

@Frank Thank you for the explanation with the diagram. :slight_smile:

I have trained a model to detect a single class using yolov3. The detection seems fine on the CPU, however on the NPU, I have some issues with the non max suppression (as I get 2601 detections on a single image). Can you share your expereince, so it would help me to solve this problem?

These below are the parameter:
const int classes = 1;
const float thresh = 0.5;
const float hier_thresh = 0.5;
const float nms = 0.80;
const int numBBoxes = 5;
const int relative = 1;
const char *coco_names[1] = {“battery”};
float biases[18] = {10, 13, 16, 30, 33, 23, 30, 61, 62, 45, 59, 119, 116, 90, 156, 198, 373, 326};

I even increased the nms value to 0.80, yet, I have the same issue. Please shed some input which can help me to solve the issue! Thanks in advance.

Please find the sample output.
Repeat 1 times, thread 1, avg time 85.24 ms, max_time 85.24 ms, min_time 85.24 ms

num_detections,2601
0: 100%
left = 245,top = 30
0: 100%
left = 253,top = 30
0: 100%
left = 269,top = 30
0: 100%
left = 385,top = 35
0: 100%
left = 102,top = 59
0: 100%
left = -23668,top = 51
0: 100%
left = 110,top = 51
0: 100%
left = -1333402262,top = 58
0: 100%

@Akkisony Many people have reported the problem of the single-category yolo model, so I will make a single-category yolo model this week to see where the problem is

@Frank Thank you. Please update me if you find a solution.

@Akkisony I successfully converted a hand detection model. I will release it this week or next. You can follow the forum or docs at that time.

@Frank Looking forward to the release. I hope now single class detection works using yolov3.
Thank you

@Akkisony Yes. I will release it today.

@Akkisony KSNN: yolov3 hand detection and face detection demo

1 Like

@Frank I was working with 1 class yolov3 tengine model.
However, I still did not get an answer what changes I need to make to get my model running. I am facing issue of multiple detections in a single frame (problem on Non max suppression) though my model preforms better on CPU.

@Akkisony I don’t understand your question very well

@Frank
I mean to say, I have trained a model with single class and I did inference on CPU. I was able to get the detections.
Later, I converted the model to NPU compatible(tengine format) using tengine SDK and now when I do inference, I get the result as explained in my post below.

I get predicitions of 2000+ bouding boxes in a single frame - which is wrong.

@Akkisony Can you show you cfg file ?

@Frank
https://drive.google.com/drive/folders/14wDTO81KBYYfyz0Jj0aWsCXWFixbHpwS?usp=sharing

please find the cfg file.

@Akkisony

#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=16

you should use bahtch=1 subdivisions=1

@Frank Thanks. May be I forgot to change the parameter of cfg file while converting using tengine sdk. I will try and update you.
Thanks again!

@Akkisony OK, if it not work , I will try you cfg and weights