Request for Full YOLOv8 Segmentation Inference Demo on Khadas VIM3 Using KSNN

Which system do you use? Android, Ubuntu, OOWOW or others?

Ubuntu

Which version of system do you use? Please provide the version of the system here:

Ubuntu 20.04 (Khadas official VIM3 image)

Please describe your issue below:

I want to perform inference using a YOLOv8n segmentation model on the Khadas VIM3 with the KSNN Python API.

Please provide a full demo or reference that shows how to:

  • Prepare and convert a YOLOv8 segmentation model to .nb format.
  • Run inference on the VIM3 using KSNN.
  • Extract and visualize the segmentation masks (not just bounding boxes).

Hello @Yash_Bhagwat

@Louis-Cheng-Liu will help you to check this issue.

Thank you @numbqq @Louis-Cheng-Liu

Hello @Yash_Bhagwat ,

I am very sorry that we do not have YOLOv8 segmentation demo now. We only have YOLOv8 detect demo.

YOLOv8 seg model has three part outputs. First is box and conf information. Second is mask protos. The last is mask coefficients. Box and conf part is the same as YOLOv8 detect. You can refer our YOLOv8 doc to modify the code.
YOLOv8n KSNN Demo - 2 [Khadas Docs]

This is my modify seg code in ultralytics==8.0.86.

    def forward(self, x):
        """Return model outputs and mask coefficients if training, otherwise return outputs and mask coefficients."""
        p = self.proto(x[0])  # mask protos
        bs = p.shape[0]  # batch size

        if torch.onnx.is_in_onnx_export():
            p = p.permute(0, 2, 3, 1).unsqueeze(1)
            mc = [self.cv4[i](x[i]).permute(0, 2, 3, 1).unsqueeze(1) for i in range(self.nl)]
            x = self.detect(self, x)
            return (x, tuple(mc), p) if self.export else (torch.cat([x[0], mc], 1), (x[1], mc, p))
        
        mc = torch.cat([self.cv4[i](x[i]).view(bs, self.nm, -1) for i in range(self.nl)], 2)  # mask coefficients
        
        x = self.detect(self, x)
        if self.training:
            return x, mc, p
        return (torch.cat([x, mc], 1), p) if self.export else (torch.cat([x[0], mc], 1), (x[1], mc, p))

Use the code get model output like this.

Model convert command is the same as YOLOv8 detect.

For inferring on VIM3, preprocess is the same as detect model. For postprocess, box part is the same as detect. The rest part you can refer the official code to do by yourself.

If you meet any problem, you can ask me for help.

Hi @Louis-Cheng-Liu

As suggested, I modified the head.py file in Ultralytics as shown in your example. I then exported the modified .pt model to ONNX format. However, the output I’m getting is still different from yours.

To help diagnose the issue, I’ve uploaded the following to Google Drive:

  • The modified .pt model
  • A screenshot of the model’s output
  • The modified head.py file
  • My installed Python libraries (requirements.txt)
  • The export.py script I used for ONNX conversion

You can access all of these files here:
:backhand_index_pointing_right: Google Drive Link

Please let me know if there’s anything I’ve missed or if further adjustments are needed.

Thanks for your continued support.

Hello @Yash_Bhagwat ,

I have not permissions for your download link.

Hi @Louis-Cheng-Liu this is the link check this out

Hello @Yash_Bhagwat ,

I use your pt model convert. I can convert the right. Have you installed ultralytics by pip install? If yes, check convert script calls your modified codes but not the codes in environment library.

A little mistake about structure in my codes.

    def forward(self, x):
        """Return model outputs and mask coefficients if training, otherwise return outputs and mask coefficients."""
        p = self.proto(x[0])  # mask protos
        bs = p.shape[0]  # batch size

        if torch.onnx.is_in_onnx_export():
-           p = p.permute(0, 2, 3, 1).unsqueeze(1)
-           mc = [self.cv4[i](x[i]).permute(0, 2, 3, 1).unsqueeze(1) for i in range(self.nl)]
+           p = p.permute(0, 2, 3, 1)
+           mc = [self.cv4[i](x[i]).permute(0, 2, 3, 1) for i in range(self.nl)]
            x = self.detect(self, x)
            return (x, tuple(mc), p) if self.export else (torch.cat([x[0], mc], 1), (x[1], mc, p))
        
        mc = torch.cat([self.cv4[i](x[i]).view(bs, self.nm, -1) for i in range(self.nl)], 2)  # mask coefficients
        
        x = self.detect(self, x)
        if self.training:
            return x, mc, p
        return (torch.cat([x, mc], 1), p) if self.export else (torch.cat([x[0], mc], 1), (x[1], mc, p))

Hi @Louis-Cheng-Liu,

Yes, I have installed the Ultralytics package using pip install, and I’ve updated the head.py file as shown in the image above.

The main issue I’m facing now is with the yolo8n-picture.py script for running inference on the VIM3. Could you please provide an updated version of this script, or guide me on the specific changes needed to handle mask coef and mask protos properly?

I’ve attempted several modifications, but haven’t been able to get it working yet.

Thank you for your continued support

Hello @Yash_Bhagwat

You can refer the code in official example.

ultralytics/examples/YOLOv8-Segmentation-ONNXRuntime-Python/main.py

The box part is the same as detect model. Suggest you neglecting the mask part output first. Only decode box to make sure the model can infer right result. And then continue to try decode mask.

Hi @Louis-Cheng-Liu,

I’ve tried decoding the mask coefficients and mask prototypes, but I still haven’t been able to get it working. There’s no documentation on how to add the masking step to the KSNN demo script for the VIM3, and most examples I’ve found online are for ONNX Runtime—not KSNN.

One major challenge is that the output of our YOLOv8 segmentation ONNX model (after modifying head.py) differs from what’s commonly shown online. I’m attaching the model outputs below for reference.

This the output of onnx model after changing the head.py as per mentioned.

This is the output of onnx model what i found in most examples I’ve found online

Here is the link: yolov8_segmentation_python/YOLOv8 segmentation using ONNX runtime.ipynb at main · AndreyGermanov/yolov8_segmentation_python · GitHub

The point where I’m stuck is determining the correct value for LISTSIZE.
Ideally, it should be 176, calculated as:

  • 80 → number of classes
  • 64 → bounding box distances (4 × 16)
  • 32 → mask coefficients

If we set LISTSIZE to 144, we only get bounding boxes because the mask coefficients (needed to generate masks) are missing.

In our case, the model outputs 7 outputs, and the mask coefficients are already separate. However, most online examples show only 2 outputs:

out[0]: bounding boxes, class scores, and mask coefficients combined
out[1]: mask prototypes

For example:

  • output0: [1, 116, 8400] → Detection head
    • 116 = 4 (box xywh) + 80 (class scores) + 32 (mask coefficients)
    • 8400 = total anchor points from three detection scales (80×80 + 40×40 + 20×20 = 6400 + 1600 + 400)
  • output1: [1, 32, 160, 160] → Segmentation mask prototypes

From what I understand based on the official KSNN object detection and pose demo scripts, the steps should be:

  1. Extract the model outputs.
  2. Normalize them.
  3. Use the mask coefficients and prototypes to generate final masks.

The question is — do we need to extract both mask coefficients and mask prototypes for mask generation, or just the mask coefficients?

Could you please share a working inference script for this? I’m running into multiple challenges trying to adapt the detection demo for segmentation.