Request for Full YOLOv8 Segmentation Inference Demo on Khadas VIM3 Using KSNN

Which system do you use? Android, Ubuntu, OOWOW or others?

Ubuntu

Which version of system do you use? Please provide the version of the system here:

Ubuntu 20.04 (Khadas official VIM3 image)

Please describe your issue below:

I want to perform inference using a YOLOv8n segmentation model on the Khadas VIM3 with the KSNN Python API.

Please provide a full demo or reference that shows how to:

  • Prepare and convert a YOLOv8 segmentation model to .nb format.
  • Run inference on the VIM3 using KSNN.
  • Extract and visualize the segmentation masks (not just bounding boxes).

Hello @Yash_Bhagwat

@Louis-Cheng-Liu will help you to check this issue.

Thank you @numbqq @Louis-Cheng-Liu

Hello @Yash_Bhagwat ,

I am very sorry that we do not have YOLOv8 segmentation demo now. We only have YOLOv8 detect demo.

YOLOv8 seg model has three part outputs. First is box and conf information. Second is mask protos. The last is mask coefficients. Box and conf part is the same as YOLOv8 detect. You can refer our YOLOv8 doc to modify the code.
YOLOv8n KSNN Demo - 2 [Khadas Docs]

This is my modify seg code in ultralytics==8.0.86.

    def forward(self, x):
        """Return model outputs and mask coefficients if training, otherwise return outputs and mask coefficients."""
        p = self.proto(x[0])  # mask protos
        bs = p.shape[0]  # batch size

        if torch.onnx.is_in_onnx_export():
            p = p.permute(0, 2, 3, 1).unsqueeze(1)
            mc = [self.cv4[i](x[i]).permute(0, 2, 3, 1).unsqueeze(1) for i in range(self.nl)]
            x = self.detect(self, x)
            return (x, tuple(mc), p) if self.export else (torch.cat([x[0], mc], 1), (x[1], mc, p))
        
        mc = torch.cat([self.cv4[i](x[i]).view(bs, self.nm, -1) for i in range(self.nl)], 2)  # mask coefficients
        
        x = self.detect(self, x)
        if self.training:
            return x, mc, p
        return (torch.cat([x, mc], 1), p) if self.export else (torch.cat([x[0], mc], 1), (x[1], mc, p))

Use the code get model output like this.

Model convert command is the same as YOLOv8 detect.

For inferring on VIM3, preprocess is the same as detect model. For postprocess, box part is the same as detect. The rest part you can refer the official code to do by yourself.

If you meet any problem, you can ask me for help.

Hi @Louis-Cheng-Liu

As suggested, I modified the head.py file in Ultralytics as shown in your example. I then exported the modified .pt model to ONNX format. However, the output I’m getting is still different from yours.

To help diagnose the issue, I’ve uploaded the following to Google Drive:

  • The modified .pt model
  • A screenshot of the model’s output
  • The modified head.py file
  • My installed Python libraries (requirements.txt)
  • The export.py script I used for ONNX conversion

You can access all of these files here:
:backhand_index_pointing_right: Google Drive Link

Please let me know if there’s anything I’ve missed or if further adjustments are needed.

Thanks for your continued support.

Hello @Yash_Bhagwat ,

I have not permissions for your download link.

Hi @Louis-Cheng-Liu this is the link check this out

Hello @Yash_Bhagwat ,

I use your pt model convert. I can convert the right. Have you installed ultralytics by pip install? If yes, check convert script calls your modified codes but not the codes in environment library.

A little mistake about structure in my codes.

    def forward(self, x):
        """Return model outputs and mask coefficients if training, otherwise return outputs and mask coefficients."""
        p = self.proto(x[0])  # mask protos
        bs = p.shape[0]  # batch size

        if torch.onnx.is_in_onnx_export():
-           p = p.permute(0, 2, 3, 1).unsqueeze(1)
-           mc = [self.cv4[i](x[i]).permute(0, 2, 3, 1).unsqueeze(1) for i in range(self.nl)]
+           p = p.permute(0, 2, 3, 1)
+           mc = [self.cv4[i](x[i]).permute(0, 2, 3, 1) for i in range(self.nl)]
            x = self.detect(self, x)
            return (x, tuple(mc), p) if self.export else (torch.cat([x[0], mc], 1), (x[1], mc, p))
        
        mc = torch.cat([self.cv4[i](x[i]).view(bs, self.nm, -1) for i in range(self.nl)], 2)  # mask coefficients
        
        x = self.detect(self, x)
        if self.training:
            return x, mc, p
        return (torch.cat([x, mc], 1), p) if self.export else (torch.cat([x[0], mc], 1), (x[1], mc, p))

Hi @Louis-Cheng-Liu,

Yes, I have installed the Ultralytics package using pip install, and I’ve updated the head.py file as shown in the image above.

The main issue I’m facing now is with the yolo8n-picture.py script for running inference on the VIM3. Could you please provide an updated version of this script, or guide me on the specific changes needed to handle mask coef and mask protos properly?

I’ve attempted several modifications, but haven’t been able to get it working yet.

Thank you for your continued support

Hello @Yash_Bhagwat

You can refer the code in official example.

ultralytics/examples/YOLOv8-Segmentation-ONNXRuntime-Python/main.py

The box part is the same as detect model. Suggest you neglecting the mask part output first. Only decode box to make sure the model can infer right result. And then continue to try decode mask.