Python NPU: KSNN v1.0 Release(en)

Khadas Software Neural Network v1.0

Explanation

  1. The relevant code has been migrated from Gitlab to Github.
  2. The conversion tool is converted on the PC, and the operation is run on the SBC.

KSNN Documentation

  1. https://docs.khadas.com/linux/vim3/KSNNUsage.html
  2. https://docs.khadas.com/linux/vim3/KSNNConvert.html
  3. https://docs.khadas.com/linux/vim3/KSNNAPI.html

Get conversion tools and code

  1. KSNN Package
$ git clone https://github.com/khadas/ksnn.git
  1. Model Conversion Tool
$ git clone --recursive https://github.com/khadas/aml_npu_sdk.git

For usage, please refer to related documents and README

Release Notes

  1. Increase the printing information level setting.
  2. Add more exaples
    $ ls ksnn/examples/
    caffe  darknet  keras  onnx  pytorch  tensorflow  tflite
    
  3. Optimize API
  4. Add multi-input support
  5. Fix the bug of failed conversion between pytorch and onnx
  6. Optimize the conversion tool parameters and use uniform parameter names

Demo Video

Future Work

  1. Hybrid quantification
  2. Add more examples
  3. Open source API
4 Likes

The new version is much better. I found something with the tensorflow.

The inference is 8x times faster than darknet, however the post processing after the “nn_inference” in mobilenet-ssd takes forever and end up much slower than darknet.

Any thoughts?

It would be great to have something that runs at 60-70 FPS realtime.

@Vignesh_Raja About the post process with SSD, I use a lot of for loop. This operation is extremely slow in python. You should use numpy functions instead of for loop.

1 Like

@Frank Yes. Here is the problem. I tried replacing with Numpy arrays. But I could not understand the why “NUM RESULTS” is 1917 and the idea behind this code. Any help here would be appreciated!

image

@Vignesh_Raja This SSD model often uses 6 convolutional layers for detection.The shape of then,

1*19*19*512
1*10*10*1024
1*5*5*512
1*3*3*256
1*2*2*256
1*1*1*128

Each convolution goes through two convolutions to get different data.

Output layer 0 is used to save coordinate point information,It will be reshape to 1*x*4
Output layer 1 is used to save category information. It will be reshape to 1*x*91(91 is num class)

So, about concat(output layer 0):

1*19*19*273    -->  1*1083*91
1*10*10*546    -->  1*600*91
1*5*5*546      -->  1*150*91
1*3*3*546      -->  1*54*91
1*2*2*546      -->  1*24*91
1*1*1*546      -->  1*6*91

About concat_1(output layer 1):

1*19*19*12    -->  1*1083*4
1*10*10*24    -->  1*600*4
1*5*5*24      -->  1*150*4
1*3*3*24      -->  1*54*4
1*2*2*24      -->  1*24*4
1*1*1*24      -->  1*6*4

1083 + 600 + 150 + 54 + 24 + 6 = 1917

1 Like

@Vignesh_Raja


Maybe this can help you

1 Like

Thanks for the detailed information. That clears some things.

Looping through 1917*91 is gonna be very time consuming in realtime. Do you have any better suggestions for that post processing part?

@Vignesh_Raja

You can do sigmoid before for loop. numpy can do the same operation on the entire array at once. Then I think numpy.where() function will help you to filter out the subscripts of the score data you need. These data are the last data you need

https://numpy.org/doc/stable/reference/generated/numpy.where.html

1 Like

@Vignesh_Raja When I have time, I will optimize the post-processing of this model, but now, I have other work to do. I will include this in the plan

1 Like

How do I convert a model I tried following this guide but the convert file gives exec format error
https://docs.khadas.com/products/sbc/vim3/npu/ksnn/ksnn-convert
I am using official ubuntu image on VIM3 PRO

Hello @Arjun_Gupta

you need to use the convert tool on Ubuntu x86_64 linux host.

Oh ok it wasn’t mentioned and since I don’t use linux on daily basis I dont have a linux PC so will WSL work?