Python NPU: KSNN v1.0 Release(en)

Khadas Software Neural Network v1.0


  1. The relevant code has been migrated from Gitlab to Github.
  2. The conversion tool is converted on the PC, and the operation is run on the SBC.

KSNN Documentation

  1. KSNN Usage | Khadas Documentation
  2. Instructions for KSNN conversion tool | Khadas Documentation
  3. KSNN API Documentation | Khadas Documentation

Get conversion tools and code

  1. KSNN Package
$ git clone
  1. Model Conversion Tool
$ git clone --recursive

For usage, please refer to related documents and README

Release Notes

  1. Increase the printing information level setting.
  2. Add more exaples
    $ ls ksnn/examples/
    caffe  darknet  keras  onnx  pytorch  tensorflow  tflite
  3. Optimize API
  4. Add multi-input support
  5. Fix the bug of failed conversion between pytorch and onnx
  6. Optimize the conversion tool parameters and use uniform parameter names

Demo Video

Future Work

  1. Hybrid quantification
  2. Add more examples
  3. Open source API

The new version is much better. I found something with the tensorflow.

The inference is 8x times faster than darknet, however the post processing after the “nn_inference” in mobilenet-ssd takes forever and end up much slower than darknet.

Any thoughts?

It would be great to have something that runs at 60-70 FPS realtime.

@Vignesh_Raja About the post process with SSD, I use a lot of for loop. This operation is extremely slow in python. You should use numpy functions instead of for loop.

1 Like

@Frank Yes. Here is the problem. I tried replacing with Numpy arrays. But I could not understand the why “NUM RESULTS” is 1917 and the idea behind this code. Any help here would be appreciated!


@Vignesh_Raja This SSD model often uses 6 convolutional layers for detection.The shape of then,


Each convolution goes through two convolutions to get different data.

Output layer 0 is used to save coordinate point information,It will be reshape to 1*x*4
Output layer 1 is used to save category information. It will be reshape to 1*x*91(91 is num class)

So, about concat(output layer 0):

1*19*19*273    -->  1*1083*91
1*10*10*546    -->  1*600*91
1*5*5*546      -->  1*150*91
1*3*3*546      -->  1*54*91
1*2*2*546      -->  1*24*91
1*1*1*546      -->  1*6*91

About concat_1(output layer 1):

1*19*19*12    -->  1*1083*4
1*10*10*24    -->  1*600*4
1*5*5*24      -->  1*150*4
1*3*3*24      -->  1*54*4
1*2*2*24      -->  1*24*4
1*1*1*24      -->  1*6*4

1083 + 600 + 150 + 54 + 24 + 6 = 1917

1 Like


Maybe this can help you

1 Like

Thanks for the detailed information. That clears some things.

Looping through 1917*91 is gonna be very time consuming in realtime. Do you have any better suggestions for that post processing part?


You can do sigmoid before for loop. numpy can do the same operation on the entire array at once. Then I think numpy.where() function will help you to filter out the subscripts of the score data you need. These data are the last data you need

1 Like

@Vignesh_Raja When I have time, I will optimize the post-processing of this model, but now, I have other work to do. I will include this in the plan

1 Like