Load my model on NPU inside a C++ code


I read about converting models from tensorflow, caffe and etc., to proper code and then compiling it to generate binary executables on Khadas VIM3.

Now, my question is that is it possible that run my converted model inside a c++ code? I want to have a C++ code and inside my code run the model.

My other question is that is it possible that load the model to NPU and run it for different outputs? As I want it to run fast and low overhead it is important to run it efficiently. As my model is constant and I want to run it for different inputs, Is it possible that load my converted model(inside my C++ code) and run it with different inputs in several time slots.


1 Like


Our sample demo source code use C++ to compile, you can follow it to build you own C++ project.

If you need to use different outputs and different inputs, you can extract the output and input layers of the neural network model and self-implement in C++ code

Hi @Frank

Thanks for your reply.

My application runs the model approximately every 150ms. My goal is to initially load the model at NPU and at each call just feed input and run it, instead of loading the model each time. Is it possible to load the model and at every run just feed the input to it? I have done it on another NPU with this method:


void my_run(input){

int main(){
every 150ms:
…prepare my input;
return 0;

In this method, the first run takes about 60 ms and after that, each run takes just 8ms.

I am wondering if is it possible for this NPU to load the model and each time just feed the input to the model?


@Ehsan If you don’t need to run another model , you can do it