Load my model on NPU inside a C++ code


I read about converting models from tensorflow, caffe and etc., to proper code and then compiling it to generate binary executables on Khadas VIM3.

Now, my question is that is it possible that run my converted model inside a c++ code? I want to have a C++ code and inside my code run the model.

My other question is that is it possible that load the model to NPU and run it for different outputs? As I want it to run fast and low overhead it is important to run it efficiently. As my model is constant and I want to run it for different inputs, Is it possible that load my converted model(inside my C++ code) and run it with different inputs in several time slots.


1 Like


Our sample demo source code use C++ to compile, you can follow it to build you own C++ project.

If you need to use different outputs and different inputs, you can extract the output and input layers of the neural network model and self-implement in C++ code

Hi @Frank

Thanks for your reply.

My application runs the model approximately every 150ms. My goal is to initially load the model at NPU and at each call just feed input and run it, instead of loading the model each time. Is it possible to load the model and at every run just feed the input to it? I have done it on another NPU with this method:


void my_run(input){

int main(){
every 150ms:
…prepare my input;
return 0;

In this method, the first run takes about 60 ms and after that, each run takes just 8ms.

I am wondering if is it possible for this NPU to load the model and each time just feed the input to the model?


@Ehsan If you don’t need to run another model , you can do it

Hi @Frank
I try to compile the case code with calng++, but it fails. compiling case code with clang is successful. Is there a way to generate c++ compatible case code or is there a flag for clang++ that could compile it?

I need to use the case code(I think case code is a code that is required to load and run the model) inside a c++ code so using clang for compiling my project is not possible. (This is the reason that I want to find a way to compile case code with clang++).

Best regards,

@Ehsan My demos are all use C++, and the case code can be directly used on C++

Dear @Frank thanks for your reply.

Great! I appreciate it if you could share with me the link of npu sdk that generate c++ case code?


@Ehsan https://github.com/khadas/aml_npu_app

Hi @Frank,

I see the mentioned repository. However, when I use the android toolkit of Khadas NPU, it first converts the desired neural network model and then generates its case code which is in C language(vnn_pre_process.c, vnn_post_process.c, vnn_inceptionv3.c, main.c). When I cross-compile this case code with clang it is OK, but when I compile it with clang++ it fails. Is it possible to handle our desired model with c++ instead of c. for example generate c++ case code instead of C, or another way to use npu for a desired neural network?


@Ehsan Our demo also uses C++, and there is no problem with compiling.

This tool is converted into c code, and the template cannot be modified

Dear @Frank,
Thanks for your reply.

So the solution is:

  1. Convert my desired neural network from TensorFlow(or from other formats) to .nb file(after step 3):

  2. Use this .nb file in my C++ project using instructions in your C++ demos (similar to this):

int main(int argc,char ** argv)
void context;
const char jpeg_path = NULL;
unsigned char rawdata;
aml_config config;
nn_input inData;
nn_output outdata = NULL;
int ret;
int i;
nn_query query;
jpeg_path = (const char )argv[2];
config.modelType = TENSORFLOW;
********* load nbg from memory /
fp = fopen(argv[1],“rb”);
size = (int)ftell(fp);
config.pdata = (char )calloc(1,size);

config.nbgType = NN_NBG_MEMORY;
config.size = size;
** load nbg from file ************/
config.path= (const char )argv[1];
config.nbgType = NN_NBG_FILE;
context = aml_module_create(&config);
rawdata = get_jpeg_rawData(jpeg_path,224,224);
inData.input_index = 0; //this value is index of input,begin from 0
inData.size = 224224

inData.input = rawdata;
inData.input_type = BINARY_RAW_DATA;
ret = aml_module_input_set(context,&inData);
if(rawdata != NULL)
rawdata = NULL;
outdata = (nn_output )aml_module_output_get_simple(context,outconfig);
if(outdata->out[0].param->data_format == NN_BUFFER_FORMAT_FP32)
ret = aml_module_destroy(context);
return ret;

Ignoring case code (pre_process.c, post_process.c model_name.c, vnn_model_name.c, and main.c).

Is it the right way? If not, what do you suggest? If yes, is there a more straightforward way?

Thanks and appreciate your help.

@Ehsan The case code just get you a sample demo. I think it is not suitable for you to use.

You can refer to our demo, we just got the necessary parts from the case code