NPU Demo and source code

@Frank thanks for the GPU clarification.

@Frank I ran the following commands and no errors came, but when I run my model I don’t see any logs of bandwidth as explained in the docs

  1. rmmod galcore
  2. cd /lib/modules/4.9.241.kernel/drivers/amlogic/npu
  3. sudo insmod galcore.ko gpuProfiler=1 showArgs=1
  4. export VIV_VX_PROFILE=1
  5. export VIV_VX_DEBUG_LEVEL=1
  6. cd {workspace}/aml_npu_demo_binaries/detect_demo_picture/
  7. sudo ./INSTALL
  8. ./detect_demo 0 1080.bmp

But it doesn’t show any logs with the bandwidth in the terminal??
is there anything I am doing wrong??

also, if I try to do rmmod galcore I get the following

@Dhruv_Gaba You need to check the background processes, which processes are using NPU, please terminate these processes

看起来像是显示有问题,如果是远程连接,可以加上参数 ssh -X

@Frank the command worked for me.

I can get the read and write bandwidth information

Can you please tell what does the total, axi and ddr bandwidths mean??
And, how can I calculate the percentage utilization of the npu from this ??

@Frank please answer my question how to add external urls to test with the npu demo

@RKroells I don’t know how to add external urls . I am not familiar with web programming and have never used external urls. The code is open source, you can see all the content

So the vim3 will only work with usb camera?
This cannot be.

Please contact your team to provide thisinformation

It should be common solution, has nothing to do with the NPU, so please google for it yourself.

@RKroells The NPU demo just provides a template for you to use the NPU interface. It has nothing to do with how the application layer processes it. If you want to use a web camera, or other cameras, or other ways to transfer data, you can implement it yourself, which has nothing to do with the NPU.

So what is this forum for?

Where would I find the required information as
Google is not help me.

One of the key selling points of your vim3 is the npu, so adding an external cam your should be included in your documentation, as you can’t expect every user to buy a 100% compatible camera.

@Frank Any suggestions on understanding the NPU Utilization? I can see TOTAL_READ_BANDWIDTH and TOTAL_WRITE_BANDWIDTH printed into the terminal logs for my model.
What is the maximum of this BANDWIDTH for the VIM3 board? so that I can calcualte the percentage?

For E.g. for my model it is ~ 50 MByte (TOTAL_READ_BANDWIDTH+ TOTAL_WRITE_BANDWIDTH)

And in the Khadas Documentation it says to subtract OCR_BANDWIDTH but I am not able too that being printed onto my screen. Please help!

@Dhruv_Gaba In the document, the content of this part has not been updated yet, I will consult the relevant part, and then feedback to you.

In my understanding, total here is the total time, axi is the time to configure the model layer, and ddr time is the time to input data and obtain results. I’m not sure, I will confirm a correct result and feedback to you

1 Like

@Frank I got questions about the capabilities of the NPU. Everything in the NPU demo works well for me. I would like to know

  1. If NPU can handle more processing techniques like semantic segmentation besides the object classification and detection? I could not find anything on this.

  2. Is there any performance bench marking among different neural networks/topologies? Since NPU is 5 TOPS, it would be interesting to know if it can do better or on par with other edge AI devices as well.

  3. How much CPU will be consumed when running operations on NPU?

Thanks in advance.

Sure, @Frank I am eagerly waiting for your response.

@Frank What is the **AXI and DDR clock frequencies maximum (e.g. DDR data transfer max frequency and AXI data transfer frequency maximum) **? I think if we know the maximum we can calculate the percentage utilization of the NPU.
Also, after reading on AXI I found that it is the Data transfer rate between the internal blocks/components on the SOC board and it is called Advanced eXtensible Interface. Which would be used for the processing of the model on the NPU
And, for DDR it’s the read and write of data speed (e.g. Mbytes/second) with the ram of the board. Which would be used to input and output data from the ram memory of the board.
Thanks for the hint, will be waiting for your response.

@Dhruv_Gaba I have new news, I will feed back here

@Vignesh_Raja At present, it is mainly in the CV direction. The input after the model conversion is picture information, so as long as the input is a picture model, it is theoretically possible. The model in the NLP direction is not possible with a high probability

We have the documentation for the supported layers, but there is no description of the relevant performance, you can find it in the SDK

This is related to your model. You can view the background process and see the CPU usage during the running process.

2 Likes

Thanks for the clear explanation. It would be great to see more future developments in the NPU area.

1 Like