Ollama and Khadas NPU's

_brym · June 14, 2024, 4:17pm

Has anyone experimented with running Ollama models on the Khadas boards? I tried just now using the Vim4, and it only runs in CPU-Only mode.

Is it possible to improve the performance of models like llama3 by leveraging the NPU?

Electr1 · June 15, 2024, 1:07pm

@_brym ollama doesn’t have any support for NPUs in any of the khadas boards,
they are more primarily cuda focused.

You can convert the model to it’s format but there are currently some hiccups in running:

VIM3, VIM4 NPUs are more suited to running convolutional neural networks due to architectural reasons. the compiler may also have issues with conversion of some of the operators.

Edge2 can support LLMs through RK-LLM interface:

The toolkit rockchip provides can also run:

The current issue of LLMs on these devices is memory bandwidth bottlenecks, and their primary adaptation to better running convolution operation.

They are much faster for their intended purpose of running machine learning vision algorithms, but for LLMs the closest we have in consumer ARM NPU hardware is the RK3588.