NNAPI Running Insightface in Android

Hello Khadas Community,
Currently, I try to use NNAPI to inference InsightFace in Android 9 Khadas Vim3.
When I load the InsightFace model using TFLite NNAPI delegate, the NPU will do I/VsiDevice: prepareModel and android.hardware.neuralnetworks@1.1-service-ovx-driver: initialize three times.
And when I run prediction, the NPU do android.hardware.neuralnetworks@1.1-service-ovx-driver: executeBase three times.
So the total time is around 750ms, and the total time I run this TFLite model in CPU is 1000ms.I expect total time using NPU is around 250ms.
So how can make NPU run only one I/VsiDevice: prepareModel ? I think this can make NPU execute faster.
For details, please refer my log file from this link
And here is my InsightFace TFlite model.
Thank you so much!
Bao

@phamhoangbao @jasonl Do you have any suggestion with this ?