Convert MobileNetv1 SSD using aml_npu_sdk python version

Hi,
First of all thank you for rolling out the pyhon version of API. Now during conversion, I have to mention the output layer of the model. As, I have used an Object Detection API to train my model, I tried to visualize the model using netron.app .

But it was very difficult for me to see the output layer, unlike other models. I had used this netron.app tool for other models, and I could visualize the model very accurately. Where as for this, I am unable to find the output layer.

Can you please have a look at the model and help me analyze the output layer of this model.

https://drive.google.com/drive/folders/1pq8DTzc_b4CWIf65Y26CIzxQxg_WoL2c?usp=sharing

Thanks in advance! :slight_smile:

@Frank On the other side, apart from the previous problem, I am getting an error when I run the demo.

@Akkisony Please use the least release gnome firmware

@Frank current gnome: https://dl.khadas.com/Firmware/VIM3/Ubuntu/EMMC/VIM3_Ubuntu-gnome-focal_Linux-4.9_arm64_EMMC_V1.0.7-210625.img.xz

I think this is the latest gnome available if I am not wrong. If I already have the latest gnome, Please let me know how can I solve the above error?
Thank in advance! :slight_smile:

@Frank Please can you suggest me any other obect detection model which is easily compatible with NPU, like Yolov3. I heard for SSD model we have to remove the input and output layers to get it compatible with the NPU.
Do you have any sugestions?
Thanks in advance.

@Akkisony I’m sorry about this, I have no other solution to the SSD model

@Frank Any other Object Detection model which is easily compatible and can be converted with NPU SDK? I think it is very much imp for the end users to have some idea which model can be converted and which cannot be converted easily using the NPU SDK.

@Frank Which model is compatible to convert using NPU SDK - Caffe SSD or MobileNet SSD?

@Frank Please can you tell me if MobileNet v1 classification model can be converted using NPU SDK without any complexities unlike SSD model (where I have remove the input and output layer manually)?

@Akkisony Maybe you can try this ? https://dl.khadas.com/test/ssd_mobilenet_v1_coco_2017_11_17.pb

#!/bin/bash

NAME=SSD
ACUITY_PATH=../bin/

convert_caffe=${ACUITY_PATH}convertcaffe
convert_tf=${ACUITY_PATH}convertensorflow
convert_tflite=${ACUITY_PATH}convertflit
convert_darknet=${ACUITY_PATH}convertdarknet
convert_onnx=${ACUITY_PATH}convertonnx
convert_keras=${ACUITY_PATH}convertkeras
convert_pytorch=${ACUITY_PATH}convertpytorch

$convert_tf \
    --tf-pb ~/Downloads/ssd_mobilenet_v1_coco_2017_11_17.pb \
    --inputs FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/mul_1 \
    --input-size-list '300,300,3' \
    --outputs "concat concat_1" \
    --net-output ${NAME}.json \
    --data-output ${NAME}.data 
	
#$convert_caffe \
#    --caffe-model xx.prototxt   \
#	--caffe-blobs xx.caffemodel \
#    --net-output ${NAME}.json \
#    --data-output ${NAME}.data 
	
#$convert_tflite \
#    --tflite-mode  xxxx.tflite \
#    --net-output ${NAME}.json \
#    --data-output ${NAME}.data 

#$convert_darknet \
#    --net-input xxx.cfg \
#	--weight-input xxx.weights \
#    --net-output ${NAME}.json \
#    --data-output ${NAME}.data 
	
#$convert_onnx \
#    --onnx-model  xxx.onnx \
#    --net-output ${NAME}.json \
#    --data-output ${NAME}.data 


#$convert_keras \
#	--keras-model xxx.hdf5 \
#	--net-output ${NAME}.json --data-output ${NAME}.data


#$convert_pytorch --pytorch-model xxxx.pt \
#        --net-output ${NAME}.json \
#        --data-output ${NAME}.data \
#	--input-size-list '1,480,854'
#!/bin/bash

NAME=SSD
ACUITY_PATH=../bin/

tensorzone=${ACUITY_PATH}tensorzonex

#dynamic_fixed_point-i8 asymmetric_affine-u8 dynamic_fixed_point-i16(s905d3 not support point-i16)
$tensorzone \
    --action quantization \
    --dtype float32 \
    --source text \
    --source-file data/validation_tf.txt \
    --channel-mean-value '127.5 127.5 127.5 127.5' \
    --reorder-channel '0 1 2' \
    --model-input ${NAME}.json \
    --model-data ${NAME}.data \
    --model-quantize ${NAME}.quantize \
    --quantized-dtype asymmetric_affine-u8 \
    --quantized-rebuild \
#    --batch-size 2 \
#    --epochs 5

#Note: default batch-size(100),epochs(1) ,the numbers of pictures in data/validation_tf.txt must equal to batch-size*epochs,if you set the epochs >1

#!/bin/bash

NAME=SSD
ACUITY_PATH=../bin/

export_ovxlib=${ACUITY_PATH}ovxgenerator

$export_ovxlib \
    --model-input ${NAME}.json \
    --data-input ${NAME}.data \
    --reorder-channel '0 1 2' \
    --channel-mean-value '127.5 127.5 127.5 127.5' \
    --export-dtype quantized \
    --model-quantize ${NAME}.quantize \
    --optimize VIPNANOQI_PID0X88  \
    --viv-sdk ${ACUITY_PATH}vcmdtools \
    --pack-nbg-unify  \

rm  *.h *.c .project .cproject *.vcxproj *.lib BUILD *.linux *.data *.quantize *.json

mv ../*_nbg_unify nbg_unify_${NAME}

cd nbg_unify_${NAME}

mv network_binary.nb ${NAME}.nb

@Akkisony I finsh it with python api. It work for me . Did you tested it ?

@Frank Ypu mean SSD model?
I still havent tried it yet as I was busy with other tasks. Can you share link to your post and pre processing for ssd model? That would help me alot.
But, I think I will do it when I am done with my current tasks! :slight_smile:
Thnks @Frank :slight_smile:

@Akkisony

from ctypes import *
import numpy as np
import os      
import argparse
import sys
from ksnn.api import KSNN
from ksnn.types import *
import cv2 as cv
import re
import math
import random
import time

INPUT_SIZE = 300

NUM_RESULTS = 1917
NUM_CLASSES = 91

Y_SCALE = 10.0
X_SCALE = 10.0
H_SCALE = 5.0
W_SCALE = 5.0

CLASSES = ("???","person", "bicycle", "car","motorbike ","aeroplane ","bus ","train","truck ","boat","traffic light",
           "fire hydrant","???","stop sign ","parking meter","bench","bird","cat","dog ","horse ","sheep","cow","elephant",
           "bear","zebra ","giraffe","???","backpack","umbrella","???","???","handbag","tie","suitcase","frisbee","skis","snowboard","sports ball","kite",
           "baseball bat","baseball glove","skateboard","surfboard","tennis racket","bottle","???","wine glass","cup","fork","knife ",
           "spoon","bowl","banana","apple","sandwich","orange","broccoli","carrot","hot dog","pizza ","donut","cake","chair","sofa",
           "pottedplant","bed","???","diningtable","???","???","toilet ","???","tvmonitor","laptop  ","mouse    ","remote ","keyboard ","cell phone","microwave ",
           "oven ","toaster","sink","refrigerator ","???","book","clock","vase","scissors ","teddy bear ","hair drier", "toothbrush ")



def sigmoid(x):
	return 1. / (1. + np.exp(-x))

def CalculateOverlap(xmin0, ymin0, xmax0, ymax0, xmin1, ymin1, xmax1, ymax1):
	w = max(0.0, min(xmax0, xmax1) - max(xmin0, xmin1))
	h = max(0.0, min(ymax0, ymax1) - max(ymin0, ymin1))
	i = w * h
	u = (xmax0 - xmin0) * (ymax0 - ymin0) + (xmax1 - xmin1) * (ymax1 - ymin1) - i

	if u <= 0.0:
		return 0.0

	return i / u


def load_box_priors():
	box_priors_ = []
	fp = open('./box_priors.txt', 'r')
	ls = fp.readlines()
	for s in ls:
		aList = re.findall('([-+]?\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)?', s)
		for ss in aList:
			aNum = float((ss[0]+ss[2]))
			box_priors_.append(aNum)
	fp.close()

	box_priors = np.array(box_priors_)
	box_priors = box_priors.reshape(4, NUM_RESULTS)

	return box_priors

def calc_position(vaildCnt, candidateBox, predictions, box_priors):
	for i in range(0, vaildCnt):
		if candidateBox[0][i] == -1:
			continue

		n = candidateBox[0][i]
		ycenter = predictions[0][n][0] / Y_SCALE * box_priors[2][n] + box_priors[0][n]
		xcenter = predictions[0][n][1] / X_SCALE * box_priors[3][n] + box_priors[1][n]
		h = math.exp(predictions[0][n][2] / H_SCALE) * box_priors[2][n]
		w = math.exp(predictions[0][n][3] / W_SCALE) * box_priors[3][n]

		ymin = ycenter - h / 2.
		xmin = xcenter - w / 2.
		ymax = ycenter + h / 2.
		xmax = xcenter + w / 2.

		predictions[0][n][0] = ymin
		predictions[0][n][1] = xmin
		predictions[0][n][2] = ymax
		predictions[0][n][3] = xmax


def nms(vaildCnt, candidateBox, predictions):
	for i in range(0, vaildCnt):
		if candidateBox[0][i] == -1:
			continue

		n = candidateBox[0][i]
		xmin0 = predictions[0][n][1]
		ymin0 = predictions[0][n][0]
		xmax0 = predictions[0][n][3]
		ymax0 = predictions[0][n][2]

		for j in range(i+1, vaildCnt):
			m = candidateBox[0][j]

			if m == -1:
				continue

			xmin1 = predictions[0][m][1]
			ymin1 = predictions[0][m][0]
			xmax1 = predictions[0][m][3]
			ymax1 = predictions[0][m][2]

			iou = CalculateOverlap(xmin0, ymin0, xmax0, ymax0, xmin1, ymin1, xmax1, ymax1)

			if iou >= 0.45:
				candidateBox[0][j] = -1


def draw(img, vaildCnt, candidateBox, predictions, scoreBox):
	for i in range(0, vaildCnt):
		if candidateBox[0][i] == -1:
			continue

		n = candidateBox[0][i]

		xmin = int(max(0.0, min(1.0, predictions[0][n][1])) * img.shape[1])
		ymin = int(max(0.0, min(1.0, predictions[0][n][0])) * img.shape[0])
		xmax = int(max(0.0, min(1.0, predictions[0][n][3])) * img.shape[1])
		ymax = int(max(0.0, min(1.0, predictions[0][n][2])) * img.shape[0])

		print("%d @ (%d, %d) (%d, %d) score=%f" % (candidateBox[1][i], xmin, ymin, xmax, ymax, scoreBox[0][i]))
		cv.rectangle(img, (xmin, ymin), (xmax, ymax), (255, 0, 0), 2)
		cv.putText(img, '{0} {1:.2f}'.format(CLASSES[candidateBox[1][i]], scoreBox[0][i]),
					(xmin, ymin),
					cv.FONT_HERSHEY_SIMPLEX,
					0.6, (0, 0, 255), 2)

	cv.imwrite("out.jpg", img)


if __name__ == "__main__":
	parser = argparse.ArgumentParser()
	parser.add_argument("--nb-file", help="path to nb file")
	parser.add_argument("--so-lib", help="path to so lib")
	parser.add_argument("--input-picture", help="path to input picture")
	args = parser.parse_args()
	if args.nb_file :
		if os.path.exists(args.nb_file) == False:
			sys.exit('nb file \'' + args.nb_file + '\' not exist')
		nbfile = args.nb_file
	else :
		sys.exit("nb file not found !!! Please specify argument: --nb-file /path/to/nb-file")
	if args.input_picture :
		if os.path.exists(args.input_picture) == False:
			sys.exit('input picture \'' + args.input_picture + '\' not exist')
		inputpicturepath = bytes(args.input_picture,encoding='utf-8')
	else :
		sys.exit("input picture not found !!! Please specify argument: --input-picture /path/to/picture")
	if args.so_lib :
		if os.path.exists(args.so_lib) == False:
			sys.exit('so lib \'' + args.so_lib + '\' not exist')
		solib = args.so_lib
	else :
		sys.exit("so lib not found !!! Please specify argument: --so-lib /path/to/lib")

	ssd = KSNN('VIM3')
	print(' |---+ KSNN Version: {} +---| '.format(ssd.get_nn_version()))
	ssd.nn_init(c_lib_p = solib, nb_p = nbfile)
	img = cv.imread( args.input_picture, cv.IMREAD_COLOR )

	start = time.time()
	outputs = ssd.nn_inference(img,platform = 'TENSORFLOW', num=2, reorder='0 1 2', out_format = out_format.OUT_FORMAT_FLOAT32)
	end = time.time()
	print('inference : ', end - start)

	predictions = outputs[0].reshape((1, NUM_RESULTS, 4))
	outputClasses = outputs[1].reshape((1, NUM_RESULTS, NUM_CLASSES))
	candidateBox = np.zeros([2, NUM_RESULTS], dtype=int)
	scoreBox = np.zeros([1, NUM_RESULTS], dtype=float)
	vaildCnt = 0

	box_priors = load_box_priors()

	# Post Process
	# got valid candidate box
	for i in range(0, NUM_RESULTS):
		topClassScore = -1000
		topClassScoreIndex = -1

	# Skip the first catch-all class.
		for j in range(1, NUM_CLASSES):
			score = sigmoid(outputClasses[0][i][j]);

			if score > topClassScore:
				topClassScoreIndex = j
				topClassScore = score

		if topClassScore > 0.4:
			candidateBox[0][vaildCnt] = i
			candidateBox[1][vaildCnt] = topClassScoreIndex
			scoreBox[0][vaildCnt] = topClassScore
			vaildCnt += 1

	# calc position
	calc_position(vaildCnt, candidateBox, predictions, box_priors)

	# NMS
	nms(vaildCnt, candidateBox, predictions)

	# Draw result
	draw(img, vaildCnt, candidateBox, predictions, scoreBox)

I will post it next week, you can also wait for that time to test directly

@Frank Thanks for making Mobilenet SSD work on Python API! :slight_smile: This is a great news for all of us, as this is one of the main model architecture used for embedded edge devices! :smiley:
Thanks again :slight_smile:

@Akkisony But this is an initial version. I used a lot of for loops. You have to know that this is very slow in Python, so I will use numpy to optimize the code after thinking about it, but if you want to port to C++, then this is right You have no influence.

From the reasoning alone, the frame rate can reach above 60fps.

Of course this is not accurate, because there are some operations that I implemented in the code

@Frank For yolov3, I got 10fps using MIPI on NPU.
Is this normal or can it be increased?

@Akkisony This is normal. Using numpy and matplot can speed up the post-processing process. So the speed is slightly faster than the c++ side. Because the code on the c++ side has not been optimized

@Frank Oh. Now I understand why it was faster on the python side compared to C++.
I havent mointored fps on the python side. I will do it this week! :slight_smile:

@Akkisony If you are interested in the optimization of this part, you can understand how numpy is implemented, and you will understand why numpy can improve the running speed so much

@Frank Thank you for the info. I will definitely investigate on this topic and try to learn more! :smiley:

@Frank @numbqq @alcohol Can you please let me know how can I measure the power consumption of the NPU? As NPU are power optimized, I would like to compare it with Google Coral TPU.