Realtime Text Recognition with VIM4 and IMX415 MIPI Camera

Does that mean when i train my model, i need to put the input shape as 960 width and 544 height?

Hi Louis,

Is it possible if i send you our training set?

Hello @JietChoo ,

About the model mistake, the problem is the model need more higher precision. The solution is add parameter disable-per-channel when convert rec model.

--model-name ppocr_rec 
--model-type onnx 
--model ./ppocr_rec.onnx 
--inputs "x" 
--input-shapes  "3,48,320" 
--dtypes "float32" 
--quantize-dtype int16 
--outdir onnx_output 
--channel-mean-value "127.5,127.5,127.5,128" 
--source-file ocr_rec_dataset.txt 
--iterations 500 
--batch-size 1 
--kboard VIM4 
--inference-input-type "float32" 
--inference-output-type "float32" 
--disable-per-channel False

You have better use 960×544 or multiples thereof training images. It can get the model more precise in this input size.

We have not train the model. We use the official model directly without training. We only have a quantifying images.

Hi Louis

I have tried this and have this error

 |---+ KSNN Version: v1.4.1 +---| 
Start init neural network ...
adla usr space 1.4.0.2
adla usr space 1.4.0.2
Done.
[API:aml_v4l2src_connect:271]Enter, devname : /dev/media0
func_name: aml_src_get_cam_method
initialize func addr: 0x7f889f167c
finalize func addr: 0x7f889f1948
start func addr: 0x7f889f199c
stop func addr: 0x7f889f1a4c
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camsrc.c:235:carm_src_is_usb]carm_src_is_usb:info(id 1, type 0x20000, name isp-csiphy), ret 0
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camsrc.c:235:carm_src_is_usb]carm_src_is_usb:info(id 4, type 0x20000, name isp-adapter), ret 0
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camsrc.c:235:carm_src_is_usb]carm_src_is_usb:info(id 7, type 0x20000, name isp-test-pattern-gen), ret 0
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camsrc.c:235:carm_src_is_usb]carm_src_is_usb:info(id 9, type 0x20000, name isp-core), ret 0
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camsrc.c:235:carm_src_is_usb]carm_src_is_usb:info(id 20, type 0x20001, name imx415-0), ret 0
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camsrc.c:235:carm_src_is_usb]carm_src_is_usb:info(id 24, type 0x10001, name isp-ddr-input), ret 0
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camsrc.c:235:carm_src_is_usb]carm_src_is_usb:info(id 28, type 0x10001, name isp-param), ret 0
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camsrc.c:235:carm_src_is_usb]carm_src_is_usb:info(id 32, type 0x10001, name isp-stats), ret 0
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camsrc.c:235:carm_src_is_usb]carm_src_is_usb:info(id 36, type 0x10001, name isp-output0), ret 0
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camsrc.c:235:carm_src_is_usb]carm_src_is_usb:info(id 40, type 0x10001, name isp-output1), ret 0
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camsrc.c:235:carm_src_is_usb]carm_src_is_usb:info(id 44, type 0x10001, name isp-output2), ret 0
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camsrc.c:235:carm_src_is_usb]carm_src_is_usb:info(id 48, type 0x10001, name isp-output3), ret 0
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camsrc.c:235:carm_src_is_usb]carm_src_is_usb:info(id 52, type 0x10001, name isp-raw), ret 0
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camsrc.c:235:carm_src_is_usb]carm_src_is_usb:info(id -2147483596, type 0x0, name ), ret -1
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camsrc.c:239:carm_src_is_usb]carm_src_is_usb:error Invalid argument
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camsrc.c:79:cam_src_select_socket]select socket:/tmp/camctrl0.socket
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camsrc.c:103:cam_src_obtain_devname]fork ok, pid:7304
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camsrc.c:103:cam_src_obtain_devname]fork ok, pid:0
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camsrc.c:107:cam_src_obtain_devname]execl /usr/bin/camctrl
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camctrl.cc:925:main][camctrl.cc:main:925]

[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camctrl.cc:889:parse_opt]media device name: /dev/media0
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camctrl.cc:898:parse_opt]Server socket: /tmp/camctrl0.socket
Opening media device /dev/media0
Enumerating entities
Found 13 entities
Enumerating pads and links
mediaStreamInit[35]: mediaStreamInit ++. 

mediaStreamInit[39]: media devnode: /dev/media0
mediaStreamInit[56]: ent 0, name isp-csiphy 
mediaStreamInit[56]: ent 1, name isp-adapter 
mediaStreamInit[56]: ent 2, name isp-test-pattern-gen 
mediaStreamInit[56]: ent 3, name isp-core 
mediaStreamInit[56]: ent 4, name imx415-0 
mediaStreamInit[56]: ent 5, name isp-ddr-input 
mediaStreamInit[56]: ent 6, name isp-param 
mediaStreamInit[56]: ent 7, name isp-stats 
mediaStreamInit[56]: ent 8, name isp-output0 
mediaStreamInit[56]: ent 9, name isp-output1 
mediaStreamInit[56]: ent 10, name isp-output2 
mediaStreamInit[56]: ent 11, name isp-output3 
mediaStreamInit[56]: ent 12, name isp-raw 
mediaStreamInit[96]: get  lens_ent fail
mediaLog[30]: v4l2_video_open: open subdev device node /dev/video63 ok, fd 10 
 
mediaStreamInit[151]: mediaStreamInit open video0 fd 10 
mediaLog[30]: v4l2_video_open: open subdev device node /dev/video64 ok, fd 11 
 
mediaStreamInit[155]: mediaStreamInit open video1 fd 11 
mediaLog[30]: v4l2_video_open: open subdev device node /dev/video65 ok, fd 12 
 
mediaStreamInit[159]: mediaStreamInit open video2 fd 12 
mediaLog[30]: v4l2_video_open: open subdev device node /dev/video66 ok, fd 13 
 
mediaStreamInit[163]: mediaStreamInit open video3 fd 13 
mediaStreamInit[172]: media stream init success
fetchPipeMaxResolution[27]: find matched sensor configs 3840x2160
media_set_wdrMode[420]: media_set_wdrMode ++ wdr_mode : 0 

media_set_wdrMode[444]: media_set_wdrMode success --

media_set_wdrMode[420]: media_set_wdrMode ++ wdr_mode : 4 

media_set_wdrMode[444]: media_set_wdrMode success --

[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camctrl.cc:374:link_and_activate_subdev]link and activate subdev successfully
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camctrl.cc:407:media_stream_config]config media stream successfully
mediaLog[30]: v4l2_video_open: open subdev device node /dev/video62 ok, fd 18 
 
mediaLog[30]: VIDIOC_QUERYCAP: success 
 
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camctrl.cc:172:check_capability]entity[isp-stats] -> video[/dev/video62], cap.driver:aml-camera, capabilities:0x85200001, device_caps:0x5200001
mediaLog[30]: v4l2_video_open: open subdev device node /dev/video61 ok, fd 19 
 
mediaLog[30]: VIDIOC_QUERYCAP: success 
 
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camctrl.cc:172:check_capability]entity[isp-param] -> video[/dev/video61], cap.driver:aml-camera, capabilities:0x85200001, device_caps:0x5200001
mediaLog[30]: set format ok, ret 0.
 
mediaLog[30]: set format ok, ret 0.
 
mediaLog[30]:  request buf ok
 
mediaLog[30]:  request buf ok
 
mediaLog[30]: query buffer success 
 
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camctrl.cc:546:isp_alg_param_init]isp stats query buffer, length: 262144, offset: 0
mediaLog[30]: query buffer success 
 
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camctrl.cc:546:isp_alg_param_init]isp stats query buffer, length: 262144, offset: 262144
mediaLog[30]: query buffer success 
 
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camctrl.cc:546:isp_alg_param_init]isp stats query buffer, length: 262144, offset: 524288
mediaLog[30]: query buffer success 
 
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camctrl.cc:546:isp_alg_param_init]isp stats query buffer, length: 262144, offset: 786432
mediaLog[30]: query buffer success 
 
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camctrl.cc:568:isp_alg_param_init]isp param query buffer, length: 262144, offset: 0
alg2User func addr: 0x7fb26c8ed8
alg2Kernel func addr: 0x7fb26c8f08
algEnable func addr: 0x7fb26c8d70
algDisable func addr: 0x7fb26c8e90
algFwInterface func addr: 0x7fb26c9008
matchLensConfig[43]: LKK: fail to match lensConfig

cmos_get_ae_default_imx415[65]: cmos_get_ae_default

cmos_get_ae_default_imx415[116]: cmos_get_ae_default++++++

cmos_get_ae_default_imx415[65]: cmos_get_ae_default

cmos_get_ae_default_imx415[116]: cmos_get_ae_default++++++

aisp_enable[984]: tuning device not exist!

aisp_enable[987]: 3a commit b56e430e80b995bb88cecff66a3a6fc17abda2c7 

cmos_inttime_calc_table_imx415[150]: cmos_inttime_calc_table: 16351232, 0, 0, 0

mediaLog[30]: streamon   success 
 
mediaLog[30]: streamon   success 
 
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camctrl.cc:650:isp_alg_param_init]Finish initializing amlgorithm parameter ...
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camctrl.cc:971:main]UNIX domain socket bound
[2025-02-14 07:04:11]  DEBUG  [amlv4l2src camctrl.cc:977:main]Accepting connections ...
[2025-02-14 07:04:12]  DEBUG  [amlv4l2src camsrc.c:122:cam_src_obtain_devname]udp_sock_create
[2025-02-14 07:04:12]  DEBUG  [amlv4l2src common/common.c:70:udp_sock_create][99219362][/tmp/camctrl0.socket] start connect
[2025-02-14 07:04:12]  DEBUG  [amlv4l2src camsrc.c:124:cam_src_obtain_devname]udp_sock_recv
[2025-02-14 07:04:12]  DEBUG  [amlv4l2src camctrl.cc:985:main]connected_sockfd: 21
[2025-02-14 07:04:12]  DEBUG  [amlv4l2src camctrl.cc:989:main]video_dev_name: /dev/video63
[2025-02-14 07:04:12]  DEBUG  [amlv4l2src camsrc.c:282:cam_src_initialize]obtain devname: /dev/video63
devname : /dev/video63
driver : aml-camera
device : Amlogic Camera Card
bus_info : platform:aml-cam
version : 331657
error tvin-port use -1 
[API:aml_v4l2src_streamon:373]Enter
[2025-02-14 07:04:12]  DEBUG  [amlv4l2src camsrc.c:298:cam_src_start]start ...
[API:aml_v4l2src_streamon:376]Exit
[2025-02-14 07:04:12]  DEBUG  [amlv4l2src camctrl.cc:860:process_socket_thread]receive streamon notification
cmos_again_calc_table_imx415[125]: cmos_again_calc_table: 1836, 1836

cmos_inttime_calc_table_imx415[150]: cmos_inttime_calc_table: 11046912, 11046912, 11046912, 11046912

cmos_again_calc_table_imx415[125]: cmos_again_calc_table: 0, 0

cmos_inttime_calc_table_imx415[150]: cmos_inttime_calc_table: 14512128, 14512128, 14512128, 14512128

cmos_inttime_calc_table_imx415[150]: cmos_inttime_calc_table: 13778944, 13778944, 13778944, 13778944

cmos_inttime_calc_table_imx415[150]: cmos_inttime_calc_table: 13668352, 13668352, 13668352, 13668352

cmos_inttime_calc_table_imx415[150]: cmos_inttime_calc_table: 13524992, 13524992, 13524992, 13524992

cmos_inttime_calc_table_imx415[150]: cmos_inttime_calc_table: 13520896, 13520896, 13520896, 13520896

[ WARN:0@1.475] global ./modules/videoio/src/cap_gstreamer.cpp (1405) open OpenCV | GStreamer warning: Cannot query video position: status=0, value=-1, duration=-1
cmos_inttime_calc_table_imx415[150]: cmos_inttime_calc_table: 13516800, 13516800, 13516800, 13516800

cmos_inttime_calc_table_imx415[150]: cmos_inttime_calc_table: 13512704, 13512704, 13512704, 13512704

[2025-02-14 07:04:12]  DEBUG  [amlv4l2src camctrl.cc:914:Signalhandler]enter camctrl Signalhandler: 15
[2025-02-14 07:04:12]  DEBUG  [amlv4l2src camctrl.cc:917:Signalhandler]exit camctrl Signalhandler: 15
Segmentation fault

My Code
ppocr-cap-960-544.py

import numpy as np
import os
import urllib.request
import argparse
import sys
import math
from ksnn.api import KSNN
from ksnn.types import *
import cv2 as cv
import time
from postprocess import ocr_det_postprocess, ocr_rec_postprocess
from PIL import Image, ImageDraw, ImageFont

det_mean = [123.675, 116.28, 103.53]
det_var = [255 * 0.229, 255 * 0.224, 255 * 0.225]
rec_mean = 127.5
rec_var = 128

det_input_size = (544, 960) # (model height, model width)
rec_input_size = ( 48, 320) # (model height, model width)
# rec_output_size = (40, 6625)
rec_output_size = (40, 97)

font = ImageFont.truetype("./data/simfang.ttf", 20)

def draw(image, boxes):
    draw_img = Image.fromarray(image)
    draw = ImageDraw.Draw(draw_img)
    for box in boxes:
        x1, y1, x2, y2, score, text = box
        left = max(0, np.floor(x1 + 0.5).astype(int))
        top = max(0, np.floor(y1 + 0.5).astype(int))
        right = min(image.shape[1], np.floor(x2 + 0.5).astype(int))
        bottom = min(image.shape[0], np.floor(y2 + 0.5).astype(int))

        draw.rectangle((left, top, right, bottom), outline=(0, 255, 0), width=2)
        draw.text((left, top - 20), text, font=font, fill=(0, 255, 0))
    
    return draw_img, np.array(draw_img)


if __name__ == '__main__':

    parser = argparse.ArgumentParser()
    parser.add_argument("--det_library", help="Path to C static library file for ppocr_det")
    parser.add_argument("--det_model", help="Path to nbg file for ppocr_det")
    parser.add_argument("--rec_library", help="Path to C static library file for ppocr_rec")
    parser.add_argument("--rec_model", help="Path to nbg file for ppocr_rec")
    parser.add_argument("--device", help="the number for video device")
    parser.add_argument("--level", help="Information printer level: 0/1/2")

    args = parser.parse_args()
    
    if args.det_model :
        if os.path.exists(args.det_model) == False:
            sys.exit('ppocr_det Model \'{}\' not exist'.format(args.det_model))
        det_model = args.det_model
    else :
        sys.exit("NBG file not found !!! Please use format: --det_model")
    if args.rec_model :
        if os.path.exists(args.rec_model) == False:
            sys.exit('ppocr_det Model \'{}\' not exist'.format(args.rec_model))
        rec_model = args.rec_model
    else :
        sys.exit("NBG file not found !!! Please use format: --rec_model")
    
    if args.device :
        cap_num = args.device
    else :
        sys.exit("video device not found !!! Please use format :--device ")
    
    if args.det_library :
        if os.path.exists(args.det_library) == False:
            sys.exit('ppocr_det C static library \'{}\' not exist'.format(args.det_library))
        det_library = args.det_library
    else :
        sys.exit("ppocr_det C static library not found !!! Please use format: --det_library")
    if args.rec_library :
        if os.path.exists(args.rec_library) == False:
            sys.exit('ppocr_rec C static library \'{}\' not exist'.format(args.rec_library))
        rec_library = args.rec_library
    else :
        sys.exit("ppocr_rec C static library not found !!! Please use format: --rec_library")
    
    if args.level == '1' or args.level == '2' :
        level = int(args.level)
    else :
        level = 0

    ppocr_det = KSNN('VIM4')
    ppocr_rec = KSNN('VIM4')
    print(' |---+ KSNN Version: {} +---| '.format(ppocr_det.get_nn_version()))

    print('Start init neural network ...')
    ppocr_det.nn_init(library=det_library, model=det_model, level=level)
    ppocr_rec.nn_init(library=rec_library, model=rec_model, level=level)
    print('Done.')

    # usb camera
    # cap = cv.VideoCapture(int(cap_num))
    # mipi
    pipeline = "v4l2src device=/dev/media0 io-mode=dmabuf ! queue ! video/x-raw,format=YUY2,framerate=30/1 ! queue ! videoconvert ! appsink"
    cap = cv.VideoCapture(pipeline, cv.CAP_GSTREAMER)
    
    # cap.set(3,1920)
    # cap.set(4,1080)
    
    while(1):
        ret,orig_img = cap.read()
        
        start = time.time()
        det_img = cv.resize(orig_img, (det_input_size[1], det_input_size[0])).astype(np.float32)
        det_img[:, :, 0] = (det_img[:, :, 0] - det_mean[0]) / det_var[0]
        det_img[:, :, 1] = (det_img[:, :, 1] - det_mean[1]) / det_var[1]
        det_img[:, :, 2] = (det_img[:, :, 2] - det_mean[2]) / det_var[2]
        
        det_output = ppocr_det.nn_inference(det_img, input_shape=(det_input_size[0], det_input_size[1], 3), input_type="RAW", output_shape=[(det_input_size[0], det_input_size[1], 1)], output_type="FLOAT")
        
        det_results = ocr_det_postprocess(det_output[0], orig_img, det_input_size)
        
        final_results = []

        for i in range(len(det_results)):
            xmin, ymin, xmax, ymax, _, _ = det_results[i]
            rec_img = orig_img[ymin:ymax, xmin:xmax]
            
            new_height = rec_input_size[0]
            new_width = int(new_height / rec_img.shape[0] * rec_img.shape[1])
        
            if new_width > rec_input_size[1] * 1.2:
                # text too long. If you want to detect it, please convert rec model input longer.
                continue
            elif new_width < rec_input_size[1] * 1.2 and new_width > rec_input_size[1]:
                new_width = rec_input_size[1]        
            
            rec_img = cv.resize(rec_img, (new_width, new_height)).astype(np.float32)
            padding_img = np.zeros((rec_input_size[0], rec_input_size[1], 3)).astype(np.float32)
            padding_img[:, :new_width] = rec_img
        
            padding_img = (padding_img - rec_mean) / rec_var
        
            rec_output = ppocr_rec.nn_inference(padding_img, input_shape=(rec_input_size[0], rec_input_size[1], 3), input_type="RAW", output_shape=[(rec_output_size[0], rec_output_size[1])], output_type="FLOAT")
        
            det_results[i][5] = ocr_rec_postprocess(rec_output[0])
            final_results.append(det_results[i])

        if det_results is not None:
            pil_img, cv_img = draw(orig_img, final_results)
        
        cv_img = cv.resize(cv_img, (1280, 720))
        
        end = time.time()
        print('Done. inference time: ', end - start)

        cv.imshow("capture", cv_img)
        if cv.waitKey(1) & 0xFF == ord('q'):
           break
    
    ppocr_det.nn_destory_network()
    ppocr_rec.nn_destory_network()
    cap.release()
    cv.destroyAllWindows() 

Nevermind, Im able to run the code with 960-544 script. However I’m still detecting chinese characters

import numpy as np
import os
import urllib.request
import argparse
import sys
import math
from ksnn.api import KSNN
from ksnn.types import *
import cv2 as cv
import time
from postprocess import ocr_det_postprocess, ocr_rec_postprocess
from PIL import Image, ImageDraw, ImageFont
from initialize_firebase_credentials import initializeFirebaseCredentials
from update_live_update_to_db import updateLiveUpdateToDb
from write_result_to_db import writeResultToDb
import color_detection

initializeFirebaseCredentials()

det_mean = [123.675, 116.28, 103.53]
det_var = [255 * 0.229, 255 * 0.224, 255 * 0.225]
rec_mean = 127.5
rec_var = 128

det_input_size = (544, 960) # (model height, model width)
rec_input_size = ( 48, 320) # (model height, model width)
# rec_output_size = (40, 6625)
rec_output_size = (40, 97)

font = ImageFont.truetype("./data/simfang.ttf", 20)
texts_data = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z",
	"A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z",
	"1","2","3","4","5","6","7","8","9","0","+","-","×","÷"
]

def draw(image, boxes):
    draw_img = Image.fromarray(image)
    draw = ImageDraw.Draw(draw_img)
    for box in boxes:
        x1, y1, x2, y2, score, text = box
        left = max(0, np.floor(x1 + 0.5).astype(int))
        top = max(0, np.floor(y1 + 0.5).astype(int))
        right = min(image.shape[1], np.floor(x2 + 0.5).astype(int))
        bottom = min(image.shape[0], np.floor(y2 + 0.5).astype(int))
        
        color = (255,255,255)
        alphabet_image = image[int(left):int(right),int(top):int(bottom)]
        color_result = "N/A"
        if np.sum(alphabet_image) != 0:
            result,result_mask,largest_pixel_count = color_detection.detect(alphabet_image)

            if not result_mask is None:
                color_result = result
                if result == "red":
                    color = (0,0,255)
                elif result == "yellow":
                    color = (0,255,255)
                elif result == "blue":
                    color = (255,0,0)
                elif result == "green":
                    color = (0,255,0)

        draw.rectangle((left, top, right, bottom), outline=color, width=2)
        draw.text((left, top - 20), f"{text}, {color_result}", font=font, fill=color)
    
    return draw_img, np.array(draw_img)


if __name__ == '__main__':

    parser = argparse.ArgumentParser()
    parser.add_argument("--det_library", help="Path to C static library file for ppocr_det")
    parser.add_argument("--det_model", help="Path to nbg file for ppocr_det")
    parser.add_argument("--rec_library", help="Path to C static library file for ppocr_rec")
    parser.add_argument("--rec_model", help="Path to nbg file for ppocr_rec")
    parser.add_argument("--device", help="the number for video device")
    parser.add_argument("--level", help="Information printer level: 0/1/2")

    args = parser.parse_args()
    
    if args.det_model :
        if os.path.exists(args.det_model) == False:
            sys.exit('ppocr_det Model \'{}\' not exist'.format(args.det_model))
        det_model = args.det_model
    else :
        sys.exit("NBG file not found !!! Please use format: --det_model")
    if args.rec_model :
        if os.path.exists(args.rec_model) == False:
            sys.exit('ppocr_det Model \'{}\' not exist'.format(args.rec_model))
        rec_model = args.rec_model
    else :
        sys.exit("NBG file not found !!! Please use format: --rec_model")
    
    if args.device :
        cap_num = args.device
    else :
        sys.exit("video device not found !!! Please use format :--device ")
    
    if args.det_library :
        if os.path.exists(args.det_library) == False:
            sys.exit('ppocr_det C static library \'{}\' not exist'.format(args.det_library))
        det_library = args.det_library
    else :
        sys.exit("ppocr_det C static library not found !!! Please use format: --det_library")
    if args.rec_library :
        if os.path.exists(args.rec_library) == False:
            sys.exit('ppocr_rec C static library \'{}\' not exist'.format(args.rec_library))
        rec_library = args.rec_library
    else :
        sys.exit("ppocr_rec C static library not found !!! Please use format: --rec_library")
    
    if args.level == '1' or args.level == '2' :
        level = int(args.level)
    else :
        level = 0

    ppocr_det = KSNN('VIM4')
    ppocr_rec = KSNN('VIM4')
    print(' |---+ KSNN Version: {} +---| '.format(ppocr_det.get_nn_version()))

    print('Start init neural network ...')
    ppocr_det.nn_init(library=det_library, model=det_model, level=level)
    ppocr_rec.nn_init(library=rec_library, model=rec_model, level=level)
    print('Done.')

    # usb camera
    # cap = cv.VideoCapture(int(cap_num))
    # mipi
    pipeline = "v4l2src device=/dev/media0 io-mode=dmabuf ! queue ! video/x-raw,format=YUY2,framerate=30/1 ! queue ! videoconvert ! appsink"
    cap = cv.VideoCapture(pipeline, cv.CAP_GSTREAMER)
    
    cap.set(3,1920)
    cap.set(4,1080)
    
    frame_counter = 0;
    camera_id = "XzWz75mg6ZKB3S28QedR"
    
    while(1):
        if frame_counter % 100 == 0:
            updateLiveUpdateToDb(camera_id)
        frame_counter += 1
        
        ret,orig_img = cap.read()
        
        start = time.time()
        det_img = cv.resize(orig_img, (det_input_size[1], det_input_size[0])).astype(np.float32)
        det_img[:, :, 0] = (det_img[:, :, 0] - det_mean[0]) / det_var[0]
        det_img[:, :, 1] = (det_img[:, :, 1] - det_mean[1]) / det_var[1]
        det_img[:, :, 2] = (det_img[:, :, 2] - det_mean[2]) / det_var[2]
        
        det_output = ppocr_det.nn_inference(det_img, input_shape=(det_input_size[0], det_input_size[1], 3), input_type="RAW", output_shape=[(det_input_size[0], det_input_size[1], 1)], output_type="FLOAT")
        
        det_results = ocr_det_postprocess(det_output[0], orig_img, det_input_size)
        
        final_results = []

        for i in range(len(det_results)):
            xmin, ymin, xmax, ymax, _, _ = det_results[i]
            rec_img = orig_img[ymin:ymax, xmin:xmax]
            
            new_height = rec_input_size[0]
            new_width = int(new_height / rec_img.shape[0] * rec_img.shape[1])
        
            if new_width > rec_input_size[1] * 1.2:
                # text too long. If you want to detect it, please convert rec model input longer.
                continue
            elif new_width < rec_input_size[1] * 1.2 and new_width > rec_input_size[1]:
                new_width = rec_input_size[1]        
            
            rec_img = cv.resize(rec_img, (new_width, new_height)).astype(np.float32)
            padding_img = np.zeros((rec_input_size[0], rec_input_size[1], 3)).astype(np.float32)
            padding_img[:, :new_width] = rec_img
        
            padding_img = (padding_img - rec_mean) / rec_var
        
            rec_output = ppocr_rec.nn_inference(padding_img, input_shape=(rec_input_size[0], rec_input_size[1], 3), input_type="RAW", output_shape=[(rec_output_size[0], rec_output_size[1])], output_type="FLOAT")
        
            det_results[i][5] = ocr_rec_postprocess(rec_output[0])
            print('results')
            print(det_results[i])
            probability = det_results[i][4]
            text = det_results[i][5]
            print(f'probability: {probability}')
            print(f'text: {text}')
            print(f'text length: {len(text)}')
            final_results.append(det_results[i])
            # if len(text) == 1 and probability > 0.5 and text in texts_data:
            #     final_results.append(det_results[i])

        if det_results is not None:
            pil_img, cv_img = draw(orig_img, final_results)
        
        cv_img = cv.resize(cv_img, (1280, 720))
        
        end = time.time()
        print('Done. inference time: ', end - start)

        cv.imshow("capture", cv_img)
        if cv.waitKey(1) & 0xFF == ord('q'):
           break
    
    ppocr_det.nn_destory_network()
    ppocr_rec.nn_destory_network()
    cap.release()
    cv.destroyAllWindows() 

Hello @JietChoo ,

Have you modify the postprocess dict txt path?

character_str = ["blank"]
-with open("./data/ppocr_keys_v1.txt", "rb") as fin:
+with open("./data/en_dict.txt", "rb") as fin:
    lines = fin.readlines()
    for line in lines:
        line = line.decode("utf-8").strip("\n").strip("\r\n")
        character_str.append(line)
character_str.append(" ")
ignored_token = [0]

Dear Louis,

We have trained a model from scratch, just detecting the “+” symbol. When we run the det and rec model on PPOCR itself (On my Windows) it works fine. Now, after i convert to onnx and to ksnn, it cannot detect anything. Can you help me out?

I’ll have sent you my det and rec ppocr files

Dear Louis

Really hope to hear from you soon as this is super urgent from our side

Dear Louis,

Any update on your side?

Hello @JietChoo ,

Sorry for late. We convert your model and infer on VIM4, but everything is right.

Here are our parameters.

# det
--model-name ppocr_det 
--model-type onnx 
--model ./ppocr_det.onnx 
--inputs "x" 
--input-shapes  "3,544,960" 
--dtypes "float32" 
--quantize-dtype int8 
--outdir onnx_output 
--channel-mean-value "123.675,116.28,103.53,57.375" 
--source-file ocr_det_dataset.txt 
--iterations 500 
--batch-size 1 
--kboard VIM4 
--inference-input-type "float32" 
--inference-output-type "float32" 
# rec
--model-name ppocr_rec
--model-type onnx 
--model ./ppocr_rec.onnx 
--inputs "x" 
--input-shapes  "3,48,320" 
--dtypes "float32" 
--quantize-dtype int16 
--outdir onnx_output 
--channel-mean-value "127.5,127.5,127.5,128" 
--source-file ocr_rec_dataset.txt 
--iterations 500 
--batch-size 1 
--kboard VIM4 
--inference-input-type "float32" 
--inference-output-type "float32" 
--disable-per-channel False 

Your rec model class has changed. The model output has 6 class, remember modify en_dict.txt to your class txt and output shape rec_output_size = (40, 6)

Dear Louis,

Thank you for your reply. How do you determine what config to set? As in what to put in the ksnn_args. And by rec_output_size, where and how to do determine it?

This is because we are planning to train A-Z,a-z,0-9,+ - × ÷ ✓ X. So we are not sure whether the configs will change later on

We are aware 0 O o and X x × might be similar so we will handle those in the application side

Hello @JietChoo ,

A easy method for you. Change the ONNX model to a fixed input size and then use Netron open the model getting the output size.

python3 -m paddle2onnx.optimize --input_model inference/det_onnx/model.onnx \
  --output_model inference/det_onnx/model.onnx \
  --input_shape_dict "{'x': [1,3,48,320]}"

Thank you, i will try it out.

By the way, the rec_output_size = (40, 6) is in the postprocess file is it?

Hello @JietChoo ,

Here.

Hi Louis, we have tried converting with the ksnn_args you provided, still cant detect. I’ll send over the converted files to you via email

Hello @JietChoo ,

Could you provide your demo code? I use your model that can infer right result.

Hi,

Do you mean demo for Paddle OCR that runs on my Windows or the code for Vim4?

The below is the main.py

import cv2
from paddleocr import PaddleOCR,draw_ocr
import color_detection

# Paddleocr supports Chinese, English, French, German, Korean and Japanese.
# You can set the parameter `lang` as `ch`, `en`, `fr`, `german`, `korean`, `japan`
# to switch the language model in order.
ocr = PaddleOCR(use_angle_cls=True,lang='en',rec_model_dir='./my_rec',det_model_dir='./my_det',
                rec_char_dict_path='./dict/en_dict_custom_symbol.txt'
                ) # need to run only once to download and load model into memory
cap = cv2.VideoCapture(0)
# cap.set(cv2.CAP_PROP_FRAME_WIDTH,2880)
# cap.set(cv2.CAP_PROP_FRAME_HEIGHT,1800)

if not cap.isOpened():
    exit()

frame_counter = 0

while True:
    ret,frame = cap.read()
    if ret:
        w,h,c = frame.shape
        if w > 0 and h > 0:
            result = ocr.ocr(frame)
            print(result)
            for bbox_text_prob_tuple_array in result:
                if bbox_text_prob_tuple_array is not None:
                    for bbox_text_prob_tuple in bbox_text_prob_tuple_array:
                        bbox = bbox_text_prob_tuple[0]
                        text = bbox_text_prob_tuple[1][0]
                        prob = bbox_text_prob_tuple[1][1]
                        print(f"Text: {text}")
                        print(f"Prob: {prob}")
                        if prob > 0.5:
                            top_left = bbox[0]
                            top_right = bbox[1]
                            bottom_right = bbox[2]
                            bottom_left = bbox[3]
                            if len(top_left) == 2 and len(top_right) == 2 and len(bottom_right) == 2 and len(bottom_left) == 2:
                                alphabet_image = frame[int(top_left[1]):int(bottom_left[1]),int(top_left[0]):int(top_right[0])]
                                if alphabet_image.shape[0] > 0 and alphabet_image.shape[1] > 0:
                                    result,result_mask,largest_pixel_count = color_detection.detect(alphabet_image)
                                    color = (255, 255, 255)
                                    if not result_mask is None:
                                        if result == "red":
                                            color = (0,0,255)
                                        elif result == "yellow":
                                            color = (0,255,255)
                                        elif result == "blue":
                                            color = (255,0,0)
                                        elif result == "green":
                                            color = (0,255,0)
                                        cv2.putText(frame,f"{text},{result}",(int(top_left[0]),int(top_left[1] - 10)),cv2.FONT_HERSHEY_SIMPLEX,1,color,2)
                                    else:
                                        color = (255, 255, 255)
                                        cv2.putText(frame,f"{text},None",(int(top_left[0]),int(top_left[1] - 10)),cv2.FONT_HERSHEY_SIMPLEX,1,color,2)
                                    cv2.rectangle(frame,tuple(map(int, top_left)), tuple(map(int, bottom_right)),color,2)
                                    # cv2.imshow("a",alphabet_image)
                                    # else:
                                    #     img = np.zeros((100, 100, 3), dtype=np.uint8)
                                    #     cv2.imshow("a",img)
                                    # print(f"Bbox: {bbox}")
                                    # print(f"Text: {text} Probability: {prob}")
                                    # if not result is None:
                                    #     writeResultToDb(camera_id,text,result)
                                        # f = open(f"{result}.txt", "w")
                                        # f.write(f"{text},{result}")
                                        # f.close()

            cv2.imshow("Cam",frame)

    if cv2.waitKey(1) == ord("q"):
        break

cv2.destroyAllWindows()

# draw result
# from PIL import Image
# result = result[0]
# image = Image.open(img_path).convert('RGB')
# boxes = [line[0] for line in result]
# txts = [line[1][0] for line in result]
# scores = [line[1][1] for line in result]
# im_show = draw_ocr(image, boxes, txts, scores, font_path='./fonts/simfang.ttf')
# im_show = Image.fromarray(im_show)
# im_show.save('result.jpg')

color_detection.py

# Python code for Multiple Color Detection


import numpy as np
import cv2
import color_utils

# Capturing video through webcam
webcam = cv2.VideoCapture(0)

# Start a while loop
def detect(image_frame):
    # Convert the imageFrame in
    # BGR(RGB color space) to
    # HSV(hue-saturation-value)
    # color space
    hsv_frame = cv2.cvtColor(image_frame, cv2.COLOR_BGR2HSV)

    # Set range for red color and
    # define mask
    red_lower = color_utils.lower_red
    red_upper = color_utils.upper_red
    red_mask = cv2.inRange(hsv_frame, red_lower, red_upper)

    # Set range for yellow color and
    # define mask
    yellow_lower = color_utils.lower_yellow
    yellow_upper = color_utils.upper_yellow
    yellow_mask = cv2.inRange(hsv_frame, yellow_lower, yellow_upper)

    # Set range for blue color and
    # define mask
    blue_lower = color_utils.lower_blue
    blue_upper = color_utils.upper_blue
    blue_mask = cv2.inRange(hsv_frame, blue_lower, blue_upper)

    # Set range for green color and
    # define mask
    green_lower = color_utils.lower_green
    green_upper = color_utils.upper_green
    green_mask = cv2.inRange(hsv_frame, green_lower, green_upper)

    red_pixels = np.where(red_mask[red_mask==255])
    yellow_pixels = np.where(yellow_mask[yellow_mask == 255])
    blue_pixels = np.where(blue_mask[blue_mask == 255])
    green_pixels = np.where(green_mask[red_mask == 255])

    w,h,c = hsv_frame.shape
    total_pixel_count = w * h
    red_pixel_count = red_pixels[0].__array__().__len__()
    yellow_pixel_count = yellow_pixels[0].__array__().__len__()
    blue_pixel_count = blue_pixels[0].__array__().__len__()
    green_pixel_count = green_pixels[0].__array__().__len__()
    
    result = None
    result_mask = None
    largest_pixel_count = 0
    
    if red_pixel_count > largest_pixel_count:
        largest_pixel_count = red_pixel_count
        result_mask = red_mask
        result = "red"
    
    if yellow_pixel_count > largest_pixel_count:
        largest_pixel_count = yellow_pixel_count
        result_mask = yellow_mask
        result = "yellow"
        
    if blue_pixel_count > largest_pixel_count:
        largest_pixel_count = blue_pixel_count
        result_mask = blue_mask
        result = "blue"
        
    if green_pixel_count > largest_pixel_count:
        largest_pixel_count = green_pixel_count
        result_mask = green_mask
        result = "green"
        
    percentage_on_mask = largest_pixel_count / total_pixel_count
    print(f"Pre Result: {result}")
    print(f"Largest Pixel Count: {largest_pixel_count}")
    print(f"Total Pixel Count: {total_pixel_count}")
    print(f"Percentage on mask {percentage_on_mask}")
    if percentage_on_mask < 0.02:
        largest_pixel_count = 0
        result_mask = None
        result = None
    

    return [result,result_mask,largest_pixel_count]

color_utils.py

import numpy as np

lower_red = np.array([0,100,100], np.uint8)
upper_red = np.array([10,255,255], np.uint8)

lower_yellow = np.array([15, 0, 0], np.uint8)
upper_yellow = np.array([36, 255, 255], np.uint8)

lower_blue = np.array([100,150,0], np.uint8)
upper_blue = np.array([110,255,255], np.uint8)

lower_green = np.array([36,0,0], np.uint8)
upper_green = np.array([86,255,255], np.uint8)

The above is the code that i ran on Windows and Nvidia GPU. I have sent the models via email to you

Hello @JietChoo ,

I mean the code on VIM4.Because i use your nb model infer on VIM4 that can get right result.

Alright! The below are my codes

ppocr-cap-960-544.py

import numpy as np
import os
import urllib.request
import argparse
import sys
import math
from ksnn.api import KSNN
from ksnn.types import *
import cv2 as cv
import time
from postprocess import ocr_det_postprocess, ocr_rec_postprocess
from PIL import Image, ImageDraw, ImageFont
from write_result_to_db import writeResultToDb
import color_detection

det_mean = [123.675, 116.28, 103.53]
det_var = [255 * 0.229, 255 * 0.224, 255 * 0.225]
rec_mean = 127.5
rec_var = 128

det_input_size = (544, 960) # (model height, model width)
rec_input_size = ( 48, 320) # (model height, model width)
# rec_output_size = (40, 6625)
# rec_output_size = (40, 97)
rec_output_size = (40, 6)

font = ImageFont.truetype("./data/simfang.ttf", 20)
texts_data = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z",
	"A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z",
	"1","2","3","4","5","6","7","8","9","0","+","-","×","÷"
]

def draw(image, boxes):
    draw_img = Image.fromarray(image)
    draw = ImageDraw.Draw(draw_img)
    for box in boxes:
        x1, y1, x2, y2, score, text = box
        left = max(0, np.floor(x1 + 0.5).astype(int))
        top = max(0, np.floor(y1 + 0.5).astype(int))
        right = min(image.shape[1], np.floor(x2 + 0.5).astype(int))
        bottom = min(image.shape[0], np.floor(y2 + 0.5).astype(int))
        
        color = (255,255,255)
        alphabet_image = image[int(left):int(right),int(top):int(bottom)]
        color_result = "N/A"
        if np.sum(alphabet_image) != 0:
            result,result_mask,largest_pixel_count = color_detection.detect(alphabet_image)

            if not result_mask is None:
                color_result = result
                if result == "red":
                    color = (0,0,255)
                elif result == "yellow":
                    color = (0,255,255)
                elif result == "blue":
                    color = (255,0,0)
                elif result == "green":
                    color = (0,255,0)

        draw.rectangle((left, top, right, bottom), outline=color, width=2)
        draw.text((left, top - 20), f"{text}, {color_result}", font=font, fill=color)
    
    return draw_img, np.array(draw_img)


if __name__ == '__main__':

    parser = argparse.ArgumentParser()
    parser.add_argument("--det_library", help="Path to C static library file for ppocr_det")
    parser.add_argument("--det_model", help="Path to nbg file for ppocr_det")
    parser.add_argument("--rec_library", help="Path to C static library file for ppocr_rec")
    parser.add_argument("--rec_model", help="Path to nbg file for ppocr_rec")
    parser.add_argument("--device", help="the number for video device")
    parser.add_argument("--level", help="Information printer level: 0/1/2")

    args = parser.parse_args()
    
    if args.det_model :
        if os.path.exists(args.det_model) == False:
            sys.exit('ppocr_det Model \'{}\' not exist'.format(args.det_model))
        det_model = args.det_model
    else :
        sys.exit("NBG file not found !!! Please use format: --det_model")
    if args.rec_model :
        if os.path.exists(args.rec_model) == False:
            sys.exit('ppocr_det Model \'{}\' not exist'.format(args.rec_model))
        rec_model = args.rec_model
    else :
        sys.exit("NBG file not found !!! Please use format: --rec_model")
    
    if args.device :
        cap_num = args.device
    else :
        sys.exit("video device not found !!! Please use format :--device ")
    
    if args.det_library :
        if os.path.exists(args.det_library) == False:
            sys.exit('ppocr_det C static library \'{}\' not exist'.format(args.det_library))
        det_library = args.det_library
    else :
        sys.exit("ppocr_det C static library not found !!! Please use format: --det_library")
    if args.rec_library :
        if os.path.exists(args.rec_library) == False:
            sys.exit('ppocr_rec C static library \'{}\' not exist'.format(args.rec_library))
        rec_library = args.rec_library
    else :
        sys.exit("ppocr_rec C static library not found !!! Please use format: --rec_library")
    
    if args.level == '1' or args.level == '2' :
        level = int(args.level)
    else :
        level = 0

    ppocr_det = KSNN('VIM4')
    ppocr_rec = KSNN('VIM4')
    print(' |---+ KSNN Version: {} +---| '.format(ppocr_det.get_nn_version()))

    print('Start init neural network ...')
    ppocr_det.nn_init(library=det_library, model=det_model, level=level)
    ppocr_rec.nn_init(library=rec_library, model=rec_model, level=level)
    print('Done.')

    # usb camera
    # cap = cv.VideoCapture(int(cap_num))
    # mipi
    pipeline = "v4l2src device=/dev/media0 io-mode=dmabuf ! queue ! video/x-raw,format=YUY2,framerate=30/1 ! queue ! videoconvert ! appsink"
    cap = cv.VideoCapture(pipeline, cv.CAP_GSTREAMER)
    
    cap.set(3,1920)
    cap.set(4,1080)
    
    frame_counter = 0;
    camera_id = "XzWz75mg6ZKB3S28QedR"
    
    while(1):
        frame_counter += 1
        
        ret,orig_img = cap.read()
        
        start = time.time()
        det_img = cv.resize(orig_img, (det_input_size[1], det_input_size[0])).astype(np.float32)
        det_img[:, :, 0] = (det_img[:, :, 0] - det_mean[0]) / det_var[0]
        det_img[:, :, 1] = (det_img[:, :, 1] - det_mean[1]) / det_var[1]
        det_img[:, :, 2] = (det_img[:, :, 2] - det_mean[2]) / det_var[2]
        
        det_output = ppocr_det.nn_inference(det_img, input_shape=(det_input_size[0], det_input_size[1], 3), input_type="RAW", output_shape=[(det_input_size[0], det_input_size[1], 1)], output_type="FLOAT")
        
        det_results = ocr_det_postprocess(det_output[0], orig_img, det_input_size)
        
        final_results = []

        for i in range(len(det_results)):
            xmin, ymin, xmax, ymax, _, _ = det_results[i]
            rec_img = orig_img[ymin:ymax, xmin:xmax]
            
            new_height = rec_input_size[0]
            new_width = int(new_height / rec_img.shape[0] * rec_img.shape[1])
        
            if new_width > rec_input_size[1] * 1.2:
                # text too long. If you want to detect it, please convert rec model input longer.
                continue
            elif new_width < rec_input_size[1] * 1.2 and new_width > rec_input_size[1]:
                new_width = rec_input_size[1]        
            
            rec_img = cv.resize(rec_img, (new_width, new_height)).astype(np.float32)
            padding_img = np.zeros((rec_input_size[0], rec_input_size[1], 3)).astype(np.float32)
            padding_img[:, :new_width] = rec_img
        
            padding_img = (padding_img - rec_mean) / rec_var
        
            rec_output = ppocr_rec.nn_inference(padding_img, input_shape=(rec_input_size[0], rec_input_size[1], 3), input_type="RAW", output_shape=[(rec_output_size[0], rec_output_size[1])], output_type="FLOAT")
        
            det_results[i][5] = ocr_rec_postprocess(rec_output[0])
            print('results')
            print(det_results[i])
            probability = det_results[i][4]
            text = det_results[i][5]
            print(f'probability: {probability}')
            print(f'text: {text}')
            print(f'text length: {len(text)}')
            final_results.append(det_results[i])
            # if len(text) == 1 and probability > 0.5 and text in texts_data:
            #     final_results.append(det_results[i])

        if det_results is not None:
            pil_img, cv_img = draw(orig_img, final_results)
        
        cv_img = cv.resize(cv_img, (1280, 720))
        
        end = time.time()
        print('Done. inference time: ', end - start)

        cv.imshow("capture", cv_img)
        if cv.waitKey(1) & 0xFF == ord('q'):
           break
    
    ppocr_det.nn_destory_network()
    ppocr_rec.nn_destory_network()
    cap.release()
    cv.destroyAllWindows() 

For the color_detection and color_utils are same as above.

postprocess.py

import cv2
import numpy as np
from shapely.geometry import Polygon
import pyclipper

det_box_thresh = 0.2
min_size = 5
unclip_ratio = 1.5

character_str = ["blank"]
with open("./data/symbol.txt", "rb") as fin:
#with open("./data/en_dict.txt", "rb") as fin:
    lines = fin.readlines()
    for line in lines:
        line = line.decode("utf-8").strip("\n").strip("\r\n")
        character_str.append(line)
character_str.append(" ")
ignored_token = [0]


def ocr_det_postprocess(det_output, original_image, det_input_size):
	outs = cv2.findContours((det_output * 255).astype(np.uint8), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
	if len(outs) == 3:
		contours = outs[1]
	elif len(outs) == 2:
		contours = outs[0]
	
	det_results = []
	for i in range(len(contours)):
		bounding_box = cv2.boundingRect(contours[i])
		if bounding_box[2] < min_size or bounding_box[3] < min_size:
			continue
		
		mask = np.ones((bounding_box[3], bounding_box[2]), dtype=np.uint8)
		tmp_det_output = det_output.reshape(det_input_size[0], det_input_size[1])
		score = cv2.mean(tmp_det_output[bounding_box[1]:bounding_box[1] + bounding_box[3], bounding_box[0]:bounding_box[0] + bounding_box[2]], mask)[0]
		if score < det_box_thresh:
			continue
		
		box = np.array(((bounding_box[0], bounding_box[1]),
                        (bounding_box[0] + bounding_box[2], bounding_box[1]),
                        (bounding_box[0] + bounding_box[2], bounding_box[1] + bounding_box[3]),
                        (bounding_box[0], bounding_box[1] + bounding_box[3])))
        
		poly = Polygon(box)
		distance = poly.area * unclip_ratio / poly.length
		offset = pyclipper.PyclipperOffset()
		offset.AddPath(box, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
		expanded = offset.Execute(distance)
		tmp_box = np.array(expanded)
        
		xmin = max(int(np.min(tmp_box[0, :, 0]) / det_input_size[1] * original_image.shape[1]), 0)
		ymin = max(int(np.min(tmp_box[0, :, 1]) / det_input_size[0] * original_image.shape[0]), 0)
		xmax = min(int(np.max(tmp_box[0, :, 0]) / det_input_size[1] * original_image.shape[1] + 1), original_image.shape[1])
		ymax = min(int(np.max(tmp_box[0, :, 1]) / det_input_size[0] * original_image.shape[0] + 1), original_image.shape[0])
        
		det_results.append([xmin, ymin, xmax, ymax, score, 0])
        
	return det_results

def ocr_rec_postprocess(rec_output):
    rec_idx = rec_output.argmax(axis=1)
    rec_prob = rec_output.max(axis=1)
    
    selection = np.ones(len(rec_idx), dtype=bool)
    selection[1:] = rec_idx[1:] != rec_idx[:-1]
    selection &= rec_idx != ignored_token
    #print(f'rec_idx: {rec_idx}')
    #print(f'selection: {selection}')
    #print(f'rec_output: {rec_output}')
    #print(f'character_str: {character_str}')
    
    
    char_list = [character_str[text_id] for text_id in rec_idx[selection]]
    character_result = "".join(char_list)
    
    return character_result


symbol.txt

+
-
×
÷