Realtime Text Recognition with VIM4 and IMX415 MIPI Camera

JietChoo · March 25, 2025, 7:03am

you mean test.py right? I have already comented

    # usb camera
    # cap = cv.VideoCapture(int(0))
    # mipi
    pipeline = "v4l2src device=/dev/media0 io-mode=dmabuf ! queue ! video/x-raw,format=YUY2,framerate=30/1 ! queue ! videoconvert ! appsink"
    cap = cv.VideoCapture(pipeline, cv.CAP_GSTREAMER)
    print(cap.isOpened())

Louis-Cheng-Liu · March 25, 2025, 8:02am

Hello @JietChoo ,

No, i mean the line in color_detection.py. But i remember i have comented the line.

Make sure the camera is available and camera index is right. And check it has camera permission in virtual environment.

JietChoo · March 26, 2025, 5:48am

Hi Louis,

I have tried running it with

gst-launch-1.0 -v v4l2src device=/dev/media0 io-mode=mmap ! video/x-raw,format=NV12,width=3840,height=2160,framerate=30/1 ! fpsdisplaysink video-sink=waylandsink sync=false text-overlay=false

And it’s working fine

I have added a print statement in test.py

print(cv.getBuildInformation())

And got this result

General configuration for OpenCV 4.11.0 =====================================
  Version control:               4.11.0-dirty

  Platform:
    Timestamp:                   2025-01-16T09:56:27Z
    Host:                        Linux 6.8.0-51-generic aarch64
    CMake:                       3.31.1
    CMake generator:             Unix Makefiles
    CMake build tool:            /bin/gmake
    Configuration:               Release
    Algorithm Hint:              ALGO_HINT_ACCURATE

  CPU/HW features:
    Baseline:                    NEON FP16
      requested:                 DETECT
    Dispatched code generation:  NEON_DOTPROD NEON_FP16 NEON_BF16
      requested:                 NEON_FP16 NEON_BF16 NEON_DOTPROD
      NEON_DOTPROD (1 files):    + NEON_DOTPROD
      NEON_FP16 (2 files):       + NEON_FP16
      NEON_BF16 (0 files):       + NEON_BF16

  C/C++:
    Built as dynamic libs?:      NO
    C++ standard:                11
    C++ Compiler:                /opt/rh/devtoolset-10/root/usr/bin/c++  (ver 10.2.1)
    C++ flags (Release):         -Wl,-strip-all   -fsigned-char -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG  -DNDEBUG
    C++ flags (Debug):           -Wl,-strip-all   -fsigned-char -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -fvisibility=hidden -fvisibility-inlines-hidden -g  -O0 -DDEBUG -D_DEBUG
    C Compiler:                  /opt/rh/devtoolset-10/root/usr/bin/cc
    C flags (Release):           -Wl,-strip-all   -fsigned-char -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -fvisibility=hidden -O3 -DNDEBUG  -DNDEBUG
    C flags (Debug):             -Wl,-strip-all   -fsigned-char -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -fvisibility=hidden -g  -O0 -DDEBUG -D_DEBUG
    Linker flags (Release):      -L/ffmpeg_build/lib  -Wl,--gc-sections -Wl,--as-needed -Wl,--no-undefined  
    Linker flags (Debug):        -L/ffmpeg_build/lib  -Wl,--gc-sections -Wl,--as-needed -Wl,--no-undefined  
    ccache:                      YES
    Precompiled headers:         NO
    Extra dependencies:          /lib64/libopenblas.so Qt5::Core Qt5::Gui Qt5::Widgets Qt5::Test Qt5::Concurrent /usr/local/lib/libpng.so /usr/lib64/libz.so dl m pthread rt
    3rdparty dependencies:       libprotobuf ade ittnotify libjpeg-turbo libwebp libtiff libopenjp2 IlmImf tegra_hal

  OpenCV modules:
    To be built:                 calib3d core dnn features2d flann gapi highgui imgcodecs imgproc ml objdetect photo python3 stitching video videoio
    Disabled:                    world
    Disabled by dependency:      -
    Unavailable:                 java python2 ts
    Applications:                -
    Documentation:               NO
    Non-free algorithms:         NO

  GUI:                           QT5
    QT:                          YES (ver 5.15.16 )
      QT OpenGL support:         NO
    GTK+:                        NO
    VTK support:                 NO

  Media I/O: 
    ZLib:                        /usr/lib64/libz.so (ver 1.2.7)
    JPEG:                        build-libjpeg-turbo (ver 3.0.3-70)
      SIMD Support Request:      YES
      SIMD Support:              YES
    WEBP:                        build (ver decoder: 0x0209, encoder: 0x020f, demux: 0x0107)
    AVIF:                        NO
    PNG:                         /usr/local/lib/libpng.so (ver 1.6.44)
    TIFF:                        build (ver 42 - 4.6.0)
    JPEG 2000:                   build (ver 2.5.0)
    OpenEXR:                     build (ver 2.3.0)
    GIF:                         NO
    HDR:                         YES
    SUNRASTER:                   YES
    PXM:                         YES
    PFM:                         YES

  Video I/O:
    FFMPEG:                      YES
      avcodec:                   YES (59.37.100)
      avformat:                  YES (59.27.100)
      avutil:                    YES (57.28.100)
      swscale:                   YES (6.7.100)
      avresample:                NO
    GStreamer:                   NO
    v4l/v4l2:                    YES (linux/videodev2.h)

  Parallel framework:            pthreads

  Trace:                         YES (with Intel ITT)

  Other third-party libraries:
    Lapack:                      YES (/lib64/libopenblas.so)
    Eigen:                       NO
    Custom HAL:                  YES (carotene (ver 0.0.1))
    Protobuf:                    build (3.19.1)
    Flatbuffers:                 builtin/3rdparty (23.5.9)

  OpenCL:                        YES (no extra features)
    Include path:                /io/opencv/3rdparty/include/opencl/1.2
    Link libraries:              Dynamic load

  Python 3:
    Interpreter:                 /opt/python/cp39-cp39/bin/python3.9 (ver 3.9.20)
    Libraries:                   libpython3.9m.a (ver 3.9.20)
    Limited API:                 YES (ver 0x03060000)
    numpy:                       /home/ci/.local/lib/python3.9/site-packages/numpy/_core/include (ver 2.0.2)
    install path:                python/cv2/python-3

  Python (for build):            /opt/python/cp39-cp39/bin/python3.9

  Java:                          
    ant:                         NO
    Java:                        NO
    JNI:                         NO
    Java wrappers:               NO
    Java tests:                  NO

  Install to:                    /io/_skbuild/linux-aarch64-3.9/cmake-install
-----------------------------------------------------------------

I assume under the Video I/O, the Gstreamer is off. That’s why i cannot use

pipeline = "v4l2src device=/dev/media0 io-mode=dmabuf ! queue ! video/x-raw,format=YUY2,framerate=30/1 ! queue ! videoconvert ! appsink"
cap = cv.VideoCapture(pipeline, cv.CAP_GSTREAMER)

Any way i can enable it? Or should i just factory reset my VIM4?

Louis-Cheng-Liu · March 26, 2025, 10:33am

Hello @JietChoo ,

I feedback the camera problem to our engineer.

Have you tried the picture demo? It still detect nothing?

Louis-Cheng-Liu · March 28, 2025, 3:27am

Hello @JietChoo ,

Your assumption is right. pip install will install OpenCV precompiled packages. The precompiled packages do not compile GStreamer. But use sudo apt install, it will compile GStreamer.

Here is the Python virtual environment package from my VIM4. In this virtual environment, OpenCV compile GStreamer. Follow the command and try again.

$ mkdir myenv
$ tar -xzf myenv.tar.gz -C myenv
$ source myenv/bin/activate
$ cd ppocr_test
$ python test.py

https://dl.khadas.com/.test/myenv.tar.gz

That is the result in another VIM4 i test.

JietChoo · April 21, 2025, 7:04am

Hi Louis, sorry for the late reply, the link https://dl.khadas.com/.test/myenv.tar.gz does not have any tar.gz file. It responded Error 404

Louis-Cheng-Liu · April 21, 2025, 9:03am

Hello @JietChoo ,

The link available now. https://dl.khadas.com/.test/myenv.tar.gz

JietChoo · April 22, 2025, 10:30am

Now im having this error while runing

python3 test.py

['/home/khadas/ppocr_test', '/home/khadas/myenv/lib/python3.12/site-packages/cv2/python-3.12', '/usr/lib/python312.zip', '/usr/lib/python3.12', '/usr/lib/python3.12/lib-dynload', '/home/khadas/ksnn-vim4-mosen/examples/myenv/lib/python3.12/site-packages']
Traceback (most recent call last):
  File "/home/khadas/ppocr_test/test.py", line 7, in <module>
    from ksnn.api import KSNN
  File "/home/khadas/ksnn-vim4-mosen/examples/myenv/lib/python3.12/site-packages/ksnn/api.py", line 6, in <module>
    import cv2 as cv
  File "/home/khadas/ksnn-vim4-mosen/examples/myenv/lib/python3.12/site-packages/cv2/__init__.py", line 181, in <module>
    bootstrap()
  File "/home/khadas/ksnn-vim4-mosen/examples/myenv/lib/python3.12/site-packages/cv2/__init__.py", line 153, in bootstrap
    native_module = importlib.import_module("cv2")
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/khadas/ksnn-vim4-mosen/examples/myenv/lib/python3.12/site-packages/cv2/__init__.py", line 181, in <module>
    bootstrap()
  File "/home/khadas/ksnn-vim4-mosen/examples/myenv/lib/python3.12/site-packages/cv2/__init__.py", line 76, in bootstrap
    raise ImportError('ERROR: recursion is detected during loading of "cv2" binary extensions. Check OpenCV installation.')
ImportError: ERROR: recursion is detected during loading of "cv2" binary extensions. Check OpenCV installation.

I already performed

$ sudo apt update
$ sudo apt install python3-pip
$ pip3 install ksnn_vim4-1.4.1-py3-none-any.whl

inside the environment you sent me

Louis-Cheng-Liu · April 23, 2025, 1:59am

Hello @JietChoo ,

The virtual environment has installed all of them. You do not need install python and ksnn again. It will change the OpenCV. Directly run test.py in the environment.

JietChoo · April 29, 2025, 12:17pm

Hi Louis,

The error is the same

['/home/khadas/ppocr_test', '/home/khadas/myenv/lib/python3.12/site-packages/cv2/python-3.12', '/usr/lib/python312.zip', '/usr/lib/python3.12', '/usr/lib/python3.12/lib-dynload', '/home/khadas/ppocr_test/myenv/lib/python3.12/site-packages']
Traceback (most recent call last):
  File "/home/khadas/ppocr_test/test.py", line 7, in <module>
    from ksnn.api import KSNN
  File "/home/khadas/ppocr_test/myenv/lib/python3.12/site-packages/ksnn/api.py", line 6, in <module>
    import cv2 as cv
  File "/home/khadas/ppocr_test/myenv/lib/python3.12/site-packages/cv2/__init__.py", line 181, in <module>
    bootstrap()
  File "/home/khadas/ppocr_test/myenv/lib/python3.12/site-packages/cv2/__init__.py", line 153, in bootstrap
    native_module = importlib.import_module("cv2")
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/khadas/ppocr_test/myenv/lib/python3.12/site-packages/cv2/__init__.py", line 181, in <module>
    bootstrap()
  File "/home/khadas/ppocr_test/myenv/lib/python3.12/site-packages/cv2/__init__.py", line 76, in bootstrap
    raise ImportError('ERROR: recursion is detected during loading of "cv2" binary extensions. Check OpenCV installation.')
ImportError: ERROR: recursion is detected during loading of "cv2" binary extensions. Check OpenCV installation.

It says something about the cv2 module not loading properly?

I have ran my previous python scripts, it can still run with the wrong detection.

However, i used your environment and run the test.py, it has the error above, and no windows is showing.

Louis-Cheng-Liu · April 30, 2025, 3:45am

Hello @JietChoo ,

Sorry, i reproduce the problem and find the reason that Python virtual environment has absolute path. You need to place virtual environment in this path.

Delete all myenv file on board to avoid recursive calls. And then unzip myenv.tar.gz in /home/khadas. It is the path before virtual environment packaging.

JietChoo · May 7, 2025, 9:33am

Dear Louis, i am able to run the script in the env. However, i still cannot detect any +.

From your side, is it easily able to detect the + sign?

Louis-Cheng-Liu · May 8, 2025, 2:23am

Hello @JietChoo ,

It is easily detect.

Video download link: https://dl.khadas.com/.test/output.mp4

JietChoo · May 9, 2025, 4:48am

Are you also using emmc ubuntu24.04 gnome on Nov 2024 firmware?

If that’s the case, let us factory reset our device and reinstall this firmware version again, and run again

Louis-Cheng-Liu · May 9, 2025, 7:40am

Hello @JietChoo ,

I use this firmware.
https://dl.khadas.com/products/vim4/firmware/ubuntu/emmc/ubuntu-24.04/vim4-ubuntu-24.04-gnome-linux-5.15-fenix-1.7.4-250423-emmc.img.xz

You can try it.

JietChoo · May 22, 2025, 7:05am

Dear Louis,

Sorry for the late reply. We have factory reset our VIM4 device. However, it seems the firmware version https://dl.khadas.com/products/vim4/firmware/ubuntu/emmc/ubuntu-24.04/vim4-ubuntu-24.04-gnome-linux-5.15-fenix-1.7.4-250423-emmc.img.xz is not working for us. The whole device just blacks out.

So we used back https://dl.khadas.com/products/vim4/firmware/ubuntu/emmc/ubuntu-24.04/vim4-ubuntu-24.04-gnome-linux-5.15-fenix-1.7.3-241129-emmc.img.xz

During this time, we have gone through the hassle of collecting our training dataset, cropping each one by one, for each of these characters:-

"0","1","2","3","4","5","6","7","8","9","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z","+","-","÷","/"

We have cropped 2000 images for each individual characters, and trained it using the Paddle OCR model, and has produced our own inference model for both det and rec. We have run it in both Windows and Mac, and it works wonders! We were happy with the training results!

Now, the challenging part is converting both the det and rec model to onnx, then to adla.

For converting to onnx, we have used:-

$ paddle2onnx --model_dir ./det_infer --model_filename inference.pdmodel --params_filename inference.pdiparams --save_file ppocr_det.onnx
$ paddle2onnx --model_dir ./rec_infer --model_filename inference.pdmodel --params_filename inference.pdiparams --save_file ppocr_rec.onnx

As for converting from onnx to adla, the parameters are as follow:-

det

--model-name ppocr_det 
--model-type onnx 
--model ./ppocr_det.onnx 
--inputs "x" 
--input-shapes  "3,544,960" 
--dtypes "float32" 
--quantize-dtype int8 
--outdir onnx_output 
--channel-mean-value "123.675,116.28,103.53,57.375" 
--source-file ocr_det_dataset.txt 
--iterations 500 
--batch-size 1 
--kboard VIM4 
--inference-input-type "float32" 
--inference-output-type "float32"

rec

--model-name ppocr_rec 
--model-type onnx 
--model ./ppocr_rec.onnx 
--inputs "x" 
--input-shapes  "3,48,320" 
--dtypes "float32" 
--quantize-dtype int16 
--outdir onnx_output 
--channel-mean-value "127.5,127.5,127.5,128" 
--source-file ocr_rec_dataset.txt 
--iterations 500 
--batch-size 1 
--kboard VIM4 
--inference-input-type "float32" 
--inference-output-type "float32" 
--disable-per-channel False

It has then produced both adla and so files for both det and rec model. And i have moved it to khadas VIM4.

I have changed the code slightly in my python scripts for the color_detection.py and color_utils.py, to get a better recognition for colors. And for the rec_output_size in test.py, i have used (40,69), which i followed your instruction as before, optimizing my rec onnx model and specify the input_shape_dict

python3 -m paddle2onnx.optimize --input_model ppocr_rec.onnx \
  --output_model optimized_ppocr_rec.onnx \
  --input_shape_dict "{'x': [1,3,48,320]}"

Then i put the optimized_ppocr_rec.onnx to Netron and got this result

Below are the updated python codes.

test.py

import numpy as np
import os
import urllib.request
import argparse
import sys
import math
from ksnn.api import KSNN
from ksnn.types import *
import cv2 as cv
import time
from postprocess import ocr_det_postprocess, ocr_rec_postprocess
from PIL import Image, ImageDraw, ImageFont
#from write_result_to_db import writeResultToDb
import color_detection

det_mean = [123.675, 116.28, 103.53]
det_var = [255 * 0.229, 255 * 0.224, 255 * 0.225]
rec_mean = 127.5
rec_var = 128

det_input_size = (544, 960) # (model height, model width)
rec_input_size = ( 48, 320) # (model height, model width)
# rec_output_size = (40, 6625)
# rec_output_size = (40, 97)
# rec_output_size = (40, 6)
rec_output_size = (40, 69)

font = ImageFont.truetype("./data/simfang.ttf", 100)
texts_data = ["0","1","2","3","4","5","6","7","8","9","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z",
	"A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z","+","-","÷","/"
]

def draw(image, boxes):
    draw_img = Image.fromarray(image)
    draw = ImageDraw.Draw(draw_img)
    for box in boxes:
        x1, y1, x2, y2, score, text = box
        left = max(0, np.floor(x1 + 0.5).astype(int))
        top = max(0, np.floor(y1 + 0.5).astype(int))
        right = min(image.shape[1], np.floor(x2 + 0.5).astype(int))
        bottom = min(image.shape[0], np.floor(y2 + 0.5).astype(int))
        
        color = (0,0,0)
        alphabet_image = image[int(left):int(right),int(top):int(bottom)]
        color_result = "N/A"
        if np.sum(alphabet_image) != 0:
            result,result_mask,largest_pixel_count = color_detection.detect(alphabet_image)

            if not result_mask is None:
                color_result = result
                if result == "red":
                    color = (0,0,255)
                elif result == "yellow":
                    color = (0,255,255)
                elif result == "blue":
                    color = (255,0,0)
                elif result == "green":
                    color = (0,255,0)

        draw.rectangle((left, top, right, bottom), outline=color, width=10)
        draw.text((left, top - 20), f"{text}, {color_result}", font=font, fill=color)
    
    return draw_img, np.array(draw_img)


if __name__ == '__main__':
    
    ppocr_det = KSNN('VIM4')
    ppocr_rec = KSNN('VIM4')
    print(' |---+ KSNN Version: {} +---| '.format(ppocr_det.get_nn_version()))

    print('Start init neural network ...')
    ppocr_det.nn_init(library="./model/libnn_ppocr_det.so", model="./model/ppocr_det_int8.adla", level=0)
    ppocr_rec.nn_init(library="./model/libnn_ppocr_rec.so", model="./model/ppocr_rec_int16.adla", level=0)
    print('Done.')

    # usb camera
    # cap = cv.VideoCapture(int(0))
    # mipi
    pipeline = "v4l2src device=/dev/media0 io-mode=dmabuf ! queue ! video/x-raw,format=YUY2,framerate=30/1 ! queue ! videoconvert ! appsink"
    cap = cv.VideoCapture(pipeline, cv.CAP_GSTREAMER)
    print(cap.isOpened())
    
    cap.set(3,1920)
    cap.set(4,1080)
    
    frame_counter = 0;
    # camera_id = "XzWz75mg6ZKB3S28QedR"
    
    while(1):
        frame_counter += 1
        
        ret,orig_img = cap.read()
        
        start = time.time()
        det_img = cv.resize(orig_img, (det_input_size[1], det_input_size[0])).astype(np.float32)
        det_img[:, :, 0] = (det_img[:, :, 0] - det_mean[0]) / det_var[0]
        det_img[:, :, 1] = (det_img[:, :, 1] - det_mean[1]) / det_var[1]
        det_img[:, :, 2] = (det_img[:, :, 2] - det_mean[2]) / det_var[2]
        
        det_output = ppocr_det.nn_inference(det_img, input_shape=(det_input_size[0], det_input_size[1], 3), input_type="RAW", output_shape=[(det_input_size[0], det_input_size[1], 1)], output_type="FLOAT")
        
        det_results = ocr_det_postprocess(det_output[0], orig_img, det_input_size)
        
        final_results = []

        for i in range(len(det_results)):
            xmin, ymin, xmax, ymax, _, _ = det_results[i]
            rec_img = orig_img[ymin:ymax, xmin:xmax]
            
            new_height = rec_input_size[0]
            new_width = int(new_height / rec_img.shape[0] * rec_img.shape[1])
        
            if new_width > rec_input_size[1] * 1.2:
                # text too long. If you want to detect it, please convert rec model input longer.
                continue
            elif new_width < rec_input_size[1] * 1.2 and new_width > rec_input_size[1]:
                new_width = rec_input_size[1]        
            
            rec_img = cv.resize(rec_img, (new_width, new_height)).astype(np.float32)
            padding_img = np.zeros((rec_input_size[0], rec_input_size[1], 3)).astype(np.float32)
            padding_img[:, :new_width] = rec_img
        
            padding_img = (padding_img - rec_mean) / rec_var
        
            rec_output = ppocr_rec.nn_inference(padding_img, input_shape=(rec_input_size[0], rec_input_size[1], 3), input_type="RAW", output_shape=[(rec_output_size[0], rec_output_size[1])], output_type="FLOAT")
        
            det_results[i][5] = ocr_rec_postprocess(rec_output[0])
            print('results')
            print(det_results[i])
            probability = det_results[i][4]
            print(f'det_results {det_results}')
            text = det_results[i][5]
            print(f'probability: {probability}')
            print(f'text: {text}')
            print(f'text length: {len(text)}')
            final_results.append(det_results[i])
            if len(text) == 1 and probability > 0.5 and text in texts_data:
                final_results.append(det_results[i])

        if det_results is not None:
            pil_img, cv_img = draw(orig_img, final_results)
        
        cv_img = cv.resize(cv_img, (1280, 720))
        
        end = time.time()
        print('Done. inference time: ', end - start)

        cv.imshow("capture", cv_img)
        if cv.waitKey(1) & 0xFF == ord('q'):
           break
    
    ppocr_det.nn_destory_network()
    ppocr_rec.nn_destory_network()
    cap.release()
    cv.destroyAllWindows()

color_detection.py

# Python code for Multiple Color Detection


import numpy as np
import cv2
import color_utils

# Capturing video through webcam
webcam = cv2.VideoCapture(0)


# Start a while loop
def detect(image_frame):
    # Convert the imageFrame in
    # BGR(RGB color space) to
    # HSV(hue-saturation-value)
    # color space
    THRESHOLD = 0.1
    height, width, channel = image_frame.shape
    hsv_frame = cv2.cvtColor(image_frame, cv2.COLOR_BGR2HSV)
    b, g, r = cv2.split(image_frame)

    # Set range for red color and
    # define mask
    lower_red_low = color_utils.lower_red_low
    upper_red_low = color_utils.upper_red_low
    lower_red_high = color_utils.lower_red_high
    upper_red_high = color_utils.upper_red_high
    red_mask_low = cv2.inRange(hsv_frame, lower_red_low, upper_red_low)
    red_mask_high = cv2.inRange(hsv_frame, lower_red_high, upper_red_high)
    red_mask = cv2.bitwise_or(red_mask_low, red_mask_high)
    # red_mask = color_utils.is_red(r,g,b)

    # Set range for blue color and
    # define mask
    blue_lower = color_utils.lower_blue
    blue_upper = color_utils.upper_blue
    blue_mask = cv2.inRange(hsv_frame, blue_lower, blue_upper)
    # blue_mask = color_utils.is_blue(r,g,b)

    # Set range for green color and
    # define mask
    green_lower = color_utils.lower_green
    green_upper = color_utils.upper_green
    green_mask = cv2.inRange(hsv_frame, green_lower, green_upper)
    # green_mask = color_utils.is_green(r,g,b)

    # Set range for yellow color and
    # define mask
    yellow_lower = color_utils.lower_yellow
    yellow_upper = color_utils.upper_yellow
    yellow_mask = cv2.inRange(hsv_frame, yellow_lower, yellow_upper)
    # yellow_mask = color_utils.is_yellow(r,g,b)

    # red_pixels = np.where(red_mask[red_mask==255])
    # yellow_pixels = np.where(yellow_mask[yellow_mask == 255])
    # blue_pixels = np.where(blue_mask[blue_mask == 255])
    # green_pixels = np.where(green_mask[red_mask == 255])

    # w,h,c = hsv_frame.shape
    total_pixel_count = width * height
    # red_pixel_count = red_pixels[0].array().len()
    # yellow_pixel_count = yellow_pixels[0].array().len()
    # blue_pixel_count = blue_pixels[0].array().len()
    # green_pixel_count = green_pixels[0].array().len()
    pixel_counts = {
        'red': np.sum(red_mask),
        'yellow': np.sum(yellow_mask),
        'green': np.sum(green_mask),
        'blue': np.sum(blue_mask)
    }
    # red_pixel_count = np.sum(red_mask)
    # yellow_pixel_count = np.sum(yellow_mask)
    # green_pixel_count = np.sum(green_mask)
    # blue_pixel_count = np.sum(blue_mask)

    result = None
    result_mask = None
    largest_pixel_count = 0
    pixel_percentage = 0
    # red_percentage = pixel_counts['red'] / total_pixel_count
    # yellow_percentage = pixel_counts['yellow'] / total_pixel_count
    # green_percentage = pixel_counts['green'] / total_pixel_count
    # blue_percentage = pixel_counts['blue'] / total_pixel_count

    for color, count in pixel_counts.items():
        percentage = count / total_pixel_count if total_pixel_count > 0 else 0
        if percentage > THRESHOLD and count > largest_pixel_count:
            pixel_percentage = percentage
            largest_pixel_count = count
            result = color
            if result is 'red':
                result_mask = red_mask
            elif result is 'blue':
                result_mask = blue_mask
            elif result is 'green':
                result_mask = green_mask
            elif result is 'yellow':
                result_mask = yellow_mask

    # if largest_pixel_count < pixel_counts['red'] :  # Adjust threshold as needed
    #     largest_pixel_count = pixel_counts['red']
    #     pixel_percentage = pixel_counts['red'] / total_pixel_count
    #     result_mask = red_mask
    #     result = "red"
    # if largest_pixel_count < pixel_counts['blue']:
    #     largest_pixel_count = pixel_counts['blue']
    #     pixel_percentage = pixel_counts['blue'] / total_pixel_count
    #     result_mask = blue_mask
    #     result = "blue"
    # if largest_pixel_count < pixel_counts['green']:
    #     largest_pixel_count = pixel_counts['green']
    #     pixel_percentage = pixel_counts['green'] / total_pixel_count
    #     result_mask = green_mask
    #     result = "green"
    # if largest_pixel_count < pixel_counts['yellow']:
    #     largest_pixel_count = pixel_counts['yellow']
    #     pixel_percentage = pixel_counts['yellow'] / total_pixel_count
    #     result_mask = yellow_mask
    #     result = "yellow"

    # if red_pixel_count > largest_pixel_count:
    #     largest_pixel_count = red_pixel_count
    #     result_mask = red_mask
    #     result = "red"
    #
    # if yellow_pixel_count > largest_pixel_count:
    #     largest_pixel_count = yellow_pixel_count
    #     result_mask = yellow_mask
    #     result = "yellow"
    #
    # if blue_pixel_count > largest_pixel_count:
    #     largest_pixel_count = blue_pixel_count
    #     result_mask = blue_mask
    #     result = "blue"
    #
    # if green_pixel_count > largest_pixel_count:
    #     largest_pixel_count = green_pixel_count
    #     result_mask = green_mask
    #     result = "green"
    # color_counts = {"red": 0, "yellow": 0, "green": 0, "blue": 0}
    # red_count = 0
    # yellow_count = 0
    # green_count = 0
    # blue_count = 0
    # percentage_on_mask = largest_pixel_count / total_pixel_count
    print(f"Pixel counts: {pixel_counts}")
    print(f"Pre Result: {result}")
    print(f"Pixel Percentage: {pixel_percentage}")
    print(f"Largest Pixel Count: {largest_pixel_count}")
    print(f"Total Pixel Count: {total_pixel_count}")
    # print(f"Percentage on mask {percentage_on_mask}")

    # for y in range(height):
    #     for x in range(width):
    #         # Get the RGB value of the current pixel
    #         b, g, r = image_frame[y, x]  # OpenCV reads as BGR by default
    #
    #         rgb_pixel = (r, g, b)
    #
    #         if color_utils.is_red(rgb_pixel):
    #             color_counts["red"] += 1
    #         elif color_utils.is_yellow(rgb_pixel):
    #             color_counts["yellow"] += 1
    #         elif color_utils.is_green(rgb_pixel):
    #             color_counts["green"] += 1
    #         elif color_utils.is_blue(rgb_pixel):
    #             color_counts["blue"] += 1

    # Determine the dominant color based on pixel counts
    # if total_pixel_count > 0:
    #     red_percentage = color_counts["red"] / total_pixel_count
    #     yellow_percentage = color_counts["yellow"] / total_pixel_count
    #     green_percentage = color_counts["green"] / total_pixel_count
    #     blue_percentage = color_counts["blue"] / total_pixel_count
    #
    #     if red_percentage > THRESHOLD:  # Adjust threshold as needed
    #         largest_pixel_count = color_counts["red"]
    #         result_mask = red_mask
    #         result = "red"
    #     elif yellow_percentage > THRESHOLD:
    #         largest_pixel_count = color_counts["yellow"]
    #         result_mask = yellow_mask
    #         result = "yellow"
    #     elif green_percentage > THRESHOLD:
    #         largest_pixel_count = color_counts["green"]
    #         result_mask = green_mask
    #         result = "green"
    #     elif blue_percentage > THRESHOLD:
    #         largest_pixel_count = color_counts["blue"]
    #         result_mask = blue_mask
    #         result = "blue"
    # print(f"Red Percentage: {red_percentage}")
    # print(f"Yellow Percentage: {yellow_percentage}")
    # print(f"Green Percentage {green_percentage}")
    # print(f"Blue Percentage {blue_percentage}")
    # print(color_counts)
    if pixel_percentage < THRESHOLD:
        largest_pixel_count = 0
        result_mask = None
        result = None

    return [result, result_mask, largest_pixel_count]

color_utils.py

import numpy as np

lower_red_low = np.array([0, 100, 100], np.uint8)
upper_red_low = np.array([5, 255, 255], np.uint8)
lower_red_high = np.array([160, 100, 100], np.uint8)
upper_red_high = np.array([179, 255, 255], np.uint8)

lower_blue = np.array([100, 70, 30], np.uint8)
upper_blue = np.array([130, 255, 255], np.uint8)

lower_green = np.array([50, 50, 30], np.uint8)
upper_green = np.array([90, 255, 255], np.uint8)

lower_yellow = np.array([10, 150, 80], np.uint8)
upper_yellow = np.array([40, 255, 255], np.uint8)

postprocess.py

import cv2
import numpy as np
from shapely.geometry import Polygon
import pyclipper

det_box_thresh = 0.2
min_size = 5
unclip_ratio = 1.5

character_str = ["blank"]
#with open("./data/symbol.txt", "rb") as fin:
with open("./data/en_dict_custom.txt", "rb") as fin:
    lines = fin.readlines()
    for line in lines:
        line = line.decode("utf-8").strip("\n").strip("\r\n")
        character_str.append(line)
character_str.append(" ")
ignored_token = [0]


def ocr_det_postprocess(det_output, original_image, det_input_size):
	outs = cv2.findContours((det_output * 255).astype(np.uint8), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
	if len(outs) == 3:
		contours = outs[1]
	elif len(outs) == 2:
		contours = outs[0]
	
	det_results = []
	for i in range(len(contours)):
		bounding_box = cv2.boundingRect(contours[i])
		if bounding_box[2] < min_size or bounding_box[3] < min_size:
			continue
		
		mask = np.ones((bounding_box[3], bounding_box[2]), dtype=np.uint8)
		tmp_det_output = det_output.reshape(det_input_size[0], det_input_size[1])
		score = cv2.mean(tmp_det_output[bounding_box[1]:bounding_box[1] + bounding_box[3], bounding_box[0]:bounding_box[0] + bounding_box[2]], mask)[0]
		if score < det_box_thresh:
			continue
		
		box = np.array(((bounding_box[0], bounding_box[1]),
                        (bounding_box[0] + bounding_box[2], bounding_box[1]),
                        (bounding_box[0] + bounding_box[2], bounding_box[1] + bounding_box[3]),
                        (bounding_box[0], bounding_box[1] + bounding_box[3])))
        
		poly = Polygon(box)
		distance = poly.area * unclip_ratio / poly.length
		offset = pyclipper.PyclipperOffset()
		offset.AddPath(box, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
		expanded = offset.Execute(distance)
		tmp_box = np.array(expanded)
        
		xmin = max(int(np.min(tmp_box[0, :, 0]) / det_input_size[1] * original_image.shape[1]), 0)
		ymin = max(int(np.min(tmp_box[0, :, 1]) / det_input_size[0] * original_image.shape[0]), 0)
		xmax = min(int(np.max(tmp_box[0, :, 0]) / det_input_size[1] * original_image.shape[1] + 1), original_image.shape[1])
		ymax = min(int(np.max(tmp_box[0, :, 1]) / det_input_size[0] * original_image.shape[0] + 1), original_image.shape[0])
        
		det_results.append([xmin, ymin, xmax, ymax, score, 0])
        
	return det_results

def ocr_rec_postprocess(rec_output):
    rec_idx = rec_output.argmax(axis=1)
    rec_prob = rec_output.max(axis=1)
    
    selection = np.ones(len(rec_idx), dtype=bool)
    selection[1:] = rec_idx[1:] != rec_idx[:-1]
    selection &= rec_idx != ignored_token
    #print(f'rec_idx: {rec_idx}')
    #print(f'selection: {selection}')
    #print(f'rec_output: {rec_output}')
    #print(f'character_str: {character_str}')
    
    
    char_list = [character_str[text_id] for text_id in rec_idx[selection]]
    character_result = "".join(char_list)
    
    return character_result

Finally, i downloaded your env and run it

$ mkdir myenv
$ tar -xzf myenv.tar.gz -C myenv
$ source myenv/bin/activate
$ cd ppocr_test
$ python test.py

Below are the screenshots

The is the dict im using for referencing to the characters
en_dict_custom.txt (196 Bytes)

It seems like the detection is working, but the recognition is not. I’m not sure where went wrong, is it the rec_output_size issue? Or is it somewhere during the conversion went wrong?

I will send you the email for our ppocr det and rec model, our converted onnx model and the converted so and adla model to you via email louis.liu@wesion.com

Louis-Cheng-Liu · May 26, 2025, 9:16am

Hello @JietChoo ,

A bad new. For some layers, its weight is too small.

After quantifying, the weight is zero which causes feature map become zero.

Suggest you modify PaddleOCR/ppocr/optimizer/optimizer.py file all self.weight_decay to zero(I am not sure which optimizer your model use). And then train a new model.

def __init__(
        self, learning_rate, momentum, weight_decay=None, grad_clip=None, **args
    ):
        super(Momentum, self).__init__()
        self.learning_rate = learning_rate
        self.momentum = momentum
-       self.weight_decay = weight_decay
+       self.weight_decay = 0
        self.grad_clip = grad_clip

If quantification model still detects nothing, you have to infer the ONNX model directly on CPU. We will provide a demo for you.

And have your saved a model at each epoch? You can check the previously fitted model that the weight is very small or not. If not, maybe can use.

JietChoo · June 5, 2025, 5:41am

Alright thank you. Will try it out

JietChoo · July 1, 2025, 8:21am

Dear Louis,

Sorry for the late response. We have been busy with other things recently. I have tried your method, however the result is still the same. What else can we do?

Maybe we can try the method of inferring using the ONNX model directly on CPU?

Louis-Cheng-Liu · July 1, 2025, 9:44am

Hello @JietChoo ,

I have reported to the engineer. The ONNX model inference codes will be provided to you this week.