Realtime Text Recognition with VIM4 and IMX415 MIPI Camera

JietChoo · January 22, 2025, 9:31am

Another example of successful use case using PaddleOCR Code

py tools/infer_rec.py -c configs/rec/PP-OCRv4/en_PP-OCRv4_rec.yml -o Global.pretrained_model=pretrain_models/en_PP-OCRv4_rec_train/best_accuracy Global.infer_img=doc/imgs_en/K.png

K.png

Result

[2025/01/22 17:30:55] ppocr WARNING: Skipping import of the encryption module.
[2025/01/22 17:30:55] ppocr INFO: Architecture :
[2025/01/22 17:30:55] ppocr INFO:     Backbone :
[2025/01/22 17:30:55] ppocr INFO:         name : PPLCNetV3
[2025/01/22 17:30:55] ppocr INFO:         scale : 0.95
[2025/01/22 17:30:55] ppocr INFO:     Head :
[2025/01/22 17:30:55] ppocr INFO:         head_list :
[2025/01/22 17:30:55] ppocr INFO:             CTCHead :
[2025/01/22 17:30:55] ppocr INFO:                 Head :
[2025/01/22 17:30:55] ppocr INFO:                     fc_decay : 1e-05
[2025/01/22 17:30:55] ppocr INFO:                 Neck :
[2025/01/22 17:30:55] ppocr INFO:                     depth : 2
[2025/01/22 17:30:55] ppocr INFO:                     dims : 120
[2025/01/22 17:30:55] ppocr INFO:                     hidden_dims : 120
[2025/01/22 17:30:55] ppocr INFO:                     kernel_size : [1, 3]
[2025/01/22 17:30:55] ppocr INFO:                     name : svtr
[2025/01/22 17:30:55] ppocr INFO:                     use_guide : True
[2025/01/22 17:30:55] ppocr INFO:             NRTRHead :
[2025/01/22 17:30:55] ppocr INFO:                 max_text_length : 25
[2025/01/22 17:30:55] ppocr INFO:                 nrtr_dim : 384
[2025/01/22 17:30:55] ppocr INFO:         name : MultiHead
[2025/01/22 17:30:55] ppocr INFO:     Transform : None
[2025/01/22 17:30:55] ppocr INFO:     algorithm : SVTR_LCNet
[2025/01/22 17:30:55] ppocr INFO:     model_type : rec
[2025/01/22 17:30:55] ppocr INFO: Eval :
[2025/01/22 17:30:55] ppocr INFO:     dataset :
[2025/01/22 17:30:55] ppocr INFO:         data_dir : ./train_data/ic15_data/
[2025/01/22 17:30:55] ppocr INFO:         label_file_list : ['./train_data/ic15_data/rec_gt_test.txt']
[2025/01/22 17:30:55] ppocr INFO:         name : SimpleDataSet
[2025/01/22 17:30:55] ppocr INFO:         transforms :
[2025/01/22 17:30:55] ppocr INFO:             DecodeImage :
[2025/01/22 17:30:55] ppocr INFO:                 channel_first : False
[2025/01/22 17:30:55] ppocr INFO:                 img_mode : BGR
[2025/01/22 17:30:55] ppocr INFO:             MultiLabelEncode :
[2025/01/22 17:30:55] ppocr INFO:                 gtc_encode : NRTRLabelEncode
[2025/01/22 17:30:55] ppocr INFO:             RecResizeImg :
[2025/01/22 17:30:55] ppocr INFO:                 image_shape : [3, 48, 320]
[2025/01/22 17:30:55] ppocr INFO:             KeepKeys :
[2025/01/22 17:30:55] ppocr INFO:                 keep_keys : ['image', 'label_ctc', 'label_gtc', 'length', 'valid_ratio']
[2025/01/22 17:30:55] ppocr INFO:     loader :
[2025/01/22 17:30:55] ppocr INFO:         batch_size_per_card : 62
[2025/01/22 17:30:55] ppocr INFO:         drop_last : False
[2025/01/22 17:30:55] ppocr INFO:         num_workers : 4
[2025/01/22 17:30:55] ppocr INFO:         shuffle : False
[2025/01/22 17:30:55] ppocr INFO: Global :
[2025/01/22 17:30:55] ppocr INFO:     cal_metric_during_train : True
[2025/01/22 17:30:55] ppocr INFO:     character_dict_path : ppocr/utils/en_dict.txt
[2025/01/22 17:30:55] ppocr INFO:     checkpoints : None
[2025/01/22 17:30:55] ppocr INFO:     debug : False
[2025/01/22 17:30:55] ppocr INFO:     distributed : False
[2025/01/22 17:30:55] ppocr INFO:     epoch_num : 50
[2025/01/22 17:30:55] ppocr INFO:     eval_batch_step : [0, 2000]
[2025/01/22 17:30:55] ppocr INFO:     infer_img : doc/imgs_en/K.png
[2025/01/22 17:30:55] ppocr INFO:     infer_mode : False
[2025/01/22 17:30:55] ppocr INFO:     log_smooth_window : 20
[2025/01/22 17:30:55] ppocr INFO:     max_text_length : 25
[2025/01/22 17:30:55] ppocr INFO:     pretrained_model : pretrain_models/en_PP-OCRv4_rec_train/best_accuracy
[2025/01/22 17:30:55] ppocr INFO:     print_batch_step : 10
[2025/01/22 17:30:55] ppocr INFO:     save_epoch_step : 10
[2025/01/22 17:30:55] ppocr INFO:     save_inference_dir : None
[2025/01/22 17:30:55] ppocr INFO:     save_model_dir : ./output/rec_ppocr_v4
[2025/01/22 17:30:55] ppocr INFO:     save_res_path : ./output/rec/predicts_ppocrv3.txt
[2025/01/22 17:30:55] ppocr INFO:     use_gpu : True
[2025/01/22 17:30:55] ppocr INFO:     use_space_char : True
[2025/01/22 17:30:55] ppocr INFO:     use_visualdl : False
[2025/01/22 17:30:55] ppocr INFO: Loss :
[2025/01/22 17:30:55] ppocr INFO:     loss_config_list :
[2025/01/22 17:30:55] ppocr INFO:         CTCLoss : None
[2025/01/22 17:30:55] ppocr INFO:         NRTRLoss : None
[2025/01/22 17:30:55] ppocr INFO:     name : MultiLoss
[2025/01/22 17:30:55] ppocr INFO: Metric :
[2025/01/22 17:30:55] ppocr INFO:     ignore_space : False
[2025/01/22 17:30:55] ppocr INFO:     main_indicator : acc
[2025/01/22 17:30:55] ppocr INFO:     name : RecMetric
[2025/01/22 17:30:55] ppocr INFO: Optimizer :
[2025/01/22 17:30:55] ppocr INFO:     beta1 : 0.9
[2025/01/22 17:30:55] ppocr INFO:     beta2 : 0.999
[2025/01/22 17:30:55] ppocr INFO:     lr :
[2025/01/22 17:30:55] ppocr INFO:         learning_rate : 0.0005
[2025/01/22 17:30:55] ppocr INFO:         name : Cosine
[2025/01/22 17:30:55] ppocr INFO:         warmup_epoch : 5
[2025/01/22 17:30:55] ppocr INFO:     name : Adam
[2025/01/22 17:30:55] ppocr INFO:     regularizer :
[2025/01/22 17:30:55] ppocr INFO:         factor : 3e-05
[2025/01/22 17:30:55] ppocr INFO:         name : L2
[2025/01/22 17:30:55] ppocr INFO: PostProcess :
[2025/01/22 17:30:55] ppocr INFO:     name : CTCLabelDecode
[2025/01/22 17:30:55] ppocr INFO: Train :
[2025/01/22 17:30:55] ppocr INFO:     dataset :
[2025/01/22 17:30:55] ppocr INFO:         data_dir : ./train_data/ic15_data/
[2025/01/22 17:30:55] ppocr INFO:         ds_width : False
[2025/01/22 17:30:55] ppocr INFO:         ext_op_transform_idx : 1
[2025/01/22 17:30:55] ppocr INFO:         label_file_list : ['./train_data/ic15_data/rec_gt_train.txt']
[2025/01/22 17:30:55] ppocr INFO:         name : MultiScaleDataSet
[2025/01/22 17:30:55] ppocr INFO:         transforms :
[2025/01/22 17:30:55] ppocr INFO:             DecodeImage :
[2025/01/22 17:30:55] ppocr INFO:                 channel_first : False
[2025/01/22 17:30:55] ppocr INFO:                 img_mode : BGR
[2025/01/22 17:30:55] ppocr INFO:             RecConAug :
[2025/01/22 17:30:55] ppocr INFO:                 ext_data_num : 2
[2025/01/22 17:30:55] ppocr INFO:                 image_shape : [48, 320, 3]
[2025/01/22 17:30:55] ppocr INFO:                 max_text_length : 25
[2025/01/22 17:30:55] ppocr INFO:                 prob : 0.5
[2025/01/22 17:30:55] ppocr INFO:             RecAug : None
[2025/01/22 17:30:55] ppocr INFO:             MultiLabelEncode :
[2025/01/22 17:30:55] ppocr INFO:                 gtc_encode : NRTRLabelEncode
[2025/01/22 17:30:55] ppocr INFO:             KeepKeys :
[2025/01/22 17:30:55] ppocr INFO:                 keep_keys : ['image', 'label_ctc', 'label_gtc', 'length', 'valid_ratio']
[2025/01/22 17:30:55] ppocr INFO:     loader :
[2025/01/22 17:30:55] ppocr INFO:         batch_size_per_card : 62
[2025/01/22 17:30:55] ppocr INFO:         drop_last : True
[2025/01/22 17:30:55] ppocr INFO:         num_workers : 8
[2025/01/22 17:30:55] ppocr INFO:         shuffle : True
[2025/01/22 17:30:55] ppocr INFO:     sampler :
[2025/01/22 17:30:55] ppocr INFO:         divided_factor : [8, 16]
[2025/01/22 17:30:55] ppocr INFO:         first_bs : 96
[2025/01/22 17:30:55] ppocr INFO:         fix_bs : False
[2025/01/22 17:30:55] ppocr INFO:         is_training : True
[2025/01/22 17:30:55] ppocr INFO:         name : MultiScaleSampler
[2025/01/22 17:30:55] ppocr INFO:         scales : [[320, 32], [320, 48], [320, 64]]
[2025/01/22 17:30:55] ppocr INFO: profiler_options : None
[2025/01/22 17:30:55] ppocr INFO: train with paddle 2.6.1 and device Place(gpu:0)
W0122 17:30:55.334977 53252 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 12.6, Runtime API Version: 11.7
W0122 17:30:55.342484 53252 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9.
[2025/01/22 17:31:02] ppocr INFO: load pretrain successful from pretrain_models/en_PP-OCRv4_rec_train/best_accuracy
[2025/01/22 17:31:02] ppocr INFO: infer_img: doc/imgs_en/K.png
[2025/01/22 17:31:05] ppocr INFO:        result: K      0.9910803437232971
[2025/01/22 17:31:05] ppocr INFO: success!

Louis-Cheng-Liu · January 22, 2025, 9:43am

Hello @JietChoo ,

Your model is detecting EN character and KSNN demo is detecting CN character. Copy this file on VIM4.

Then modify the path to this file in postprocess.py

character_str = ["blank"]
-with open("./data/ppocr_keys_v1.txt", "rb") as fin:
+with open("./data/en_dict.txt", "rb") as fin:
    lines = fin.readlines()
    for line in lines:
        line = line.decode("utf-8").strip("\n").strip("\r\n")
        character_str.append(line)
character_str.append(" ")
ignored_token = [0]

If still error, please provide your paddle model and adla model. We try to reproduce the problem.

JietChoo · January 22, 2025, 9:49am

Hi I’ve tried it and got this error

Traceback (most recent call last):
  File "/home/khadas/ksnn-vim4-mosen/examples/ppocr/ppocr-cap-mosen.py", line 173, in <module>
    det_results[i][5] = ocr_rec_postprocess(rec_output[0])
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/khadas/ksnn-vim4-mosen/examples/ppocr/postprocess.py", line 69, in ocr_rec_postprocess
    char_list = [character_str[text_id] for text_id in rec_idx[selection]]
                 ~~~~~~~~~~~~~^^^^^^^^^
IndexError: list index out of range

en_dict.txt

0
1
2
3
4
5
6
7
8
9
:
;
<
=
>
?
@
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
[
\
]
^
_
`
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
{
|
}
~
!
"
#
$
%
&
'
(
)
*
+
,
-
.
/

Seems like i cannot attach paddle files and adla file here

Louis-Cheng-Liu · January 22, 2025, 9:53am

Hello @JietChoo ,

You can send an email for me. My email: louis.liu@wesion.com

Louis-Cheng-Liu · January 22, 2025, 9:55am

Hello @JietChoo ,

Oh, I forget to modify the output size.

Modify output shape in ppocr-cap-960-544.py

det_input_size = (544, 960) # (model height, model width)
rec_input_size = ( 48, 320) # (model height, model width)
-rec_output_size = (40, 6625)
+rec_output_size = (40, 97)

JietChoo · January 22, 2025, 10:04am

Hi I have just sent you an email, however, I cannot send the paddle files as they are too large. Any ways i can send it to you?

Louis-Cheng-Liu · January 22, 2025, 10:17am

Hello @JietChoo ,

Yon can use google drive and send the download link for me.

JietChoo · January 22, 2025, 10:21am

Hi,

here the file in OneDrive, it’s the training model. Do you need the inference model as well?

ppocr.zip

JietChoo · January 22, 2025, 10:24am

I’ll just pass it to you anyways

here’s the inference model

https://1drv.ms/u/c/7931b6f36f2554e7/EfwXbNA8B4tJpeilOG9tS7EBKiazaPHE9qxtUQL0tJoGYQ?e=DaPewV

Sorry, just now i sent the embeded link, this updated is correct one

Louis-Cheng-Liu · January 22, 2025, 10:28am

Hello @JietChoo ,

Emmm, I can not open both two links.

JietChoo · January 22, 2025, 10:38am

Hi Louis, i have shared it to your email. Check whether is it possible to open

JietChoo · January 23, 2025, 2:13am

Hi Louis, did you receive the files?

Louis-Cheng-Liu · January 23, 2025, 3:10am

Hello @JietChoo ,

The paddle model link still open fail, but from ONNX and adla model i think it is problem for model with low precision. Lower precision model will lose more precision after quantifying.

Is it all training pictures? It is too few. And is the testing pictures same as training pictures? Testing pictures have better not same as training. It will help you judge your model whether Fitting well or not.

JietChoo · January 23, 2025, 5:46am

Hi Louis,

Yes, the train set and the test set are the same. We train on top of the downloaded training model from PaddleOCR/docs/ppocr/model_list.md at main · PaddlePaddle/PaddleOCR · GitHub , which is the en_PP-OCRv4_rec 训练模型. The en_PP-OCRv4_rec model originally already can recognize words, but when we train on top of that model with our alphabets, doesn’t it make it recognize our words more accurate?

JietChoo · January 23, 2025, 5:50am

I have tried sharing the model file link again

JietChoo · January 23, 2025, 5:58am

Or you can try this wetransfer link

Louis-Cheng-Liu · January 23, 2025, 7:59am

Hello @JietChoo ,

Sorry, i judge too hasty. The ONNX model perform much better than int16. The problem maybe occur as converting. But Chinese Spring Festival is approaching, our engineer has taken holiday in advance. This problem will solve after holiday (Feb 5).

JietChoo · January 23, 2025, 11:06am

Hi Loius,

Alright Noted!

We actually tried training with a whole new set of training set. We have actually took scene photos with all our alphabets, and cropped them. Trained them to a training model, and evaluate it, seems like working fine. This time the train set and test set are different already.

We now converted it to onnx then to adla, it’s still the same, displaying chinese characters. So i assume maybe is the ppocr_keys_v1.txt file in the postprocess.py.

So we did changed and use en_dict.txt in the postprocess.py

character_str = ["blank"]
-with open("./data/ppocr_keys_v1.txt", "rb") as fin:
+with open("./data/en_dict.txt", "rb") as fin:
    lines = fin.readlines()
    for line in lines:
        line = line.decode("utf-8").strip("\n").strip("\r\n")
        character_str.append(line)
character_str.append(" ")
ignored_token = [0]

But got this error and the opencv windows crashes

Traceback (most recent call last):
  File "/home/khadas/ksnn-vim4-mosen/examples/ppocr/ppocr-cap-mosen.py", line 173, in <module>
    det_results[i][5] = ocr_rec_postprocess(rec_output[0])
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/khadas/ksnn-vim4-mosen/examples/ppocr/postprocess.py", line 74, in ocr_rec_postprocess
    char_list = [character_str[text_id] for text_id in rec_idx[selection]]
                 ~~~~~~~~~~~~~^^^^^^^^^
IndexError: list index out of range

By the way, Happy Holidays Loius! We are also going to celebrate Chinese New Year next week here in Malaysia!

新年快乐！恭喜发财！大大吉！

Louis-Cheng-Liu · January 24, 2025, 1:25am

Hello @JietChoo ,

Have you modify output shape? I infer your adla model that can get right result but with low precision.

Louis-Cheng-Liu:

Oh, I forget to modify the output size.

Modify output shape in ppocr-cap-960-544.py
det_input_size = (544, 960) # (model height, model width)
rec_input_size = ( 48, 320) # (model height, model width)
-rec_output_size = (40, 6625)
+rec_output_size = (40, 97)

JietChoo · January 27, 2025, 3:51am

Ohh i did not use the 960 544 file, maybe i try it later