Realtime Text Recognition with VIM4 and IMX415 MIPI Camera

Another example of successful use case using PaddleOCR Code

py tools/infer_rec.py -c configs/rec/PP-OCRv4/en_PP-OCRv4_rec.yml -o Global.pretrained_model=pretrain_models/en_PP-OCRv4_rec_train/best_accuracy Global.infer_img=doc/imgs_en/K.png

K.png

Result

[2025/01/22 17:30:55] ppocr WARNING: Skipping import of the encryption module.
[2025/01/22 17:30:55] ppocr INFO: Architecture :
[2025/01/22 17:30:55] ppocr INFO:     Backbone :
[2025/01/22 17:30:55] ppocr INFO:         name : PPLCNetV3
[2025/01/22 17:30:55] ppocr INFO:         scale : 0.95
[2025/01/22 17:30:55] ppocr INFO:     Head :
[2025/01/22 17:30:55] ppocr INFO:         head_list :
[2025/01/22 17:30:55] ppocr INFO:             CTCHead :
[2025/01/22 17:30:55] ppocr INFO:                 Head :
[2025/01/22 17:30:55] ppocr INFO:                     fc_decay : 1e-05
[2025/01/22 17:30:55] ppocr INFO:                 Neck :
[2025/01/22 17:30:55] ppocr INFO:                     depth : 2
[2025/01/22 17:30:55] ppocr INFO:                     dims : 120
[2025/01/22 17:30:55] ppocr INFO:                     hidden_dims : 120
[2025/01/22 17:30:55] ppocr INFO:                     kernel_size : [1, 3]
[2025/01/22 17:30:55] ppocr INFO:                     name : svtr
[2025/01/22 17:30:55] ppocr INFO:                     use_guide : True
[2025/01/22 17:30:55] ppocr INFO:             NRTRHead :
[2025/01/22 17:30:55] ppocr INFO:                 max_text_length : 25
[2025/01/22 17:30:55] ppocr INFO:                 nrtr_dim : 384
[2025/01/22 17:30:55] ppocr INFO:         name : MultiHead
[2025/01/22 17:30:55] ppocr INFO:     Transform : None
[2025/01/22 17:30:55] ppocr INFO:     algorithm : SVTR_LCNet
[2025/01/22 17:30:55] ppocr INFO:     model_type : rec
[2025/01/22 17:30:55] ppocr INFO: Eval :
[2025/01/22 17:30:55] ppocr INFO:     dataset :
[2025/01/22 17:30:55] ppocr INFO:         data_dir : ./train_data/ic15_data/
[2025/01/22 17:30:55] ppocr INFO:         label_file_list : ['./train_data/ic15_data/rec_gt_test.txt']
[2025/01/22 17:30:55] ppocr INFO:         name : SimpleDataSet
[2025/01/22 17:30:55] ppocr INFO:         transforms :
[2025/01/22 17:30:55] ppocr INFO:             DecodeImage :
[2025/01/22 17:30:55] ppocr INFO:                 channel_first : False
[2025/01/22 17:30:55] ppocr INFO:                 img_mode : BGR
[2025/01/22 17:30:55] ppocr INFO:             MultiLabelEncode :
[2025/01/22 17:30:55] ppocr INFO:                 gtc_encode : NRTRLabelEncode
[2025/01/22 17:30:55] ppocr INFO:             RecResizeImg :
[2025/01/22 17:30:55] ppocr INFO:                 image_shape : [3, 48, 320]
[2025/01/22 17:30:55] ppocr INFO:             KeepKeys :
[2025/01/22 17:30:55] ppocr INFO:                 keep_keys : ['image', 'label_ctc', 'label_gtc', 'length', 'valid_ratio']
[2025/01/22 17:30:55] ppocr INFO:     loader :
[2025/01/22 17:30:55] ppocr INFO:         batch_size_per_card : 62
[2025/01/22 17:30:55] ppocr INFO:         drop_last : False
[2025/01/22 17:30:55] ppocr INFO:         num_workers : 4
[2025/01/22 17:30:55] ppocr INFO:         shuffle : False
[2025/01/22 17:30:55] ppocr INFO: Global :
[2025/01/22 17:30:55] ppocr INFO:     cal_metric_during_train : True
[2025/01/22 17:30:55] ppocr INFO:     character_dict_path : ppocr/utils/en_dict.txt
[2025/01/22 17:30:55] ppocr INFO:     checkpoints : None
[2025/01/22 17:30:55] ppocr INFO:     debug : False
[2025/01/22 17:30:55] ppocr INFO:     distributed : False
[2025/01/22 17:30:55] ppocr INFO:     epoch_num : 50
[2025/01/22 17:30:55] ppocr INFO:     eval_batch_step : [0, 2000]
[2025/01/22 17:30:55] ppocr INFO:     infer_img : doc/imgs_en/K.png
[2025/01/22 17:30:55] ppocr INFO:     infer_mode : False
[2025/01/22 17:30:55] ppocr INFO:     log_smooth_window : 20
[2025/01/22 17:30:55] ppocr INFO:     max_text_length : 25
[2025/01/22 17:30:55] ppocr INFO:     pretrained_model : pretrain_models/en_PP-OCRv4_rec_train/best_accuracy
[2025/01/22 17:30:55] ppocr INFO:     print_batch_step : 10
[2025/01/22 17:30:55] ppocr INFO:     save_epoch_step : 10
[2025/01/22 17:30:55] ppocr INFO:     save_inference_dir : None
[2025/01/22 17:30:55] ppocr INFO:     save_model_dir : ./output/rec_ppocr_v4
[2025/01/22 17:30:55] ppocr INFO:     save_res_path : ./output/rec/predicts_ppocrv3.txt
[2025/01/22 17:30:55] ppocr INFO:     use_gpu : True
[2025/01/22 17:30:55] ppocr INFO:     use_space_char : True
[2025/01/22 17:30:55] ppocr INFO:     use_visualdl : False
[2025/01/22 17:30:55] ppocr INFO: Loss :
[2025/01/22 17:30:55] ppocr INFO:     loss_config_list :
[2025/01/22 17:30:55] ppocr INFO:         CTCLoss : None
[2025/01/22 17:30:55] ppocr INFO:         NRTRLoss : None
[2025/01/22 17:30:55] ppocr INFO:     name : MultiLoss
[2025/01/22 17:30:55] ppocr INFO: Metric :
[2025/01/22 17:30:55] ppocr INFO:     ignore_space : False
[2025/01/22 17:30:55] ppocr INFO:     main_indicator : acc
[2025/01/22 17:30:55] ppocr INFO:     name : RecMetric
[2025/01/22 17:30:55] ppocr INFO: Optimizer :
[2025/01/22 17:30:55] ppocr INFO:     beta1 : 0.9
[2025/01/22 17:30:55] ppocr INFO:     beta2 : 0.999
[2025/01/22 17:30:55] ppocr INFO:     lr :
[2025/01/22 17:30:55] ppocr INFO:         learning_rate : 0.0005
[2025/01/22 17:30:55] ppocr INFO:         name : Cosine
[2025/01/22 17:30:55] ppocr INFO:         warmup_epoch : 5
[2025/01/22 17:30:55] ppocr INFO:     name : Adam
[2025/01/22 17:30:55] ppocr INFO:     regularizer :
[2025/01/22 17:30:55] ppocr INFO:         factor : 3e-05
[2025/01/22 17:30:55] ppocr INFO:         name : L2
[2025/01/22 17:30:55] ppocr INFO: PostProcess :
[2025/01/22 17:30:55] ppocr INFO:     name : CTCLabelDecode
[2025/01/22 17:30:55] ppocr INFO: Train :
[2025/01/22 17:30:55] ppocr INFO:     dataset :
[2025/01/22 17:30:55] ppocr INFO:         data_dir : ./train_data/ic15_data/
[2025/01/22 17:30:55] ppocr INFO:         ds_width : False
[2025/01/22 17:30:55] ppocr INFO:         ext_op_transform_idx : 1
[2025/01/22 17:30:55] ppocr INFO:         label_file_list : ['./train_data/ic15_data/rec_gt_train.txt']
[2025/01/22 17:30:55] ppocr INFO:         name : MultiScaleDataSet
[2025/01/22 17:30:55] ppocr INFO:         transforms :
[2025/01/22 17:30:55] ppocr INFO:             DecodeImage :
[2025/01/22 17:30:55] ppocr INFO:                 channel_first : False
[2025/01/22 17:30:55] ppocr INFO:                 img_mode : BGR
[2025/01/22 17:30:55] ppocr INFO:             RecConAug :
[2025/01/22 17:30:55] ppocr INFO:                 ext_data_num : 2
[2025/01/22 17:30:55] ppocr INFO:                 image_shape : [48, 320, 3]
[2025/01/22 17:30:55] ppocr INFO:                 max_text_length : 25
[2025/01/22 17:30:55] ppocr INFO:                 prob : 0.5
[2025/01/22 17:30:55] ppocr INFO:             RecAug : None
[2025/01/22 17:30:55] ppocr INFO:             MultiLabelEncode :
[2025/01/22 17:30:55] ppocr INFO:                 gtc_encode : NRTRLabelEncode
[2025/01/22 17:30:55] ppocr INFO:             KeepKeys :
[2025/01/22 17:30:55] ppocr INFO:                 keep_keys : ['image', 'label_ctc', 'label_gtc', 'length', 'valid_ratio']
[2025/01/22 17:30:55] ppocr INFO:     loader :
[2025/01/22 17:30:55] ppocr INFO:         batch_size_per_card : 62
[2025/01/22 17:30:55] ppocr INFO:         drop_last : True
[2025/01/22 17:30:55] ppocr INFO:         num_workers : 8
[2025/01/22 17:30:55] ppocr INFO:         shuffle : True
[2025/01/22 17:30:55] ppocr INFO:     sampler :
[2025/01/22 17:30:55] ppocr INFO:         divided_factor : [8, 16]
[2025/01/22 17:30:55] ppocr INFO:         first_bs : 96
[2025/01/22 17:30:55] ppocr INFO:         fix_bs : False
[2025/01/22 17:30:55] ppocr INFO:         is_training : True
[2025/01/22 17:30:55] ppocr INFO:         name : MultiScaleSampler
[2025/01/22 17:30:55] ppocr INFO:         scales : [[320, 32], [320, 48], [320, 64]]
[2025/01/22 17:30:55] ppocr INFO: profiler_options : None
[2025/01/22 17:30:55] ppocr INFO: train with paddle 2.6.1 and device Place(gpu:0)
W0122 17:30:55.334977 53252 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 12.6, Runtime API Version: 11.7
W0122 17:30:55.342484 53252 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9.
[2025/01/22 17:31:02] ppocr INFO: load pretrain successful from pretrain_models/en_PP-OCRv4_rec_train/best_accuracy
[2025/01/22 17:31:02] ppocr INFO: infer_img: doc/imgs_en/K.png
[2025/01/22 17:31:05] ppocr INFO:        result: K      0.9910803437232971
[2025/01/22 17:31:05] ppocr INFO: success!

Hello @JietChoo ,

Your model is detecting EN character and KSNN demo is detecting CN character. Copy this file on VIM4.

Then modify the path to this file in postprocess.py

character_str = ["blank"]
-with open("./data/ppocr_keys_v1.txt", "rb") as fin:
+with open("./data/en_dict.txt", "rb") as fin:
    lines = fin.readlines()
    for line in lines:
        line = line.decode("utf-8").strip("\n").strip("\r\n")
        character_str.append(line)
character_str.append(" ")
ignored_token = [0]

If still error, please provide your paddle model and adla model. We try to reproduce the problem.

Hi I’ve tried it and got this error

Traceback (most recent call last):
  File "/home/khadas/ksnn-vim4-mosen/examples/ppocr/ppocr-cap-mosen.py", line 173, in <module>
    det_results[i][5] = ocr_rec_postprocess(rec_output[0])
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/khadas/ksnn-vim4-mosen/examples/ppocr/postprocess.py", line 69, in ocr_rec_postprocess
    char_list = [character_str[text_id] for text_id in rec_idx[selection]]
                 ~~~~~~~~~~~~~^^^^^^^^^
IndexError: list index out of range

en_dict.txt

0
1
2
3
4
5
6
7
8
9
:
;
<
=
>
?
@
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
[
\
]
^
_
`
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
{
|
}
~
!
"
#
$
%
&
'
(
)
*
+
,
-
.
/

Seems like i cannot attach paddle files and adla file here

Hello @JietChoo ,

You can send an email for me. My email: louis.liu@wesion.com

Hello @JietChoo ,

Oh, I forget to modify the output size.

Modify output shape in ppocr-cap-960-544.py

det_input_size = (544, 960) # (model height, model width)
rec_input_size = ( 48, 320) # (model height, model width)
-rec_output_size = (40, 6625)
+rec_output_size = (40, 97)

Hi I have just sent you an email, however, I cannot send the paddle files as they are too large. Any ways i can send it to you?

Hello @JietChoo ,

Yon can use google drive and send the download link for me.

Hi,

here the file in OneDrive, it’s the training model. Do you need the inference model as well?

ppocr.zip

I’ll just pass it to you anyways

here’s the inference model

https://1drv.ms/u/c/7931b6f36f2554e7/EfwXbNA8B4tJpeilOG9tS7EBKiazaPHE9qxtUQL0tJoGYQ?e=DaPewV

Sorry, just now i sent the embeded link, this updated is correct one

Hello @JietChoo ,

Emmm, I can not open both two links.

Hi Louis, i have shared it to your email. Check whether is it possible to open

Hi Louis, did you receive the files?

Hello @JietChoo ,

The paddle model link still open fail, but from ONNX and adla model i think it is problem for model with low precision. Lower precision model will lose more precision after quantifying.


Is it all training pictures? It is too few. And is the testing pictures same as training pictures? Testing pictures have better not same as training. It will help you judge your model whether Fitting well or not.

Hi Louis,

Yes, the train set and the test set are the same. We train on top of the downloaded training model from PaddleOCR/docs/ppocr/model_list.md at main · PaddlePaddle/PaddleOCR · GitHub , which is the en_PP-OCRv4_rec 训练模型. The en_PP-OCRv4_rec model originally already can recognize words, but when we train on top of that model with our alphabets, doesn’t it make it recognize our words more accurate?

I have tried sharing the model file link again

Or you can try this wetransfer link

Hello @JietChoo ,

Sorry, i judge too hasty. The ONNX model perform much better than int16. The problem maybe occur as converting. But Chinese Spring Festival is approaching, our engineer has taken holiday in advance. This problem will solve after holiday (Feb 5).

Hi Loius,

Alright Noted!

We actually tried training with a whole new set of training set. We have actually took scene photos with all our alphabets, and cropped them. Trained them to a training model, and evaluate it, seems like working fine. This time the train set and test set are different already.

We now converted it to onnx then to adla, it’s still the same, displaying chinese characters. So i assume maybe is the ppocr_keys_v1.txt file in the postprocess.py.

So we did changed and use en_dict.txt in the postprocess.py

character_str = ["blank"]
-with open("./data/ppocr_keys_v1.txt", "rb") as fin:
+with open("./data/en_dict.txt", "rb") as fin:
    lines = fin.readlines()
    for line in lines:
        line = line.decode("utf-8").strip("\n").strip("\r\n")
        character_str.append(line)
character_str.append(" ")
ignored_token = [0]

But got this error and the opencv windows crashes

Traceback (most recent call last):
  File "/home/khadas/ksnn-vim4-mosen/examples/ppocr/ppocr-cap-mosen.py", line 173, in <module>
    det_results[i][5] = ocr_rec_postprocess(rec_output[0])
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/khadas/ksnn-vim4-mosen/examples/ppocr/postprocess.py", line 74, in ocr_rec_postprocess
    char_list = [character_str[text_id] for text_id in rec_idx[selection]]
                 ~~~~~~~~~~~~~^^^^^^^^^
IndexError: list index out of range

By the way, Happy Holidays Loius! We are also going to celebrate Chinese New Year next week here in Malaysia!

新年快乐!恭喜发财!大大吉!:red_gift_envelope::pray:

Hello @JietChoo ,

Have you modify output shape? I infer your adla model that can get right result but with low precision.

Ohh i did not use the 960 544 file, maybe i try it later