VIM3 face_landmark_68_88.nb model detect 68 landmarks but has a null face detection bounding box

Summary

Build the face 68p landmark detection code

  1. Code is based on the example code from ​https://gitlab.com/khadas/aml_npu_nnsdk_app/person_detect_640x384
    git clone https://gitlab.com/khadas/aml_npu_nnsdk_app && cd aml_npu_nnsdk_app

  2. Create a new example for face landmark 68p detection that is based on the person_detect_640x384 code

  3. Notable parts of the 68p landmark detection code:
    aml_config config;
    … … …
    config.path = face_landmark_68_88.nb
    config.modelType = TENSORFLOW;
    void *context = aml_module_create(&config);

    … … …

    aml_output_config_t outconfig;
    outconfig.mdType = FACE_LANDMARK_68;

    … … …

    typedef struct __nn_face_landmark_68
    {
    unsigned int detNum;
    detBox facebox[MAX_DETECT_NUM];
    point_t pos[MAX_DETECT_NUM][68];
    }face_landmark68_out_t;

    const face_landmark68_out_t *pout = (face_landmark68_out_t *)aml_module_output_get(context, outconfig);

Run the face 68p landmark detection code

  1. Run the face_landmark_68_88.nb model with a 60x60x3 picture
    ./main.exe face_landmark_68_88.nb 4 60x60_face.png
  2. Input picture of a 60x60 face:
    http://gals-linux-01:8000/noveto/attachment/wiki/SOC/KhadasVIM3/NPU/AMLogic/060x060_face.png
  3. Output picture of a 60x60 face with the 68p landmarks (drawing of the x,y of the position points as a percentage of the 60x60 face image):
    http://gals-linux-01:8000/noveto/attachment/wiki/SOC/KhadasVIM3/NPU/AMLogic/060x060_face_output.bmp
  4. One face is detected with 68p landmark positions
    Info: Num of detected face landmark(s): 1
  5. However, the face detection bounding box output (facebox[0] from the output of type face_landmark68_out_t *) has a null dimension:
    Info: detection number 0:
    i:0 x:0 w:0 y:0 h:0 score:0
    Error: bounding box has zero size!
  6. Note: size of picture is 60 x 60 - however, some detected position points has X an Y values as-high-as 71:
    point # 15: (71.185448,19.491253)
  7. The detected 68 position points (seem to be O.K.) are:
    point # 0: (0.000000,7.062048)
    point # 1: (0.000000,11.581758)
    point # 2: (0.000000,23.446001)
    point # 3: (0.000000,35.027760)
    point # 4: (0.000000,45.762074)
    point # 5: (0.000000,55.366459)
    point # 6: (7.909494,61.863541)
    point # 7: (16.666433,71.467926)
    point # 8: (27.118265,69.490555)
    point # 9: (36.157688,70.055519)
    point # 10: (48.869373,62.710991)
    point # 11: (55.083977,58.756241)
    point # 12: (61.581059,45.762074)
    point # 13: (64.970840,37.570095)
    point # 14: (68.360626,28.530676)
    point # 15: (71.750404,19.773735)
    point # 16: (70.337997,10.169350)
    point # 17: (0.282482,8.191976)
    point # 18: (10.734313,10.451831)
    point # 19: (16.383951,12.146723)
    point # 20: (21.468626,12.429205)
    point # 21: (29.943083,14.124096)
    point # 22: (35.875202,13.559133)
    point # 23: (42.937252,9.886868)
    point # 24: (49.151855,10.734313)
    point # 25: (59.321205,11.016795)
    point # 26: (66.383247,11.581758)
    point # 27: (33.332867,18.078844)
    point # 28: (35.592724,20.903662)
    point # 29: (32.767902,27.118265)
    point # 30: (33.897831,34.462795)
    point # 31: (22.598555,37.852577)
    point # 32: (27.118265,39.829952)
    point # 33: (29.943083,39.829952)
    point # 34: (33.615349,38.700024)
    point # 35: (39.829952,37.570095)
    point # 36: (7.627012,12.146723)
    point # 37: (11.864241,10.451831)
    point # 38: (19.208771,13.841615)
    point # 39: (22.598555,14.406578)
    point # 40: (17.796362,16.101471)
    point # 41: (13.276650,16.101471)
    point # 42: (39.547470,14.971541)
    point # 43: (46.327034,14.124096)
    point # 44: (50.846748,10.451831)
    point # 45: (56.778866,12.711687)
    point # 46: (49.434338,14.971541)
    point # 47: (46.044556,14.971541)
    point # 48: (13.559133,47.739449)
    point # 49: (19.773735,44.914627)
    point # 50: (27.118265,44.914627)
    point # 51: (29.943083,45.197109)
    point # 52: (34.745277,44.914627)
    point # 53: (41.524845,42.089806)
    point # 54: (46.609520,47.174480)
    point # 55: (41.807323,50.846748)
    point # 56: (38.135063,49.151855)
    point # 57: (30.225567,53.106602)
    point # 58: (24.575928,51.129230)
    point # 59: (19.208771,50.281784)
    point # 60: (15.536507,48.586891)
    point # 61: (27.118265,47.174480)
    point # 62: (29.095638,49.434338)
    point # 63: (34.745277,49.716820)
    point # 64: (46.044556,46.609520)
    point # 65: (36.440170,47.174480)
    point # 66: (30.508049,46.609520)
    point # 67: (28.813156,48.021927)

Add the missing input and output pictures above:
Input picture with a face:
060x060_face
Output picture with the detected landmarks:
060x060_face_output

@GalShalifNoveto This should be related to the scale of post-processing. Your shrinking ratio should be incorrect. You can try to scale up or down your post-processed data.

At the same time, I do not recommend that you do it based on my demo. The post-processing of each model is different. You can refer to my resize and restore resize process.

Hello Frank,

Thanks for your help.
However, the problem seem to be with the model and not with the pre-processing of the input picture because one face is detected (by the model) even when the input picture has no face at all:

Run the face_landmark_68_88.nb model with a 60x60x3 picture withot a face:

./main.exe face_landmark_68_88.nb 4 060x060_no_face.png

Input picture has no face, however, one face is detected by the model

Info: Num of detected face landmark(s): 1

Reference:

The input picture 060x060_no_face.png without a face:
060x060_no_face

The output picture (without a face) with the 68p landmarks (drawing of the x,y of the position points as a percentage of input picture):
060x060_no_face_output_with_landmarks_img

@GalShalifNoveto I am not sure . But my idea is that if there is no lower accuracy threshold setting after outputting the data, even if there is no face, there will still be 68 points of data output. This is just a post-processing problem and has nothing to do with the model itself.

If you think the data of this face model is not accurate, you can also train a model yourself and replace it.