Clpeak benchmark cannot work correctly

Which Khadas SBC do you use?

Khadas Vim3 Pro

Which system do you use? Android, Ubuntu, OOWOW or others?

Ubuntu

Which version of system do you use? Khadas official images, self built images, or others?

Khadas official images

Please describe your issue below:

I want to use clpeak benchmark to profile the mali GPU. But it seems that the output numbers of clpeak is not accurate. It cannot recognize ARM Mali G52 GPU device. And the opencl driver version is 3.0 while that shown in clinfo is 2.0. Does anyone help resolve this issue? Thanks very much!

Post a console log of your issue below:

clpeak output:

khadas@Khadas:~/clpeak/build$ ./clpeak
Platform: Vivante OpenCL Platform
  Device: Vivante OpenCL Device VIPNano-QI.7120.0000
    Driver version  : OpenCL 3.0 V6.4.8.7.415784 (Linux ARM64)
    Compute units   : 1
    Clock frequency : 800 MHz

    Global memory bandwidth (GBPS)
      float   : 1.06
      float2  : 2.04
      float4  : 3.90
      float8  : 3.89
      float16 : 4.21

    Single-precision compute (GFLOPS)
      float   : 0.75
      float2  : 1.42
      float4  : 2.90
      float8  : 2.97
      float16 : 3.15

    Half-precision compute (GFLOPS)
      half   : 1.25
      half2  : 2.61
      half4  : 5.20
      half8  : 5.53
      half16 : 3.15

    No double precision support! Skipped

    Integer compute (GIOPS)
      int   : 1.42
      int2  : 1.50
      int4  : 1.59
      int8  : 1.58
      int16 : 1.58

    Integer compute Fast 24bit (GIOPS)
      int   : 1.42
      int2  : 1.50
      int4  : 1.59
      int8  : 1.58
      int16 : 1.58

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer              : 4.24
      enqueueReadBuffer               : 0.57
      enqueueWriteBuffer non-blocking : 4.25
      enqueueReadBuffer non-blocking  : 0.56
      enqueueMapBuffer(for read)      : 681.65
        memcpy from mapped ptr        : 0.56
      enqueueUnmap(after write)       : 53.85
        memcpy to mapped ptr          : 4.29

    Kernel launch latency : 160.39 us

clinfo output:

khadas@Khadas:~/clpeak/build$ clinfo
Number of platforms                               1
  Platform Name                                   ARM Platform
  Platform Vendor                                 ARM
  Platform Version                                OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16
cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_subgroups cl_khr_create_command_queue cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory cl_arm_shared_virtual_memory
  Platform Extensions function suffix             ARM

  Platform Name                                   ARM Platform
Number of devices                                 1
  Device Name                                     Mali-G52
  Device Vendor                                   ARM
  Device Vendor ID                                0x72120000
  Device Version                                  OpenCL 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58
  Driver Version                                  2.0
  Device OpenCL C Version                         OpenCL C 2.0 git.c8adbf9.122c9daed32dbba4b3056f41a2f23c58
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               2
  Max clock frequency                             750MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
    Supported affinity domains                    (n/a)

@Yujie_Zhang I’m able to get proper output

but from your data it’s seen to be accessing the NPU (Verisillicon Vivante) ?
could you give other information such as kernel version, etc.

Thanks for your response @Electr1. The kernel version of my used system is listed as follows.

khadas@Khadas:~/clpeak/build$ uname -a
Linux Khadas 4.9.241 #4 SMP PREEMPT Thu Dec 29 04:03:11 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux

khadas@Khadas:~/clpeak/build$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.5 LTS
Release:        20.04
Codename:       focal

I also thought the clpeak was testing NPU, but I did not make any modifications to the source code (GitHub - krrishnarraj/clpeak: A tool which profiles OpenCL devices to find their peak capacities). It is weird.

@Yujie_Zhang I just checked, my setup is with debian. And it doesn’t have that issue but others who have used with Ubuntu have noticed this issue in previous threads.

Example thread of similar behaviour:

Seems the cl driver for npu is getting probed somehow.
I’m unable to test with my device but could you could try

clinfo -l

This can help you identify what is the platform ID of the GPU use during normal operation or perhaps when selecting platform use the CL_DEVICE_TYPE_GPU attribute. It should filter out the NPU from being accessible.

@Electr1 I tried “clinfo -l” and got the following output. It seems that only the GPU platform is listed.

khadas@Khadas:~/clpeak/build$ clinfo -l
Platform #0: ARM Platform
 `-- Device #0: Mali-G52
khadas@Khadas:~/clpeak/build$ ./clpeak

Platform: Vivante OpenCL Platform
  Device: Vivante OpenCL Device VIPNano-QI.7120.0000
    Driver version  : OpenCL 3.0 V6.4.8.7.415784 (Linux ARM64)
    Compute units   : 1
    Clock frequency : 800 MHz

    Global memory bandwidth (GBPS)
      float   : 1.06
      float2  : 2.04
      float4  : 3.90
      float8  : 3.89
      float16 : 4.21

    Single-precision compute (GFLOPS)
      float   : 0.75
      float2  : 1.42
      float4  : 2.90
      float8  : 2.97
      float16 : 3.15

    Half-precision compute (GFLOPS)
      half   : 1.25
      half2  : 2.61
      half4  : 5.20
      half8  : 5.53
      half16 : 3.15

    No double precision support! Skipped

    Integer compute (GIOPS)
      int   : 1.42
      int2  : 1.50
      int4  : 1.59
      int8  : 1.58
      int16 : 1.58

    Integer compute Fast 24bit (GIOPS)
      int   : 1.42
      int2  : 1.50
      int4  : 1.59
      int8  : 1.58
      int16 : 1.58

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer              : 4.39
      enqueueReadBuffer               : 0.52
      enqueueWriteBuffer non-blocking : 4.41
      enqueueReadBuffer non-blocking  : 0.52
      enqueueMapBuffer(for read)      : 758.72

Based on the example thread of similar behaviour, clpeak on my side did test NPU. Does the problem arise from the setup of the clpeak? Could you help share your setup with debian?

Okay, I have just checked by installing Ubuntu on another device.

Similar replication of issue.

My debian setup is something like this:

(ignore change of username, there is no other customizations involved)

update: Interesing find, the NPU driver is binded to galcore module, and on debian, dropping the kernel module doesn’t affect the gpu operations what so ever and clpeak/clinfo works as intended, on Ubuntu dropping galcore module stops gpu operation, so I’m not exactly aware of the software differences.

@Electr1 Thanks for your response. It is weird. What hardware did you use for Debian setup, Khadas Vim3 Basic or Pro? If it is the Pro version, could you help provide the clpeak output for reference?

Ubuntu was on vim3 pro and Debian on vim3 basic, but that will not be responsible for change in any outputs (memory mapping for GPU is full system ram) so that is the only thing that will change if using basic or pro.

Reference image is available in my initial message of this thread.

Cheers

@Electr1 Thanks; I also thought the basic or pro version would not be the cause. Does the pro or basic version affect the clpeak output due to different DRAM sizes?

@Yujie_Zhang only change is how much memory your cl program can allocate. with pro you can use 2GB more memory, that’s all.

@Electr1 Then it will not affect the clpeak benchmark results, right?

@Yujie_Zhang yes, it is not affected.