Does the vim3 hardware encoder support multi-stream input?

I have started working on VIM3(A331D).
For my project, I plan on using 2 mipi-csi cameras encoded to the H265/H264 video stream and push to remote server.
Here is my question below:

  1. Does the hardware encoder support 2 stream inputs? Do I need to modify the source code?
  2. In the 2 cam input case (let’s assume the pixel size is 1280x800), Do I get a single images with 1280x800x2, or 2 separate images of each size 1280x800?

The current idea is to combine the two images into one image first(1280x800x2), and then obtain the video stream through hardware encoding, which can avoid the hardware encoder not supporting two video stream inputs.

Plan to work on linux4.9, Any tips appreciated.