Link Search Menu Expand Document

Low Latency Video Streaming Using VOXL2

Table of contents

  1. Overview
    1. Components of Glass-To-Glass Latency Using Wifi
      1. Typical Data Flow
      2. Sources of Latency on the Transmitter Side (VOXL2)
      3. Sources of Latency on the Receiver Side
    2. How to Measure Camera Pipeline Latency
    3. Optimizing Latency on the Transmitter Side
    4. Optimizing Latency on the Receiver Side
    5. Camera Pipeline Latency in Different Operating Modes
    6. Voxl Camera Server Configuration
      1. How to Check Which RAW Resolutions are Available
      2. IMX412 Operating Modes
      3. Voxl-camera-server.conf When Using Qualcomm ISP
      4. How to Confirm Which Camera Resolution Was Selected by the Pipeline
    7. Suggested Use Case for FPV Wifi Streaming
    8. How To Test End-to-end Latency
      1. Example of Image With Frame Counter Watermark
      2. Example of Streaming Delay Shown Using the Counter Watermark
      3. Display Low Latency H264 or H265 Stream Using a Ubuntu Desktop or Laptop
      4. Use FFplay with MJPEG from Voxl Portal

Overview

Components of Glass-To-Glass Latency Using Wifi

  • glass-to-glass means camera-to-screen, which represents the total end-to-end latency

Typical Data Flow

data-flow

Sources of Latency on the Transmitter Side (VOXL2)

  • frame acquisition
    • frame exposure time (controlled by Auto Exposure algorithm or manual exposure)
    • frame readout time (fixed for a given camera streaming mode, depends on camera, frame size and configured transmission speed)
    • frame exposure and readout (in a rolling shutter camera) happen concurrently in a rolling pattern
  • image processing (RAW to RGB / YUV conversion + all other image processing)
    • Using Qualcomm ISP, or
    • Using ModalAI processing pipeline (MISP)
  • video encoding, typically hardware encoder (H264 or H265)
    • either directly using voxl-camera-server
    • or using voxl-streamer to encode uncompressed frames
  • video packaging
    • for example, RTSP stream, using voxl-streamer
  • wifi transmission
    • latency depends on encoded frame size and communication link speed
    • unreliable link will result in data loss and re-transmission, resulting in extra delay
    • UDP protocol may be used to remove re-transmission latency, but corrupted frames will be lost

Sources of Latency on the Receiver Side

  • reception of data via Wifi (the same latency as the transmission time on the transmitter, do not double count)
  • parsing of encoded packets and decoding the H264 / H265 stream into full image (using either SW or HW decoder)
  • buffering in order to ensure smooth playback (this is not good for latency!)
  • composition / rendering of the screens (if using a Desktop Manager)
  • transmission of the display buffer to the display (HDMI, DisplayPort, etc)
  • buffering on the display side, since the display is typically not frame-synchronized with the Receiver

How to Measure Camera Pipeline Latency

  • use voxl-inspect-cam to inspect the _encoded stream
  • latency figure is the total latency after the frame starts transmitting into VOXL2 (after exposure of the first line for rolling shutter cameras) until the encoded frame is ready
    • this does not include the frame exposure time
    • in order to calculate total TX latency of the center of the image, as an approximation, add 1/2 of the frame exposure time
  • frame timestamp (from voxl-camera-server) is the start of exposure of the first row of the image
  • running VOXL in Performance Mode (voxl-set-cpu-mode perf should reduce the TX latency slightly)

Optimizing Latency on the Transmitter Side

  • use latest camera drivers that reduce the readout time
    • for IMX412 Camera : https://storage.googleapis.com/modalai_public/temp/imx412_test_bins/20250311/imx412_fps_eis_20250311_drivers.zip
  • do not configure high-resolution streams if they are not used
    • this will allow selecting a lower resolution camera mode, reduce readout time
  • set VOXL2 to performance mode : voxl-set-cpu-mode perf
  • if possible, remove any buffering on the video server side (RTSP, etc)

Optimizing Latency on the Receiver Side

  • A custom decoding / rendering pipeline may be required to achieve lowest latency
  • Use hardware-based decoder if available
  • Reduce buffering delay, typically added by decoder / player for smooth playback
  • the Display FPS will affect the total stream latency (higher FPS will result in lower latency)
  • Use a display that supports 120 or 240 FPS and configure your OS to run the display at highest refresh rate
    • running the display at highest possible rate will reduce the buffering time in OS and the display itself
    • there will still be extra frames of delay due to buffering, but the frame duration will be shorter at higher FPS
  • Desktop manager will add 1-2 frame latency in order to compose and render the whole “desktop” before sending to the display
    • if possible, reduce the delay of the desktop manager (e.g. enter full screen mode)

Camera Pipeline Latency in Different Operating Modes

  • IMX412 Camera with latest drivers
  • does not include encoded frame packaging / transmission
  • latency measured voxl-inspect-cam tool and may include small additional overhead
    • voxl-inspect-cam tool subscribes to the camera stream and reports the time between start of frame readout and receiving the (encoded) frame
  • ❗note that increasing source resolution will typically improve the down-scaled image quality, so if the additional delay is not critical, using higher input resolution may be desired
Operating Mode (IN -> OUT)Readout (ms)Processing MISP / ISP (ms)Encoding (ms)Total Latency MISP / ISP (ms)
3840x2160 -> 3840x2160126-10 / 10-1514-1534-38 / 40-42
3840x2160 -> 1920x1080124-6 / 6-87-826-28 / 28-30
3840x2160 -> 1280x720123-4 / 4-64-523-24 / 26-27
1920x1080 -> 1920x108042-3 / 3-44-515-17 / 18-20
1920x1080 -> 1280x72041-2 / 2-34-511-13 / 14-16

Voxl Camera Server Configuration

How to Check Which RAW Resolutions are Available

  • in this example we are using IMX412 drivers dated 20250311
    voxl-camera-server -l
    ...
    ANDROID_SCALER_AVAILABLE_RAW_SIZES:
    These are likely supported by the sensor
    4056 x 3040
    4040 x 3040
    4040 x 3040
    3840 x 2160
    3840 x 2160
    3840 x 2160
    1996 x 1520
    1996 x 1520
    1996 x 1520
    1936 x 1080
    1936 x 1080
    ..
    
  • multiple entries for each resolution are shown because there are different FPS for each resolution, but FPS are not shown
  • you should only request the resolution that is available and check the following table for available FPS:

IMX412 Operating Modes

  • using camera drivers from 20250311 (see link above)
  • note 4040x3040 @ 60 FPS is not stable, use 58 or lower (fix is coming)
   - 4056x3040 @ 30                     16.5ms readout time
   - 4040x3040 @ 30, **60**             16.5ms readout time
   - 3840x2160 @ 30, 60, 80             11.8ms readout time
   - 1996x1520 @ 30, 60, 120            5.5ms readout time
   - 1936x1080 @ 30, 60, 90, 120, 240   4.0ms readout time
   - 1996x480  @ 30, 480                1.8ms readout time *** experimental ***
   - 1996x240  @ 30, 800                0.9 - 1.0 ms readout time *** not always stable at 800 FPS ***

Voxl-camera-server.conf When Using Qualcomm ISP

  • using latest camera drivers with Qualcomm ISP option is not a must
  • however, it is still important to track the input resolution vs output
  • input resolution (preview_width) may need to be adjusted to 1920, depending on the camera driver used
  • using older drivers will result in slightly slower readout times (about 25% slower, which is not critical)
  • enabling en_raw_preview will enforce the specific camera mode, even if the preview stream is not actually consumed at run time
    • en_preview and en_raw_preview may be disabled if you want to just let the camera pipeline pick the best quality mode for the small_encoded video
  • typical voxl_camera_server.conf settings:
- fps: 60
- en_preview : true
- en_raw_preview : true
- en_misp : false
- preview_width : 1936 (or 1920)
- preview_height : 1080
- ae_mode : isp
- en_small_video : true
- en_large_video : false
- en_snapshot : false
- small_video_width : 1280
- small_video_height : 720
- small_venc_mode : h265
- small_venc_br_ctrl : cbr
- small_venc_mbps : 1.0
- small_venc_nPframes : 29

How to Confirm Which Camera Resolution Was Selected by the Pipeline

  • double check in /vendor/etc/camera/camxoverridesettings.txt
    • maxRAWSizes=20 : this is needed to allow camera pipeline to use all available RAW resolutions
    • systemLogEnable=1 : this is needed to see the selected resolution in the debug message. System log can be disabled after testing, will slightly reduce cpu usage. this is logging from camera pipeline that goes to logcat
  • run logcat | grep -i selected in one terminal on VOXL2
  • start voxl-camera-server in another terminal on VOXL2
  • the first terminal will print:
    03-19 22:22:35.774 27237 27237 I CHIUSECASE: [CONFIG ] chxextensionmodule.cpp:2358 InitializeOverrideSession() Session_parameters FPS range 60:60, previewFPS 0, videoFPS 0 BatchSize: 1 HALOutputBufferCombined 0 FPS: 60 SkipPattern: 2, cameraId = 2 selected use case = 1
    03-19 22:22:35.778 27237 27237 I CHIUSECASE: [CONFIG ] chxsensorselectmode.cpp:635 FindBestSensorMode() Selected Usecase: 7, SelectedMode W=1936, H=1080, FPS:60, NumBatchedFrames:0, modeIndex:10
    03-19 22:22:35.778 27237 27237 I CHIUSECASE: [CONFIG ] chxpipeline.cpp:371 CreateDescriptor() Pipeline[PreviewRaw] Pipeline pointer 0x55b9f62cf0 Selected sensor Mode W=1936, H=1080
    03-19 22:22:35.778 27237 27237 I CamX    : [CONFIG][CORE   ] camxsession.cpp:5907 SetRealtimePipeline() Session 0x55ba1299d0 Pipeline[PreviewRaw] Selected Sensor Mode W=1936  H=1080
    
  • pay attention to the resolution listed as Selected Sensor Mode

Suggested Use Case for FPV Wifi Streaming

  • IMX412 Camera is recommended for best low latency performance
  • output resolution 1280x720, down-scaled during processing
  • binned camera resolution, such as 1920x1080. Use 1936x1080 for compatibility with MISP (1936x1080 input, 1280x720 output)
  • h265 encoder, CBR at 0.5 - 1.5 Mbps; number of P frames: 29 or 59 (or choose lower if less dropout is needed in case of frame drop)
  • use 60FPS camera frame rate
  • ideally, use UDP-based transmission protocol (not discussed here)
  • for even lower latency, a lower resolution may be used, such as 960x540

How To Test End-to-end Latency

  • In order to eliminate measurement variability due to exposure length, use manual exposure and set to a small value (1-2ms)
  • An approximate measurement can be done by pointing the camera at a millisecond timer (using a phone or laptop screen) and taking a picture of the timer and rendered screen of the streamed video (side by side)
    • the difference in the time is the total latency
    • note that refresh rate of phone displays can be variable
  • Ideal test would involve a hardware timer that updates at a very fast rate, such as 1ms (instead of a timer that is rendered on a screen)
  • A delay in number of frames can be measured by embedding a frame count or a timestamp directly into the transmitted image
    • this measurement will have a granularity of one frame
    • How to enable frame number watermark (works only with ISP pipeline)
      #add the following to /vendor/etc/camera/camxoverridesettings.txt and restart voxl-camera-server
      watermarkImage=TRUE
      watermarkOffset=20x20
      forceDisableUBWCOnIfeIpeLink=TRUE
      

Example of Image With Frame Counter Watermark

watermark-example

Example of Streaming Delay Shown Using the Counter Watermark

  • using voxl-portal to display the YUV stream, which is transmitted encoded as MJPEG
  • in this example, the delay between 5-7 frames (probably 6 on average), at 60FPS, 6 * 16.666 = 100ms
  • note that MJPEG encoding is slower than H264 / H265 and Display used is running at 60Hz
  • however, in this case we are bypassing any buffering done by rtsp server and client because voxl-portal renders MJPEG instantly watermark-delay

Display Low Latency H264 or H265 Stream Using a Ubuntu Desktop or Laptop

  • use the ISP pipeline in this example (not MISP)
  • start voxl-camera-server on voxl2
  • start voxl-streamer on voxl2:
    voxl-streamer -i hires_small_encoded
    
  • using ffmpeg / ffplay
    ffplay -fflags nobuffer -flags low_delay rtsp://<voxl-ip>:8900/live
    
  • testing of alternate ffplay flags may be needed
  • ffplay is not robust to packet drops, may result in video freezing
  • achieving low latency using h264 / h265 requires experimentation with network transport and HW and SW selection and tuning on the receiver side

Use FFplay with MJPEG from Voxl Portal

ffplay -fflags nobuffer -flags low_delay http://localhost:8080/video_raw/hires_small_color -f mjpeg -framerate 60
  • in this case, we have set up adb port forwarding, so we can access the mjpeg stream via ADB for testing
  • alternatively, replace the ip address with VOXL2 ip