Link Search Menu Expand Document

Deep Learning on VOXL

VOXL2 aims to deliver state-of-the-art machine learning performance through both GPU and NPU acceleration on the QRB5165. This is best illustrated through voxl-tflite-server, our server which demonstrates great capacity for on-device inference of TensorFlow Lite models.



voxl-tflite-server includes a helper script, voxl-configure-tflite, which will walk you through the available default configurations depending on your platform. This script will generate a config file at /etc/modalai/voxl-tflite-server.conf that will be used on initialization.

Available Params:

  • skip_n_frames - how many frames to skip between processed frames. For 30Hz input frame rate, we recommend skipping 5 frame resulting in 5hz model output. For 30Hz/maximum output, set to 0.
  • model - which model to use.
  • input_pipe - which camera to use (tracking, hires, or stereo).
  • delegate - optional hardware acceleration. If the selection is invalid for the current model/hardware, will quietly fall back to base cpu delegate.

Voxl 2 Additional Params:

  • allow_multiple - remove process handling and allow multiple instances of voxl-tflite-server to run. Enables the ability to run multiples models simultaneously.
  • output_pipe_prefix - if allow_multiple is set, create output pipes using default names (tflite, tflite_data) with added prefix. ONLY USED IF allow_multiple is set to true.


The output stream is setup as a normal camera pipe under /run/mpa/tflite. As such, it can be viewed with voxl-portal, converted to ROS with voxl_mpa_to_ros, and logged to disk with voxl-logger.

Each model will provide a different custom overlay, depending on the task it is doing. The overlays will all contain an fps counter + inference timer in the top left.


When running an object detection model, you will also see the pipe /run/mpa/tflite_data. This output pipe will provide some metadata about each object detected by the model. The format of this metadata is as follows:

// struct containing all relevant metadata to a tflite object detection
typedef struct ai_detection_t {
    uint32_t magic_number;
    int64_t timestamp_ns;
    uint32_t class_id;
    int32_t  frame_id;
    char class_name[BUF_LEN];
    char cam[BUF_LEN];
    float class_confidence;
    float detection_confidence;
    float x_min;
    float y_min;
    float x_max;
    float y_max;
} __attribute__((packed)) ai_detection_t;

The tflite_data output is also compatible with voxl_mpa_to_ros, or can be subscribed to with a custom MPA client of your own.


Stats were collected across 5000 inferences, using hires [640x480x3] camera as input.


ModelTaskAvg Cpu Inference(ms)Avg Gpu Inference(ms)Max Frames Per Second(fps)Input DimensionsSource
MobileNetV2-SSDliteObject Detection127.78ms21.82ms37.28560776[1,300,300,3]link
MobileNetV1-SSDObject Detection75.48ms64.40ms14.619883041[1,300,300,3]link


ModelTaskAvg Cpu Inference(ms)Avg Gpu Inference(ms)Avg NNAPI Inference(ms)Max Frames Per Second(fps)Input DimensionsSource
MobileNetV2-SSDliteObject Detection33.89ms24.68ms34.42ms34.86750349[1,300,300,3]link
EfficientNet Lite4Classifier115.30ms24.74ms16.42ms48.97159647[1,300,300,3]link
FastDepthMonocular Depth37.34ms18.00ms37.32ms45.45454546[1,320,320,3]link
Movenet SinglePose LightningPose Estimation24.58ms28.49ms24.61ms34.98950315[1,192,192,3]link
YoloV5Object Detection88.49ms23.37ms83.87ms36.53635367[1,320,320,3]link
MobileNetV1-SSDObject Detection19.56ms21.35ms7.72ms85.324232082[1,300,300,3]link

Custom Models

voxl-tflite-server is intended to serve as an example for how one can use libmodal-pipe to ingest images and use a trained model to perform inference. It is not a fully fleshed-out ML library with extensive support for custom models. This is because the central logic for passing the input tensor and parsing the output tensor are specific to the model being trained and it would take a significant amount of time to build a library to to so.


With that in mind, you can generally get a custom model to work with the existing voxl-tflite-server code if the following are true:

  • Your pipeline uses the same tensor preprocessing as voxl-tflite-server (see here).
  • Your pipeline uses the same tensor postprocessing as one of the provided model types (see here).
  • The output information you’d like to pass downstream is representable in an ai_detection_t (see here).
  • You’re able to quantize your model (described below).
  • When quantizing your model, you use a compatible opset for your platform. For VOXL, this server can run any Tensorflow lite model using the v2.2.3 supported opsets(TFLITE_BUILTINS and custom opsets). See the Tensorflow Guide for more information on opsets. VOXL 2 is dependent on the v2.8.0 opset.

If any of the above are not true, then you will likely need to create a custom module in order to run your custom module. If there’s just a small difference between what’s provided and what you’d like to do, you should likely just fork voxl-tflite-server, make the necessary changes, and build and deploy the updated version using the instructions provided in the README. If what you’d like to do is entirely different, you can and should still leverage the logic in voxl-tflite-server for the core tasks of reading in images from a pipe, casting a model to a hardware-accelerated delegate, and publishing the results.

The below setps outline a general process for getting your custom model (which meets the above requirements) loaded into voxl-tflite-server.

Post-Training Quantization

To deploy a custom model on VOXL 2, first start by validating your model on your desktop. Use Google’s Guide to learn how to develop a TensorFlow Lite model.

Once you have a model that is ready to convert to tflite format, it is necessary to do some post-training quantization. This process will enable hardware acceleration on the gpu and ensure it is compatible with the voxl-tflite-server.

A simple python script can be used for this process. Below is an example that will convert the frozen inference graph of a TensorFlow object detection model to a tflite model using our standard options (float16 quantization for gpu target):

# i.e., pip install tensorflow==2.2.3
import tensorflow as tf

# if you have a saved model and not a frozen graph, see: 
# tf.compat.v1.lite.TFLiteConverter.from_saved_model()

# please check these by opening up your frozen graph/saved model in a tool like netron
converter =  tf.compat.v1.lite.TFLiteConverter.from_frozen_graph(
  graph_def_file = '/path/to/.pb/file/tflite_graph.pb', 
  input_arrays = ['normalized_input_image_tensor'],
  input_shapes={'normalized_input_image_tensor': [1,300,300,3]},
  output_arrays = ['TFLite_Detection_PostProcess', 'TFLite_Detection_PostProcess:1', 'TFLite_Detection_PostProcess:2', 'TFLite_Detection_PostProcess:3'] 

converter.use_experimental_new_converter = True
converter.allow_custom_ops = True
converter.target_spec.supported_types = [tf.float16]

tflite_model = converter.convert()
with'model_converted.tflite', 'wb') as f:

If you are using Voxl 2, the conversion process is the same but you can use the current version of the converter api and Tensorflow v2.8.0.

Note that there are various delegates available depending on your platform, but they each require different quantization types for optimal performance (i.e. when targeting the gpu, we typically use floating point models, but for the npu we use integer models).

Implementing your Model in voxl-tflite-server

Once you have a model ready, voxl-tflite-server is ready for integration via the InferenceHelper class. If your model has the same pre/post-processing as one of the already enabled models, setup is fairly straightforward. voxl-tflite-server does use exact string matches to assign postprocessing functions and so you will need to move your model to /usr/bin/dnn/ on your VOXL and change the name to be the same as correspondent model (see here for more info). You will also need to do this for the labels file for your model.

Otherwise, you will likely just need to write your own postprocessing function. There are a few examples provided, each with a slightly different use case and different output tensor to handle. If you are unsure of the output format for your specific model, tools like Netron can be extremely useful for determining input/output specifications. Another great resource is the TFlite examples page, which describes some of the most common tensorflow lite tasks and how to implement them. Then you can use the string compare logic mentioned above to pass your model through.

The inference_worker function will likely need to be updated as well as this is where the models are processed.

Running Multiple Models

On VOXL 2, we expose some extra parameters to enable running multiple instances of voxl-tflite-server simultaneously. Doing so will require the following configuration file modifications:

  • first set allow_multiple to true
  • then, to prevent overwriting pipe names, set the output_pipe_prefix to something unique (relative to the other instances of the tflite-server you will be running)
  • finally, you can start the server via a custom service file (see the base file for reference here) or via shell as normal
  • before starting another instance, update the output_pipe_prefix to prevent overwriting, and continue as before.

NOTE: The board will heat up much quicker when running multiple models. Adding an external fan if not in flight can help combat the extra heat.



Source code is available on Gitlab

Frequently Asked Questions (FAQ)

I want to use my Python script to pre/post-process inputs/outputs to the model, how can I do this? The inputs and outputs for the model are both done using libmodal-pipe which only supports C/C++ and so there isn’t a way to directly interface with Python. However, it’s possible to use libmodal-pipe to create a very simple wrapper which consumes from a pipe, runs a Python executable, and then passes information along to another pipe. In the libmodal-pipe repo check out the examples dir for an example of setting up a basic pipe interface on VOXL.

Why is my model is predicting less accurately when deployed on VOXL? If you’re passing it to voxl-tflite-server, some preprocessing and postprocessing is done on the image tensors which may differ from what you’ve done locally. Quantization can also reduce performance over a non-quantized model.