How to Convert Your Custom Model into TensorRT | by Ivan Ralašić | Jun, 2022

By Jessie Hobb On Jun 17, 2022

ML TIPS & TRICKS / TPAT

Ease your TensorRT pain with TPAT

If you’ve ever worked with TensorRT, then you probably faced an error similar to this one, right?

[E] [TRT] UffParser: Validator error: resize/ResizeNearestNeighbor: Unsupported operation _ResizeNearestNeighbor

In this blog, we’ll show you how to convert your model with custom operators into TensorRT and how to avoid these errors!

TPAT implements the automatic generation of TensorRT plug-ins, and the deployment of TensorRT models can be streamlined and no longer requires manual interventions.

The only inputs that TPAT requires are the ONNX model and name mapping for the custom operators. The TPAT optimization process is based on the TVM deep learning compiler, which performs auto-tuning on fixed-shape operators, and automatically generates high-performance CUDA Kernel. Necessary CUDA kernel and runtime parameters are written in the TensorRT plugin template and used to generate a dynamic link library, which can be directly loaded into TensorRT to run.

TPAT is really a fantastic tool since it offers the following benefits over handwritten plugins and native TensorRT operators:

⦁ improved operator coverage: supports all operators of ONNX, Tensorflow, and PyTorch

List of TPAT supported operators (source: https://github.com/Tencent/TPAT/blob/main/docs/Operators.md)

⦁ full automation: end-to-end fully automatic generation of user-specified TensorRT plugins

⦁ high performance: the performance of most operators exceeds that of the handwritten or original TensorRT plugins

TPAT vs. handwritten plugins performance comparison (source: https://github.com/Tencent/TPAT/blob/main/docs/Compare_handwritten.md)

TPAT vs. native TensorRT plugins performance comparison (source: https://github.com/Tencent/TPAT/blob/main/docs/Optimize_TensorRT.md)

In order to optimize your model using TPAT and TensorRT, and to run it on NVIDIA Jetson AGX Xavier, you should use the following Dockerfile instead of the one contained in the TPAT repo to successfully build the TPAT Docker image.

FROM nvcr.io/nvidia/l4t-tensorflow:r32.4.4-tf1.15-py3RUN apt-get update && apt-get install build-essential cmake -yRUN wget -O "clang+llvm-9.0.1-aarch64-linux-gnu.tar.xz" https://github.com/llvm/llvm-project/releases/download/llvmorg-9.0.1/clang+llvm-9.0.1-aarch64-linux-gnu.tar.xz \&& tar -xvf clang+llvm-9.0.1-aarch64-linux-gnu.tar.xz && mkdir -p /usr/local/llvm/ \&& mv clang+llvm-9.0.1-aarch64-linux-gnu/* /usr/local/llvm/RUN python3 -m pip install --upgrade pipRUN pip3 install buildtools onnx==1.10.0RUN pip3 install pycuda nvidia-pyindexRUN apt-get install gitRUN pip install onnx-graphsurgeon onnxruntime==1.9.0 tf2onnx xgboost==1.5.2RUN git clone --recursive https://github.com/Tencent/TPAT.git /workspace/TPAT && cd /workspace/TPAT/3rdparty/blazerml-tvm && mkdir build && cp cmake/config.cmake build && cd buildRUN sed -i 's/set(USE_LLVM OFF)/set(USE_LLVM \/usr\/local\/llvm\/bin\/llvm-config)/g' /workspace/TPAT/3rdparty/blazerml-tvm/build/config.cmakeRUN sed -i 's/set(USE_CUDA OFF)/set(USE_CUDA ON)/g' /workspace/TPAT/3rdparty/blazerml-tvm/build/config.cmakeRUN cd /workspace/TPAT/3rdparty/blazerml-tvm/build/ && cmake .. && make -j8ENV TVM_HOME="/workspace/TPAT/3rdparty/blazerml-tvm/"ENV PYTHONPATH="$TVM_HOME/python:${PYTHONPATH}"

You can build the Docker image using the following command:

sudo docker build . -t tpat:master

Note: you should have external storage attached and build the docker image there since the image itself is quite large and AGX has limited built-in memory.

After successfully building the image, you can run the Docker container using:

sudo docker run --gpus all --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -it tpat:master

After starting the container, you should make sure that the compute capability in your TPAT Makefile corresponds to the compute capability of your device. In order to successfully build TPAT plugins on Jetson AGX Xavier, you should replace -arch=sm_75 with -arch=sm_72 on this line.

Now you should have everything that is needed to generate custom plugins for your model automagically using TPAT!

In order to optimize some operators in a model using TPAT on Jetson Xavier AGX, please follow the next steps.

First, you should run onnx_to_plugin.py script that expects the following parameters:

usage: onnx_to_plugin.py [-h] -i INPUT_MODEL_PATH -o OUTPUT_MODEL_PATH
[-n [NODE_NAMES [NODE_NAMES ...]]]
[-t [NODE_TYPES [NODE_TYPES ...]]]
[-p PLUGIN_NAME_DICT]optional arguments:
-h, --help            show this help message and exit
-i INPUT_MODEL_PATH, --input_model_path INPUT_MODEL_PATH
Please provide input onnx model path
-o OUTPUT_MODEL_PATH, --output_model_path OUTPUT_MODEL_PATH
Please provide output onnx model path which used for
tensorrt
-n [NODE_NAMES [NODE_NAMES ...]], --node_names [NODE_NAMES [NODE_NAMES ...]]
Please provide the operator name that needed to
generate tensorrt-plugin
-t [NODE_TYPES [NODE_TYPES ...]], --node_types [NODE_TYPES [NODE_TYPES ...]]
Please provide the operator type that needed to
generate tensorrt-plugin
-p PLUGIN_NAME_DICT, --plugin_name_dict PLUGIN_NAME_DICT
Please provide the dict of op name and plugin name
that will be generated by TPAT, such as : {"op_name" :
"plugin_name"}

We provide an example command which optimizes loop_function_1/OneHotEncoding/one_hotoperator in model.onnx graph and outputs model_tpat.onnx graph that contains the optimized tpat_onehot operators:

OPENBLAS_CORETYPE=ARMV8 python3 onnx_to_plugin.py \
-i “model.onnx” \
-o “model_tpat.onnx” \ 
-p “{\”loop_function_1/OneHotEncoding/one_hot\” : \”tpat_onehot\”}”

The result of running this command is an optimized ONNX graph where the unsupported operator is replaced by TPAT generated one. You can find the TPAT generated operator dynamic library in TPAT/python/trt_plugin/lib/and it should be named tpat_onehot.so .

Note: you should add OPENBLAS_CORETYPE=ARMV8 before the command which runs the TPAT conversion script in order to fix the issue which occurs on Jetson Xavier AGX devices.

OneHot operator in model.onnx graph vs. tpat_onehot operator in model_tpat.onnx graph (source: images by author generated using Netron)

trtexec is a tool to quickly utilize TensorRT without having to develop your own application. The trtexec tool has three main purposes:

benchmarking networks on random or user-provided input data.
generating serialized engines from models.
generating a serialized timing cache from the builder.

You can use the following trtexec command to convert a model into TensorRT plan format:

trtexec --onnx=model_tpat.onnx \
--saveEngine=model.plan \
--buildOnly --verbose --fp16 \
--workspace=6000 --noTF32 \
--plugins=”./python/trt_plugin/lib/tpat_onehot.so”

Please note that you have to provide the path to your TPAT optimized operators.

After successful conversion of the model, you can use the following command to measure TensorRT model performance:

trtexec --loadEngine=model.plan \
--verbose --workspace=4096 \ 
--streams=16 --threads \
--plugins=”./python/trt_plugin/lib/tpat_onehot.so”

And that’s it, you’ve successfully converted an operator which is not supported by TensorRT using TPAT and optimized a TensorRT graph. You can try out the presented process and share your insights and results in the comments!

We hope that you found this blog post useful, please take a look at some other blogs written by our team at Forsight, and feel free to reach out to us at [email protected] if you have any questions!