Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] use Onnxruntime TensorRT execution provider with lean tensorRT runtime #23082

Open
jcdatin opened this issue Dec 11, 2024 · 1 comment
Labels
ep:TensorRT issues related to TensorRT execution provider feature request request for unsupported feature or enhancement

Comments

@jcdatin
Copy link

jcdatin commented Dec 11, 2024

Describe the feature request

I am doing an inference with Onnxruntime TensorRT Execution Provider by loading a tensorRT cached model that was generated previously with the libnvinfer_builder_resource.so from an original. onnx model
So I don't need anymore at runtime the huge 1GB+ libnvinfer_builder_resource.so and I can deply my onnxruntime environnement without it (tested).

Also I would like not to deploy 230MB libnvinfer.so either but 30MB libnvinfer_lean.so instead (NVDIA TensorRT says it is possible if we don't build the tensorrt model anymore (quoting https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html :
We provide the possibility to install TensorRT in three different modes:
A full installation of TensorRT, including TensorRT plan file builder functionality. This mode is the same as the runtime provided before TensorRT 8.6.0.
A lean runtime installation is significantly smaller than the full installation. It allows you to load and run engines built with a version-compatible builder flag. However, this installation does not provide the functionality to build a TensorRT plan file.
A dispatch runtime installation. This installation allows for deployments with minimum memory consumption. It allows you to load and run engines built with a version compatible with the builder flag and includes the lean runtime. However, it does not provide the functionality to build a TensorRT plan file.

However , when I am redirecting libnvinfer.so to libnvinfer_lean.so as advised , then I am getting an error from Onnxruntime TensorRT Execution Provider
terminate called after throwing an instance of 'Ort::Exception'
what(): /tmp/onnxruntime/onnxruntime/core/session/provider_bridge_ort.cc:1419 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_tensorrt.so with error: /usr/local/lib/libonnxruntime_providers_tensorrt.so: undefined symbol: createInferBuilder_INTERNAL

Am I making some error or is Onnxruntime TRT EP currently not capable of using libnvinfer_lean only instead of libnvinfer ?. In the last case adding this improvement in ORT would be nice when trying to reduce the Onnxrutime gpu image footprint

Describe scenario use case

Build a trt cached model from an onnx model with libnvinfer_builder_resource.so deployed
then deploy the onnxrt and the TRT cached model in an environment where now lbvinfer.so points to lbvinfer_lean and without libnvinfer_builder_resource.so is removed
then load the cached model from onnxrt TRT EP : the model should be loaded and the infered with lbvinfer_lean.so only

ORT version 1.18.1 on Linux SLES15 compiled with GCC11 and flag --use_tensorrt_oss_parser as in build command
CC=gcc-11 CXX=g++-11 ./build.sh
--skip_submodule_sync --nvcc_threads 2
--config $ORT_BUILD_MODE --use_cuda
--cudnn_home /usr/local/cuda/lib64
--cuda_home /usr/local/cuda/ --use_tensorrt
--use_tensorrt_oss_parser --tensorrt_home /usr/local/TensorRT
--build_shared_lib --parallel --skip_tests
--allow_running_as_root --cmake_extra_defines "CMAKE_CUDA_ARCHITECTURES=75"
--cmake_extra_defines "CMAKE_CUDA_HOST_COMPILER=/usr/bin/gcc-11"

Cuda version 12.4
Cudnn 8.9.7.29
TRT 8.6.1

@jcdatin jcdatin added the feature request request for unsupported feature or enhancement label Dec 11, 2024
@github-actions github-actions bot added the ep:TensorRT issues related to TensorRT execution provider label Dec 11, 2024
@chilo-ms
Copy link
Contributor

Thanks for bringing this request.
TensorRT EP doesn't support TRT lean runtime yet, but it's in our backlog.
We will have internal discuss about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:TensorRT issues related to TensorRT execution provider feature request request for unsupported feature or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants