We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use onnxruntime with CUDAExecutionProvider in multithreading of python server.
from waitress import serve from flask import Flask app = Flask(name) session = onnxruntime.InferenceSession("best.onnx",providers=['CUDAExecutionProvider']) @app.route('/') def infer_model(): ...... t1 = time.time() outputs = session.run(["output"], { "input": img })[0] t2 = time.time() ts = t2-t1 print(ts) ...... if name == 'main': serve(app, host='0.0.0.0', port=8080) # case1
I use Multi-threading post methold outside, ts ranges between 0.02 sec and 0.4 sec,but I set serve as follow:
if name == 'main': serve(app, host='0.0.0.0', port=8080,threasds = 1) # case 2
ts is approximate 0.025 sec and very smooth. How to get a smooth and fast result from #case1 as #case 2?
No response
Windows
win10
Released Package
1.16
Python
X64
CUDA
Yes
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Describe the issue
How to use onnxruntime with CUDAExecutionProvider in multithreading of python server.
To reproduce
from waitress import serve
from flask import Flask
app = Flask(name)
session = onnxruntime.InferenceSession("best.onnx",providers=['CUDAExecutionProvider'])
@app.route('/')
def infer_model():
......
t1 = time.time()
outputs = session.run(["output"], { "input": img })[0]
t2 = time.time()
ts = t2-t1
print(ts)
......
if name == 'main':
serve(app, host='0.0.0.0', port=8080) # case1
I use Multi-threading post methold outside, ts ranges between 0.02 sec and 0.4 sec,but I set serve as follow:
if name == 'main':
serve(app, host='0.0.0.0', port=8080,threasds = 1) # case 2
ts is approximate 0.025 sec and very smooth.
How to get a smooth and fast result from #case1 as #case 2?
Urgency
No response
Platform
Windows
OS Version
win10
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.16
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
Yes
The text was updated successfully, but these errors were encountered: