You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Opening the optimization fusion_quickgelu (i.e., opt_level=1) will crash unexpectedly. This bug was trigger due to the self.model.get_constant_value(first_mul_node.input[1]) return None.
Traceback (most recent call last):
File "/share_container/optfuzz/ONNX/bugs/bug4.py", line 8, in <module>
optimized_model = optimizer.optimize_model(model_path, opt_level=1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/software/onnxruntime/build/Linux/Release/onnxruntime/transformers/optimizer.py", line 405, in optimize_model
optimizer = optimize_by_fusion(model, model_type, num_heads, hidden_size, optimization_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/software/onnxruntime/build/Linux/Release/onnxruntime/transformers/optimizer.py", line 260, in optimize_by_fusion
optimizer.optimize(optimization_options)
File "/software/onnxruntime/build/Linux/Release/onnxruntime/transformers/onnx_model_bert.py", line 340, in optimize
self.fuse_gelu()
File "/software/onnxruntime/build/Linux/Release/onnxruntime/transformers/onnx_model_bert.py", line 70, in fuse_gelu
fusion.apply()
File "/software/onnxruntime/build/Linux/Release/onnxruntime/transformers/fusion_base.py", line 71, in apply
self.fuse(node, input_name_to_nodes, output_name_to_node)
File "/software/onnxruntime/build/Linux/Release/onnxruntime/transformers/fusion_quickgelu.py", line 53, in fuse
approximation_value = self.model.get_constant_value(first_mul_node.input[1]).item()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'item'
The linked model uses float16 precision. When you run your provided script, you should see the following message.
This model uses float16 in the graph, use_gpu=False might cause extra Cast nodes. Most operators have no float16 implementation in CPU, so Cast nodes are added to compute them in float32. If the model is intended to use in GPU, please set use_gpu=True. Otherwise, consider exporting onnx in float32 and optional int8 quantization for better performance.
Because you are not setting use_gpu = True when calling optimize_model, the optimizer assumes the model is using float32 precision and inserts extra nodes for casting between precisions. This appears to break a fusion and produce the error that you see.
You can pass use_gpu = True as one of the arguments in optimize_model to fix your error. I was able to successfully run your script once I added it.
Here's the function signature of optimize_model for reference.
@kunal-vaishnavi
Thank you for the detailed explanation. Although the warning about poor performance when running on CPU with float16 is helpful, the unexpected crash remains a critical issue. It would be better to improve the code implementation by either fixing the crash or handling it with a try-catch block to capture the error. Thanks again!
Describe the issue
Opening the optimization
fusion_quickgelu
(i.e., opt_level=1) will crash unexpectedly. This bug was trigger due to theself.model.get_constant_value(first_mul_node.input[1])
return None.To reproduce
Step 1: Download the model via this link
Step 2: run the following script:
Urgency
No response
Platform
Linux
OS Version
Ubuntu 20.04
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
5c1b7cc (latest-version)
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
The text was updated successfully, but these errors were encountered: