-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unexpected result of short integer SimdArray division #257
Comments
Vectors of |
This is the fix I applied to std-simd: https://github.com/VcDevel/std-simd/blob/5d66f8f31942c000c329d7865360a3b3db1a85ae/experimental/bits/simd_x86.h#L1428-L1462 |
What really surprised me that division by one should produce an erroneous result. This is the one case where multiplication with the reciprocal should be fine, and correct for all numbers. Instead, with g++ and -Ofast, all values in the result vector are one less than they should be: `#include <Vc/Vc> int main ( int argc , char * argv[] ) v_t lhs { 2 , 3 , 4 , 5 , 6 , 7, 8 , 9} ; auto result = lhs / rhs ; |
vrcpps xmm2, xmm1
vmulps xmm1, xmm2, xmm1
vmulps xmm1, xmm2, xmm1
vaddps xmm2, xmm2, xmm2
vsubps xmm2, xmm2, xmm1
vmulps xmm0, xmm0, xmm2 I agree this is surprising. |
So my error was to assume that forming the reciprocal of 1.0f would yield 1.0f.
I use -Ofast throughout vspline and pv, and so far this has gone well and it's noticeably faster. Most of my code is floating point arithmetic, though. I only noticed this issue because I implemented my own SIMD type (see https://bitbucket.org/kfj/vspline/src/master/simd_type.h) which I use as a stand-in for Vc::SimdArray when the target can't use it, or for fundamentals Vc won't vectorize. So I wrote test code to make sure that all the operator functions work as expected. The test code can run with Vc SimdArrays as well, and that coughed up the problem. |
I guess I should prioritize gather & scatter for you so you can move to std-simd. ;-) |
Hey, you still remember me! |
Sorry if you feel/felt ignored. I was occupied with stupid corner cases like from |
You have my sympathy... What I like about generic programming, like with my simd_type, is that one can get away without much of the nitty-gritty. But I suppose someone has to do it. And I'm also bad at multitasking. So now your code does work with clang++ - I just checked! And, indeed, the test code I wrote a year or so back comes out some 30% faster than the g++ binary - BTW with -Ofast. When I last tried, clang++ just spat out lots of errors. Now this is good news! |
Vc version / revision | Operating System | Compiler & Version | Compiler Flags | Assembler & Version | CPU
1.4.1 | ubuntu 20.04 | g++ 9.3.0 | -Ofast | | Haswell core i5
Testcase
Actual Results
when I compile this with
g++ -Ofast opdiv_bug.cc -lVc
And run it, I get:
$ ./a.out
a.out: opdiv_bug.cc:11: int main(int, char**): Assertion `result [ 0 ] == 2' failed.
Expected Results
I'd expect the assertion to hold. Compiling with -O3 does produce code which does not fail the assertion, and when using 'int' instead of 'short' it holds as well.
I'm not sure what to make of this, and it may not be a Vc problem, but rather an error of the optimizer. The problem first showed when processing C vectors of Vc::SimdArray with clang++, even though clang++ does not produce the error when used for the example above, so it does not seem to be occuring only with g++, but is easier to provoke with it.
Looking at the assembler code produced with -Ofast, there is no division instruction. fast math would use a multiplication with the reciprocal, and the assembler opcode to produce a reciprocal is indeed present, but the result should still be correct, because the reciprocal of one is one and multiplying with it should have no effect. Any ideas?
The text was updated successfully, but these errors were encountered: