-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding wrappers for __riscv_vget* and __riscv_vset* for non-tuple types #2345
Comments
Ooh, thanks for pointing that out! I agree this would be a great improvement. Think of
So to implement Get/Set we would mostly copy Does that make sense? |
Thanks! Let me have a look at it and see whether I can add the GET_VEC based on TRUNC and EXT. |
Regarding the VIRT, RISC-V only provide vget for non-fractional LMULs, and the best implementation for something similar for fractional LMUL is to use vslidedown to extract the upper half, like what we did in the concat operators. What I can get is something similar to the following, though I need to change the SlideDown operator to intrinsics. Here the _GET and _GET_VIRT data types are defined for non-fractional and fractional types. // Halves LMUL. (Use LMUL arg for the source so we can use _TRUNC.)
#define HWY_RVV_GET(BASE, CHAR, SEW, SEWD, SEWH, LMUL, LMULD, LMULH, SHIFT, \
MLEN, NAME, OP) \
template <size_t kIndex> \
HWY_API HWY_RVV_V(BASE, SEW, LMULH) NAME(HWY_RVV_V(BASE, SEW, LMUL) v) { \
return __riscv_v##OP##_v_##CHAR##SEW##LMUL##_##CHAR##SEW##LMULH( \
v, kIndex); /* no AVL */ \
}
HWY_RVV_FOREACH(HWY_RVV_GET, Get, get, _GET)
#undef HWY_RVV_GET
#define HWY_RVV_GET_VIRT(BASE, CHAR, SEW, SEWD, SEWH, LMUL, LMULD, LMULH, \
SHIFT, MLEN, NAME, OP) \
template <size_t kIndex> \
HWY_API HWY_RVV_V(BASE, SEW, LMULH) NAME(HWY_RVV_V(BASE, SEW, LMUL) v) { \
if constexpr (kIndex == 0) { \
return Trunc(v); \
} else { \
static_assert(kIndex == 1); \
return Trunc( \
SlideDown(v, Lanes(DFromV<HWY_RVV_V(BASE, SEW, LMUL)>{}) / 2)); \
} \
}
HWY_RVV_FOREACH(HWY_RVV_GET_VIRT, Get, _, _GET_VIRT)
#undef HWY_RVV_GET_VIRT What's your idea on this? Thanks! |
A suggestion: rather than go through RVV_V and then DFromV, we can construct the D directly via Also, I'm surprised you'd have to create a new _GET category. Because this is functionally the same as Trunc, just with an index, it should be possible to use the existing _TRUNC category, no? |
That makes sense and thanks for the suggestion.
Yes, using
The reason for creating a new I will go work on a pull request with these suggestions. Thanks! |
Hi, I still have questions regarding the Also, I realized that doing |
This is typically taken from a template argument. The idea is to allow users to specify a (power of two) cap on the max number of lanes, similar to avl. For concreteness, we can add a size_t N argument after kIndex. hm, bummer about |
For these It might make sense to let Then we can
Then, in the client code, for example, in the implementation of About the |
I agree that the if constexpr only works if we have a template argument. That could be arranged by adding a D argument, but it sounds like this is anyway not necessary. Your plan (full vectors only for Get/Set) sounds good to me 👍 |
RVV provides the
__riscv_vset_v_*_*
and__riscv_vget_v_*_*
intrinsics for not only tuple types but also for vector groups since v0.11, for example:They are usually translated to whole register move instructions (e.g.,
vmv1r
) and is usually efficient on most microarchitectures, and could potentially be eliminated by the register allocator when the compilers are getting more advanced.These operations are useful when implementing concat operators like
ConcatUpperLower
when LMUL is not fractional. For example, the currentConcatUpperLower
is implemented as followsFor
V=vuint8m2_t
, each of the two slide operations will take 4 cycles on x280. If we implement it withvget
andvset
, we can doThis will be translated to a program that takes 2 cycles by clang (trunk version).
However, I have no idea on how to deal with all the macros to add the operations to highway. Any idea or instructions on this?
The text was updated successfully, but these errors were encountered: