wjc404

Follow

wjc404

Follow

16 followers · 1 following

Beijing, China

Achievements

Achievements

Popular repositories Loading

GEMM_AVX512F GEMM_AVX512F Public

SGEMM and DGEMM subroutines using AVX512F instructions.

C 12 1
GEMM_AVX2 GEMM_AVX2 Public

Fast avx2/fma3 dgemm and sgemm subroutines for medium to large matrices(>2000*2000) on haswell/skylake/zen processors, with performances comparable to MKL.

C 7 1
Simple_CUDA_GEMM Simple_CUDA_GEMM Public

Sgemm kernel function on Nvidia Pascal GPU, able to achieve 60% theoretical performance.

Cuda 5 1
GEMM_AVX2_FMA3 GEMM_AVX2_FMA3 Public archive

sgemm and dgemm subroutine for large matrices, slightly outperform Intel MKL

C 1 1
COMPLEX_GEMM_AVX2_FMA3 COMPLEX_GEMM_AVX2_FMA3 Public

cgemm and zgemm subroutines for large matrices, using avx2 and fma3 instructions, with performance comparable to MKL2018

C
cpu_gemm_opt cpu_gemm_opt Public

Forked from carlushuang/cpu_gemm_opt

how to design cpu gemm on x86 with avx256, that can beat openblas.

C++