Skip to content
View wjc404's full-sized avatar

Block or report wjc404

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Popular repositories Loading

  1. GEMM_AVX512F GEMM_AVX512F Public

    SGEMM and DGEMM subroutines using AVX512F instructions.

    C 12 1

  2. GEMM_AVX2 GEMM_AVX2 Public

    Fast avx2/fma3 dgemm and sgemm subroutines for medium to large matrices(>2000*2000) on haswell/skylake/zen processors, with performances comparable to MKL.

    C 7 1

  3. Simple_CUDA_GEMM Simple_CUDA_GEMM Public

    Sgemm kernel function on Nvidia Pascal GPU, able to achieve 60% theoretical performance.

    Cuda 5 1

  4. GEMM_AVX2_FMA3 GEMM_AVX2_FMA3 Public archive

    sgemm and dgemm subroutine for large matrices, slightly outperform Intel MKL

    C 1 1

  5. COMPLEX_GEMM_AVX2_FMA3 COMPLEX_GEMM_AVX2_FMA3 Public

    cgemm and zgemm subroutines for large matrices, using avx2 and fma3 instructions, with performance comparable to MKL2018

    C

  6. cpu_gemm_opt cpu_gemm_opt Public

    Forked from carlushuang/cpu_gemm_opt

    how to design cpu gemm on x86 with avx256, that can beat openblas.

    C++