-
Huawei 2012 Labs., Compiler Lab.
- Beijing, China
- https://chenglong92.github.io
-
-
cutile-python Public
Forked from NVIDIA/cutile-pythoncuTile is a programming model for writing parallel kernels for NVIDIA GPUs
Python Other UpdatedDec 10, 2025 -
TileGym Public
Forked from NVIDIA/TileGymHelpful kernel tutorials and examples for tile-based GPU programming
Python Other UpdatedDec 7, 2025 -
LLM4Compiler Public
The Next-generation Innovation of Code Optimization and Compilers in the Era of LLM
BSD 2-Clause "Simplified" License UpdatedDec 7, 2025 -
TensorRT-LLM Public
Forked from NVIDIA/TensorRT-LLMTensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
C++ Other UpdatedNov 24, 2025 -
PASA Public
PASA: Accelerating attention with low precision computing for large models
MIT License UpdatedOct 31, 2025 -
chenglong92.github.io Public
This is my personal homepage.
JavaScript MIT License UpdatedOct 23, 2025 -
lectures Public
Forked from gpu-mode/lecturesMaterial for gpu-mode lectures
Jupyter Notebook Apache License 2.0 UpdatedJun 18, 2025 -
learn-cuda Public
Forked from gau-nernst/learn-cudaLearn CUDA with PyTorch
Cuda UpdatedJun 8, 2025 -
triton-cpu Public
Forked from triton-lang/triton-cpuAn experimental CPU backend for Triton
MLIR MIT License UpdatedJun 3, 2025 -
extension-cpp Public
Forked from pytorch/extension-cppC++ extensions in PyTorch
Python UpdatedAug 7, 2024 -
latent-diffusion Public
Forked from CompVis/latent-diffusionHigh-Resolution Image Synthesis with Latent Diffusion Models
Jupyter Notebook MIT License UpdatedNov 21, 2023 -
FlashAttention Public
Flash Attention Code Study for Large Language Model(LLM).
-
Mixed-precision-Computing Public
Mixed-precision Solver
GNU General Public License v3.0 UpdatedAug 27, 2023 -
-
GiMMiK Public
Forked from PyFR/GiMMiKMako BSD 3-Clause "New" or "Revised" License UpdatedMay 21, 2023 -
libxsmm Public
Forked from libxsmm/libxsmmLibrary for specialized dense and sparse matrix operations, and deep learning primitives.
C BSD 3-Clause "New" or "Revised" License UpdatedMay 20, 2023 -
cpfloat Public
Forked from north-numerical-computing/cpfloatCustom-Precision Floating-point numbers.
C GNU Lesser General Public License v2.1 UpdatedMar 8, 2023 -
TemplateProgrammingCPP Public
It is related to template programming with C++, and more exactly, template metaprogramming.
MIT License UpdatedFeb 21, 2023 -
OP2-Common Public
Forked from OP-DSL/OP2-CommonOP2: open-source framework for the execution of unstructured grid applications on clusters of GPUs or multi-core CPUs
C++ Other UpdatedDec 10, 2022 -
Simple-CFD-Demo-with-DPCPP Public
This repo is built to implement some simple CFD demo using Intel OneAPI(DPCPP). The target is to study the platform portability for CFD code with SYCL and C++.
C++ MIT License UpdatedSep 27, 2022 -
chop Public
Forked from higham/chopRound matrix elements to lower precision in MATLAB
MATLAB BSD 2-Clause "Simplified" License UpdatedJun 14, 2022 -
FlatPlateCascadeSourceCode Public
This repository contains the source code which can calculate the gust-cascade interaction parallelly, all the modules are parallel but the forcing solver section, which uses LU decompsition on the …
Shell UpdatedMar 18, 2022 -
-
Framework for performance-portable parallel computations on unstructured meshes
Python Other UpdatedMar 15, 2022 -
microprocessor-trend-data Public
Forked from karlrupp/microprocessor-trend-dataData repository for my blog series on microprocessor trend data.
Gnuplot Other UpdatedFeb 22, 2022 -
bempp-acoustic-tutorials Public
Forked from mscroggs/bempp-acoustic-tutorialsTutorials and exercises for learning to use Bempp-cl for problems in acoustics
Jupyter Notebook MIT License UpdatedJul 7, 2021 -
hpl-ai Public
Forked from RIKEN-RCCS/hpl-aiAn HPL-AI implementation for Fugaku
C++ Other UpdatedJun 29, 2021 -
CAA-Basic-Test Public
This repository contains some numerical cases for CAA, including 1-D nonlinear wave propagation using DRP scheme, Guass-type pulse wave propagation in 2-D free field and the problem for CAA Worksho…
-
3D-Tylor-Green-Vortex Public
This Code is developed for simulating the 3D compressible viscous Taylor-Green Vortex Cases. The solver is developed by FORTRAN90+MPICH. The 3D domain decomposition is implemented to parallelize th…