Kaleido's Personal Page

Article Digests

Pure Alogrithm
CV
Computer Architecture
Embodied AI
LLM
Model Compression
Unassorted

System Implement

Envi Setup
MarkDown related
Page Usage
Tool Box
- Usage of Git
Codez
Languages
- Python
- CUDA related

Job Hunting
Progress
- 2025 Working Progress
- 2026 Working Progress
Submission Notes
- Hardware-Aware Efficient LLCV
Talks & Lectures
Review
- 2026/1
ReadMes
Study Record

Kaleido's Personal Page

articles
ModelCompression
Quantization
Application
README.md

Quantization Applications

Post-training Quantization on Diffusion Models
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
Quantizable Transformers Removing Outliers by Helping Attention Heads Do Nothing
Q-DM: An Efficient Low-bit Quantized Diffusion Model
A Survey on Efficient Inference for Large Language Models

2021-2026, UCaiJun Revision e5827fd

Built with GitHub Pages using a theme provided by RunDocs.

Kaleido's Personal Page

master

GitHub: Homepage; Issues; Download

This Software is under the terms of MIT License.