Quantization Applications
- Post-training Quantization on Diffusion Models
- GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
- SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
- AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
- LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
- Quantizable Transformers Removing Outliers by Helping Attention Heads Do Nothing
- Q-DM: An Efficient Low-bit Quantized Diffusion Model
- A Survey on Efficient Inference for Large Language Models