Quantization
- A Survey of Quantization Methods for Efficient Neural Network Inference
- Post training 4-bit quantization of convolutional networks for rapid-deployment
- Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming
- Up or Down? Adaptive Rounding for Post-Training Quantization
- Automated Log-Scale Quantization for Low-Cost Deep Neural Networks
- BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction
- Data-Free Quantization Through Weight Equalization and Bias Correction
- Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks
- Deep Learning with Limited Numerical Precision
- Loss Aware Post-training Quantization
- Learnable Companding Quantization for Accurate Low-bit Neural Networks
- LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
- Learned Step Size Quantization
- MQBench: Towards Reproducible and Deployable Model Quantization Benchmark
- Improving Neural Network Quantization without Retraining using Outlier Channel Splitting
- Trained quantization thresholds for accurate and efficient fixed-point inference of deep neural networks
- ZeroQ: A Novel Zero Shot Quantization Framework
- NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers
- PACT: Parameterized Clipping Activation for Quantized Neural Networks
- Quantization Applications
- Post-training Quantization on Diffusion Models
- SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
- LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
- Quantizable Transformers Removing Outliers by Helping Attention Heads Do Nothing
- Q-DM: An Efficient Low-bit Quantized Diffusion Model