Model Compression
- GhostNet: More Features from Cheap Operations
- MCUNet: Tiny Deep Learning on IoT Devices
- MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning
- TinyTL: Reduce Activations, Not Trainable Parameters for Efficient On-Device Learning
- 深度神经网络压缩与加速综述
- BNN related articles
- Towards Accurate Binary Convolutional Neural Network
- BATS: Binary ArchitecTure Search
- Bayesian Optimized 1-Bit CNNs
- Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm
- DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
- Learning Frequency Domain Approximation for Binary Neural Networks
- High-Capacity Expert Binary Networks
- Learning Channel-wise Interactions for Binary Convolutional Neural Networks
- ReCU: Reviving the Dead Weights in Binary Neural Networks
- Training Binary Neural Networks with Real-to-Binary Convlutions
- Training Binary Neural Networks through Learning with Noisy Supervision
- Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets
- WRPN: Wide Reduced-Precision Networks
- XNOR-Net
- PokeBNN: A Binary Pursuit of Lightweight Accuracy
- ReActNet Towards Precise Binary Neural Network with Generalized Activation Functions
- Bitwise Neural Networks
- BiT: Robustly Binarized Multi-distilled Transformer
- BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by Coupling Binary Activations
- BoolNet: Minimizing the Energy Consumption of Binary Neural Networks
- An Empirical study of Binary Neural Networks' Optimisation
- Deployment
- Kownledge Distillation
- ML System
- NAS
- Neural Architecture Search for Dense Prediction Tasks in Computer Vision
- A Generic Graph-Based Neural Architecture Encoding Scheme for Predictor-Based NAS
- Neural Predictor for Neural Architecture Search
- NAS-BENCH-201: Extending the Scope of Reproducible Neural Architecture Search
- A Generic Graph-based Neural Architecture Encoding Scheme with Multifaceted Information
- Pruning
- Architecture-Aware Network Pruning for Vision Quality Applications
- Structured Pruning of Neural Networks with Budget-Aware Regularization
- ECC: Platform-Independent Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model
- Revisiting Random Channel Pruning for Neural Network Compression
- DHP: Differentiable Meta Pruning via HyperNetworks
- Universally Slimmable Networks and Improved Training Techniques
- SLIMMABLE NEURAL NETWORKS
- Learning N: M Fine-grained Structured Sparse Neural Networks From Scratch
- AutoSlim: Towards One-Shot Architecture Search for Channel Numbers
- Quantization
- A Survey of Quantization Methods for Efficient Neural Network Inference
- Post training 4-bit quantization of convolutional networks for rapid-deployment
- Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming
- Up or Down? Adaptive Rounding for Post-Training Quantization
- Automated Log-Scale Quantization for Low-Cost Deep Neural Networks
- BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction
- Data-Free Quantization Through Weight Equalization and Bias Correction
- Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks
- Deep Learning with Limited Numerical Precision
- Loss Aware Post-training Quantization
- Learnable Companding Quantization for Accurate Low-bit Neural Networks
- LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
- Learned Step Size Quantization
- MQBench: Towards Reproducible and Deployable Model Quantization Benchmark
- Improving Neural Network Quantization without Retraining using Outlier Channel Splitting
- Trained quantization thresholds for accurate and efficient fixed-point inference of deep neural networks
- ZeroQ: A Novel Zero Shot Quantization Framework
- NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers
- PACT: Parameterized Clipping Activation for Quantized Neural Networks
- Quantization Applications
- Post-training Quantization on Diffusion Models
- SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
- LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
- Quantizable Transformers Removing Outliers by Helping Attention Heads Do Nothing
- Q-DM: An Efficient Low-bit Quantized Diffusion Model