ML System
- MCUNet: Tiny Deep Learning on IoT Devices
- MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning
- TinyTL: Reduce Activations, Not Trainable Parameters for Efficient On-Device Learning
- A systematic methodology for analysis of deep learning hardware and software platforms
- Precious: Resource-Demand Estimation for Embedded Neural Network Accelerators
- A Learned Performance Model for Tensor Processing Units
- Learned TPU Cost Model for XLA Tensor Programs
- Efficient Mixed-Precision Large Language Model Inference with TurboMind