作者是旷视的Liangyu Chen, Xin Lu, Jie Zhang, Xiaojie Chu, Chengpeng Chen——他们怎么这么喜欢打比赛…不愧是做算法挣钱(能挣到钱吗)的。
Summary:这篇文章是SIDD上的SOTA,有一定参考价值(堆就完了)。提出了一个HIN block,在block中间将通道折半通过Instance Normalizaiton(另一半不动),用HIN block和ResBlock插到U-Net中,再将两个U-Net作为两个阶段拼成大网络,在两个网络之间用CSFF和SAM传特征和注意力图。
- 将Instance Normalization加到提出的HIN block里,用两个U-Net作为两个stage拼成一张超网,并在中间套用CSFF和SAM模块,取得了SIDD上的SOTA性能。
1 Introduction
- 分类任务中的BN和IBN;
- 检测任务中的Layer Normalization(DETR)和GroupNorm(FCOS);
- style/domain transfer任务中的Instance Normalization。
2.1 Normalization in low-level computer vision tasks
2.2 Architectures for Image Restoration
- single-stage methods(复杂的架构设计):
- Densely residual laplacian super-resolution, CVPR20
- Density-aware single image de-raining using a multi-stream dense network, CVPR18
- multi-stage methods(将原来的网络拆解成简单的小网络):
- [Lightweight pyramid networks for image deraining, TNNLS19] introduce the mature Gaussian-Laplacian image pyramid decomposition technology to the neural network, and uses a relatively shallow network to handle the learning problem at each pyramid level
- [Progressive image deraining networks: A better and simpler baseline, CVPR19] proposes a progressive recurrent network by repeatedly unfolding a shallow ResNet and introduces a recurrent layer to exploit the dependencies of deep features across stages;
- [Deep stacked hierarchical multi-patch network for image deblurring, CVPR19] proposes a deep stacked hierarchical multi-patch network. Each level focus on different scales of the blur and the finer level contributes its residual image to the coarser level;
- [Multi-stage progressive image restoration, arxiv21] proposes a multi-stage progressive image restoration architecture, where there are two encoder-decoder subnetworks and one original resolution subnetwork, and proposes a supervised attention module (SAM) and a cross-stage feature fusion (CSFF) module between every two stages to enrich the features of the next stage;
3 Approach
3.1 HINet
HINet由两个子网构成,每个子网都是U-Net,每个子网开始都由一个3*3的卷积层提取初始特征,编码器由5个HIN block组成,解码器由5个Res Block组成,在两个子网间使用了cross-stage feature fusion (CSFF) module和supervised attention module (SAM),前者将前一个阶段的不同尺度的特征融合在一起,丰富下一阶段的特征,后者将前一阶段提取的比较有用的信息传到下一阶段,而不怎么有信息的特征则会被attention masks掩码掉。
HIN block如下所示:
- 使用了PSNR loss,看样子每个阶段输出的是去掉的噪声?
4 Experiments
- 未来说不定有用的benchmark on GoPro(deblur):
- 同样可能有用的benchmark for deraining:
4.4 Extension to HINet
- Wider, Deeper:实际上用的HINet是2x宽,且在编解码器上各加了两个resblock(为什么不加hinblock?还是不好用呗)。
- Test Time Augmentation and Ensemble:在测试的时候还做了增强,而且用了3个model作ensemble,结果取平均。
2021-2024, UCaiJun Revision
af8e214 Kaleido's Personal Page