2021/4/16 结果反馈与讨论记录
2021/4/16
Channel Widening Exp
- 实验条件(?):XNOR-ResNet18, CIFAR10数据集,200 epoch run
- Accuracy数据计算:后10 epoch validation accuracy算术平均。

- 结论:加宽通道在一定范围内可以提高accuracy,但是难以到达FP ResNet18的精度,且通道过宽对accuracy有害。
- 可能原因:optimizer设置问题,比如weight decay,可能存在overfitting.
BinaryDuo CDG Validation
- 实验条件:3层MLP(32/32/32), 输入数据服从N(0,1),维度为(1000000,32) epsilon=0.1

raw data:  
s_full = [1.0000, 1.0000, 1.0000, 1.0000]  
s_bina = [0.8934, 0.8968, 0.9629, 0.9046]  
s_tern = [0.9653, 0.9874, 0.9933, 0.9747]  
s_2_b =  [0.9752, 0.9755, 0.9847, 0.9747]  
- 实验条件:3层MLP(32/32/32), 输入数据服从N(0,1),维度为(1000000,32) epsilon=0.01

raw data:  
s_full = [1.0000, 1.0000, 1.0000, 1.0000]
s_bina = [0.8468, 0.8430, 0.9488, 0.9205]
s_tern = [0.9586, 0.9841, 0.9945, 0.9672]
s_2_b =  [0.9009, 0.9419, 0.9516, 0.9303]
陈老师:
- loss下降
- NES(cheap, apply to real model(resnet18 cifar - CDG as upperbound, STE as lowerbound
- loss descendence plot (variance/mean)
- convergent low; less sample yet better estimation
- 比较estimator(少样本loss下降快)
 
- 预期产出:算法(特殊性:有grad/NES等传统算法没有grad)
- ResNet18 training accuracy -> overfitting?
TO-DO
- WRPN related:
- 调一下2x channels / 3x channels的情况,可以看一下training accuracy和training loss(有没有出现过拟合,陈老师说channel++ accuracy会严格变好?)
- NES related:
- 实现NES,在toy model上看看效果?可以画loss随batch的下降曲线,CDG作为真实gradient(的估计)作为upper bound,STE作为被比较的对象(or lowerbound)(文章我还没有读过…)
- 将NES用到ResNet上(这里因为计算开销应该就用不了CDG了,直接和STE比?)
- Framework related:
- 比一下awnas和现在实现的training setting,看看训不上去的原因在哪
- 把接口改得稍微好看点
- BMXNet baseline related:
- 有卡挂上去调完块内结构的model/lr?