Dim = 16
Dim = 32
Dim = 64
Dim = 8 Sample = 500
Sample = 5000
Sample = 50000
Dim = 16
Sample = 500
Sample = 5000
Sample = 50000
Dim = 32
Sample = 500
Sample = 5000
Sample = 50000
Dim = 64
Sample = 500
Sample = 5000
Sample = 50000
实验设置:
问题:
NES grad似乎不能指向loss下降的方向
Dim = 8
Dim = 16
Dim = 32
Dim = 64
Dim = 8 Sample = 500
Sample = 5000
Sample = 50000
Dim = 16
Sample = 500
Sample = 5000
Sample = 50000
Dim = 32
Sample = 500
Sample = 5000
Sample = 50000
Dim = 64
Sample = 500
Sample = 5000
Sample = 50000
实验设置:
ABOLISHED
(从输入端算起)第1层,kernel size=8*8*3*3
(从输入端算起)第8层, kernel size=16*16*3*3
(从输入端算起)倒数第二层, kernel size=64*64*3*3
(从输入端算起)第1层,kernel size=16*16*3*3
(从输入端算起)第8层,kernel size=32*32*3*3
(从输入端算起)倒数第二层, kernel size=256*256*3*3
Absent
(从输入端算起)第1层,kernel size=8*8*3*3
(从输入端算起)第8层, kernel size=16*16*3*3
(从输入端算起)倒数第二层, kernel size=64*64*3*3
torch.abs(torch.max(A))
/torch.abs(torch.min(A))
,将A的取值范围映射到[-1, 1],再对A加1除2,得到[0, 1]上的A作为prob,设定temperature进行relaxed Bernoulli采样,采样值取值范围为[0, 1],减去0.5后通过sign得到stochastic rounding activationbatch size = 640k
batch size = 256
batch size = 256
Mr.Chen:
妃哥: