本质

ResNet跨层连接设计的引申。

与ResNet的主要区别在于,DenseNet⾥模块B的输出不是像ResNet那样和模块A的输出相加,⽽是在通道维上连结。在这个设计⾥,模块A直接跟模块B后⾯的所有层连接在了⼀起。这也是它被称为“稠密连接”的原因.

稠密块

由多个conv_block组成,每个conv_block使用“批量归一化、激活和卷积”结构。conv_block的通道数控制了输出通道数相对于输⼊通道数的增⻓,因此也被称为增⻓率(growth rate)。

DENSENET

⾸先使⽤同ResNet⼀样的单卷积层和最⼤池化层。

之后同ResNet⼀样是4个使用4个卷积层的稠密块,稠密块里的卷积层通道数(即增长率)设为32。每个稠密块后使用过渡层来减半高和宽以及通道数。

最后接上全局池化层和全连接层来输出。

获取数据并训练模型

在Fashion-MNIST数据集上训练ResNet。

以下为完整代码。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
import time
import torch
from torch import nn, optim
import torch.nn.functional as F

import sys

sys.path.append("..")
import d2lzh_pytorch as d2l

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

def conv_block(in_channels, out_channels):
blk = nn.Sequential(nn.BatchNorm2d(in_channels), nn.ReLU(),
nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))
return blk


class DenseBLock(nn.Module):
def __init__(self, num_convs, in_channels, out_channels):
super(DenseBLock, self).__init__()
net = []
for i in range(num_convs):
in_c = in_channels + i * out_channels
net.append(conv_block(in_c, out_channels))
self.net = nn.ModuleList(net)
self.out_channels = in_channels + num_convs * out_channels

def forward(self, X):
for blk in self.net:
Y = blk(X)
X = torch.cat((X, Y), dim=1)
return X


def transition_block(in_channels, out_channels):
blk = nn.Sequential(nn.BatchNorm2d(in_channels), nn.ReLU(), nn.Conv2d(in_channels, out_channels, kernel_size=1),
nn.AvgPool2d(kernel_size=2, stride=2))
return blk


net = nn.Sequential(nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3), nn.BatchNorm2d(64), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
num_channels, growth_rate = 64, 32
num_convs_in_dense_blocks = [4, 4, 4, 4]

for i, num_convs in enumerate(num_convs_in_dense_blocks):
DB = DenseBLock(num_convs, num_channels, growth_rate)
net.add_module("DenseBlock_%d" % i, DB)
num_channels = DB.out_channels
if i != len(num_convs_in_dense_blocks) - 1:
net.add_module("transition_block_%d" % i, transition_block(num_channels, num_channels // 2))
num_channels = num_channels // 2
net.add_module("BN", nn.BatchNorm2d(num_channels))
net.add_module("relu", nn.ReLU())
net.add_module("global_avg_pool", d2l.GlobalAvgPool2d())
net.add_module("fc", nn.Sequential(d2l.FlattenLayer(),
nn.Linear(num_channels, 10)))

batch_size = 256
# 如出现“out of memory”的报错信息,可减⼩batch_size或resize
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size,
resize=96)
lr, num_epochs = 0.001, 5
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
d2l.train_ch5(net, train_iter, test_iter, batch_size, optimizer,
device, num_epochs)

结果