pytorch-handbook/chapter2/2.4-cnn.ipynb
2021-08-07 23:08:20 +08:00

985 lines
57 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'1.0.0'"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import torch\n",
"torch.__version__"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 2.4 卷积神经网络简介\n",
"卷积神经网络由一个或多个卷积层和顶端的全连通层也可以使用1x1的卷积层作为最终的输出组成一种前馈神经网络。一般的认为卷积神经网络是由Yann LeCun大神在1989年提出的LeNet中首先被使用但是由于当时的计算能力不够并没有得到广泛的应用到了1998年Yann LeCun及其合作者构建了更加完备的卷积神经网络LeNet-5并在手写数字的识别问题中取得成功LeNet-5的成功使卷积神经网络的应用得到关注。LeNet-5沿用了LeCun (1989) 的学习策略并在原有设计中加入了池化层对输入特征进行筛选 。LeNet-5基本上定义了现代卷积神经网络的基本结构其构筑中交替出现的卷积层-池化层被认为有效提取了输入图像的平移不变特征使得对于特征的提取前进了一大步所以我们一般的认为Yann LeCun是卷积神经网络的创始人。\n",
"\n",
"2006年后随着深度学习理论的完善尤其是计算能力的提升和参数微调fine-tuning等技术的出现卷积神经网络开始快速发展在结构上不断加深各类学习和优化理论得到引入2012年的AlexNet、2014年的VGGNet、GoogLeNet 和2015年的ResNet,使得卷积神经网络几乎成为了深度学习中图像处理方面的标配。\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.4.1 为什么要用卷积神经网络\n",
"对于计算机视觉来说每一个图像是由一个个像素点构成每个像素点有三个通道分别代表RGB三种颜色(不计算透明度)我们以手写识别的数据集MNIST举例每个图像的是一个长宽均为28channel为1的单色图像如果使用全连接的网络结构网络中的神经与相邻层上的每个神经元均连接那就意味着我们的网络有28 * 28 =784个神经元RGB3色的话还要*3hidden层如果使用了15个神经元需要的参数个数(w和b)就有28 * 28 * 15 * 10 + 15 + 10=117625个这个数量级到现在为止也是一个很恐怖的数量级一次反向传播计算量都是巨大的这还展示一个单色的28像素大小的图片如果我们使用更大的像素计算量可想而知。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.4.2结构组成\n",
"上面说到传统的网络需要大量的参数,但是这些参数是否重复了呢,例如,我们识别一个人,只要看到他的眼睛,鼻子,嘴,还有脸基本上就知道这个人是谁了,只是用这些局部的特征就能做做判断了,并不需要所有的特征。\n",
"另外一点就是我们上面说的可以有效提取了输入图像的平移不变特征,就好像我们看到了这是个眼睛,这个眼镜在左边还是在右边他都是眼睛,这就是平移不变性。\n",
"我们通过卷积的计算操作来提取图像局部的特征,每一层都会计算出一些局部特征,这些局部特征再汇总到下一层,这样一层一层的传递下去,特征由小变大,最后在通过这些局部的特征对图片进行处理,这样大大提高了计算效率,也提高了准确度。\n",
"### 卷积层\n",
"#### 卷积计算\n",
"在介绍卷积层之前要先介绍一下卷积的计算,这里使用[知乎](https://www.zhihu.com/question/39022858)上的一张图片\n",
"![](9.gif)\n",
"我们会定义一个权重矩阵也就是我们说的W一般对于卷积来说称作卷积的核kernel也有有人称做过滤器filter这个权重矩阵的大小一般为`3 * 3` 或者`5 * 5`但是在LeNet里面还用到了比较大的`7 * 7`现在已经很少见了因为根据经验的验证3和5是最佳的大小。\n",
"我们以图上所示的方式,我们在输入矩阵上使用我们的权重矩阵进行滑动,每滑动一步,将所覆盖的值与矩阵对应的值相乘,并将结果求和并作为输出矩阵的一项,依次类推直到全部计算完成。\n",
"\n",
"上图所示,我们输入是一个 `5 * 5`的矩阵,通过使用一次`3 * 3`的卷积核计算得到的计算结果是一个`3 * 3`的新矩阵。\n",
"那么新矩阵的大小是如何计算的呢?\n",
"#### 卷积核大小 f\n",
"刚才已经说到了一个重要的参数就是核的大小我们这里用f来表示\n",
"\n",
"#### 边界填充 (p)adding\n",
"我们看到上图,经过计算后矩阵的大小改变了,如果要使矩阵大小不改变呢,我们可以先对矩阵做一个填充,将矩阵的周围全部再包围一层,这个矩阵就变成了`7*7`,上下左右各加1相当于 `5+1+1=7` 这时,计算的结果还是 `5 * 5`的矩阵保证了大小不变这里的p=1\n",
"\n",
"#### 步长 (s)tride\n",
"从动图上我们能够看到,每次滑动只是滑动了一个距离,如果每次滑动两个距离呢?那就需要使用步长这个参数。\n",
"\n",
"#### 计算公式\n",
"\n",
"n为我们输入的矩阵的大小$ \\frac{n-f+2p}{s} +1 $ 向下取整\n",
"\n",
"这个公式非常重要一定要记住\n",
"\n",
"#### 卷积层\n",
"在每一个卷积层中我们都会设置多个核,每个核代表着不同的特征,这些特征就是我们需要传递到下一层的输出,而我们训练的过程就是训练这些不同的核。\n",
"\n",
"### 激活函数\n",
"由于卷积的操作也是线性的所以也需要进行激活一般情况下都会使用relu。\n",
"\n",
"### 池化层pooling\n",
"池化层是CNN的重要组成部分通过减少卷积层之间的连接降低运算复杂程度池化层的操作很简单就想相当于是合并我们输入一个过滤器的大小与卷积的操作一样也是一步一步滑动但是过滤器覆盖的区域进行合并只保留一个值。\n",
"合并的方式也有很多种例如我们常用的两种取最大值maxpooling取平均值avgpooling\n",
"\n",
"池化层的输出大小公式也与卷积层一样由于没有进行填充所以p=0可以简化为\n",
"$ \\frac{n-f}{s} +1 $\n",
"\n",
"### dropout层\n",
"dropout是2014年 Hinton 提出防止过拟合而采用的trick增强了模型的泛化能力\n",
"Dropout随机失活是指在深度学习网络的训练过程中按照一定的概率将一部分神经网络单元暂时从网络中丢弃相当于从原始的网络中找到一个更瘦的网络说的通俗一点就是随机将一部分网络的传播掐断听起来好像不靠谱但是通过实际测试效果非常好。\n",
"有兴趣的可以去看一下原文[Dropout: A Simple Way to Prevent Neural Networks from Overfitting](http://jmlr.org/papers/v15/srivastava14a.html)这里就不详细介绍了。\n",
"\n",
"### 全连接层\n",
"全链接层一般是作为最后的输出层使用,卷积的作用是提取图像的特征,最后的全连接层就是要通过这些特征来进行计算,输出我们所要的结果了,无论是分类,还是回归。\n",
"\n",
"我们的特征都是使用矩阵表示的所以再传入全连接层之前还需要对特征进行压扁将他这些特征变成一维的向量如果要进行分类的话就是用sofmax作为输出如果要是回归的话就直接使用linear即可。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"以上就是卷积神经网络几个主要的组成部分,下面我们介绍一些经典的网络模型\n",
"## 2.4.3 经典模型"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### LeNet-5\n",
"1998 Yann LeCun 的 LeNet5 [官网](http://yann.lecun.com/exdb/lenet/index.html)\n",
"\n",
"卷积神经网路的开山之作麻雀虽小但五脏俱全卷积层、pooling层、全连接层这些都是现代CNN网络的基本组件\n",
" - 用卷积提取空间特征;\n",
" - 由空间平均得到子样本;\n",
" - 用 tanh 或 sigmoid 得到非线性;\n",
" - 用 multi-layer neural networkMLP作为最终分类器\n",
" - 层层之间用稀疏的连接矩阵,以避免大的计算成本。\n",
"![](lenet5.jpg)\n",
"\n",
"输入图像Size为32*32。这要比mnist数据库中最大的字母(28*28)还大。这样做的目的是希望潜在的明显特征,如笔画断续、角点能够出现在最高层特征监测子感受野的中心。\n",
"\n",
"输出10个类别分别为0-9数字的概率\n",
"\n",
"1. C1层是一个卷积层有6个卷积核提取6种局部特征核大小为5 * 5\n",
"2. S2层是pooling层下采样区域:2 * 2 )降低网络训练参数及模型的过拟合程度。\n",
"3. C3层是第二个卷积层使用16个卷积核核大小:5 * 5 提取特征\n",
"4. S4层也是一个pooling层区域:2*2\n",
"5. C5层是最后一个卷积层卷积核大小:5 * 5 卷积核种类:120\n",
"6. 最后使用全连接层将C5的120个特征进行分类最后输出0-9的概率\n",
"\n",
"以下代码来自[官方教程](https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"LeNet5(\n",
" (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))\n",
" (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))\n",
" (fc1): Linear(in_features=400, out_features=120, bias=True)\n",
" (fc2): Linear(in_features=120, out_features=84, bias=True)\n",
" (fc3): Linear(in_features=84, out_features=10, bias=True)\n",
")\n"
]
}
],
"source": [
"import torch.nn as nn\n",
"class LeNet5(nn.Module):\n",
"\n",
" def __init__(self):\n",
" super(LeNet5, self).__init__()\n",
" # 1 input image channel, 6 output channels, 5x5 square convolution\n",
" # kernel\n",
" self.conv1 = nn.Conv2d(1, 6, 5)\n",
" self.conv2 = nn.Conv2d(6, 16, 5)\n",
" # an affine operation: y = Wx + b\n",
" self.fc1 = nn.Linear(16 * 5 * 5, 120) # 这里论文上写的是conv,官方教程用了线性层\n",
" self.fc2 = nn.Linear(120, 84)\n",
" self.fc3 = nn.Linear(84, 10)\n",
"\n",
" def forward(self, x):\n",
" # Max pooling over a (2, 2) window\n",
" x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))\n",
" # If the size is a square you can only specify a single number\n",
" x = F.max_pool2d(F.relu(self.conv2(x)), 2)\n",
" x = x.view(-1, self.num_flat_features(x))\n",
" x = F.relu(self.fc1(x))\n",
" x = F.relu(self.fc2(x))\n",
" x = self.fc3(x)\n",
" return x\n",
"\n",
" def num_flat_features(self, x):\n",
" size = x.size()[1:] # all dimensions except the batch dimension\n",
" num_features = 1\n",
" for s in size:\n",
" num_features *= s\n",
" return num_features\n",
"\n",
"\n",
"net = LeNet5()\n",
"print(net)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### AlexNet\n",
"2012Alex Krizhevsky\n",
"可以算作LeNet的一个更深和更广的版本可以用来学习更复杂的对象 [论文](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf)\n",
" - 用rectified linear unitsReLU得到非线性\n",
" - 使用 dropout 技巧在训练期间有选择性地忽略单个神经元,来减缓模型的过拟合;\n",
" - 重叠最大池,避免平均池的平均效果;\n",
" - 使用 GPU NVIDIA GTX 580 可以减少训练时间这比用CPU处理快了 10 倍,所以可以被用于更大的数据集和图像上。\n",
"![](alexnet.png)\n",
"虽然 AlexNet只有8层但是它有60M以上的参数总量Alexnet有一个特殊的计算层LRN层做的事是对当前层的输出结果做平滑处理这里就不做详细介绍了\n",
"Alexnet的每一阶段含一次卷积主要计算的算作一层可以分为8层\n",
"1. con - relu - pooling - LRN \n",
"要注意的是input层是227*227而不是paper里面的224这里可以算一下主要是227可以整除后面的conv1计算224不整除。如果一定要用224可以通过自动补边实现不过在input就补边感觉没有意义补得也是0这就是我们上面说的公式的重要性。\n",
"\n",
"2. conv - relu - pool - LRN \n",
"group=2这个属性强行把前面结果的feature map分开卷积部分分成两部分做\n",
"\n",
"3. conv - relu\n",
"\n",
"4. conv - relu\n",
"\n",
"5. conv - relu - pool\n",
"\n",
"6. fc - relu - dropout \n",
"dropout层在alexnet中是说在训练的以1/2概率使得隐藏层的某些neuron的输出为0这样就丢到了一半节点的输出BP的时候也不更新这些节点防止过拟合。\n",
"\n",
"7. fc - relu - dropout \n",
"\n",
"8. fc - softmax \n",
"\n",
"在Pytorch的vision包中是包含Alexnet的官方实现的我们直接使用官方版本看下网络"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"AlexNet(\n",
" (features): Sequential(\n",
" (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))\n",
" (1): ReLU(inplace)\n",
" (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
" (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))\n",
" (4): ReLU(inplace)\n",
" (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
" (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
" (7): ReLU(inplace)\n",
" (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
" (9): ReLU(inplace)\n",
" (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
" (11): ReLU(inplace)\n",
" (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
" )\n",
" (classifier): Sequential(\n",
" (0): Dropout(p=0.5)\n",
" (1): Linear(in_features=9216, out_features=4096, bias=True)\n",
" (2): ReLU(inplace)\n",
" (3): Dropout(p=0.5)\n",
" (4): Linear(in_features=4096, out_features=4096, bias=True)\n",
" (5): ReLU(inplace)\n",
" (6): Linear(in_features=4096, out_features=1000, bias=True)\n",
" )\n",
")\n"
]
}
],
"source": [
"import torchvision\n",
"model = torchvision.models.alexnet(pretrained=False) #我们不下载预训练权重\n",
"print(model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### VGG\n",
"2015牛津的 VGG。[论文](https://arxiv.org/pdf/1409.1556.pdf)\n",
"\n",
" - 每个卷积层中使用更小的 3×3 filters并将它们组合成卷积序列\n",
" - 多个3×3卷积序列可以模拟更大的接收场的效果\n",
" - 每次的图像像素缩小一倍,卷积核的数量增加一倍\n",
" \n",
"VGG有很多个版本也算是比较稳定和经典的model。它的特点也是连续conv多计算量巨大这里我们以VGG16为例.[图片来源](https://www.cs.toronto.edu/~frossard/post/vgg16/)\n",
"![](vgg16.png) \n",
"VGG清一色用小卷积核结合作者和自己的观点这里整理出小卷积核比用大卷积核的优势\n",
"\n",
"根据作者的观点input8 -> 3层conv3x3后output=2等同于1层conv7x7的结果 input=8 -> 2层conv3x3后output=2等同于2层conv5x5的结果\n",
"\n",
"卷积层的参数减少。相比5x5、7x7和11x11的大卷积核3x3明显地减少了参数量\n",
"\n",
"通过卷积和池化层后,图像的分辨率降低为原来的一半,但是图像的特征增加一倍,这是一个十分规整的操作:\n",
"分辨率由输入的224->112->56->28->14->7\n",
"特征从原始的RGB3个通道-> 64 ->128 -> 256 -> 512\n",
"\n",
"这为后面的网络提供了一个标准我们依旧使用Pytorch官方实现版本来查看"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"VGG(\n",
" (features): Sequential(\n",
" (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
" (1): ReLU(inplace)\n",
" (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
" (3): ReLU(inplace)\n",
" (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
" (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
" (6): ReLU(inplace)\n",
" (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
" (8): ReLU(inplace)\n",
" (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
" (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
" (11): ReLU(inplace)\n",
" (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
" (13): ReLU(inplace)\n",
" (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
" (15): ReLU(inplace)\n",
" (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
" (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
" (18): ReLU(inplace)\n",
" (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
" (20): ReLU(inplace)\n",
" (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
" (22): ReLU(inplace)\n",
" (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
" (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
" (25): ReLU(inplace)\n",
" (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
" (27): ReLU(inplace)\n",
" (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
" (29): ReLU(inplace)\n",
" (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
" )\n",
" (classifier): Sequential(\n",
" (0): Linear(in_features=25088, out_features=4096, bias=True)\n",
" (1): ReLU(inplace)\n",
" (2): Dropout(p=0.5)\n",
" (3): Linear(in_features=4096, out_features=4096, bias=True)\n",
" (4): ReLU(inplace)\n",
" (5): Dropout(p=0.5)\n",
" (6): Linear(in_features=4096, out_features=1000, bias=True)\n",
" )\n",
")\n"
]
}
],
"source": [
"import torchvision\n",
"model = torchvision.models.vgg16(pretrained=False) #我们不下载预训练权重\n",
"print(model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### GoogLeNet (Inception)\n",
"2014Google Christian Szegedy [论文](https://arxiv.org/abs/1512.00567)\n",
"- 使用1×1卷积块NiN来减少特征数量这通常被称为“瓶颈”可以减少深层神经网络的计算负担。\n",
"- 每个池化层之前,增加 feature maps增加每一层的宽度来增多特征的组合性\n",
"\n",
"googlenet最大的特点就是包含若干个inception模块所以有时候也称作 inception net。\n",
"googlenet虽然层数要比VGG多很多但是由于inception的设计计算速度方面要快很多。\n",
"\n",
"![](googlenet.png)\n",
"\n",
"不要被这个图吓到,其实原理很简单\n",
"\n",
"Inception架构的主要思想是找出如何让已有的稠密组件接近与覆盖卷积视觉网络中的最佳局部稀疏结构。现在需要找出最优的局部构造并且重复几次。之前的一篇文献提出一个层与层的结构在最后一层进行相关性统计将高相关性的聚集到一起。这些聚类构成下一层的单元且与上一层单元连接。假设前面层的每个单元对应于输入图像的某些区域这些单元被分为滤波器组。在接近输入层的低层中相关单元集中在某些局部区域最终得到在单个区域中的大量聚类在最后一层通过1x1的卷积覆盖。\n",
"\n",
"上面的话听起来很生硬,其实解释起来很简单:每一模块我们都是用若干个不同的特征提取方式,例如 3x3卷积5x5卷积1x1的卷积pooling等都计算一下最后再把这些结果通过Filter Concat来进行连接找到这里面作用最大的。而网络里面包含了许多这样的模块这样不用我们人为去判断哪个特征提取方式好网络会自己解决是不是有点像AUTO ML在Pytorch中实现了InceptionA-E还有InceptionAUX 模块。\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Inception3(\n",
" (Conv2d_1a_3x3): BasicConv2d(\n",
" (conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)\n",
" (bn): BatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (Conv2d_2a_3x3): BasicConv2d(\n",
" (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (Conv2d_2b_3x3): BasicConv2d(\n",
" (conv): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (Conv2d_3b_1x1): BasicConv2d(\n",
" (conv): Conv2d(64, 80, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(80, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (Conv2d_4a_3x3): BasicConv2d(\n",
" (conv): Conv2d(80, 192, kernel_size=(3, 3), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (Mixed_5b): InceptionA(\n",
" (branch1x1): BasicConv2d(\n",
" (conv): Conv2d(192, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch5x5_1): BasicConv2d(\n",
" (conv): Conv2d(192, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(48, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch5x5_2): BasicConv2d(\n",
" (conv): Conv2d(48, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)\n",
" (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3dbl_1): BasicConv2d(\n",
" (conv): Conv2d(192, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3dbl_2): BasicConv2d(\n",
" (conv): Conv2d(64, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3dbl_3): BasicConv2d(\n",
" (conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch_pool): BasicConv2d(\n",
" (conv): Conv2d(192, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (Mixed_5c): InceptionA(\n",
" (branch1x1): BasicConv2d(\n",
" (conv): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch5x5_1): BasicConv2d(\n",
" (conv): Conv2d(256, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(48, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch5x5_2): BasicConv2d(\n",
" (conv): Conv2d(48, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)\n",
" (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3dbl_1): BasicConv2d(\n",
" (conv): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3dbl_2): BasicConv2d(\n",
" (conv): Conv2d(64, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3dbl_3): BasicConv2d(\n",
" (conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch_pool): BasicConv2d(\n",
" (conv): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (Mixed_5d): InceptionA(\n",
" (branch1x1): BasicConv2d(\n",
" (conv): Conv2d(288, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch5x5_1): BasicConv2d(\n",
" (conv): Conv2d(288, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(48, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch5x5_2): BasicConv2d(\n",
" (conv): Conv2d(48, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)\n",
" (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3dbl_1): BasicConv2d(\n",
" (conv): Conv2d(288, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3dbl_2): BasicConv2d(\n",
" (conv): Conv2d(64, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3dbl_3): BasicConv2d(\n",
" (conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch_pool): BasicConv2d(\n",
" (conv): Conv2d(288, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (Mixed_6a): InceptionB(\n",
" (branch3x3): BasicConv2d(\n",
" (conv): Conv2d(288, 384, kernel_size=(3, 3), stride=(2, 2), bias=False)\n",
" (bn): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3dbl_1): BasicConv2d(\n",
" (conv): Conv2d(288, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3dbl_2): BasicConv2d(\n",
" (conv): Conv2d(64, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3dbl_3): BasicConv2d(\n",
" (conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(2, 2), bias=False)\n",
" (bn): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (Mixed_6b): InceptionC(\n",
" (branch1x1): BasicConv2d(\n",
" (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7_1): BasicConv2d(\n",
" (conv): Conv2d(768, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(128, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7_2): BasicConv2d(\n",
" (conv): Conv2d(128, 128, kernel_size=(1, 7), stride=(1, 1), padding=(0, 3), bias=False)\n",
" (bn): BatchNorm2d(128, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7_3): BasicConv2d(\n",
" (conv): Conv2d(128, 192, kernel_size=(7, 1), stride=(1, 1), padding=(3, 0), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7dbl_1): BasicConv2d(\n",
" (conv): Conv2d(768, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(128, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7dbl_2): BasicConv2d(\n",
" (conv): Conv2d(128, 128, kernel_size=(7, 1), stride=(1, 1), padding=(3, 0), bias=False)\n",
" (bn): BatchNorm2d(128, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7dbl_3): BasicConv2d(\n",
" (conv): Conv2d(128, 128, kernel_size=(1, 7), stride=(1, 1), padding=(0, 3), bias=False)\n",
" (bn): BatchNorm2d(128, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7dbl_4): BasicConv2d(\n",
" (conv): Conv2d(128, 128, kernel_size=(7, 1), stride=(1, 1), padding=(3, 0), bias=False)\n",
" (bn): BatchNorm2d(128, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7dbl_5): BasicConv2d(\n",
" (conv): Conv2d(128, 192, kernel_size=(1, 7), stride=(1, 1), padding=(0, 3), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch_pool): BasicConv2d(\n",
" (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (Mixed_6c): InceptionC(\n",
" (branch1x1): BasicConv2d(\n",
" (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7_1): BasicConv2d(\n",
" (conv): Conv2d(768, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7_2): BasicConv2d(\n",
" (conv): Conv2d(160, 160, kernel_size=(1, 7), stride=(1, 1), padding=(0, 3), bias=False)\n",
" (bn): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7_3): BasicConv2d(\n",
" (conv): Conv2d(160, 192, kernel_size=(7, 1), stride=(1, 1), padding=(3, 0), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7dbl_1): BasicConv2d(\n",
" (conv): Conv2d(768, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7dbl_2): BasicConv2d(\n",
" (conv): Conv2d(160, 160, kernel_size=(7, 1), stride=(1, 1), padding=(3, 0), bias=False)\n",
" (bn): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7dbl_3): BasicConv2d(\n",
" (conv): Conv2d(160, 160, kernel_size=(1, 7), stride=(1, 1), padding=(0, 3), bias=False)\n",
" (bn): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7dbl_4): BasicConv2d(\n",
" (conv): Conv2d(160, 160, kernel_size=(7, 1), stride=(1, 1), padding=(3, 0), bias=False)\n",
" (bn): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7dbl_5): BasicConv2d(\n",
" (conv): Conv2d(160, 192, kernel_size=(1, 7), stride=(1, 1), padding=(0, 3), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch_pool): BasicConv2d(\n",
" (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (Mixed_6d): InceptionC(\n",
" (branch1x1): BasicConv2d(\n",
" (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7_1): BasicConv2d(\n",
" (conv): Conv2d(768, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7_2): BasicConv2d(\n",
" (conv): Conv2d(160, 160, kernel_size=(1, 7), stride=(1, 1), padding=(0, 3), bias=False)\n",
" (bn): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7_3): BasicConv2d(\n",
" (conv): Conv2d(160, 192, kernel_size=(7, 1), stride=(1, 1), padding=(3, 0), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7dbl_1): BasicConv2d(\n",
" (conv): Conv2d(768, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7dbl_2): BasicConv2d(\n",
" (conv): Conv2d(160, 160, kernel_size=(7, 1), stride=(1, 1), padding=(3, 0), bias=False)\n",
" (bn): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7dbl_3): BasicConv2d(\n",
" (conv): Conv2d(160, 160, kernel_size=(1, 7), stride=(1, 1), padding=(0, 3), bias=False)\n",
" (bn): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7dbl_4): BasicConv2d(\n",
" (conv): Conv2d(160, 160, kernel_size=(7, 1), stride=(1, 1), padding=(3, 0), bias=False)\n",
" (bn): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7dbl_5): BasicConv2d(\n",
" (conv): Conv2d(160, 192, kernel_size=(1, 7), stride=(1, 1), padding=(0, 3), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch_pool): BasicConv2d(\n",
" (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (Mixed_6e): InceptionC(\n",
" (branch1x1): BasicConv2d(\n",
" (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7_1): BasicConv2d(\n",
" (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7_2): BasicConv2d(\n",
" (conv): Conv2d(192, 192, kernel_size=(1, 7), stride=(1, 1), padding=(0, 3), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7_3): BasicConv2d(\n",
" (conv): Conv2d(192, 192, kernel_size=(7, 1), stride=(1, 1), padding=(3, 0), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7dbl_1): BasicConv2d(\n",
" (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7dbl_2): BasicConv2d(\n",
" (conv): Conv2d(192, 192, kernel_size=(7, 1), stride=(1, 1), padding=(3, 0), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7dbl_3): BasicConv2d(\n",
" (conv): Conv2d(192, 192, kernel_size=(1, 7), stride=(1, 1), padding=(0, 3), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7dbl_4): BasicConv2d(\n",
" (conv): Conv2d(192, 192, kernel_size=(7, 1), stride=(1, 1), padding=(3, 0), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7dbl_5): BasicConv2d(\n",
" (conv): Conv2d(192, 192, kernel_size=(1, 7), stride=(1, 1), padding=(0, 3), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch_pool): BasicConv2d(\n",
" (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (AuxLogits): InceptionAux(\n",
" (conv0): BasicConv2d(\n",
" (conv): Conv2d(768, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(128, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (conv1): BasicConv2d(\n",
" (conv): Conv2d(128, 768, kernel_size=(5, 5), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(768, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (fc): Linear(in_features=768, out_features=1000, bias=True)\n",
" )\n",
" (Mixed_7a): InceptionD(\n",
" (branch3x3_1): BasicConv2d(\n",
" (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3_2): BasicConv2d(\n",
" (conv): Conv2d(192, 320, kernel_size=(3, 3), stride=(2, 2), bias=False)\n",
" (bn): BatchNorm2d(320, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7x3_1): BasicConv2d(\n",
" (conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7x3_2): BasicConv2d(\n",
" (conv): Conv2d(192, 192, kernel_size=(1, 7), stride=(1, 1), padding=(0, 3), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7x3_3): BasicConv2d(\n",
" (conv): Conv2d(192, 192, kernel_size=(7, 1), stride=(1, 1), padding=(3, 0), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch7x7x3_4): BasicConv2d(\n",
" (conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(2, 2), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (Mixed_7b): InceptionE(\n",
" (branch1x1): BasicConv2d(\n",
" (conv): Conv2d(1280, 320, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(320, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3_1): BasicConv2d(\n",
" (conv): Conv2d(1280, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3_2a): BasicConv2d(\n",
" (conv): Conv2d(384, 384, kernel_size=(1, 3), stride=(1, 1), padding=(0, 1), bias=False)\n",
" (bn): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3_2b): BasicConv2d(\n",
" (conv): Conv2d(384, 384, kernel_size=(3, 1), stride=(1, 1), padding=(1, 0), bias=False)\n",
" (bn): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3dbl_1): BasicConv2d(\n",
" (conv): Conv2d(1280, 448, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(448, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3dbl_2): BasicConv2d(\n",
" (conv): Conv2d(448, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3dbl_3a): BasicConv2d(\n",
" (conv): Conv2d(384, 384, kernel_size=(1, 3), stride=(1, 1), padding=(0, 1), bias=False)\n",
" (bn): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3dbl_3b): BasicConv2d(\n",
" (conv): Conv2d(384, 384, kernel_size=(3, 1), stride=(1, 1), padding=(1, 0), bias=False)\n",
" (bn): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch_pool): BasicConv2d(\n",
" (conv): Conv2d(1280, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (Mixed_7c): InceptionE(\n",
" (branch1x1): BasicConv2d(\n",
" (conv): Conv2d(2048, 320, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(320, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3_1): BasicConv2d(\n",
" (conv): Conv2d(2048, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3_2a): BasicConv2d(\n",
" (conv): Conv2d(384, 384, kernel_size=(1, 3), stride=(1, 1), padding=(0, 1), bias=False)\n",
" (bn): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3_2b): BasicConv2d(\n",
" (conv): Conv2d(384, 384, kernel_size=(3, 1), stride=(1, 1), padding=(1, 0), bias=False)\n",
" (bn): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3dbl_1): BasicConv2d(\n",
" (conv): Conv2d(2048, 448, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(448, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3dbl_2): BasicConv2d(\n",
" (conv): Conv2d(448, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3dbl_3a): BasicConv2d(\n",
" (conv): Conv2d(384, 384, kernel_size=(1, 3), stride=(1, 1), padding=(0, 1), bias=False)\n",
" (bn): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch3x3dbl_3b): BasicConv2d(\n",
" (conv): Conv2d(384, 384, kernel_size=(3, 1), stride=(1, 1), padding=(1, 0), bias=False)\n",
" (bn): BatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (branch_pool): BasicConv2d(\n",
" (conv): Conv2d(2048, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
" (bn): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (fc): Linear(in_features=2048, out_features=1000, bias=True)\n",
")\n"
]
}
],
"source": [
"# inception_v3需要scipy所以没有安装的话pip install scipy 一下\n",
"import torchvision\n",
"model = torchvision.models.inception_v3(pretrained=False) #我们不下载预训练权重\n",
"print(model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ResNet\n",
"2015Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun [论文](https://arxiv.org/abs/1512.03385)\n",
"Kaiming He 何凯明(音译)这个大神大家一定要记住,现在很多论文都有他参与(mask rcnn, focal loss)Jian Sun孙剑老师就不用说了现在旷视科技的首席科学家。\n",
"刚才的GoogLeNet已经很深了ResNet可以做到更深通过残差计算可以训练超过1000层的网络俗称跳连接\n",
"\n",
"#### 退化问题\n",
"网络层数增加但是在训练集上的准确率却饱和甚至下降了。这个不能解释为overfitting因为overfit应该表现为在训练集上表现更好才对。这个就是网络退化的问题退化问题说明了深度网络不能很简单地被很好地优化\n",
"\n",
"#### 残差网络的解决办法\n",
"深层网络的后面那些层是恒等映射那么模型就退化为一个浅层网络。那现在要解决的就是学习恒等映射函数了。让一些层去拟合一个潜在的恒等映射函数H(x) = x比较困难。如果把网络设计为H(x) = F(x) + x。我们可以转换为学习一个残差函数F(x) = H(x) - x。 只要F(x)=0就构成了一个恒等映射H(x) = x. 而且,拟合残差肯定更加容易。\n",
"\n",
"以上又很不好理解,继续解释下,先看图:\n",
"![](resnet.png)\n",
"\n",
"我们在激活函数前将上一层或几层的输出与本层计算的输出相加将求和的结果输入到激活函数中做为本层的输出引入残差后的映射对输出的变化更敏感其实就是看本层相对前几层是否有大的变化相当于是一个差分放大器的作用。图中的曲线就是残差中的shoutcut他将前一层的结果直接连接到了本层也就是俗称的跳连接。\n",
"\n",
"我们以经典的resnet18来看一下网络结构 [图片来源](https://www.researchgate.net/figure/Proposed-Modified-ResNet-18-architecture-for-Bangla-HCR-In-the-diagram-conv-stands-for_fig1_323063171)\n",
"![](resnet18.jpg)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ResNet(\n",
" (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)\n",
" (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (relu): ReLU(inplace)\n",
" (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)\n",
" (layer1): Sequential(\n",
" (0): BasicBlock(\n",
" (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
" (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (relu): ReLU(inplace)\n",
" (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
" (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" (1): BasicBlock(\n",
" (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
" (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (relu): ReLU(inplace)\n",
" (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
" (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (layer2): Sequential(\n",
" (0): BasicBlock(\n",
" (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)\n",
" (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (relu): ReLU(inplace)\n",
" (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
" (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (downsample): Sequential(\n",
" (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)\n",
" (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (1): BasicBlock(\n",
" (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
" (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (relu): ReLU(inplace)\n",
" (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
" (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (layer3): Sequential(\n",
" (0): BasicBlock(\n",
" (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)\n",
" (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (relu): ReLU(inplace)\n",
" (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
" (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (downsample): Sequential(\n",
" (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)\n",
" (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (1): BasicBlock(\n",
" (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
" (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (relu): ReLU(inplace)\n",
" (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
" (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (layer4): Sequential(\n",
" (0): BasicBlock(\n",
" (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)\n",
" (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (relu): ReLU(inplace)\n",
" (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
" (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (downsample): Sequential(\n",
" (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)\n",
" (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (1): BasicBlock(\n",
" (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
" (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" (relu): ReLU(inplace)\n",
" (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
" (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
" )\n",
" )\n",
" (avgpool): AvgPool2d(kernel_size=7, stride=1, padding=0)\n",
" (fc): Linear(in_features=512, out_features=1000, bias=True)\n",
")\n"
]
}
],
"source": [
"import torchvision\n",
"model = torchvision.models.resnet18(pretrained=False) #我们不下载预训练权重\n",
"print(model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"那么我们该如何选择网络呢?\n",
"[来源](https://www.researchgate.net/figure/Comparison-of-popular-CNN-architectures-The-vertical-axis-shows-top-1-accuracy-on_fig2_320084139)\n",
"![](cnn.png)\n",
"以上表格可以清楚的看到准确率和计算量之间的对比。我的建议是小型图片分类任务resnet18基本上已经可以了如果真对准确度要求比较高再选其他更好的网络架构。\n",
"\n",
"**另外有句俗话叫穷人只能AlexNet富人才用Res**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}