pytorch-handbook/chapter2/2.2-deep-learning-basic-mathematics.ipynb
2019-03-11 09:09:49 +08:00

699 lines
89 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 2.2 深度学习基础及数学原理\n",
"深度学习并没有想象的那么难甚至比有些传统的机器学习更简单。所用到的数学知识也不需要特别的高深本章将会一边讲解深度学习中的基本理论一边通过动手使用PyTorch实现一些简单的理论本章内容很多所以只做一个简短的介绍\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.2.1 监督学习和无监督学习\n",
"监督学习、无监督学习、半监督学习、强化学习是我们日常接触到的常见的四个机器学习方法:\n",
"\n",
"- 监督学习:通过已有的训练样本(即已知数据以及其对应的输出)去训练得到一个最优模型(这个模型属于某个函数的集合,最优则表示在某个评价准则下是最佳的),再利用这个模型将所有的输入映射为相应的输出。\n",
"- 无监督学习:它与监督学习的不同之处,在于我们事先没有任何训练样本,而需要直接对数据进行建模。 \n",
"- 半监督学习 :在训练阶段结合了大量未标记的数据和少量标签数据。与使用所有标签数据的模型相比,使用训练集的训练模型在训练时可以更为准确。\n",
"- 强化学习我们设定一个回报函数reward function通过这个函数来确认否越来越接近目标类似我们训练宠物如果做对了就给他奖励做错了就给予惩罚最后来达到我们的训练目的。\n",
"\n",
"这里我们只着重介绍监督学习因为我们后面的绝大部们课程都是使用的监督学习的方法在训练和验证时输入的数据既包含输入x,又包含x对应的输出y即学习数据已经事先给出了正确答案。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.2.2 线性回归 Linear Regreesion\n",
"线性回归是利用数理统计中回归分析来确定两种或两种以上变量间相互依赖的定量关系的一种统计分析方法运用十分广泛。其表达形式为y = w'x+ee为误差服从均值为0的正态分布。 \n",
"\n",
"回归分析中,只包括一个自变量和一个因变量,且二者的关系可用一条直线近似表示,这种回归分析称为一元线性回归分析。如果回归分析中包括两个或两个以上的自变量,且因变量和自变量之间是线性关系,则称为多元线性回归分析。\n",
"摘自[百度百科](https://baike.baidu.com/item/线性回归/8190345)\n",
"\n",
"简单的说:\n",
"线性回归对于输入x与输出y有一个映射fy=f(x),而f的形式为aX+b。其中a和b是两个可调的参数我们训练的时候就是训练ab这两个参数。\n",
"\n",
"下面我们来用pyTorch的代码来做一个详细的解释"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'1.0.0'"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 引用\n",
"# 注意,这里我们使用了一个新库叫 seaborn 如果报错找不到包的话请使用pip install seaborn 来进行安装\n",
"import torch\n",
"from torch.nn import Linear, Module, MSELoss\n",
"from torch.optim import SGD\n",
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"torch.__version__"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"下面定义一个线性函数这里使用y = 5x + 7这里的5和7就是上面说到的参数a和b我们先使用matplot可视化一下这个函数"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[<matplotlib.lines.Line2D at 0x1cae5b6e630>]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"x=np.linspace(0,20,500)\n",
"y=5*x + 7\n",
"plt.plot(x,y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"下面我生成一些随机的点,来作为我们的训练数据"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"x = np.random.rand(256)\n",
"noise = np.random.randn(256) / 4\n",
"y = x * 5 + 7 + noise\n",
"df = pd.DataFrame()\n",
"df['x'] = x\n",
"df['y'] = y"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"在图上显示下我们生成的数据"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\ProgramData\\Anaconda3\\envs\\pytorch1\\lib\\site-packages\\scipy\\stats\\stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.\n",
" return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval\n"
]
},
{
"data": {
"text/plain": [
"<seaborn.axisgrid.FacetGrid at 0x1cadecb7780>"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 360x360 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.lmplot(x='x', y='y', data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"我们随机生成了一些点下面将使用PyTorch建立一个线性的模型来对其进行拟合这就是所说的训练的过程由于只有一层线性模型所以我们就直接使用了"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"model=Linear(1, 1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"损失函数我们使用均方损失函数MSELoss这个后面会详细介绍"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"criterion = MSELoss()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"优化器我们选择最长见的优化方法 SGD就是每一次迭代计算mini-batch的梯度然后对参数进行更新学习率0.01。优化器在本章后面也会进行介绍"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"optim = SGD(model.parameters(), lr = 0.01)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"训练3000次"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"epochs = 3000"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"wb是我们所需要训练的模型参数就是5和7记录下来然后最后显示一下我们这个模型最后训练的结果是多少"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"[w, b] = model.parameters()\n",
"x_train = x.reshape(-1, 1).astype('float32')\n",
"y_train = y.reshape(-1, 1).astype('float32')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"开始训练了"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 100, loss 0.4731859564781189\n",
"epoch 200, loss 0.06617910414934158\n",
"epoch 300, loss 0.0634840652346611\n",
"epoch 400, loss 0.06299427896738052\n",
"epoch 500, loss 0.06262210011482239\n",
"epoch 600, loss 0.062333136796951294\n",
"epoch 700, loss 0.062108736485242844\n",
"epoch 800, loss 0.06193448230624199\n",
"epoch 900, loss 0.06179914250969887\n",
"epoch 1000, loss 0.06169406324625015\n",
"epoch 1100, loss 0.06161244958639145\n",
"epoch 1200, loss 0.061549071222543716\n",
"epoch 1300, loss 0.06149986386299133\n",
"epoch 1400, loss 0.06146164610981941\n",
"epoch 1500, loss 0.061431970447301865\n",
"epoch 1600, loss 0.06140892207622528\n",
"epoch 1700, loss 0.06139101833105087\n",
"epoch 1800, loss 0.06137711927294731\n",
"epoch 1900, loss 0.06136634573340416\n",
"epoch 2000, loss 0.06135796383023262\n",
"epoch 2100, loss 0.06135144084692001\n",
"epoch 2200, loss 0.06134640425443649\n",
"epoch 2300, loss 0.06134248524904251\n",
"epoch 2400, loss 0.06133943423628807\n",
"epoch 2500, loss 0.06133706122636795\n",
"epoch 2600, loss 0.06133522093296051\n",
"epoch 2700, loss 0.061333801597356796\n",
"epoch 2800, loss 0.06133269891142845\n",
"epoch 2900, loss 0.06133183836936951\n",
"epoch 3000, loss 0.06133117899298668\n"
]
}
],
"source": [
"for i in range(epochs):\n",
" i+=1\n",
" # 整理输入和输出的数据这里输入和输出一定要是torch的Tensor类型\n",
" inputs = torch.from_numpy(x_train)\n",
" labels = torch.from_numpy(y_train)\n",
" #使用模型进行预测\n",
" outputs = model(inputs)\n",
" #梯度置0否则会累加\n",
" optim.zero_grad()\n",
" # 计算损失\n",
" loss = criterion(outputs, labels)\n",
" # 反向传播\n",
" loss.backward()\n",
" # 使用优化器默认方行优化\n",
" optim.step()\n",
" if (i%100==0):\n",
" #每 100次打印一下损失函数看看效果\n",
" print('epoch {}, loss {}'.format(i,loss.data.item()))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"训练完成了,看一下训练的成果是多少\n",
"\n",
"我们期望的数据 w=5b=7 可以做一下对比"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"4.924540996551514 7.020828723907471\n"
]
}
],
"source": [
"print (w.data.item(),b.data.item())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"再次可视化一下我们的模型看看我们训练的数据如果你不喜欢seaborn可以直接使用matplot"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"predicted =model.forward(torch.from_numpy(x_train)).data.numpy()\n",
"plt.plot(x_train, y_train, 'go', label = 'data', alpha = 0.3)\n",
"plt.plot(x_train, predicted, label = 'predicted', alpha = 1)\n",
"plt.legend()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"以上就是一个使用PyTorch做线性回归的简单样例了下面我们会对上面的内容做详细的介绍\n",
"## 2.2.3 损失函数(Loss Function)\n",
"损失函数loss function是用来估量模型的预测值(我们例子中的output)与真实值例子中的y_train的不一致程度它是一个非负实值函数,损失函数越小,模型的鲁棒性就越好。\n",
"我们训练模型的过程,就是通过不断的迭代计算,使用梯度下降的优化算法,使得损失函数越来越小。损失函数越小就表示算法达到意义上的最优。\n",
"\n",
"这里有一个重点因为PyTorch是使用mini-batch来进行计算的所以损失函数的计算出来的结果已经对mini-batch取了平均\n",
"\n",
"常见PyTorch内置的损失函数有以下几个\n",
"### nn.L1Loss:\n",
"输入x和目标y之间差的绝对值要求 x 和 y 的维度要一样(可以是向量或者矩阵),得到的 loss 维度也是对应一样的\n",
"\n",
"$ loss(x,y)=1/n\\sum|x_i-y_i| $\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### nn.NLLLoss:\n",
"用于多分类的负对数似然损失函数\n",
"\n",
"$ loss(x, class) = -x[class]$\n",
"\n",
"NLLLoss中如果传递了weights参数会对损失进行加权公式就变成了\n",
"\n",
"$ loss(x, class) = -weights[class] * x[class] $"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### nn.MSELoss:\n",
"均方损失函数 输入x和目标y之间均方差\n",
"\n",
"$ loss(x,y)=1/n\\sum(x_i-y_i)^2 $"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### nn.CrossEntropyLoss:\n",
"多分类用的交叉熵损失函数LogSoftMax和NLLLoss集成到一个类中会调用nn.NLLLoss函数,我们可以理解为CrossEntropyLoss()=log_softmax() + NLLLoss()\n",
"\n",
"\n",
" $ \\begin{aligned} loss(x, class) &= -\\text{log}\\frac{exp(x[class])}{\\sum_j exp(x[j]))}\\ &= -x[class] + log(\\sum_j exp(x[j])) \\end{aligned} $\n",
" \n",
" 因为使用了NLLLoss所以也可以传入weight参数这时loss的计算公式变为\n",
" \n",
" $ loss(x, class) = weights[class] * (-x[class] + log(\\sum_j exp(x[j]))) $\n",
" \n",
" 所以一般多分类的情况会使用这个损失函数"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### nn.BCELoss:\n",
"计算 x 与 y 之间的二进制交叉熵。\n",
"\n",
"$ loss(o,t)=-\\frac{1}{n}\\sum_i(t[i]* log(o[i])+(1-t[i])* log(1-o[i])) $ \n",
"\n",
"与NLLLoss类似也可以添加权重参数 \n",
"\n",
"$ loss(o,t)=-\\frac{1}{n}\\sum_iweights[i]* (t[i]* log(o[i])+(1-t[i])* log(1-o[i])) $\n",
"\n",
"用的时候需要在该层前面加上 Sigmoid 函数。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.2.4 梯度下降\n",
"在介绍损失函数的时候我们已经说了梯度下降是一个使损失函数越来越小的优化算法在无求解机器学习算法的模型参数即约束优化问题时梯度下降Gradient Descent是最常采用的方法之一。所以梯度下降是我们目前所说的机器学习的核心了解了它的含义也就了解了机器学习算法的含义。\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 梯度\n",
"在微积分里面,对多元函数的参数求∂偏导数,把求得的各个参数的偏导数以向量的形式写出来,就是梯度。\n",
"例如函数f(x,y), 分别对x,y求偏导数求得的梯度向量就是(∂f/∂x, ∂f/∂y)T,简称grad f(x,y)或者▽f(x,y)。\n",
"\n",
"几何上讲,梯度就是函数变化增加最快的地方,沿着梯度向量的方向,更加容易找到函数的最大值。反过来说,沿着梯度向量相反的方向梯度减少最快,也就是更加容易找到函数的最小值。\n",
"\n",
"我们需要最小化损失函数,可以通过梯度下降法来一步步的迭代求解,得到最小化的损失函数,和模型参数值。\n",
"### 梯度下降法直观解释\n",
"梯度下降法就好比下山,我们并不知道下山的路,于是决定走一步算一步,每走到一个位置的时候,求解当前位置的梯度,沿着梯度的负方向,也就是当前最陡峭的位置向下走一步,然后继续求解当前位置梯度,向这一步所在位置沿着最陡峭最易下山的位置走一步。这样一步步的走下去,一直走到觉得我们已经到了山脚。\n",
"\n",
"如下图所示,(此图摘自百度百科)\n",
"![](1.png)\n",
"\n",
"这样走下去,有可能我们不能走到山脚,而是到了某一个局部的山峰低处(局部最优解)。\n",
"\n",
"这个问题在以前的机器学习中可能会遇到因为机器学习中的特征比较少所以导致很可能陷入到一个局部最优解中出不来但是到了深度学习动辄百万甚至上亿的特征出现这种情况的概率几乎为0所以我们可以不用考虑这个问题。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Mini-batch的梯度下降法\n",
"对整个训练集进行梯度下降法的时候我们必须处理整个训练数据集然后才能进行一步梯度下降即每一步梯度下降法需要对整个训练集进行一次处理如果训练数据集很大的时候处理速度会很慢而且也不可能一次的载入到内存或者显存中所以我们会把大数据集分成小数据集一部分一部分的训练这个训练子集即称为Mini-batch。\n",
"在PyTorch中就是使用这种方法进行的训练可以看看上一章中关于dataloader的介绍里面的batch_size就是我们一个Mini-batch的大小。\n",
"\n",
"为了介绍的更简洁,使用 吴恩达老师的 [deeplearning.ai](https://www.deeplearning.ai/deep-learning-specialization/) 课程板书。\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"对于普通的梯度下降法一个epoch只能进行一次梯度下降而对于Mini-batch梯度下降法一个epoch可以进行Mini-batch的个数次梯度下降。\n",
"![](2.png)\n",
"普通的batch梯度下降法和Mini-batch梯度下降法代价函数的变化趋势如下图所示\n",
"![](3.png)\n",
"- 如果训练样本的大小比较小时,能够一次性的读取到内存中那我们就不需要使用Mini-batch\n",
"- 如果训练样本的大小比较大时,一次读入不到内存或者现存中,那我们必须要使用 Mini-batch来分批的计算\n",
"- Mini-batch size的计算规则如下在内存允许的最大情况下使用2的N次方个size\n",
"![](4.png)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`torch.optim`是一个实现了各种优化算法的库。大部分常用优化算法都有实现,我们直接调用即可。\n",
"### torch.optim.SGD\n",
"随机梯度下降算法,带有动量momentum的算法作为一个可选参数可以进行设置样例如下"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"#lr参数为学习率对于SGD来说一般选择0.1 0.01.0.001,如何设置会在后面实战的章节中详细说明\n",
"##如果设置了momentum就是带有动量的SGD可以不设置\n",
"optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### torch.optim.RMSprop\n",
"除了以上的带有动量Momentum梯度下降法外RMSproproot mean square prop也是一种可以加快梯度下降的算法利用RMSprop算法可以减小某些维度梯度更新波动较大的情况使其梯度下降的速度变得更快"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"#我们的课程基本不会使用到RMSprop所以这里只给一个实例\n",
"optimizer = torch.optim.RMSprop(model.parameters(), lr=0.01, alpha=0.99)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### torch.optim.Adam\n",
"Adam 优化算法的基本思想就是将 Momentum 和 RMSprop 结合起来形成的一种适用于不同深度学习结构的优化算法"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"# 这里的lrbetas还有eps都是用默认值即可所以Adam是一个使用起来最简单的优化方法\n",
"optimizer = torch.optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.999), eps=1e-08)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.2.5 方差/偏差\n",
"- 偏差度量了学习算法的期望预测与真实结果的偏离程序, 即 刻画了学习算法本身的拟合能力\n",
"- 方差度量了同样大小的训练集的变动所导致的学习性能的变化, 即 模型的泛化能力\n",
"![](5.png)\n",
"\n",
"从图中我们可以看出\n",
"- 高偏差high bias的情况一般称为欠拟合underfitting,即我们的模型并没有很好的去适配现有的数据,拟合度不够。\n",
"- 高方差high variance的情况一般称作过拟合overfitting即模型对于训练数据拟合度太高了失去了泛化的能力。\n",
"\n",
"如何解决这两种情况呢?\n",
"\n",
"欠拟合:\n",
"- 增加网络结构,如增加隐藏层数目;\n",
"- 训练更长时间;\n",
"- 寻找合适的网络架构使用更大的NN结构\n",
"\n",
"过拟合 \n",
"- 使用更多的数据;\n",
"- 正则化( regularization\n",
"- 寻找合适的网络结构;\n",
"\n",
"例如我们上面的例子,可以计算出我们的偏差:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.07545900344848633 -0.020828723907470703\n"
]
}
],
"source": [
"print (5-w.data.item(),7-b.data.item())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.2.6 正则化\n",
"利用正则化来解决High variance 的问题,正则化是在 Cost function 中加入一项正则化项,惩罚模型的复杂度,这里我们简单的介绍一下正则化的概念"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### L1正则化\n",
"损失函数基础上加上权重参数的绝对值\n",
"\n",
"$ L=E_{in}+\\lambda{\\sum_j} \\left|w_j\\right|$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### L2正则化\n",
"损失函数基础上加上权重参数的平方和\n",
"\n",
"$ L=E_{in}+\\lambda{\\sum_j} w^2_j$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"需要说明的是l1 相比于 l2 会更容易获得稀疏解\n",
"\n",
"[知乎](https://www.zhihu.com/question/37096933/answer/70507353)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "pytorch 1.0",
"language": "python",
"name": "pytorch1"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}