diff --git a/chapter1/2_autograd_tutorial.ipynb b/chapter1/2_autograd_tutorial.ipynb index c7ead87..66c25b8 100644 --- a/chapter1/2_autograd_tutorial.ipynb +++ b/chapter1/2_autograd_tutorial.ipynb @@ -119,7 +119,7 @@ "output_type": "stream", "text": [ "tensor([[3., 3.],\n", - " [3., 3.]], grad_fn=)\n" + " [3., 3.]], grad_fn=)\n" ] } ], @@ -145,7 +145,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "\n" + "\n" ] } ], @@ -171,7 +171,7 @@ "output_type": "stream", "text": [ "tensor([[27., 27.],\n", - " [27., 27.]], grad_fn=) tensor(27., grad_fn=)\n" + " [27., 27.]], grad_fn=) tensor(27., grad_fn=)\n" ] } ], @@ -202,7 +202,7 @@ "text": [ "False\n", "True\n", - "\n" + "\n" ] } ], @@ -267,14 +267,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "得到矩阵 ``4.5``.调用 ``out``\n", + "得到矩阵 ``4.5``.将 ``out``叫做\n", "*Tensor* “$o$”.\n", "\n", "得到 $o = \\frac{1}{4}\\sum_i z_i$,\n", - "$z_i = 3(x_i+2)^2$ and $z_i\\bigr\\rvert_{x_i=1} = 27$.\n", + "$z_i = 3(x_i+2)^2$ 和 $z_i\\bigr\\rvert_{x_i=1} = 27$.\n", "\n", "因此,\n", - "$\\frac{\\partial o}{\\partial x_i} = \\frac{3}{2}(x_i+2)$, hence\n", + "$\\frac{\\partial o}{\\partial x_i} = \\frac{3}{2}(x_i+2)$, 则\n", "$\\frac{\\partial o}{\\partial x_i}\\bigr\\rvert_{x_i=1} = \\frac{9}{2} = 4.5$.\n", "\n" ] @@ -283,8 +283,24 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "可以使用 autograd 做更多的操作\n", - "\n" + "在数学上,如果我们有向量值函数 $\\vec{y} = f(\\vec{x}))$ ,且 $\\vec{y}$ 关于 $\\vec{x}$ 的梯度是一个雅可比矩阵(Jacobian matrix):\n", + "\n", + "$J = \\begin{pmatrix} \\frac{\\partial y_{1}}{\\partial x_{1}} & \\cdots & \\frac{\\partial y_{1}}{\\partial x_{n}} \\\\ \\vdots & \\ddots & \\vdots \\\\ \\frac{\\partial y_{m}}{\\partial x_{1}} & \\cdots & \\frac{\\partial y_{m}}{\\partial x_{n}} \\end{pmatrix}$\n", + "\n", + "一般来说,`torch.autograd`就是用来计算vector-Jacobian product的工具。也就是说,给定任一向量 $v=(v_{1}\\;v_{2}\\;\\cdots\\;v_{m})^{T}$ ,计算 $v^{T}\\cdot J$ ,如果 $v$ 恰好是标量函数 $l=g(\\vec{y})$ 的梯度,也就是说 $v=(\\frac{\\partial l}{\\partial y_{1}}\\;\\cdots\\;\\frac{\\partial l}{\\partial y_{m}})^{T}$,那么根据链式法则,vector-Jacobian product 是 $\\vec{x}$ 对 $l$ 的梯度:\n", + "\n", + "$J^{T}\\cdot v = \\begin{pmatrix} \\frac{\\partial y_{1}}{\\partial x_{1}} & \\cdots & \\frac{\\partial y_{m}}{\\partial x_{1}} \\\\ \\vdots & \\ddots & \\vdots \\\\ \\frac{\\partial y_{1}}{\\partial x_{n}} & \\cdots & \\frac{\\partial y_{m}}{\\partial x_{n}} \\end{pmatrix} \\begin{pmatrix} \\frac{\\partial l}{\\partial y_{1}}\\\\ \\vdots \\\\ \\frac{\\partial l}{\\partial y_{m}} \\end{pmatrix} = \\begin{pmatrix} \\frac{\\partial l}{\\partial x_{1}}\\\\ \\vdots \\\\ \\frac{\\partial l}{\\partial x_{n}} \\end{pmatrix}$\n", + "\n", + "(注意,$v^{T}\\cdot J$ 给出了一个行向量,可以通过 $J^{T}\\cdot v$ 将其视为列向量)\n", + "\n", + "vector-Jacobian product 这种特性使得将外部梯度返回到具有非标量输出的模型变得非常方便。\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "现在让我们来看一个vector-Jacobian product的例子\n" ] }, { @@ -296,7 +312,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "tensor([-920.6895, -115.7301, -867.6995], grad_fn=)\n" + "tensor([ 293.4463, 50.6356, 1031.2501], grad_fn=)\n" ] } ], @@ -310,6 +326,13 @@ "print(y)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "在这个情形中,`y`不再是个标量。`torch.autograd`无法直接计算出完整的雅可比行列,但是如果我们只想要vector-Jacobian product,只需将向量作为参数传入`backward`:" + ] + }, { "cell_type": "code", "execution_count": 11, @@ -319,7 +342,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "tensor([ 51.2000, 512.0000, 0.0512])\n" + "tensor([5.1200e+01, 5.1200e+02, 5.1200e-02])\n" ] } ], @@ -395,7 +418,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.7" + "version": "3.7.5" } }, "nbformat": 4,