pytorch-handbook/chapter2/2.1.2-pytorch-basics-autograd.ipynb
hengtao tantai 23dee6a31c
Merge pull request #126 from showarp/patch-1
Update 2.1.2-pytorch-basics-autograd.ipynb
2021-10-09 10:01:46 +08:00

1291 lines
33 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'1.0.1.post2'"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import torch\n",
"torch.__version__"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 使用PyTorch计算梯度数值"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"PyTorch的Autograd模块实现了深度学习的算法中的向传播求导数在张量Tensor类上的所有操作Autograd都能为他们自动提供微分简化了手动计算导数的复杂过程。\n",
"\n",
"在0.4以前的版本中Pytorch 使用 Variable 类来自动计算所有的梯度。Variable类主要包含三个属性\n",
"data保存Variable所包含的Tensorgrad保存data对应的梯度grad也是个Variable而不是Tensor它和data的形状一样grad_fn指向一个Function对象这个Function用来反向传播计算输入的梯度。\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"从0.4起, Variable 正式合并入Tensor类通过Variable嵌套实现的自动微分功能已经整合进入了Tensor类中。虽然为了代码的兼容性还是可以使用Variable(tensor)这种方式进行嵌套,但是这个操作其实什么都没做。\n",
"\n",
"所以以后的代码建议直接使用Tensor类进行操作因为官方文档中已经将Variable设置成过期模块。\n",
"\n",
"要想通过Tensor类本身就支持了使用autograd功能只需要设置.requires_grad=True\n",
"\n",
"Variable类中的的grad和grad_fn属性已经整合进入了Tensor类中"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Autograd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"在张量创建时,通过设置 requires_grad 标识为Ture来告诉Pytorch需要对该张量进行自动求导PyTorch会记录该张量的每一步操作历史并自动计算"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"tensor([[0.0403, 0.5633, 0.2561, 0.4064, 0.9596],\n",
" [0.6928, 0.1832, 0.5380, 0.6386, 0.8710],\n",
" [0.5332, 0.8216, 0.8139, 0.1925, 0.4993],\n",
" [0.2650, 0.6230, 0.5945, 0.3230, 0.0752],\n",
" [0.0919, 0.4770, 0.4622, 0.6185, 0.2761]], requires_grad=True)"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = torch.rand(5, 5, requires_grad=True)\n",
"x"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"tensor([[0.2269, 0.7673, 0.8179, 0.5558, 0.0493],\n",
" [0.7762, 0.9242, 0.2872, 0.0035, 0.4197],\n",
" [0.4322, 0.5281, 0.9001, 0.7276, 0.3218],\n",
" [0.5123, 0.6567, 0.9465, 0.0475, 0.9172],\n",
" [0.9899, 0.9284, 0.5303, 0.1718, 0.3937]], requires_grad=True)"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y = torch.rand(5, 5, requires_grad=True)\n",
"y"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"PyTorch会自动追踪和记录对与张量的所有操作当计算完成后调用.backward()方法自动计算梯度并且将计算结果保存到grad属性中。"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor(25.6487, grad_fn=<SumBackward0>)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"z=torch.sum(x+y)\n",
"z"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"在张量进行操作后grad_fn已经被赋予了一个新的函数这个函数引用了一个创建了这个Tensor类的Function对象。\n",
"Tensor和Function互相连接生成了一个非循环图它记录并且编码了完整的计算历史。每个张量都有一个.grad_fn属性如果这个张量是用户手动创建的那么这个张量的grad_fn是None。\n",
"\n",
"下面我们来调用反向传播函数,计算其梯度"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 简单的自动求导"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.]]) tensor([[1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.]])\n"
]
}
],
"source": [
"z.backward()\n",
"print(x.grad,y.grad)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"如果Tensor类表示的是一个标量即它包含一个元素的张量则不需要为backward()指定任何参数但是如果它有更多的元素则需要指定一个gradient参数它是形状匹配的张量。\n",
"以上的 `z.backward()`相当于是`z.backward(torch.tensor(1.))`的简写。\n",
"这种参数常出现在图像分类中的单标签分类,输出一个标量代表图像的标签。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 复杂的自动求导"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"tensor([[3.3891e-01, 4.9468e-01, 8.0797e-02, 2.5656e-01, 2.9529e-01],\n",
" [7.1946e-01, 1.6977e-02, 1.7965e-01, 3.2656e-01, 1.7665e-01],\n",
" [3.1353e-01, 2.2096e-01, 1.2251e+00, 5.5087e-01, 5.9572e-02],\n",
" [1.3015e+00, 3.8029e-01, 1.1103e+00, 4.0392e-01, 2.2055e-01],\n",
" [8.8726e-02, 6.9701e-01, 8.0164e-01, 9.7221e-01, 4.2239e-04]],\n",
" grad_fn=<AddBackward0>)"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = torch.rand(5, 5, requires_grad=True)\n",
"y = torch.rand(5, 5, requires_grad=True)\n",
"z= x**2+y**3\n",
"z"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[0.2087, 1.3554, 0.5560, 1.0009, 0.9931],\n",
" [1.2655, 0.1223, 0.8008, 1.1127, 0.7261],\n",
" [1.1052, 0.2579, 1.8006, 0.1544, 0.3646],\n",
" [1.8855, 1.2296, 1.9061, 0.9313, 0.0648],\n",
" [0.5952, 1.6190, 0.8430, 1.9213, 0.0322]])\n"
]
}
],
"source": [
"#我们的返回值不是一个标量所以需要输入一个大小相同的张量作为参数这里我们用ones_like函数根据x生成一个张量\n",
"z.backward(torch.ones_like(x))\n",
"print(x.grad)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"我们可以使用with torch.no_grad()上下文管理器临时禁止对已设置requires_grad=True的张量进行自动求导。这个方法在测试集计算准确率的时候会经常用到例如"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"False\n"
]
}
],
"source": [
"with torch.no_grad():\n",
" print((x +y*2).requires_grad)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"使用.no_grad()进行嵌套后,代码不会跟踪历史记录,也就是说保存的这部分记录会减少内存的使用量并且会加快少许的运算速度。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Autograd 过程解析"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"为了说明Pytorch的自动求导原理我们来尝试分析一下PyTorch的源代码虽然Pytorch的 Tensor和 TensorBase都是使用CPP来实现的但是可以使用一些Python的一些方法查看这些对象在Python的属性和状态。\n",
" Python的 `dir()` 返回参数的属性、方法列表。`z`是一个Tensor变量看看里面有哪些成员变量。"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['__abs__',\n",
" '__add__',\n",
" '__and__',\n",
" '__array__',\n",
" '__array_priority__',\n",
" '__array_wrap__',\n",
" '__bool__',\n",
" '__class__',\n",
" '__deepcopy__',\n",
" '__delattr__',\n",
" '__delitem__',\n",
" '__dict__',\n",
" '__dir__',\n",
" '__div__',\n",
" '__doc__',\n",
" '__eq__',\n",
" '__float__',\n",
" '__floordiv__',\n",
" '__format__',\n",
" '__ge__',\n",
" '__getattribute__',\n",
" '__getitem__',\n",
" '__gt__',\n",
" '__hash__',\n",
" '__iadd__',\n",
" '__iand__',\n",
" '__idiv__',\n",
" '__ilshift__',\n",
" '__imul__',\n",
" '__index__',\n",
" '__init__',\n",
" '__init_subclass__',\n",
" '__int__',\n",
" '__invert__',\n",
" '__ior__',\n",
" '__ipow__',\n",
" '__irshift__',\n",
" '__isub__',\n",
" '__iter__',\n",
" '__itruediv__',\n",
" '__ixor__',\n",
" '__le__',\n",
" '__len__',\n",
" '__long__',\n",
" '__lshift__',\n",
" '__lt__',\n",
" '__matmul__',\n",
" '__mod__',\n",
" '__module__',\n",
" '__mul__',\n",
" '__ne__',\n",
" '__neg__',\n",
" '__new__',\n",
" '__nonzero__',\n",
" '__or__',\n",
" '__pow__',\n",
" '__radd__',\n",
" '__rdiv__',\n",
" '__reduce__',\n",
" '__reduce_ex__',\n",
" '__repr__',\n",
" '__reversed__',\n",
" '__rfloordiv__',\n",
" '__rmul__',\n",
" '__rpow__',\n",
" '__rshift__',\n",
" '__rsub__',\n",
" '__rtruediv__',\n",
" '__setattr__',\n",
" '__setitem__',\n",
" '__setstate__',\n",
" '__sizeof__',\n",
" '__str__',\n",
" '__sub__',\n",
" '__subclasshook__',\n",
" '__truediv__',\n",
" '__weakref__',\n",
" '__xor__',\n",
" '_backward_hooks',\n",
" '_base',\n",
" '_cdata',\n",
" '_coalesced_',\n",
" '_dimI',\n",
" '_dimV',\n",
" '_grad',\n",
" '_grad_fn',\n",
" '_indices',\n",
" '_make_subclass',\n",
" '_nnz',\n",
" '_values',\n",
" '_version',\n",
" 'abs',\n",
" 'abs_',\n",
" 'acos',\n",
" 'acos_',\n",
" 'add',\n",
" 'add_',\n",
" 'addbmm',\n",
" 'addbmm_',\n",
" 'addcdiv',\n",
" 'addcdiv_',\n",
" 'addcmul',\n",
" 'addcmul_',\n",
" 'addmm',\n",
" 'addmm_',\n",
" 'addmv',\n",
" 'addmv_',\n",
" 'addr',\n",
" 'addr_',\n",
" 'all',\n",
" 'allclose',\n",
" 'any',\n",
" 'apply_',\n",
" 'argmax',\n",
" 'argmin',\n",
" 'argsort',\n",
" 'as_strided',\n",
" 'as_strided_',\n",
" 'asin',\n",
" 'asin_',\n",
" 'atan',\n",
" 'atan2',\n",
" 'atan2_',\n",
" 'atan_',\n",
" 'backward',\n",
" 'baddbmm',\n",
" 'baddbmm_',\n",
" 'bernoulli',\n",
" 'bernoulli_',\n",
" 'bincount',\n",
" 'bmm',\n",
" 'btrifact',\n",
" 'btrifact_with_info',\n",
" 'btrisolve',\n",
" 'byte',\n",
" 'cauchy_',\n",
" 'ceil',\n",
" 'ceil_',\n",
" 'char',\n",
" 'cholesky',\n",
" 'chunk',\n",
" 'clamp',\n",
" 'clamp_',\n",
" 'clamp_max',\n",
" 'clamp_max_',\n",
" 'clamp_min',\n",
" 'clamp_min_',\n",
" 'clone',\n",
" 'coalesce',\n",
" 'contiguous',\n",
" 'copy_',\n",
" 'cos',\n",
" 'cos_',\n",
" 'cosh',\n",
" 'cosh_',\n",
" 'cpu',\n",
" 'cross',\n",
" 'cuda',\n",
" 'cumprod',\n",
" 'cumsum',\n",
" 'data',\n",
" 'data_ptr',\n",
" 'dense_dim',\n",
" 'det',\n",
" 'detach',\n",
" 'detach_',\n",
" 'device',\n",
" 'diag',\n",
" 'diag_embed',\n",
" 'diagflat',\n",
" 'diagonal',\n",
" 'digamma',\n",
" 'digamma_',\n",
" 'dim',\n",
" 'dist',\n",
" 'div',\n",
" 'div_',\n",
" 'dot',\n",
" 'double',\n",
" 'dtype',\n",
" 'eig',\n",
" 'element_size',\n",
" 'eq',\n",
" 'eq_',\n",
" 'equal',\n",
" 'erf',\n",
" 'erf_',\n",
" 'erfc',\n",
" 'erfc_',\n",
" 'erfinv',\n",
" 'erfinv_',\n",
" 'exp',\n",
" 'exp_',\n",
" 'expand',\n",
" 'expand_as',\n",
" 'expm1',\n",
" 'expm1_',\n",
" 'exponential_',\n",
" 'fft',\n",
" 'fill_',\n",
" 'flatten',\n",
" 'flip',\n",
" 'float',\n",
" 'floor',\n",
" 'floor_',\n",
" 'fmod',\n",
" 'fmod_',\n",
" 'frac',\n",
" 'frac_',\n",
" 'gather',\n",
" 'ge',\n",
" 'ge_',\n",
" 'gels',\n",
" 'geometric_',\n",
" 'geqrf',\n",
" 'ger',\n",
" 'gesv',\n",
" 'get_device',\n",
" 'grad',\n",
" 'grad_fn',\n",
" 'gt',\n",
" 'gt_',\n",
" 'half',\n",
" 'hardshrink',\n",
" 'histc',\n",
" 'ifft',\n",
" 'index_add',\n",
" 'index_add_',\n",
" 'index_copy',\n",
" 'index_copy_',\n",
" 'index_fill',\n",
" 'index_fill_',\n",
" 'index_put',\n",
" 'index_put_',\n",
" 'index_select',\n",
" 'indices',\n",
" 'int',\n",
" 'inverse',\n",
" 'irfft',\n",
" 'is_coalesced',\n",
" 'is_complex',\n",
" 'is_contiguous',\n",
" 'is_cuda',\n",
" 'is_distributed',\n",
" 'is_floating_point',\n",
" 'is_leaf',\n",
" 'is_nonzero',\n",
" 'is_pinned',\n",
" 'is_same_size',\n",
" 'is_set_to',\n",
" 'is_shared',\n",
" 'is_signed',\n",
" 'is_sparse',\n",
" 'isclose',\n",
" 'item',\n",
" 'kthvalue',\n",
" 'layout',\n",
" 'le',\n",
" 'le_',\n",
" 'lerp',\n",
" 'lerp_',\n",
" 'lgamma',\n",
" 'lgamma_',\n",
" 'log',\n",
" 'log10',\n",
" 'log10_',\n",
" 'log1p',\n",
" 'log1p_',\n",
" 'log2',\n",
" 'log2_',\n",
" 'log_',\n",
" 'log_normal_',\n",
" 'log_softmax',\n",
" 'logdet',\n",
" 'logsumexp',\n",
" 'long',\n",
" 'lt',\n",
" 'lt_',\n",
" 'map2_',\n",
" 'map_',\n",
" 'masked_fill',\n",
" 'masked_fill_',\n",
" 'masked_scatter',\n",
" 'masked_scatter_',\n",
" 'masked_select',\n",
" 'matmul',\n",
" 'matrix_power',\n",
" 'max',\n",
" 'mean',\n",
" 'median',\n",
" 'min',\n",
" 'mm',\n",
" 'mode',\n",
" 'mul',\n",
" 'mul_',\n",
" 'multinomial',\n",
" 'mv',\n",
" 'mvlgamma',\n",
" 'mvlgamma_',\n",
" 'name',\n",
" 'narrow',\n",
" 'narrow_copy',\n",
" 'ndimension',\n",
" 'ne',\n",
" 'ne_',\n",
" 'neg',\n",
" 'neg_',\n",
" 'nelement',\n",
" 'new',\n",
" 'new_empty',\n",
" 'new_full',\n",
" 'new_ones',\n",
" 'new_tensor',\n",
" 'new_zeros',\n",
" 'nonzero',\n",
" 'norm',\n",
" 'normal_',\n",
" 'numel',\n",
" 'numpy',\n",
" 'orgqr',\n",
" 'ormqr',\n",
" 'output_nr',\n",
" 'permute',\n",
" 'pin_memory',\n",
" 'pinverse',\n",
" 'polygamma',\n",
" 'polygamma_',\n",
" 'potrf',\n",
" 'potri',\n",
" 'potrs',\n",
" 'pow',\n",
" 'pow_',\n",
" 'prelu',\n",
" 'prod',\n",
" 'pstrf',\n",
" 'put_',\n",
" 'qr',\n",
" 'random_',\n",
" 'reciprocal',\n",
" 'reciprocal_',\n",
" 'record_stream',\n",
" 'register_hook',\n",
" 'reinforce',\n",
" 'relu',\n",
" 'relu_',\n",
" 'remainder',\n",
" 'remainder_',\n",
" 'renorm',\n",
" 'renorm_',\n",
" 'repeat',\n",
" 'requires_grad',\n",
" 'requires_grad_',\n",
" 'reshape',\n",
" 'reshape_as',\n",
" 'resize',\n",
" 'resize_',\n",
" 'resize_as',\n",
" 'resize_as_',\n",
" 'retain_grad',\n",
" 'rfft',\n",
" 'roll',\n",
" 'rot90',\n",
" 'round',\n",
" 'round_',\n",
" 'rsqrt',\n",
" 'rsqrt_',\n",
" 'scatter',\n",
" 'scatter_',\n",
" 'scatter_add',\n",
" 'scatter_add_',\n",
" 'select',\n",
" 'set_',\n",
" 'shape',\n",
" 'share_memory_',\n",
" 'short',\n",
" 'sigmoid',\n",
" 'sigmoid_',\n",
" 'sign',\n",
" 'sign_',\n",
" 'sin',\n",
" 'sin_',\n",
" 'sinh',\n",
" 'sinh_',\n",
" 'size',\n",
" 'slogdet',\n",
" 'smm',\n",
" 'softmax',\n",
" 'sort',\n",
" 'sparse_dim',\n",
" 'sparse_mask',\n",
" 'sparse_resize_',\n",
" 'sparse_resize_and_clear_',\n",
" 'split',\n",
" 'split_with_sizes',\n",
" 'sqrt',\n",
" 'sqrt_',\n",
" 'squeeze',\n",
" 'squeeze_',\n",
" 'sspaddmm',\n",
" 'std',\n",
" 'stft',\n",
" 'storage',\n",
" 'storage_offset',\n",
" 'storage_type',\n",
" 'stride',\n",
" 'sub',\n",
" 'sub_',\n",
" 'sum',\n",
" 'svd',\n",
" 'symeig',\n",
" 't',\n",
" 't_',\n",
" 'take',\n",
" 'tan',\n",
" 'tan_',\n",
" 'tanh',\n",
" 'tanh_',\n",
" 'to',\n",
" 'to_dense',\n",
" 'to_sparse',\n",
" 'tolist',\n",
" 'topk',\n",
" 'trace',\n",
" 'transpose',\n",
" 'transpose_',\n",
" 'tril',\n",
" 'tril_',\n",
" 'triu',\n",
" 'triu_',\n",
" 'trtrs',\n",
" 'trunc',\n",
" 'trunc_',\n",
" 'type',\n",
" 'type_as',\n",
" 'unbind',\n",
" 'unfold',\n",
" 'uniform_',\n",
" 'unique',\n",
" 'unsqueeze',\n",
" 'unsqueeze_',\n",
" 'values',\n",
" 'var',\n",
" 'view',\n",
" 'view_as',\n",
" 'where',\n",
" 'zero_']"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dir(z)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"返回很多我们直接排除掉一些Python中特殊方法以__开头和结束的和私有方法以_开头的直接看几个比较主要的属性\n",
"`.is_leaf`:记录是否是叶子节点。通过这个属性来确定这个变量的类型\n",
"在官方文档中所说的“graph leaves”“leaf variables”都是指像`x``y`这样的手动创建的、而非运算得到的变量,这些变量称为创建变量。\n",
"像`z`这样的,是通过计算后得到的结果称为结果变量。\n",
"\n",
"一个变量是创建变量还是结果变量是通过`.is_leaf`来获取的。"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"x.is_leaf=True\n",
"z.is_leaf=False\n"
]
}
],
"source": [
"print(\"x.is_leaf=\"+str(x.is_leaf))\n",
"print(\"z.is_leaf=\"+str(z.is_leaf))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`x`是手动创建的没有通过计算,所以他被认为是一个叶子节点也就是一个创建变量,而`z`是通过`x`与`y`的一系列计算得到的,所以不是叶子结点也就是结果变量。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"为什么我们执行`z.backward()`方法会更新`x.grad`和`y.grad`呢?\n",
"`.grad_fn`属性记录的就是这部分的操作,虽然`.backward()`方法也是CPP实现的但是可以通过Python来进行简单的探索。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`grad_fn`:记录并且编码了完整的计算历史"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AddBackward0 at 0x120840a90>"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"z.grad_fn"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`grad_fn`是一个`AddBackward0`类型的变量 `AddBackward0`这个类也是用Cpp来写的但是我们从名字里就能够大概知道他是加法(ADD)的反反向传播Backward看看里面有些什么东西"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['__call__',\n",
" '__class__',\n",
" '__delattr__',\n",
" '__dir__',\n",
" '__doc__',\n",
" '__eq__',\n",
" '__format__',\n",
" '__ge__',\n",
" '__getattribute__',\n",
" '__gt__',\n",
" '__hash__',\n",
" '__init__',\n",
" '__init_subclass__',\n",
" '__le__',\n",
" '__lt__',\n",
" '__ne__',\n",
" '__new__',\n",
" '__reduce__',\n",
" '__reduce_ex__',\n",
" '__repr__',\n",
" '__setattr__',\n",
" '__sizeof__',\n",
" '__str__',\n",
" '__subclasshook__',\n",
" '_register_hook_dict',\n",
" 'metadata',\n",
" 'name',\n",
" 'next_functions',\n",
" 'register_hook',\n",
" 'requires_grad']"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dir(z.grad_fn)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`next_functions`就是`grad_fn`的精华"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"((<PowBackward0 at 0x1208409b0>, 0), (<PowBackward0 at 0x1208408d0>, 0))"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"z.grad_fn.next_functions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`next_functions`是一个tuple of tuple of PowBackward0 and int。\n",
"\n",
"为什么是2个tuple \n",
"因为我们的操作是`z= x**2+y**3` 刚才的`AddBackward0`是相加,而前面的操作是乘方 `PowBackward0`。tuple第一个元素就是x相关的操作记录"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['__call__',\n",
" '__class__',\n",
" '__delattr__',\n",
" '__dir__',\n",
" '__doc__',\n",
" '__eq__',\n",
" '__format__',\n",
" '__ge__',\n",
" '__getattribute__',\n",
" '__gt__',\n",
" '__hash__',\n",
" '__init__',\n",
" '__init_subclass__',\n",
" '__le__',\n",
" '__lt__',\n",
" '__ne__',\n",
" '__new__',\n",
" '__reduce__',\n",
" '__reduce_ex__',\n",
" '__repr__',\n",
" '__setattr__',\n",
" '__sizeof__',\n",
" '__str__',\n",
" '__subclasshook__',\n",
" '_register_hook_dict',\n",
" 'metadata',\n",
" 'name',\n",
" 'next_functions',\n",
" 'register_hook',\n",
" 'requires_grad']"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"xg = z.grad_fn.next_functions[0][0]\n",
"dir(xg)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"继续深挖"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AccumulateGrad"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x_leaf=xg.next_functions[0][0]\n",
"type(x_leaf)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"在PyTorch的反向图计算中`AccumulateGrad`类型代表的就是叶子节点类型,也就是计算图终止节点。`AccumulateGrad`类中有一个`.variable`属性指向叶子节点。"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor([[0.1044, 0.6777, 0.2780, 0.5005, 0.4966],\n",
" [0.6328, 0.0611, 0.4004, 0.5564, 0.3631],\n",
" [0.5526, 0.1290, 0.9003, 0.0772, 0.1823],\n",
" [0.9428, 0.6148, 0.9530, 0.4657, 0.0324],\n",
" [0.2976, 0.8095, 0.4215, 0.9606, 0.0161]], requires_grad=True)"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x_leaf.variable"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"这个`.variable`的属性就是我们的生成的变量`x`"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"x_leaf.variable的id:4840553424\n",
"x的id:4840553424\n"
]
}
],
"source": [
"print(\"x_leaf.variable的id:\"+str(id(x_leaf.variable)))\n",
"print(\"x的id:\"+str(id(x)))"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"assert(id(x_leaf.variable)==id(x))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"这样整个规程就很清晰了:\n",
"\n",
"1. 当我们执行z.backward()的时候。这个操作将调用z里面的grad_fn这个属性执行求导的操作。\n",
"2. 这个操作将遍历grad_fn的next_functions然后分别取出里面的FunctionAccumulateGrad执行求导操作。这部分是一个递归的过程直到最后类型为叶子节点。\n",
"3. 计算出结果以后将结果保存到他们对应的variable 这个变量所引用的对象x和y的 grad这个属性里面。\n",
"4. 求导结束。所有的叶节点的grad变量都得到了相应的更新\n",
"\n",
"最终当我们执行完c.backward()之后a和b里面的grad值就得到了更新。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 扩展Autograd\n",
"如果需要自定义autograd扩展新的功能就需要扩展Function类。因为Function使用autograd来计算结果和梯度并对操作历史进行编码。\n",
"在Function类中最主要的方法就是`forward()`和`backward()`他们分别代表了前向传播和反向传播。\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"一个自定义的Function需要一下三个方法\n",
"\n",
" __init__ (optional)如果这个操作需要额外的参数则需要定义这个Function的构造函数不需要的话可以忽略。\n",
" \n",
" forward():执行前向传播的计算代码\n",
" \n",
" backward():反向传播时梯度计算的代码。 参数的个数和forward返回值的个数一样每个参数代表传回到此操作的梯度。\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"# 引入Function便于扩展\n",
"from torch.autograd.function import Function"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"# 定义一个乘以常数的操作(输入参数是张量)\n",
"# 方法必须是静态方法,所以要加上@staticmethod \n",
"class MulConstant(Function):\n",
" @staticmethod \n",
" def forward(ctx, tensor, constant):\n",
" # ctx 用来保存信息这里类似self并且ctx的属性可以在backward中调用\n",
" ctx.constant=constant\n",
" return tensor *constant\n",
" @staticmethod\n",
" def backward(ctx, grad_output):\n",
" # 返回的参数要与输入的参数一样.\n",
" # 第一个输入为3x3的张量第二个为一个常数\n",
" # 常数的梯度必须是 None.\n",
" return grad_output, None "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"定义完我们的新操作后,我们来进行测试"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a:tensor([[0.0118, 0.1434, 0.8669],\n",
" [0.1817, 0.8904, 0.5852],\n",
" [0.7364, 0.5234, 0.9677]], requires_grad=True)\n",
"b:tensor([[0.0588, 0.7169, 4.3347],\n",
" [0.9084, 4.4520, 2.9259],\n",
" [3.6820, 2.6171, 4.8386]], grad_fn=<MulConstantBackward>)\n"
]
}
],
"source": [
"a=torch.rand(3,3,requires_grad=True)\n",
"b=MulConstant.apply(a,5)\n",
"print(\"a:\"+str(a))\n",
"print(\"b:\"+str(b)) # b为a的元素乘以5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"反向传播,返回值不是标量,所以`backward`方法需要参数"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"b.backward(torch.ones_like(a))"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor([[1., 1., 1.],\n",
" [1., 1., 1.],\n",
" [1., 1., 1.]])"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a.grad"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"梯度应为1"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "ML with tensorflow and Pytorch",
"language": "python",
"name": "ml"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autoclose": false,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": true,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {
"height": "calc(100% - 180px)",
"left": "10px",
"top": "150px",
"width": "288px"
},
"toc_section_display": true,
"toc_window_display": true
}
},
"nbformat": 4,
"nbformat_minor": 2
}