PageRenderTime 32ms CodeModel.GetById 28ms RepoModel.GetById 1ms app.codeStats 0ms

/doc/source/layers.rst

https://gitlab.com/github-cloud-corporation/neon
ReStructuredText | 234 lines | 175 code | 59 blank | 0 comment | 0 complexity | 37d710af05da576b26b4b4e1894c8ac5 MD5 | raw file
  1. .. ---------------------------------------------------------------------------
  2. .. Copyright 2015 Nervana Systems Inc.
  3. .. Licensed under the Apache License, Version 2.0 (the "License");
  4. .. you may not use this file except in compliance with the License.
  5. .. You may obtain a copy of the License at
  6. ..
  7. .. http://www.apache.org/licenses/LICENSE-2.0
  8. ..
  9. .. Unless required by applicable law or agreed to in writing, software
  10. .. distributed under the License is distributed on an "AS IS" BASIS,
  11. .. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  12. .. See the License for the specific language governing permissions and
  13. .. limitations under the License.
  14. .. ---------------------------------------------------------------------------
  15. Layers
  16. ======
  17. To specify the architecture of a model, we can create a network by
  18. concatenating layers in a list:
  19. .. code-block:: python
  20. from neon.layers import Affine
  21. from neon.initializers import Gaussian
  22. from neon.transforms import Rectlin
  23. init = Gaussian()
  24. # add three affine (all-to-all) layers
  25. layers = []
  26. layers.append(Affine(nout=100, init=init, bias=init, activation=Rectlin()))
  27. layers.append(Affine(nout=50, init=init, bias=init, activation=Rectlin()))
  28. layers.append(Affine(nout=10, init=init, bias=init, activation=Rectlin()))
  29. Each layer has several core methods:
  30. .. csv-table::
  31. :header: "Method", "Description"
  32. :widths: 20, 40
  33. :escape: ~
  34. :delim: |
  35. ``configure(self, in_obj)`` | Define the layer's ``out_shape`` and ``in_shape``
  36. ``allocate(self, shared_outputs=None)`` | Allocate the ``output`` buffer (if needed)
  37. ``fprop(self, inputs, inference=False)`` | Forward propagate the activation based on the tensor ``inputs``. If ``inference``, do not store the outputs.
  38. ``bprop(self, error)`` | Backward propagate the tensor ``error`` and return the gradients
  39. During model training, the provided training data is propagated through
  40. the model's layers, calling the ``configure`` method to set the
  41. appropriate layer shapes. Then, each layer's ``allocate`` method is
  42. called to allocate any needed buffers.
  43. Layer taxonomy
  44. --------------
  45. The base classes :py:class:`neon.layers.Layer<neon.layers.layer.Layer>`, :py:class:`neon.layers.ParameterLayer<neon.layers.layer.ParameterLayer>`,
  46. and :py:class:`neon.layers.CompoundLayer<neon.layers.layer.CompoundLayer>` form the classes from which all other
  47. layers should inherit. These base classes are not meant to be directly
  48. instantiated. The figure below is a taxonomy of all the layers
  49. implemented in neon ( :math:`B\leftarrow A` means that :math:`B` inherits from :math:`A`).
  50. .. figure:: assets/LayerTaxonomy_v3.gif
  51. Layer
  52. -----
  53. Because these layers do not have weights, they do *not* need to be
  54. instantiated with a ``neon.initializers.Initializer``. Below is
  55. table of the layers, their key layer-specific parameters, and a
  56. description.
  57. .. csv-table::
  58. :header: "Layer", "Parameters", "Description"
  59. :widths: 20, 20, 40
  60. :delim: |
  61. :py:class:`neon.layers.Dropout<neon.layers.layer.Dropout>` | ``keep=0.5`` | At each ``fprop`` call, retains a random ``keep`` fraction of units
  62. :py:class:`neon.layers.Pooling<neon.layers.layer.Pooling>` | ``fshape, op, strides, padding`` | Pools over a window ``fshape`` (height, width, num_filters) with the operation ``op`` (either `"max"` or `"avg"`).
  63. :py:class:`neon.layers.BatchNorm<neon.layers.layer.BatchNorm>` | ``rho=0.9`` | Z-scores each minibatch's input, then scales with :math:`f(z) = \gamma z + \beta`. See `Ioffe, 2015 <http://arxiv.org/abs/1502.03167)>`__
  64. :py:class:`neon.layers.LRN<neon.layers.layer.LRN>` | ``alpha=1``, ``beta=0``, ``ascale=1``, ``bpower=1`` | Performs local response normalization (see Section 3.3 in `Krizhevsky, 2012 <http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf>`__)
  65. :py:class:`neon.layers.Activation<neon.layers.layer.Activation>` | ``transform`` | Applies ``transform`` (:py:class:`neon.transforms.Transform<neon.transforms.transform.Transform>`) to the input
  66. :py:class:`neon.layers.BranchNode<neon.layers.layer.BranchNode>` | | Inserts a branching node (see Layer containers)
  67. :py:class:`neon.layers.SkipNode<neon.layers.layer.SkipNode>` | | Layer that allows pass-through
  68. Parameter Layers
  69. ----------------
  70. These layers with weights inherit from :py:class:`neon.layers.ParameterLayer<neon.layers.layer.ParameterLayer>`,
  71. which handles the buffering and tracking of the weight parameters. They
  72. should be initialized with an Initializer
  73. (``neon.initializers.Initializer``). For example,
  74. .. code-block:: python
  75. from neon.layers import Linear
  76. from neon.initializers import Gaussian
  77. layers = Linear(nout = 100, init = Gaussian())
  78. .. csv-table::
  79. :header: "Layer", "Parameters", "Description"
  80. :widths: 20, 20, 40
  81. :delim: |
  82. :py:class:`neon.layers.Linear <neon.layers.layer.Linear>` | ``nout`` | Linear all-to-all layer with ``nout`` units
  83. :py:class:`neon.layers.Convolution <neon.layers.layer.Convolution>` | ``fshape``, ``strides``, ``padding`` | Convolves the input with filters of size ``fshape`` (height, width, num_filters).
  84. :py:class:`neon.layers.Deconvolution <neon.layers.layer.Deconvolution>` | ``fshape``, ``strides``, ``padding`` | Applies deconvolution with filters of size ``fshape``
  85. :py:class:`neon.layers.LookupTable <neon.layers.layer.LookupTable>` | ``vocab_size``, ``embedding_dim`` | Embeds input with ``vocab_size`` number of unique symbols to ``embedding_dim`` dimensions
  86. :py:class:`neon.layers.Bias <neon.layers.layer.Bias>` | | Adds a learned bias to the input
  87. Compound Layers
  88. ---------------
  89. Filtering or linear layers are often combined with a bias and an
  90. activation function. For convenience, we use
  91. :py:class:`neon.layers.CompoundLayer<neon.layers.layer.CompoundLayer>` which are simply a list of layers, to
  92. initialize these layers. For example,
  93. .. code-block:: python
  94. from neon.layers import Conv
  95. from neon.initializers import Gaussian, Constant
  96. from neon.transforms import Rectlin
  97. layers = Conv((11, 11, 64), init=Gaussian(scale=0.01), bias=Constant(0),
  98. activation=Rectlin(), name = "myConv")
  99. This code will create a convolution layer, followed by a bias layer and
  100. a rectified linear activation layer. By default, the convolution layer
  101. will be given the name ``"myConv"``, the bias layer ``"myConv_bias"``,
  102. and the activation layer ``"myConv_Rectlin"``.
  103. .. csv-table::
  104. :header: "Layer", "Description"
  105. :widths: 20, 20
  106. :delim: |
  107. :py:class:`neon.layers.Affine <neon.layers.layer.Affine>` | ``Linear`` -> ``Bias`` -> ``Activation``
  108. :py:class:`neon.layers.Conv <neon.layers.layer.Conv>` | ``Convolution`` -> ``Bias``-> ``Activation``
  109. :py:class:`neon.layers.Deconv <neon.layers.layer.Deconv>` | ``Deconvolution`` -> ``Bias`` -> ``Activation``
  110. Recurrent Layers
  111. ----------------
  112. Recurrent layers inherit from the base class :py:class:`neon.layers.Recurrent<neon.layers.recurrent.Recurrent>`.
  113. The number of recurrent units is specified by the argument
  114. ``output_size``. These layers also require the arguments
  115. ``init (Initializer)`` and ``activation (Transform)`` to seed the
  116. model's weights and activation function for the inputs-to-hidden units
  117. connections. An optional argument is ``init_inner``, which initializes
  118. the models' recurrent parameters. If absent, the initializer provided
  119. with ``init`` will be used.
  120. Additional layer-specific parameters are specified below:
  121. .. csv-table::
  122. :header: "Layer", "Parameters", "Description"
  123. :widths: 20, 20, 40
  124. :delim: |
  125. :py:class:`neon.layers.Recurrent <neon.layers.recurrent.Recurrent>` | | Recurrent layer with all-to-all connections
  126. :py:class:`neon.layers.LSTM <neon.layers.recurrent.LSTM>` | ``gate_activation`` | Long Short-Term Layer (LSTM) implementation
  127. :py:class:`neon.layers.GRU <neon.layers.recurrent.GRU>` | ``gate_activation`` | Gated Recurrent Unit (GRU)
  128. Examples of a recurrent layer with tanh units:
  129. .. code-block:: python
  130. from neon.initializers import Uniform, GlorotUniform
  131. from neon.layers import Recurrent, Affine, GRU
  132. from neon.transforms import Tanh, Softmax, Logistic
  133. init = Uniform(low=-0.08, high=0.08)
  134. # Recurrent layer with tanh units
  135. layers = [Recurrent(500, init, activation=Tanh()),
  136. Affine(1000, init, bias=init, activation=Softmax())]
  137. LSTM layer with embedding for word analysis:
  138. .. code-block:: python
  139. # LSTM layer with embedding layer
  140. layers = [
  141. LSTM(128, g_uni, activation=Tanh(),
  142. gate_activation=Logistic()),
  143. RecurrentSum(),
  144. Dropout(keep=0.5),
  145. Affine(2, g_uni, bias=GlorotUniform(), activation=Softmax())
  146. ]
  147. Network with two stacked GRU layers:
  148. .. code-block:: python
  149. # set common parameters
  150. rlayer_params = {"output_size": hidden_size, "init": init,
  151. "activation": Tanh(), "gate_activation": Logistic()}
  152. # initialize two GRU layers
  153. rlayer1, rlayer2 = GRU(**rlayer_params), GRU(**rlayer_params)
  154. # build full model
  155. layers = [
  156. LookupTable(vocab_size=1000, embedding_dim=200, init=init),
  157. rlayer1,
  158. rlayer2,
  159. Affine(1000, init, bias=init, activation=Softmax())
  160. ]
  161. Summary layers
  162. ~~~~~~~~~~~~~~
  163. A recurrent layer can be followed with layers that collapse over the
  164. time dimension in interesting ways. These layers do not have
  165. weights/parameters and therefore do not undergo any learning.
  166. .. csv-table::
  167. :header: "Layer", "Description"
  168. :widths: 20, 20
  169. :delim: |
  170. :py:class:`neon.layers.RecurrentSum <neon.layers.recurrent.RecurrentSum>` | Sums unit output over time
  171. :py:class:`neon.layers.RecurrentMean <neon.layers.recurrent.RecurrentMean>` | Averages unit output over time
  172. :py:class:`neon.layers.RecurrentLast <neon.layers.recurrent.RecurrentLast>` | Retains output from last time step only
  173. If a recurrent layer is followed by, for example, an ``Affine`` layer,
  174. and not one of the above summary layers, then the ``Affine`` layer has
  175. connections to all the units from the different time steps.