PageRenderTime 113ms CodeModel.GetById 13ms RepoModel.GetById 1ms app.codeStats 0ms

/doc/modules/manifold.rst

https://github.com/joshbohde/scikit-learn
ReStructuredText | 363 lines | 264 code | 99 blank | 0 comment | 0 complexity | 50c75c5a6e30f2d174fb3521023af15d MD5 | raw file
  1. .. _manifold:
  2. .. currentmodule:: sklearn.manifold
  3. =================
  4. Manifold learning
  5. =================
  6. .. rst-class:: quote
  7. | Look for the bare necessities
  8. | The simple bare necessities
  9. | Forget about your worries and your strife
  10. | I mean the bare necessities
  11. | Old Mother Nature's recipes
  12. | That bring the bare necessities of life
  13. |
  14. | -- Baloo's song [The Jungle Book]
  15. .. figure:: ../auto_examples/manifold/images/plot_compare_methods_1.png
  16. :target: ../auto_examples/manifold/plot_compare_methods.html
  17. :align: center
  18. :scale: 60
  19. Manifold learning is an approach to nonlinear dimensionality reduction.
  20. Algorithms for this task are based on the idea that the dimensionality of
  21. many data sets is only artificially high.
  22. Introduction
  23. ============
  24. High-dimensional datasets can be very difficult to visualize. While data
  25. in two or three dimensions can be plotted to show the inherent
  26. structure of the data, equivalent high-dimensional plots are much less
  27. intuitive. To aid visualization of the structure of a dataset, the
  28. dimension must be reduced in some way.
  29. The simplest way to accomplish this dimensionality reduction is by taking
  30. a random projection of the data. Though this allows some degree of
  31. visualization of the data structure, the randomness of the choice leaves much
  32. to be desired. In a random projection, it is likely that the more
  33. interesting structure within the data will be lost.
  34. .. |digits_img| image:: ../auto_examples/manifold/images/plot_lle_digits_1.png
  35. :target: ../auto_examples/manifold/plot_lle_digits.html
  36. :scale: 50
  37. .. |projected_img| image:: ../auto_examples/manifold/images/plot_lle_digits_2.png
  38. :target: ../auto_examples/manifold/plot_lle_digits.html
  39. :scale: 50
  40. .. centered:: |digits_img| |projected_img|
  41. To address this concern, a number of supervised and unsupervised linear
  42. dimensionality reduction frameworks have been designed, such as Principal
  43. Component Analysis (PCA), Independent Component Analysis, Linear
  44. Discriminant Analysis, and others. These algorithms define specific
  45. rubrics to choose an "interesting" linear projection of the data.
  46. These methods can be powerful, but often miss important nonlinear
  47. structure in the data.
  48. .. |PCA_img| image:: ../auto_examples/manifold/images/plot_lle_digits_3.png
  49. :target: ../auto_examples/manifold/plot_lle_digits.html
  50. :scale: 50
  51. .. |LDA_img| image:: ../auto_examples/manifold/images/plot_lle_digits_4.png
  52. :target: ../auto_examples/manifold/plot_lle_digits.html
  53. :scale: 50
  54. .. centered:: |PCA_img| |LDA_img|
  55. Manifold Learning can be thought of as an attempt to generalize linear
  56. frameworks like PCA to be sensitive to nonlinear structure in data. Though
  57. supervised variants exist, the typical manifold learning problem is
  58. unsupervised: it learns the high-dimensional structure of the data
  59. from the data itself, without the use of predetermined classifications.
  60. .. topic:: Examples:
  61. * See :ref:`example_manifold_plot_lle_digits.py` for an example of
  62. dimensionality reduction on handwritten digits.
  63. * See :ref:`example_manifold_plot_compare_methods.py` for an example of
  64. dimensionality reduction on a toy "S-curve" dataset.
  65. The manifold learning implementations available in sklearn are
  66. summarized below
  67. Isomap
  68. ======
  69. One of the earliest approaches to manifold learning is the Isomap
  70. algorithm, short for Isometric Mapping. Isomap can be viewed as an
  71. extension of Multi-dimensional Scaling (MDS) or Kernel PCA.
  72. Isomap seeks a lower-dimensional embedding which maintains geodesic
  73. distances between all points. Isomap can be performed with the object
  74. :class:`Isomap`.
  75. .. figure:: ../auto_examples/manifold/images/plot_lle_digits_5.png
  76. :target: ../auto_examples/manifold/plot_lle_digits.html
  77. :align: center
  78. :scale: 50
  79. Complexity
  80. ----------
  81. The Isomap algorithm comprises three stages:
  82. 1. **Nearest neighbor search.** Isomap uses
  83. :class:`sklearn.neighbors.BallTree` for efficient neighbor search.
  84. The cost is approximately :math:`O[D \log(k) N \log(N)]`, for :math:`k`
  85. nearest neighbors of :math:`N` points in :math:`D` dimensions.
  86. 2. **Shortest-path graph search.** The most efficient known algorithms
  87. for this are *Dijkstra's Algorithm*, which is approximately
  88. :math:`O[N^2(k + \log(N))]`, or the *Floyd-Warshall algorithm*, which
  89. is :math:`O[N^3]`. The algorithm can be selected by the user with
  90. the ``path_method`` keyword of ``Isomap``. If unspecified, the code
  91. attempts to choose the best algorithm for the input data.
  92. 3. **Partial eigenvalue decomposition.** The embedding is encoded in the
  93. eigenvectors corresponding to the :math:`d` largest eigenvalues of the
  94. :math:`N \times N` isomap kernel. For a dense solver, the cost is
  95. approximately :math:`O[d N^2]`. This cost can often be improved using
  96. the ``ARPACK`` solver. The eigensolver can be specified by the user
  97. with the ``path_method`` keyword of ``Isomap``. If unspecified, the
  98. code attempts to choose the best algorithm for the input data.
  99. The overall complexity of Isomap is
  100. :math:`O[D \log(k) N \log(N)] + O[N^2(k + \log(N))] + O[d N^2]`.
  101. * :math:`N` : number of training data points
  102. * :math:`D` : input dimension
  103. * :math:`k` : number of nearest neighbors
  104. * :math:`d` : output dimension
  105. .. topic:: References:
  106. * `"A global geometric framework for nonlinear dimensionality reduction"
  107. <http://www.sciencemag.org/content/290/5500/2319.full>`_
  108. Tenenbaum, J.B.; De Silva, V.; & Langford, J.C. Science 290 (5500)
  109. Locally Linear Embedding
  110. ========================
  111. Locally linear embedding (LLE) seeks a lower-dimensional projection of the data
  112. which preserves distances within local neighborhoods. It can be thought
  113. of as a series of local Principal Component Analyses which are globally
  114. compared to find the best nonlinear embedding.
  115. Locally linear embedding can be performed with function
  116. :func:`locally_linear_embedding` or its object-oriented counterpart
  117. :class:`LocallyLinearEmbedding`.
  118. .. figure:: ../auto_examples/manifold/images/plot_lle_digits_6.png
  119. :target: ../auto_examples/manifold/plot_lle_digits.html
  120. :align: center
  121. :scale: 50
  122. Complexity
  123. ----------
  124. The standard LLE algorithm comprises three stages:
  125. 1. **Nearest Neighbors Search**. See discussion under Isomap above.
  126. 2. **Weight Matrix Construction**. :math:`O[D N k^3]`.
  127. The construction of the LLE weight matrix involves the solution of a
  128. :math:`k \times k` linear equation for each of the :math:`N` local
  129. neighborhoods
  130. 3. **Partial Eigenvalue Decomposition**. See discussion under Isomap above.
  131. The overall complexity of standard LLE is
  132. :math:`O[D \log(k) N \log(N)] + O[D N k^3] + O[d N^2]`.
  133. * :math:`N` : number of training data points
  134. * :math:`D` : input dimension
  135. * :math:`k` : number of nearest neighbors
  136. * :math:`d` : output dimension
  137. .. topic:: References:
  138. * `"Nonlinear dimensionality reduction by locally linear embedding"
  139. <http://www.sciencemag.org/content/290/5500/2323.full>`_
  140. Roweis, S. & Saul, L. Science 290:2323 (2000)
  141. Modified Locally Linear Embedding
  142. =================================
  143. One well-known issue with LLE is the regularization problem. When the number
  144. of neighbors is greater than the number of input dimensions, the matrix
  145. defining each local neighborhood is rank-deficient. To address this, standard
  146. LLE applies an arbitrary regularization parameter :math:`r`, which is chosen
  147. relative to the trace of the local weight matrix. Though it can be shown
  148. formally that as :math:`r \to 0`, the solution coverges to the desired
  149. embedding, there is no guarantee that the optimal solution will be found
  150. for :math:`r > 0`. This problem manifests itself in embeddings which distort
  151. the underlying geometry of the manifold.
  152. One method to address the regularization problem is to use multiple weight
  153. vectors in each neighborhood. This is the essence of *modified locally
  154. linear embedding* (MLLE). MLLE can be performed with function
  155. :func:`locally_linear_embedding` or its object-oriented counterpart
  156. :class:`LocallyLinearEmbedding`, with the keyword ``method = 'modified'``.
  157. It requires ``n_neighbors > out_dim``.
  158. .. figure:: ../auto_examples/manifold/images/plot_lle_digits_7.png
  159. :target: ../auto_examples/manifold/plot_lle_digits.html
  160. :align: center
  161. :scale: 50
  162. Complexity
  163. ----------
  164. The MLLE algorithm comprises three stages:
  165. 1. **Nearest Neighbors Search**. Same as standard LLE
  166. 2. **Weight Matrix Construction**. Approximately
  167. :math:`O[D N k^3] + O[N (k-D) k^2]`. The first term is exactly equivalent
  168. to that of standard LLE. The second term has to do with constructing the
  169. weight matrix from multiple weights. In practice, the added cost of
  170. constructing the MLLE weight matrix is relatively small compared to the
  171. cost of steps 1 and 3.
  172. 3. **Partial Eigenvalue Decomposition**. Same as standard LLE
  173. The overall complexity of MLLE is
  174. :math:`O[D \log(k) N \log(N)] + O[D N k^3] + O[N (k-D) k^2] + O[d N^2]`.
  175. * :math:`N` : number of training data points
  176. * :math:`D` : input dimension
  177. * :math:`k` : number of nearest neighbors
  178. * :math:`d` : output dimension
  179. .. topic:: References:
  180. * `"MLLE: Modified Locally Linear Embedding Using Multiple Weights"
  181. <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.70.382>`_
  182. Zhang, Z. & Wang, J.
  183. Hessian Eigenmapping
  184. ====================
  185. Hessian Eigenmapping (also known as Hessian-based LLE: HLLE) is another method
  186. of solving the regularization problem of LLE. It revolves around a
  187. hessian-based quadratic form at each neighborhood which is used to recover
  188. the locally linear structure. Though other implementations note its poor
  189. scaling with data size, ``sklearn`` implements some algorithmic
  190. improvements which make its cost comparable to that of other LLE variants
  191. for small output dimension. HLLE can be performed with function
  192. :func:`locally_linear_embedding` or its object-oriented counterpart
  193. :class:`LocallyLinearEmbedding`, with the keyword ``method = 'hessian'``.
  194. It requires ``n_neighbors > out_dim * (out_dim + 3) / 2``.
  195. .. figure:: ../auto_examples/manifold/images/plot_lle_digits_8.png
  196. :target: ../auto_examples/manifold/plot_lle_digits.html
  197. :align: center
  198. :scale: 50
  199. Complexity
  200. ----------
  201. The HLLE algorithm comprises three stages:
  202. 1. **Nearest Neighbors Search**. Same as standard LLE
  203. 2. **Weight Matrix Construction**. Approximately
  204. :math:`O[D N k^3] + O[N d^6]`. The first term reflects a similar
  205. cost to that of standard LLE. The second term comes from a QR
  206. decomposition of the local hessian estimator.
  207. 3. **Partial Eigenvalue Decomposition**. Same as standard LLE
  208. The overall complexity of standard HLLE is
  209. :math:`O[D \log(k) N \log(N)] + O[D N k^3] + O[N d^6] + O[d N^2]`.
  210. * :math:`N` : number of training data points
  211. * :math:`D` : input dimension
  212. * :math:`k` : number of nearest neighbors
  213. * :math:`d` : output dimension
  214. .. topic:: References:
  215. * `"Hessian Eigenmaps: Locally linear embedding techniques for
  216. high-dimensional data" <http://www.pnas.org/content/100/10/5591>`_
  217. Donoho, D. & Grimes, C. Proc Natl Acad Sci USA. 100:5591 (2003)
  218. Local Tangent Space Alignment
  219. =============================
  220. Though not technically a variant of LLE, Local tangent space alignment (LTSA)
  221. is algorithmically similar enough to LLE that it can be put in this category.
  222. Rather than focusing on preserving neighborhood distances as in LLE, LTSA
  223. seeks to characterize the local geometry at each neighborhood via its
  224. tangent space, and performs a global optimization to align these local
  225. tangent spaces to learn the embedding. LTSA can be performed with function
  226. :func:`locally_linear_embedding` or its object-oriented counterpart
  227. :class:`LocallyLinearEmbedding`, with the keyword ``method = 'ltsa'``.
  228. .. figure:: ../auto_examples/manifold/images/plot_lle_digits_9.png
  229. :target: ../auto_examples/manifold/plot_lle_digits.html
  230. :align: center
  231. :scale: 50
  232. Complexity
  233. ----------
  234. The LTSA algorithm comprises three stages:
  235. 1. **Nearest Neighbors Search**. Same as standard LLE
  236. 2. **Weight Matrix Construction**. Approximately
  237. :math:`O[D N k^3] + O[k^2 d]`. The first term reflects a similar
  238. cost to that of standard LLE.
  239. 3. **Partial Eigenvalue Decomposition**. Same as standard LLE
  240. The overall complexity of standard LTSA is
  241. :math:`O[D \log(k) N \log(N)] + O[D N k^3] + O[k^2 d] + O[d N^2]`.
  242. * :math:`N` : number of training data points
  243. * :math:`D` : input dimension
  244. * :math:`k` : number of nearest neighbors
  245. * :math:`d` : output dimension
  246. .. topic:: References:
  247. * `"Principal manifolds and nonlinear dimensionality reduction via
  248. tangent space alignment"
  249. <http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.3693>`_
  250. Zhang, Z. & Zha, H. Journal of Shanghai Univ. 8:406 (2004)
  251. Tips on practical use
  252. =====================
  253. * Make sure the same scale is used over all features. Because manifold
  254. learning methods are based on a nearest-neighbor search, the algorithm
  255. may perform poorly otherwise. See :ref:`Scaler <preprocessing_scaler>`
  256. for convenient ways of scaling heterogeneous data.
  257. * The reconstruction error computed by each routine can be used to choose
  258. the optimal output dimension. For a :math:`d`-dimensional manifold embedded
  259. in a :math:`D`-dimensional parameter space, the reconstruction error will
  260. decrease as ``out_dim`` is increased until ``out_dim == d``.
  261. * Note that noisy data can "short-circuit" the manifold, in essence acting
  262. as a bridge between parts of the manifold that would otherwise be
  263. well-separated. Manifold learning on noisy and/or incomplete data is
  264. an active area of research.