/docs/source/install.rst

https://github.com/dask/dask · ReStructuredText · 140 lines · 105 code · 35 blank · 0 comment · 0 complexity · 7a29628ae1cef81f3e9fd50a232316d8 MD5 · raw file

  1. Install Dask
  2. ============
  3. You can install dask with ``conda``, with ``pip``, or by installing from source.
  4. Conda
  5. -----
  6. Dask is installed by default in `Anaconda <https://www.anaconda.com/download/>`_.
  7. You can update Dask using the `conda <https://www.anaconda.com/download/>`_ command::
  8. conda install dask
  9. This installs Dask and **all** common dependencies, including Pandas and NumPy.
  10. Dask packages are maintained both on the default channel and on `conda-forge <https://conda-forge.github.io/>`_.
  11. Optionally, you can obtain a minimal Dask installation using the following command::
  12. conda install dask-core
  13. This will install a minimal set of dependencies required to run Dask similar to (but not exactly the same as) ``python -m pip install dask`` below.
  14. Pip
  15. ---
  16. You can install everything required for most common uses of Dask (arrays,
  17. dataframes, ...) This installs both Dask and dependencies like NumPy, Pandas,
  18. and so on that are necessary for different workloads. This is often the right
  19. choice for Dask users::
  20. python -m pip install "dask[complete]" # Install everything
  21. You can also install only the Dask library. Modules like ``dask.array``,
  22. ``dask.dataframe``, ``dask.delayed``, or ``dask.distributed`` won't work until you also install NumPy,
  23. Pandas, Toolz, or Tornado, respectively. This is common for downstream library
  24. maintainers::
  25. python -m pip install dask # Install only core parts of dask
  26. We also maintain other dependency sets for different subsets of functionality::
  27. python -m pip install "dask[array]" # Install requirements for dask array
  28. python -m pip install "dask[bag]" # Install requirements for dask bag
  29. python -m pip install "dask[dataframe]" # Install requirements for dask dataframe
  30. python -m pip install "dask[delayed]" # Install requirements for dask delayed
  31. python -m pip install "dask[distributed]" # Install requirements for distributed dask
  32. We have these options so that users of the lightweight core Dask scheduler
  33. aren't required to download the more exotic dependencies of the collections
  34. (Numpy, Pandas, Tornado, etc.).
  35. Install from Source
  36. -------------------
  37. To install Dask from source, clone the repository from `github
  38. <https://github.com/dask/dask>`_::
  39. git clone https://github.com/dask/dask.git
  40. cd dask
  41. python -m pip install .
  42. You can also install all dependencies as well::
  43. python -m pip install ".[complete]"
  44. You can view the list of all dependencies within the ``extras_require`` field
  45. of ``setup.py``.
  46. Or do a developer install by using the ``-e`` flag::
  47. python -m pip install -e .
  48. Anaconda
  49. --------
  50. Dask is included by default in the `Anaconda distribution <https://www.anaconda.com/download>`_.
  51. Optional dependencies
  52. ---------------------
  53. Specific functionality in Dask may require additional optional dependencies.
  54. For example, reading from Amazon S3 requires ``s3fs``.
  55. These optional dependencies and their minimum supported versions are listed below.
  56. +---------------+----------+--------------------------------------------------------------+
  57. | Dependency | Version | Description |
  58. +===============+==========+==============================================================+
  59. | bokeh | >=1.0.0 | Visualizing dask diagnostics |
  60. +---------------+----------+--------------------------------------------------------------+
  61. | cloudpickle | >=0.2.2 | Pickling support for Python objects |
  62. +---------------+----------+--------------------------------------------------------------+
  63. | cityhash | | Faster hashing of arrays |
  64. +---------------+----------+--------------------------------------------------------------+
  65. | distributed | >=2.0 | Distributed computing in Python |
  66. +---------------+----------+--------------------------------------------------------------+
  67. | fastparquet | | Storing and reading data from parquet files |
  68. +---------------+----------+--------------------------------------------------------------+
  69. | fsspec | >=0.6.0 | Used for local, cluster and remote data IO |
  70. +---------------+----------+--------------------------------------------------------------+
  71. | gcsfs | >=0.4.0 | File-system interface to Google Cloud Storage |
  72. +---------------+----------+--------------------------------------------------------------+
  73. | murmurhash | | Faster hashing of arrays |
  74. +---------------+----------+--------------------------------------------------------------+
  75. | numpy | >=1.13.0 | Required for dask.array |
  76. +---------------+----------+--------------------------------------------------------------+
  77. | pandas | >=0.23.0 | Required for dask.dataframe |
  78. +---------------+----------+--------------------------------------------------------------+
  79. | partd | >=0.3.10 | Concurrent appendable key-value storage |
  80. +---------------+----------+--------------------------------------------------------------+
  81. | psutil | | Enables a more accurate CPU count |
  82. +---------------+----------+--------------------------------------------------------------+
  83. | pyarrow | >=0.14.0 | Python library for Apache Arrow |
  84. +---------------+----------+--------------------------------------------------------------+
  85. | s3fs | >=0.4.0 | Reading from Amazon S3 |
  86. +---------------+----------+--------------------------------------------------------------+
  87. | sqlalchemy | | Writing and reading from SQL databases |
  88. +---------------+----------+--------------------------------------------------------------+
  89. | cytoolz/toolz | >=0.8.2 | Utility functions for iterators, functions, and dictionaries |
  90. +---------------+----------+--------------------------------------------------------------+
  91. | xxhash | | Faster hashing of arrays |
  92. +---------------+----------+--------------------------------------------------------------+
  93. Test
  94. ----
  95. Test Dask with ``py.test``::
  96. cd dask
  97. py.test dask
  98. Please be aware that installing Dask naively may not install all
  99. requirements by default. Please read the ``pip`` section above which discusses
  100. requirements. You may choose to install the ``dask[complete]`` version which includes
  101. all dependencies for all collections. Alternatively, you may choose to test
  102. only certain submodules depending on the libraries within your environment.
  103. For example, to test only Dask core and Dask array we would run tests as
  104. follows::
  105. py.test dask/tests dask/array/tests