Dask (software)
Dask is an open source library for parallel computing written in Python.[2][3] Originally developed by Matthew Rocklin, Dask is a community project maintained and sponsored by developers and organizations.
Original author(s) | Matthew Rocklin |
---|---|
Developer(s) | Dask |
Initial release | October 28, 2018 |
Stable release | 2.25.0
/ August 28, 2020 |
Repository | Dask Repository |
Written in | Python[1] |
Operating system | Linux, Microsoft Windows, macOS |
Available in | Python |
Type | Data analytics |
License | New BSD |
Website | dask |
Overview
Dask is a library composed of two parts. It includes a task scheduling component for building dependency graphs and scheduling tasks. Second, it includes the distributed data structures with APIs similar to Pandas Dataframes or NumPy arrays. Dask has a variety of use cases and can be run with a single node and scale to thousand node clusters.[4]
References
- "Dask: Parallel Computation with Blocked algorithms and Task Scheduling" (PDF).
This paper introduces dask, a specification to encode parallel algorithms, using primitive Python dictionaries, tuples, and callables.
- Daniel, Jesse C. (2019). Data Science at Scale with Python and Dask. Manning Publications. ISBN 9781617295607.
- Rocklin, Matthew (2015). "Dask: Parallel Computation with Blocked algorithms and Task Scheduling". Proceedings of the 14th Python in Science Conference: 126–132. doi:10.25080/Majora-7b98e3ed-013.
- https://docs.dask.org/en/latest/
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.