/README.md

https://github.com/nikhilkumarsingh/content-downloader · Markdown · 182 lines · 120 code · 62 blank · 0 comment · 0 complexity · 4c608d3785747ac5e4dc99a5ce509e62 MD5 · raw file

  1. [![PyPI](https://img.shields.io/badge/PyPi-v1.5-f39f37.svg)](https://pypi.python.org/pypi/ctdl)
  2. [![license](https://img.shields.io/github/license/mashape/apistatus.svg?maxAge=2592000)](https://github.com/nikhilkumarsingh/content-downloader/blob/master/LICENSE.txt)
  3. # content-downloader
  4. **content-downloader** a.k.a **ctdl** is a python package with **command line utility** and **desktop GUI** to download files on any topic in bulk!
  5. ![](https://media.giphy.com/media/3oKIPlt7APHqWuVl3q/giphy.gif)
  6. ![](https://media.giphy.com/media/xUPGcIvGpH3KvEmlnG/giphy.gif)
  7. ## Features
  8. - ctdl can be used as a command line utility as well as a desktop GUI.
  9. - ctdl fetches file links related to a search query from **Google Search**.
  10. - Files can be downloaded parallely using multithreading.
  11. - ctdl is Python 2 as well as Python 3 compatible.
  12. ## Installation
  13. - To install content-downloader, simply,
  14. ```
  15. $ pip install ctdl
  16. ```
  17. - There seem to be some issues with parallel progress bars in tqdm which have
  18. been resolved in this [pull](https://github.com/tqdm/tqdm/pull/385). Until this pull is merged, please use my patch by running this command:
  19. ```
  20. $ pip install -U git+https://github.com/nikhilkumarsingh/tqdm
  21. ```
  22. ## Desktop GUI usage
  23. To use **ctdl** desktop GUI, open terminal and run this command:
  24. ```
  25. $ ctdl-gui
  26. ```
  27. ## Command line usage
  28. ```
  29. $ ctdl [-h] [-f FILE_TYPE] [-l LIMIT] [-d DIRECTORY] [-p] [-a] [-t]
  30. [-minfs MIN_FILE_SIZE] [-maxfs MAX_FILE_SIZE] [-nr]
  31. [query]
  32. ```
  33. Optional arguments are:
  34. - -f FILE_TYPE : set the file type. (can take values like ppt, pdf, xml, etc.)
  35. Default value: pdf
  36. - -l LIMIT : specify the number of files to download.
  37. Default value: 10
  38. - -d DIRECTORY : specify the directory where files will be stored.
  39. Default: A directory with same name as the search query in the current directory.
  40. - -p : for parallel downloading.
  41. - -minfs MIN_FILE_SIZE : specify minimum file size to download in Kilobytes (KB).
  42. Default: 0
  43. - -maxfs MAX_FILE_SIZE : specify maximum file size to download in Kilobytes (KB).
  44. Default: -1 (represents no maximum file size)
  45. - -nr : prevent download redirects.
  46. Default: False
  47. ## Examples
  48. - To get list of available filetypes:
  49. ```
  50. $ ctdl -a
  51. ```
  52. - To get list of potential high threat filetypes:
  53. ```
  54. $ ctdl -t
  55. ```
  56. - To download pdf files on topic 'python':
  57. ```
  58. $ ctdl python
  59. ```
  60. This is the default behaviour which will download 10 pdf files in a folder named 'python' in current directory.
  61. - To download 3 ppt files on 'health':
  62. ```
  63. $ ctdl -f ppt -l 3 health
  64. ```
  65. - To explicitly specify download folder:
  66. ```
  67. $ ctdl -d /home/nikhil/Desktop/ml-pdfs machine-learning
  68. ```
  69. - To download files parallely:
  70. ```
  71. $ ctdl -f pdf -p python
  72. ```
  73. - To search for and download in parallel 10 files in PDF format containing
  74. the text "python" and "algorithm", without allowing any url redirects,
  75. and where the file size is between 10,000 KB (10 MB) and 100,000KB (100 MB),
  76. where KB means Kilobytes, which has an equivalent value expressed in Megabytes:
  77. ```
  78. $ ctdl -f pdf -l 10 -minfs 10000 -maxfs 100000 -nr -p "python algorithm"
  79. ```
  80. ## Usage in Python files
  81. ```python
  82. from ctdl import ctdl
  83. ctdl.download_content(
  84. file_type = 'ppt',
  85. limit = 5,
  86. directory = '/home/nikhil/Desktop/ml-pdfs',
  87. query = 'machine learning using python')
  88. ```
  89. ## TODO
  90. - [X] Prompt user before downloading potentially threatful files
  91. - [X] Create ctdl GUI
  92. - [ ] Implement unit testing
  93. - [ ] Use DuckDuckgo API as an option
  94. ## Want to contribute?
  95. - Clone the repository
  96. ```
  97. $ git clone http://github.com/nikhilkumarsingh/content-downloader
  98. ```
  99. - Install dependencies
  100. ```
  101. $ pip install -r requirements.txt
  102. ```
  103. **Note:** There seem to be some issues with current version of tqdm. If you do not get
  104. expected progress bar behaviour, try this patch:
  105. ```
  106. $ pip uninstall tqdm
  107. $ pip install git+https://github.com/nikhilkumarsingh/tqdm
  108. ```
  109. - In ctdl/ctdl.py, remove the `.` prefix from `.downloader` and `.utils` for
  110. the following imports, so it changes from:
  111. ```python
  112. from .downloader import download_series, download_parallel
  113. from .utils import FILE_EXTENSIONS, THREAT_EXTENSIONS
  114. ```
  115. to:
  116. ```python
  117. from downloader import download_series, download_parallel
  118. from utils import FILE_EXTENSIONS, THREAT_EXTENSIONS
  119. ```
  120. - Run the python file directly `python ctdl/ctdl.py ___` (instead of with `ctdl ___`)