/README.md
Markdown | 135 lines | 105 code | 30 blank | 0 comment | 0 complexity | 06d84be4190ff2d3d2a6afac1e7888f6 MD5 | raw file
- # Fork this project and send back a pull request.
- ## What to do
- Create a simple RESTful API that imports data from GitHub Search API https://developer.github.com/v3/search/
- Import repositories that have been written in Python and have more than 500 stars. Import just a few fields from GitHub Search API:
- `full_name`, `html_url`, `description`, `stargazers_count`, `language`.
-
- You may use any database or just store repositories in a RAM.
- REQUIREMENTS:
- 1. It is necessary to do a script/function/endpoint to fill the database.
- 2. Add pagination to API.
- 3. Add docsrings and comments to your code.
- 4. Describe your solution in the README.md file.
- ## Bonus 1
- Add sorting by `stars` to an API.
- ## Bonus 2
- Use docker compose encapsulating all the services related to this app.
- ## Some help
- * How to fork https://confluence.atlassian.com/bitbucket/forking-a-repository-221449527.html
- * How to create a pull request https://confluence.atlassian.com/bitbucket/create-a-pull-request-774243413.html
- ## Describing my solution
- According to the task to create a simple RESTful API Server which will be
- working with the GitHub API I decided to use such tools like:
- 1. Framework: I choose asyncio and aiohttp, because at first these both modules
- work asynchronously that's why they are faster, at second asyncio is the part
- of the python 3.4+ versions (we could use it "from the box") and at last
- current task to create API Server with 1 endpoint is minor task and this is
- no sense to use large frameworks like Django, Flask and etc...
- 2. DB: after reading GitHub's API manul I know that GitHub response data in json
- format. The best way to work with JSON at this case is to use MongoDB,
- because its Document Oriented Storage and we haven't to convert data for store.
- DB schema:
- "github" - database
- "repositories" - collection for storing data from github
- create user for connection to db:
- "db.createUser({user:"admin", pwd:"admin123", roles:[{role:"root", db:"admin"}]})"
- 3. Pagination: According to GitHub API manual for developers the API will
- automatically paginate the requested items. "Different API calls respond
- with different defaults. For example, a call to list GitHub's public
- repositories provides paginated items in sets of 30, whereas a call to
- the GitHub Search API provides items in sets of 100".
- See https://developer.github.com/v3/#pagination
- I will use for pagination parameters "?page" and "per_page" in requests and
- in response I will add field with "The Link header includes pagination information:"
- Example below:
- "Link: <https://api.github.com/user/repos?page=3&per_page=100>; rel="next",
- <https://api.github.com/user/repos?page=50&per_page=100>; rel="last""
- 4. Authentication: Authentication user could send 5000 request per hour and
- unauthentication user could send 60 request per hour. Some endpoints at GitHub
- Search API require that request from authenticate user.
- A wrote code wich authorize user by his credentials like 'username' and 'password'.
- But I haven't user acc for testing in bitbucket (atlassian) that's why I
- comment lines 54-55
- """
- #auth = aiohttp.BasicAuth(login=config['username'],
- # password=config['password'])
- """
- and define variable "auth" like None.
- 5. How to check that server is work.
- Type in the browser:
- http://{HOST}:{PORT}/github/searh/v0.1?language=python&stars=600&per_page=100
- According to my settings it looks:
- http://localhost:11071/github/searh/v0.1?language=python&stars=600&per_page=100
- And you have to get response:
- {
- "items": [{
- "full_name": "vinta/awesome-python",
- "html_url": "https://github.com/vinta/awesome-python",
- "description": "A curated list of awesome Python frameworks, libraries, software and resources",
- "stargazers_count": 50726,
- "language": "Python"
- },
- ......
- , {
- "full_name": "Miserlou/Zappa",
- "html_url": "https://github.com/Miserlou/Zappa",
- "description": "Serverless Python",
- "stargazers_count": 7125,
- "language": "Python"
- }, {
- "full_name": "RaRe-Technologies/gensim",
- "html_url": "https://github.com/RaRe-Technologies/gensim",
- "description": "Topic Modelling for Humans",
- "stargazers_count": 7100,
- "language": "Python"
- }],
- "pagination": "<https://api.github.com/search/repositories?q=language%3Apython+stars%3A%3E500&sort=stars&page=2&per_page=100>; rel=\"next\", <https://api.github.com/search/repositories?q=language%3Apython+stars%3A%3E500&sort=stars&page=10&per_page=100>; rel=\"last\""
- 6. NOT GOOD solutions in my code (In my opinion):
- - Construction
- "async with aiohttp.ClientSession(auth=auth) as session:" (line 60) doesn't
- recommend to use inside the event handler. Link to manual:
- "https://docs.aiohttp.org/en/stable/client_advanced.html?highlight=ClientSession"
- But at this server we have only 1 endpoint and 1 handler (and little free time)
- - and this was make sense here.
- 7. How much time did I spend:
- 2 hours - to read documents for GitHub Search API and googling info about it.
- The major part of the time spent for looking for filters which depend fields
- in JSON. ('full_name','html_url','description','stargazers_count','language').
- But I didn't.
- 1 hours - to choose technologies and tools.
- 11 hours - for developing and debugging + fighting with mongodb launching.
- PS: I had BLOCKERS: In my laptop (OSX High Sierrs) was python 3.4 as default
- in system and I didn't update OS for long long period. As a result when I
- start updating python3.4 to python3.6.5 I got a problems.
- All fridays evening I fixed this.
- That's all)
- If you have a question you could write or call me. I answer with a pleasure.