README.md | searchcode

/README.md

https://bitbucket.org/A_Red/challenge-backend · Markdown · 135 lines · 105 code · 30 blank · 0 comment · 0 complexity · 06d84be4190ff2d3d2a6afac1e7888f6 MD5 · raw file

# Fork this project and send back a pull request.

## What to do
Create a simple RESTful API that imports data from GitHub Search API https://developer.github.com/v3/search/

Import repositories that have been written in Python and have more than 500 stars. Import just a few fields from GitHub Search API:
 `full_name`, `html_url`, `description`, `stargazers_count`, `language`.
 
You may use any database or just store repositories in a RAM.

REQUIREMENTS:
1. It is necessary to do a script/function/endpoint to fill the database.
2. Add pagination to API.
3. Add docsrings and comments to your code.
4. Describe your solution in the README.md file.

## Bonus 1
Add sorting by `stars` to an API.

## Bonus 2
Use docker compose encapsulating all the services related to this app.

## Some help
* How to fork https://confluence.atlassian.com/bitbucket/forking-a-repository-221449527.html
* How to create a pull request https://confluence.atlassian.com/bitbucket/create-a-pull-request-774243413.html

## Describing my solution
According to the task to create a simple RESTful API Server which will be
working with the GitHub API I decided to use such tools like:

1. Framework: I choose asyncio and aiohttp, because at first these both modules
   work asynchronously that's why they are faster, at second asyncio is the part
   of the python 3.4+ versions (we could use it "from the box") and at last
   current task to create API Server with 1 endpoint is minor task and this is
   no sense to use large frameworks like Django, Flask and etc...

2. DB: after reading GitHub's API manul I know that GitHub response data in json
   format. The best way to work with JSON at this case is to use MongoDB,
   because its Document Oriented Storage and we haven't to convert data for store.
   DB schema:
        "github" - database
        "repositories" - collection for storing data from github
        create user for connection to db:
        "db.createUser({user:"admin", pwd:"admin123", roles:[{role:"root", db:"admin"}]})"

3. Pagination: According to GitHub API manual for developers the API will
  automatically paginate the requested items. "Different API calls respond
  with different defaults. For example, a call to list GitHub's public
  repositories provides paginated items in sets of 30, whereas a call to
  the GitHub Search API provides items in sets of 100".

  See https://developer.github.com/v3/#pagination

  I will use for pagination parameters "?page" and "per_page" in requests and
  in response I will add field with "The Link header includes pagination information:"
  Example below:
  "Link: <https://api.github.com/user/repos?page=3&per_page=100>; rel="next",
  <https://api.github.com/user/repos?page=50&per_page=100>; rel="last""

4. Authentication: Authentication user could send 5000 request per hour and
   unauthentication user could send 60 request per hour. Some endpoints at GitHub
   Search API require that request from authenticate user.

   A wrote code wich authorize user by his credentials like 'username' and 'password'.
   But I haven't user acc for testing in bitbucket (atlassian) that's why I
   comment lines 54-55
   """
   #auth = aiohttp.BasicAuth(login=config['username'],
   #                          password=config['password'])
   """
   and define variable "auth" like None.

5. How to check that server is work.
   Type in the browser:
   http://{HOST}:{PORT}/github/searh/v0.1?language=python&stars=600&per_page=100

   According to my settings it looks:
   http://localhost:11071/github/searh/v0.1?language=python&stars=600&per_page=100

   And you have to get response:

   {
    "items": [{
        "full_name": "vinta/awesome-python",
        "html_url": "https://github.com/vinta/awesome-python",
        "description": "A curated list of awesome Python frameworks, libraries, software and resources",
        "stargazers_count": 50726,
        "language": "Python"
    },
   ......
    , {
        "full_name": "Miserlou/Zappa",
        "html_url": "https://github.com/Miserlou/Zappa",
        "description": "Serverless Python",
        "stargazers_count": 7125,
        "language": "Python"
    }, {
        "full_name": "RaRe-Technologies/gensim",
        "html_url": "https://github.com/RaRe-Technologies/gensim",
        "description": "Topic Modelling for Humans",
        "stargazers_count": 7100,
        "language": "Python"
    }],
    "pagination": "<https://api.github.com/search/repositories?q=language%3Apython+stars%3A%3E500&sort=stars&page=2&per_page=100>; rel=\"next\", <https://api.github.com/search/repositories?q=language%3Apython+stars%3A%3E500&sort=stars&page=10&per_page=100>; rel=\"last\""

6. NOT GOOD solutions in my code (In my opinion):
  - Construction
  "async with aiohttp.ClientSession(auth=auth) as session:" (line 60) doesn't
  recommend to use inside the event handler. Link to manual:
  "https://docs.aiohttp.org/en/stable/client_advanced.html?highlight=ClientSession"

  But at this server we have only 1 endpoint and 1 handler (and little free time)
  - and this was make sense here.

7. How much time did I spend:
   2 hours - to read documents for GitHub Search API and googling info about it.
   The major part of the time spent for looking for filters which depend fields
   in JSON. ('full_name','html_url','description','stargazers_count','language').
   But I didn't.
   1 hours - to choose technologies and tools.
   11 hours - for developing and debugging + fighting with mongodb launching.

   PS: I had BLOCKERS: In my laptop (OSX High Sierrs) was python 3.4 as default
   in system and I didn't update OS for long long period. As a result when I
   start updating python3.4 to python3.6.5 I got a problems.
   All fridays evening I fixed this.


That's all)

If you have a question you could write or call me. I answer with a pleasure.