PageRenderTime 30ms CodeModel.GetById 35ms RepoModel.GetById 1ms app.codeStats 0ms

/README.md

https://github.com/mrowl/filmdata
Markdown | 31 lines | 18 code | 13 blank | 0 comment | 0 complexity | e529c8c1bf853dcbbabff663aa72ad8c MD5 | raw file
  1. filmdata
  2. ========
  3. Technically I use this to import lots of film data for my site, [filmlust](http://filmlust.com), but it's mostly just a playground for me. I see something that looks interesting and I bring it in here and build a prototype to experiment with it (mainly because that's how I learn). For instance, buried deep in the bowels of the code lies at least 3 implementations for scrapers (using tornado, gevent, and twisted. I think I'm going to settle on gevent, btw).
  4. Sometimes things will reach a quiescent point and I clean it up a bit (e.g. the dynamic sqlalchemy models for automagically making tables for plugins). I guess what I'm saying is use at your own risk and feel free to contribute because there are probably many superior solutions out there and I'd like to see them.
  5. Anyway, this thing will fetch raw data from the following sources:
  6. * [imdb](http://www.imdb.com/interfaces)
  7. * [netflix](http://developer.netflix.com/)
  8. * [flixster/rotten tomatoes](http://developer.rottentomatoes.com/)
  9. That data is then imported into one of the following sinks:
  10. * [MongoDB](http://www.mongodb.org)
  11. The following sinks are currently broken:
  12. * [SQLAlchemy](http://www.sqlalchemy.org) (use your preferred relational db behind it)
  13. Coming soon: freebase source? box office data? sqlalchemy working?
  14. Usage
  15. -----
  16. Copy the `config_sample.ini` to `config.ini` and edit it to your liking. There are also a couple more static options in the base `__init__.py` file.
  17. This will show you all the options for fetching and importing:
  18. python main.py --help