PageRenderTime 46ms CodeModel.GetById 20ms RepoModel.GetById 0ms app.codeStats 0ms

/helpers/hadoop/wikipedia/lighttag/spots.py

https://github.com/champ1/twittomatic
Python | 23 lines | 18 code | 5 blank | 0 comment | 5 complexity | 60f1ba54acc85b38ecd3639b8c8eb1ee MD5 | raw file
  1. import sys
  2. import acora
  3. class SpotsFinder(object):
  4. def __init__(self, spotfile='anchors-sorted.txt'):
  5. builder = acora.AcoraBuilder()
  6. with open(spotfile, 'r') as inputfile:
  7. for count, line in enumerate(inputfile):
  8. builder.add(line.rstrip("\n"))
  9. print "Building the tree"
  10. self.tree = builder.build()
  11. def findall(self, contents):
  12. for word, start in self.tree.findall(contents):
  13. yield word, start, len(word) + start
  14. if __name__ == "__main__":
  15. finder = SpotsFinder()
  16. text = sys.argv[1]
  17. for word, start, end in finder.findall(text):
  18. print "Found spot %s start: %d end: %d" % (word, start, end)