spell-correct.py - This code implements a simple spell chec…

/spell-correct.py

http://spell-correct-in-go.googlecode.com/ · Python · 36 lines · 27 code · 9 blank · 0 comment · 20 complexity · 5dfa56ceca4676533ad7a94d5161de51 MD5 · raw file


from datetime import datetime
import re, collections

def words(text): return re.findall('[a-z]+', text.lower())

def train(features):
    model = collections.defaultdict(lambda: 1)
    for f in features:
        model[f] += 1
    return model

NWORDS = train(words(file('big.txt').read()))

alphabet = 'abcdefghijklmnopqrstuvwxyz'

def edits1(word):
   splits     = [(word[:i], word[i:]) for i in range(len(word) + 1)]
   deletes    = [a + b[1:] for a, b in splits if b]
   transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]
   replaces   = [a + c + b[1:] for a, b in splits for c in alphabet if b]
   inserts    = [a + c + b     for a, b in splits for c in alphabet]
   return set(deletes + transposes + replaces + inserts)

def known_edits2(word):
    return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)

def known(words): return set(w for w in words if w in NWORDS)

def correct(word):
    candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]
    return max(candidates, key=NWORDS.get)

startTime = datetime.now()
for i in range(100):
    correct('korrecter')
print datetime.now() - startTime

Summary ✨

This code implements a simple spell checker using a dictionary of English words. It reads a large text file, trains a model on its word frequencies, and then uses this model to correct misspelled words by suggesting the most common alternatives. The code measures the time it takes to correct 100 instances of a specific misspelling, ‘korrecter’.

Tech Fingerprint

Alerts (11)

'def' Ensure functions have docstrings for documentation
4 6 16 24 27 29
Complexity hotspot; lines 17 to 21 (total complexity: 10)
17 18 19 20 21