/fingerprint/man/linefunc.Rd

http://github.com/rajarshi/cdkr · Unknown · 54 lines · 50 code · 4 blank · 0 comment · 0 complexity · f195b3813a235780f5de974030e1635d MD5 · raw file

  1. \name{cdk.lf, moe.lf, bci.lf}
  2. \alias{cdk.lf}
  3. \alias{moe.lf}
  4. \alias{bci.lf}
  5. \alias{ecfp.lf}
  6. \alias{fps.lf}
  7. \alias{jchem.binary.lf}
  8. \title{
  9. Functions to parse lines from fingerprint files
  10. }
  11. \description{
  12. These functions take a single line and parses it to produce
  13. a vector of integers which represents the position of the 'on' bits in
  14. a fingerprint. This allows the user to use \code{read.fp} with arbitrary fingerprint
  15. files. A new file format can be handled by defining a new line parser function.
  16. Currently the first three functions process fingerprint files obtained from the
  17. CDK (\url{http://cdk.sourceforge.net}), MOE (\url{http://chemcomp.com}), BCI
  18. (\url{http://www.digitalchemistry.co.uk/}) and the FPS format
  19. (\url{http://code.google.com/p/chem-fingerprints/wiki/FPS}). The last function can be used
  20. for any fingerprint that generates hashed features (such as ECFPs or other
  21. circular fingerprints). For these cases, it is assumed that features are unsigned
  22. integers, so string features are not handled.
  23. Note that when the \code{fps.lf} function is specified, items such as the number of bits
  24. or the header flag do not need to be specified, as the format requires a header block
  25. containing some of these items.
  26. }
  27. \usage{
  28. cdk.lf(line)
  29. moe.lf(line)
  30. bci.lf(line)
  31. ecfp.lf(line)
  32. fps.lf(line)
  33. jchem.binary.lf(line)
  34. }
  35. \arguments{
  36. \item{line}{
  37. The line to parse
  38. }
  39. }
  40. \value{
  41. A list with three componenents - the name associated with the fingerprint (if available)
  42. and a vector of integers representing bits set to 1 (for the case of the first three
  43. methods) or a vector of characters representing hashed features (characteristic of
  44. circular fingerprints) or more generally, any string feature. The third component is a
  45. (possibly empty) list, which contains the remaining components of a line, when the format
  46. allows items other than an a title and the fingerprint (such as the FPS format). The content
  47. of the third component is dependent on the line function that is being used.
  48. }
  49. \author{Rajarshi Guha \email{rajarshi.guha@gmail.com}}
  50. \keyword{logic}