http://github.com/rajarshi/cdkr
Unknown | 77 lines | 72 code | 5 blank | 0 comment | 0 complexity | 514c875ae21e574d8ea3178938342622 MD5 | raw file
 1\name{load.molecules}
4\title{
5  Load Molecular Structures From Disk
6}
7\description{
8The CDK can read a variety of molecular structure formats. This function
9encapsulates the calls to the CDK API to load a structure given its filename
10}
11\usage{
12load.molecules(molfiles=NA, aromaticity = TRUE, typing = TRUE, isotopes = TRUE,
13               verbose=FALSE)
14iload.molecules(molfile, type="smi", aromaticity = TRUE, typing = TRUE, isotopes = TRUE,
15                skip=TRUE)
16}
17\arguments{
18  \item{molfiles}{A \code{character} vector of filenames. Note that the full
19  path to the files should be provided. URL's can also be used as
20  paths. In such a case, the URL should start with "http://"}
21  \item{molfile}{A string containing the filename to load. Must be a local file}
22  \item{type}{Indicates whether the input file is SMILES or SDF. Valid values are
23  "smi" or "sdf"}
24  \item{aromaticity}{If \code{TRUE} then aromaticity detection is
25  performed on all loaded molecules. If this fails for a given
26  molecule, then the molecule is set to NA in the return list}
27  \item{typing}{If \code{TRUE} then atom typing is
28  performed on all loaded molecules. The assigned types will be CDK
29  internal types. If this fails for a given
30  molecule, then the molecule is set to NA in the return list}
31  \item{isotopes}{If \code{TRUE} then atoms are configured with isotopic masses}
32  \item{verbose}{If TRUE, output (such as file download progress) will
33  be bountiful}
34  \item{skip}{If \code{TRUE}, then the reader will continue reading even when faced with an
35  invalid molecule. If \code{FALSE}, the reader will stop at the fist invalid molecule}
36}
37\value{
38  \code{load.molecules} returns a list of CDK \code{Molecule} objects, which can be
39  used in other rcdk functions.
40
41  \code{iload.molecules} is an iterating version of the loader and is applicable for
42  large SMILES or SDF files. In contrast to \code{load.molecules} this does not load
43  all the molecules into memory at one go, and as a result lets you process arbitrarily
44  large structure files.
45}
46\details{
47Note that if molecules are read in from formats that do not have rules for
48handling implicit hydrogens (such as MDL MOL), the molecule will not have
49implicit or explicit hydrogens. To add explicit hydrogens, make sure that the molecule
50has been typed (this is \code{TRUE} by default for this function) and then call
51\code{\link{convert.implicit.to.explicit}}. On the other hand for a format
52such as SMILES, implicit or explicit hydrogens will be present.
53}
54\examples{
55\dontrun{
56
57## load a single file
59
60## load multiple files
61mols <- load.molecules(c('mol1.sdf', 'mol2.smi',
62          'https://github.com/rajarshi/cdkr/blob/master/data/set2/dhfr00008.sdf?raw=true'))
63
64## iterate over a large file
65moliter <- iload.molecules("big.sdf", type="sdf")
66while(hasNext(moliter)) {
67  mol <- nextElem(moliter)
68  print(get.property(mol, "cdk:Title"))
69}
70}
71}
72\seealso{