PageRenderTime 46ms CodeModel.GetById 17ms RepoModel.GetById 1ms app.codeStats 0ms

/jodd-mail/src/main/java/jodd/mail/EmailAddress.java

https://bitbucket.org/cng1985/jodd
Java | 300 lines | 175 code | 7 blank | 118 comment | 0 complexity | ba77fd843320ef8876e9a31044fa200d MD5 | raw file
  1. // Copyright (c) 2003-2012, Jodd Team (jodd.org). All Rights Reserved.
  2. package jodd.mail;
  3. import java.util.regex.Pattern;
  4. /**
  5. * A utility class to parse, clean up, and extract email addresses from messages
  6. * per RFC2822 syntax. Designed to integrate with Javamail (this class will require that you
  7. * have a javamail mail.jar in your classpath), but you could easily change
  8. * the existing methods around to not use Javamail at all. For example, if you're changing
  9. * the code, see the difference between getInternetAddress and getDomain: the latter doesn't
  10. * depend on any javamail code. This is all a by-product of what this class was written for,
  11. * so feel free to modify it to suit your needs.
  12. * <p>
  13. * For real-world addresses, this class is roughly 3-4 times slower than parsing with
  14. * InternetAddress, but
  15. * it can handle a whole lot more. Because of sensible design tradeoffs made in javamail, if
  16. * InternetAddress has trouble parsing,
  17. * it might throw an exception, but often it will silently leave the entire original string
  18. * in the result of ia.getAddress(). This class can be trusted to only provide authenticated
  19. * results.
  20. * <p>
  21. * This class has been tested on a few thousand real-world addresses, and is live in
  22. * production environments, but you may want to do some of your own testing to ensure
  23. * that it works for you. In other words, it's not beta, but it's not guaranteed yet.
  24. * <p>
  25. * Comments/Questions/Corrections welcome: java &lt;at&gt; caseyconnor.org
  26. * <p>
  27. * Started with code by Les Hazlewood:
  28. * <a href="http://www.leshazlewood.com">leshazlewood.com</a>.
  29. * <p>
  30. * Modified/added: removed some functions, added support for CFWS token,
  31. * corrected FWSP token, added some boolean flags, added getInternetAddress and
  32. * extractHeaderAddresses and other methods, some optimization.
  33. * <p>
  34. * Where Mr. Hazlewood's version was more for ensuring certain forms that were passed in during
  35. * registrations, etc, this handles more types of verifying as well a few forms of extracting
  36. * the data in predictable, cleaned-up chunks.
  37. * <p>
  38. * Note: CFWS means the "comment folded whitespace" token from 2822, in other words,
  39. * whitespace and comment text that is enclosed in ()'s.
  40. * <p>
  41. * <b>Limitations</b>: doesn't support nested CFWS (comments within (other) comments), doesn't
  42. * support mailbox groups except when flat-extracting addresses from headers or when doing
  43. * verification, doesn't support
  44. * any of the obs-* tokens. Also: the getInternetAddress and
  45. * extractHeaderAddresses methods return InternetAddress objects; if the personal name has
  46. * any quotes or \'s in it at all, the InternetAddress object will always
  47. * escape the name entirely and put it in quotes, so
  48. * multiple-token personal names with those characters somewhere in them will always be munged
  49. * into one big escaped string. This is not really a big deal at all, but I mention it anyway.
  50. * (And you could get around it by a simple modification to those methods to not use
  51. * InternetAddress objects.) See the docs of those methods for more info.
  52. * <p>
  53. * Note: This does not do any header-length-checking. There are no such limitations on the
  54. * email address grammar in 2822, though email headers in general do have length restrictions.
  55. * So if the return path
  56. * is 40000 unfolded characters long, but otherwise valid under 2822, this class will pass it.
  57. * <p>
  58. * Examples of passing (2822-valid) addresses, believe it or not:
  59. * <p>
  60. * <tt>bob @example.com</tt>
  61. * <BR><tt>&quot;bob&quot; @ example.com</tt>
  62. * <BR><tt>bob (comment) (other comment) @example.com (personal name)</tt>
  63. * <BR><tt>&quot;&lt;bob \&quot; (here) &quot; &lt; (hi there) &quot;bob(the man)smith&quot; (hi) @ (there) example.com (hello) &gt; (again)</tt>
  64. * <p>
  65. * (none of which are permitted by javamail, incidentally)
  66. * <p>
  67. * By using getInternetAddress(), you can retrieve an InternetAddress object that, when
  68. * toString()'ed, would reveal that the parser had converted the above into:
  69. * <p>
  70. * <tt>&lt;bob@example.com&gt;</tt>
  71. * <BR><tt>&lt;bob@example.com&gt;</tt>
  72. * <BR><tt>&quot;personal name&quot; &lt;bob@example.com&gt;</tt>
  73. * <BR><tt>&quot;&lt;bob \&quot; (here)&quot; &lt;&quot;bob(the man)smith&quot;@example.com&gt;</tt>
  74. * <P>(respectively)
  75. * <P>If parsing headers, however, you'll probably be calling extractHeaderAddresses().
  76. * <p>
  77. * A future improvement may be to use this class to extract info from corrupted
  78. * addresses, but for now, it does not permit them.
  79. * <p>
  80. * <b>Some of the configuration booleans allow a bit of tweaking
  81. * already. The source code can be compiled with these booleans in various
  82. * states. They are configured to what is probably the most commonly-useful state.</b>
  83. *
  84. * @author Les Hazlewood, Casey Connor, Igor Spasic
  85. */
  86. public class EmailAddress {
  87. /**
  88. * This constant states that domain literals are allowed in the email address, e.g.:
  89. * <p>
  90. * <p><tt>someone@[192.168.1.100]</tt> or <br/>
  91. * <tt>john.doe@[23:33:A2:22:16:1F]</tt> or <br/>
  92. * <tt>me@[my computer]</tt></p>
  93. * <p>
  94. * <p>The RFC says these are valid email addresses, but most people don't like allowing them.
  95. * If you don't want to allow them, and only want to allow valid domain names
  96. * (<a href="http://www.ietf.org/rfc/rfc1035.txt">RFC 1035</a>, x.y.z.com, etc),
  97. * change this constant to <tt>false</tt>.
  98. * <p>
  99. * <p>Its default value is <tt>true</tt> to remain RFC 2822 compliant, but
  100. * you should set it depending on what you need for your application.
  101. */
  102. private static final boolean ALLOW_DOMAIN_LITERALS = true;
  103. /**
  104. * This constant states that quoted identifiers are allowed
  105. * (using quotes and angle brackets around the raw address) are allowed, e.g.:
  106. * <p>
  107. * <p><tt>"John Smith" &lt;john.smith@somewhere.com&gt;</tt>
  108. * <p>
  109. * <p>The RFC says this is a valid mailbox. If you don't want to
  110. * allow this, because for example, you only want users to enter in
  111. * a raw address (<tt>john.smith@somewhere.com</tt> - no quotes or angle
  112. * brackets), then change this constant to <tt>false</tt>.
  113. * <p>
  114. * <p>Its default value is <tt>true</tt> to remain RFC 2822 compliant, but
  115. * you should set it depending on what you need for your application.
  116. */
  117. private static final boolean ALLOW_QUOTED_IDENTIFIERS = true;
  118. // RFC 2822 2.2.2 Structured Header Field Bodies
  119. private static final String wsp = "[ \\t]"; //space or tab
  120. private static final String fwsp = wsp + '*';
  121. //RFC 2822 3.2.1 Primitive tokens
  122. private static final String dquote = "\\\"";
  123. //ASCII Control characters excluding white space:
  124. private static final String noWsCtl = "\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F\\x7F";
  125. //all ASCII characters except CR and LF:
  126. private static final String asciiText = "[\\x01-\\x09\\x0B\\x0C\\x0E-\\x7F]";
  127. // RFC 2822 3.2.2 Quoted characters:
  128. //single backslash followed by a text char
  129. private static final String quotedPair = "(\\\\" + asciiText + ')';
  130. //RFC 2822 3.2.4 Atom:
  131. private static final String atext = "[a-zA-Z0-9\\!\\#\\$\\%\\&\\'\\*\\+\\-\\/\\=\\?\\^\\_\\`\\{\\|\\}\\~]";
  132. private static final String atom = fwsp + atext + '+' + fwsp;
  133. private static final String dotAtomText = atext + '+' + '(' + "\\." + atext + "+)*";
  134. private static final String dotAtom = fwsp + '(' + dotAtomText + ')' + fwsp;
  135. //RFC 2822 3.2.5 Quoted strings:
  136. //noWsCtl and the rest of ASCII except the doublequote and backslash characters:
  137. private static final String qtext = '[' + noWsCtl + "\\x21\\x23-\\x5B\\x5D-\\x7E]";
  138. private static final String qcontent = '(' + qtext + '|' + quotedPair + ')';
  139. private static final String quotedString = dquote + '(' + fwsp + qcontent + ")*" + fwsp + dquote;
  140. //RFC 2822 3.2.6 Miscellaneous tokens
  141. private static final String word = "((" + atom + ")|(" + quotedString + "))";
  142. private static final String phrase = word + '+'; //one or more words.
  143. //RFC 1035 tokens for domain names:
  144. private static final String letter = "[a-zA-Z]";
  145. private static final String letDig = "[a-zA-Z0-9]";
  146. private static final String letDigHyp = "[a-zA-Z0-9-]";
  147. private static final String rfcLabel = letDig + '(' + letDigHyp + "{0,61}" + letDig + ")?";
  148. private static final String rfc1035DomainName = rfcLabel + "(\\." + rfcLabel + ")*\\." + letter + "{2,6}";
  149. //RFC 2822 3.4 Address specification
  150. //domain text - non white space controls and the rest of ASCII chars not including [, ], or \:
  151. private static final String dtext = '[' + noWsCtl + "\\x21-\\x5A\\x5E-\\x7E]";
  152. private static final String dcontent = dtext + '|' + quotedPair;
  153. private static final String domainLiteral = "\\[" + '(' + fwsp + dcontent + "+)*" + fwsp + "\\]";
  154. private static final String rfc2822Domain = '(' + dotAtom + '|' + domainLiteral + ')';
  155. private static final String domain = ALLOW_DOMAIN_LITERALS ? rfc2822Domain : rfc1035DomainName;
  156. private static final String localPart = "((" + dotAtom + ")|(" + quotedString + "))";
  157. private static final String addrSpec = localPart + '@' + domain;
  158. private static final String angleAddr = '<' + addrSpec + '>';
  159. private static final String nameAddr = '(' + phrase + ")?" + fwsp + angleAddr;
  160. private static final String mailbox = nameAddr + '|' + addrSpec;
  161. //now compile a pattern for efficient re-use:
  162. //if we're allowing quoted identifiers or not:
  163. private static final String patternString = ALLOW_QUOTED_IDENTIFIERS ? mailbox : addrSpec;
  164. public static final Pattern VALID_PATTERN = Pattern.compile(patternString);
  165. //class attributes
  166. private String text;
  167. private boolean bouncing = true;
  168. private boolean verified;
  169. private String label;
  170. public EmailAddress() {
  171. super();
  172. }
  173. public EmailAddress(String text) {
  174. super();
  175. setText(text);
  176. }
  177. /**
  178. * Returns the actual email address string, e.g. <tt>someone@somewhere.com</tt>
  179. *
  180. * @return the actual email address string.
  181. */
  182. public String getText() {
  183. return text;
  184. }
  185. public void setText(String text) {
  186. this.text = text;
  187. }
  188. /**
  189. * Returns whether or not any emails sent to this email address come back as bounced
  190. * (undeliverable).
  191. * <p>
  192. * <p>Default is <tt>false</tt> for convenience's sake - if a bounced message is ever received for this
  193. * address, this value should be set to <tt>true</tt> until verification can made.
  194. *
  195. * @return whether or not any emails sent to this email address come back as bounced
  196. * (undeliverable).
  197. */
  198. public boolean isBouncing() {
  199. return bouncing;
  200. }
  201. public void setBouncing(boolean bouncing) {
  202. this.bouncing = bouncing;
  203. }
  204. /**
  205. * Returns whether or not the party associated with this email has verified that it is
  206. * their email address.
  207. * <p>
  208. * <p>Verification is usually done by sending an email to this
  209. * address and waiting for the party to respond or click a specific link in the email.
  210. * <p>
  211. * <p>Default is <tt>false</tt>.
  212. *
  213. * @return whether or not the party associated with this email has verified that it is
  214. * their email address.
  215. */
  216. public boolean isVerified() {
  217. return verified;
  218. }
  219. public void setVerified(boolean verified) {
  220. this.verified = verified;
  221. }
  222. /**
  223. * Party label associated with this address, for example, 'Home', 'Work', etc.
  224. *
  225. * @return a label associated with this address, for example 'Home', 'Work', etc.
  226. */
  227. public String getLabel() {
  228. return label;
  229. }
  230. public void setLabel(String label) {
  231. this.label = label;
  232. }
  233. /**
  234. * Returns whether or not the text represented by this object instance is valid
  235. * according to the <tt>RFC 2822</tt> rules.
  236. *
  237. * @return true if the text represented by this instance is valid according
  238. * to RFC 2822, false otherwise.
  239. */
  240. public boolean isValid() {
  241. return isValidText(getText());
  242. }
  243. /**
  244. * Utility method that checks to see if the specified string is a valid
  245. * email address according to the RFC 2822 specification.
  246. *
  247. * @param email the email address string to test for validity.
  248. * @return true if the given text valid according to RFC 2822, false otherwise.
  249. */
  250. public static boolean isValidText(String email) {
  251. return (email != null) && VALID_PATTERN.matcher(email).matches();
  252. }
  253. @Override
  254. public boolean equals(Object o) {
  255. if (o instanceof EmailAddress) {
  256. EmailAddress ea = (EmailAddress) o;
  257. return getText().equals(ea.getText());
  258. }
  259. return false;
  260. }
  261. @Override
  262. public int hashCode() {
  263. return getText().hashCode();
  264. }
  265. @Override
  266. public String toString() {
  267. return getText();
  268. }
  269. }