PageRenderTime 185ms CodeModel.GetById 26ms app.highlight 135ms RepoModel.GetById 1ms app.codeStats 1ms

/testability-explorer/src/main/antlr/java15.g

Relevant Search: With Applications for Solr and Elasticsearch

For more in depth reading about search, ranking and generally everything you could ever want to know about how lucene, elasticsearch or solr work under the hood I highly suggest this book. Easily one of the most interesting technical books I have read in a long time. If you are tasked with solving search relevance problems even if not in Solr or Elasticsearch it should be your first reference. Amazon Affiliate Link
http://testability-explorer.googlecode.com/
Unknown | 1832 lines | 1611 code | 221 blank | 0 comment | 0 complexity | 1ddc0340d006f643eddf96fdac920781 MD5 | raw file

Large files files are truncated, but you can click here to view the full file

   1/* Java 1.5 Recognizer
   2 *
   3 * Run 'java Main [-showtree] directory-full-of-java-files'
   4 *
   5 * [The -showtree option pops up a Swing frame that shows
   6 *  the AST constructed from the parser.]
   7 *
   8 * Run 'java Main <directory full of java files>'
   9 *
  10 * Contributing authors:
  11 *		John Mitchell		johnm@non.net
  12 *		Terence Parr		parrt@magelang.com
  13 *		John Lilley		jlilley@empathy.com
  14 *		Scott Stanchfield	thetick@magelang.com
  15 *		Markus Mohnen		mohnen@informatik.rwth-aachen.de
  16 *		Peter Williams		pete.williams@sun.com
  17 *		Allan Jacobs		Allan.Jacobs@eng.sun.com
  18 *		Steve Messick		messick@redhills.com
  19 *		John Pybus		john@pybus.org
  20 *
  21 * Version 1.00 December 9, 1997 -- initial release
  22 * Version 1.01 December 10, 1997
  23 *		fixed bug in octal def (0..7 not 0..8)
  24 * Version 1.10 August 1998 (parrt)
  25 *		added tree construction
  26 *		fixed definition of WS,comments for mac,pc,unix newlines
  27 *		added unary plus
  28 * Version 1.11 (Nov 20, 1998)
  29 *		Added "shutup" option to turn off last ambig warning.
  30 *		Fixed inner class def to allow named class defs as statements
  31 *		synchronized requires compound not simple statement
  32 *		add [] after builtInType DOT class in primaryExpression
  33 *		"const" is reserved but not valid..removed from modifiers
  34 * Version 1.12 (Feb 2, 1999)
  35 *		Changed LITERAL_xxx to xxx in tree grammar.
  36 *		Updated java.g to use tokens {...} now for 2.6.0 (new feature).
  37 *
  38 * Version 1.13 (Apr 23, 1999)
  39 *		Didn't have (stat)? for else clause in tree parser.
  40 *		Didn't gen ASTs for interface extends.  Updated tree parser too.
  41 *		Updated to 2.6.0.
  42 * Version 1.14 (Jun 20, 1999)
  43 *		Allowed final/abstract on local classes.
  44 *		Removed local interfaces from methods
  45 *		Put instanceof precedence where it belongs...in relationalExpr
  46 *			It also had expr not type as arg; fixed it.
  47 *		Missing ! on SEMI in classBlock
  48 *		fixed: (expr) + "string" was parsed incorrectly (+ as unary plus).
  49 *		fixed: didn't like Object[].class in parser or tree parser
  50 * Version 1.15 (Jun 26, 1999)
  51 *		Screwed up rule with instanceof in it. :(  Fixed.
  52 *		Tree parser didn't like (expr).something; fixed.
  53 *		Allowed multiple inheritance in tree grammar. oops.
  54 * Version 1.16 (August 22, 1999)
  55 *		Extending an interface built a wacky tree: had extra EXTENDS.
  56 *		Tree grammar didn't allow multiple superinterfaces.
  57 *		Tree grammar didn't allow empty var initializer: {}
  58 * Version 1.17 (October 12, 1999)
  59 *		ESC lexer rule allowed 399 max not 377 max.
  60 *		java.tree.g didn't handle the expression of synchronized
  61 *		statements.
  62 * Version 1.18 (August 12, 2001)
  63 *	  	Terence updated to Java 2 Version 1.3 by
  64 *		observing/combining work of Allan Jacobs and Steve
  65 *		Messick.  Handles 1.3 src.  Summary:
  66 *		*  primary didn't include boolean.class kind of thing
  67 *	  	*  constructor calls parsed explicitly now:
  68 * 		   see explicitConstructorInvocation
  69 *		*  add strictfp modifier
  70 *	  	*  missing objBlock after new expression in tree grammar
  71 *		*  merged local class definition alternatives, moved after declaration
  72 *		*  fixed problem with ClassName.super.field
  73 *	  	*  reordered some alternatives to make things more efficient
  74 *		*  long and double constants were not differentiated from int/float
  75 *		*  whitespace rule was inefficient: matched only one char
  76 *		*  add an examples directory with some nasty 1.3 cases
  77 *		*  made Main.java use buffered IO and a Reader for Unicode support
  78 *		*  supports UNICODE?
  79 *		   Using Unicode charVocabulay makes code file big, but only
  80 *		   in the bitsets at the end. I need to make ANTLR generate
  81 *		   unicode bitsets more efficiently.
  82 * Version 1.19 (April 25, 2002)
  83 *		Terence added in nice fixes by John Pybus concerning floating
  84 *		constants and problems with super() calls.  John did a nice
  85 *		reorg of the primary/postfix expression stuff to read better
  86 *		and makes f.g.super() parse properly (it was METHOD_CALL not
  87 *		a SUPER_CTOR_CALL).  Also:
  88 *
  89 *		*  "finally" clause was a root...made it a child of "try"
  90 *		*  Added stuff for asserts too for Java 1.4, but *commented out*
  91 *		   as it is not backward compatible.
  92 *
  93 * Version 1.20 (October 27, 2002)
  94 *
  95 *	  Terence ended up reorging John Pybus' stuff to
  96 *	  remove some nondeterminisms and some syntactic predicates.
  97 *	  Note that the grammar is stricter now; e.g., this(...) must
  98 *	be the first statement.
  99 *
 100 *	  Trinary ?: operator wasn't working as array name:
 101 *		  (isBig ? bigDigits : digits)[i];
 102 *
 103 *	  Checked parser/tree parser on source for
 104 *		  Resin-2.0.5, jive-2.1.1, jdk 1.3.1, Lucene, antlr 2.7.2a4,
 105 *		and the 110k-line jGuru server source.
 106 *
 107 * Version 1.21 (October 17, 2003)
 108 *  Fixed lots of problems including:
 109 *  Ray Waldin: add typeDefinition to interfaceBlock in java.tree.g
 110 *  He found a problem/fix with floating point that start with 0
 111 *  Ray also fixed problem that (int.class) was not recognized.
 112 *  Thorsten van Ellen noticed that \n are allowed incorrectly in strings.
 113 *  TJP fixed CHAR_LITERAL analogously.
 114 *
 115 * Version 1.21.2 (March, 2003)
 116 *	  Changes by Matt Quail to support generics (as per JDK1.5/JSR14)
 117 *	  Notes:
 118 *	  * We only allow the "extends" keyword and not the "implements"
 119 *		keyword, since thats what JSR14 seems to imply.
 120 *	  * Thanks to Monty Zukowski for his help on the antlr-interest
 121 *		mail list.
 122 *	  * Thanks to Alan Eliasen for testing the grammar over his
 123 *		Fink source base
 124 *
 125 * Version 1.22 (July, 2004)
 126 *	  Changes by Michael Studman to support Java 1.5 language extensions
 127 *	  Notes:
 128 *	  * Added support for annotations types
 129 *	  * Finished off Matt Quail's generics enhancements to support bound type arguments
 130 *	  * Added support for new for statement syntax
 131 *	  * Added support for static import syntax
 132 *	  * Added support for enum types
 133 *	  * Tested against JDK 1.5 source base and source base of jdigraph project
 134 *	  * Thanks to Matt Quail for doing the hard part by doing most of the generics work
 135 *
 136 * Version 1.22.1 (July 28, 2004)
 137 *	  Bug/omission fixes for Java 1.5 language support
 138 *	  * Fixed tree structure bug with classOrInterface - thanks to Pieter Vangorpto for
 139 *		spotting this
 140 *	  * Fixed bug where incorrect handling of SR and BSR tokens would cause type
 141 *		parameters to be recognised as type arguments.
 142 *	  * Enabled type parameters on constructors, annotations on enum constants
 143 *		and package definitions
 144 *	  * Fixed problems when parsing if ((char.class.equals(c))) {} - solution by Matt Quail at Cenqua
 145 *
 146 * Version 1.22.2 (July 28, 2004)
 147 *	  Slight refactoring of Java 1.5 language support
 148 *	  * Refactored for/"foreach" productions so that original literal "for" literal
 149 *	    is still used but the for sub-clauses vary by token type
 150 *	  * Fixed bug where type parameter was not included in generic constructor's branch of AST
 151 *
 152 * Version 1.22.3 (August 26, 2004)
 153 *	  Bug fixes as identified by Michael Stahl; clean up of tabs/spaces
 154 *        and other refactorings
 155 *	  * Fixed typeParameters omission in identPrimary and newStatement
 156 *	  * Replaced GT reconcilliation code with simple semantic predicate
 157 *	  * Adapted enum/assert keyword checking support from Michael Stahl's java15 grammar
 158 *	  * Refactored typeDefinition production and field productions to reduce duplication
 159 *
 160 * Version 1.22.4 (October 21, 2004)
 161 *    Small bux fixes
 162 *    * Added typeArguments to explicitConstructorInvocation, e.g. new <String>MyParameterised()
 163 *    * Added typeArguments to postfixExpression productions for anonymous inner class super
 164 *      constructor invocation, e.g. new Outer().<String>super()
 165 *    * Fixed bug in array declarations identified by Geoff Roy
 166 *
 167 * Version 1.22.5 (January 03, 2005)
 168 *    Small change to tree structure
 169 *    * Flattened classOrInterfaceType tree so IDENT no longer has children. TYPE_ARGUMENTS are now
 170 *      always siblings of IDENT rather than children. Fully.qualified.names trees now
 171 *      look a little less clean when TYPE_ARGUMENTS are present though.
 172 *
 173 * This grammar is in the PUBLIC DOMAIN
 174 */
 175
 176header {
 177package com.google.test.metric.javasrc;
 178}
 179
 180
 181class JavaRecognizer extends Parser;
 182options {
 183	k = 2;							// two token lookahead
 184	exportVocab=Java;				// Call its vocabulary "Java"
 185	codeGenMakeSwitchThreshold = 2;	// Some optimizations
 186	codeGenBitsetTestThreshold = 3;
 187	defaultErrorHandler = false;	// Don't generate parser error handlers
 188	buildAST = true;
 189}
 190
 191tokens {
 192	BLOCK; MODIFIERS; OBJBLOCK; SLIST; CTOR_DEF; METHOD_DEF; VARIABLE_DEF;
 193	INSTANCE_INIT; STATIC_INIT; TYPE; CLASS_DEF; INTERFACE_DEF;
 194	PACKAGE_DEF; ARRAY_DECLARATOR; EXTENDS_CLAUSE; IMPLEMENTS_CLAUSE;
 195	PARAMETERS; PARAMETER_DEF; LABELED_STAT; TYPECAST; INDEX_OP;
 196	POST_INC; POST_DEC; METHOD_CALL; EXPR; ARRAY_INIT;
 197	IMPORT; UNARY_MINUS; UNARY_PLUS; CASE_GROUP; ELIST; FOR_INIT; FOR_CONDITION;
 198	FOR_ITERATOR; EMPTY_STAT; FINAL="final"; ABSTRACT="abstract";
 199	STRICTFP="strictfp"; SUPER_CTOR_CALL; CTOR_CALL; VARIABLE_PARAMETER_DEF;
 200	STATIC_IMPORT; ENUM_DEF; ENUM_CONSTANT_DEF; FOR_EACH_CLAUSE; ANNOTATION_DEF; ANNOTATIONS;
 201	ANNOTATION; ANNOTATION_MEMBER_VALUE_PAIR; ANNOTATION_FIELD_DEF; ANNOTATION_ARRAY_INIT;
 202	TYPE_ARGUMENTS; TYPE_ARGUMENT; TYPE_PARAMETERS; TYPE_PARAMETER; WILDCARD_TYPE;
 203	TYPE_UPPER_BOUNDS; TYPE_LOWER_BOUNDS;
 204}
 205
 206{
 207	/**
 208	 * Counts the number of LT seen in the typeArguments production.
 209	 * It is used in semantic predicates to ensure we have seen
 210	 * enough closing '>' characters; which actually may have been
 211	 * either GT, SR or BSR tokens.
 212	 */
 213	private int ltCounter = 0;
 214}
 215
 216// Compilation Unit: In Java, this is a single file. This is the start
 217// rule for this parser
 218compilationUnit
 219	:	// A compilation unit starts with an optional package definition
 220		(	(annotations "package")=> packageDefinition
 221		|	/* nothing */
 222		)
 223
 224		// Next we have a series of zero or more import statements
 225		( importDefinition )*
 226
 227		// Wrapping things up with any number of class or interface
 228		// definitions
 229		( typeDefinition )*
 230
 231		EOF!
 232	;
 233
 234
 235// Package statement: optional annotations followed by "package" then the package identifier.
 236packageDefinition
 237	options {defaultErrorHandler = true;} // let ANTLR handle errors
 238	:	annotations p:"package"^ {#p.setType(PACKAGE_DEF);} identifier SEMI!
 239	;
 240
 241
 242// Import statement: import followed by a package or class name
 243importDefinition
 244	options {defaultErrorHandler = true;}
 245	{ boolean isStatic = false; }
 246	:	i:"import"^ {#i.setType(IMPORT);} ( "static"! {#i.setType(STATIC_IMPORT);} )? identifierStar SEMI!
 247	;
 248
 249// A type definition is either a class, interface, enum or annotation with possible additional semis.
 250typeDefinition
 251	options {defaultErrorHandler = true;}
 252	:	m:modifiers!
 253		typeDefinitionInternal[#m]
 254	|	SEMI!
 255	;
 256
 257// Protected type definitions production for reuse in other productions
 258protected typeDefinitionInternal[AST mods]
 259	:	classDefinition[#mods]		// inner class
 260	|	interfaceDefinition[#mods]	// inner interface
 261	|	enumDefinition[#mods]		// inner enum
 262	|	annotationDefinition[#mods]	// inner annotation
 263	;
 264
 265// A declaration is the creation of a reference or primitive-type variable
 266// Create a separate Type/Var tree for each var in the var list.
 267declaration!
 268	:	m:modifiers t:typeSpec[false] v:variableDefinitions[#m,#t]
 269		{#declaration = #v;}
 270	;
 271
 272// A type specification is a type name with possible brackets afterwards
 273// (which would make it an array type).
 274typeSpec[boolean addImagNode]
 275	:	classTypeSpec[addImagNode]
 276	|	builtInTypeSpec[addImagNode]
 277	;
 278
 279// A class type specification is a class type with either:
 280// - possible brackets afterwards
 281//   (which would make it an array type).
 282// - generic type arguments after
 283classTypeSpec[boolean addImagNode]
 284	:	classOrInterfaceType[false]
 285		(options{greedy=true;}: // match as many as possible
 286			lb:LBRACK^ {#lb.setType(ARRAY_DECLARATOR);} RBRACK!
 287		)*
 288		{
 289			if ( addImagNode ) {
 290				#classTypeSpec = #(#[TYPE,"TYPE"], #classTypeSpec);
 291			}
 292		}
 293	;
 294
 295// A non-built in type name, with possible type parameters
 296classOrInterfaceType[boolean addImagNode]
 297	:	IDENT (typeArguments)?
 298		(options{greedy=true;}: // match as many as possible
 299			DOT^
 300			IDENT (typeArguments)?
 301		)*
 302		{
 303			if ( addImagNode ) {
 304				#classOrInterfaceType = #(#[TYPE,"TYPE"], #classOrInterfaceType);
 305			}
 306		}
 307	;
 308
 309// A specialised form of typeSpec where built in types must be arrays
 310typeArgumentSpec
 311	:	classTypeSpec[true]
 312	|	builtInTypeArraySpec[true]
 313	;
 314
 315// A generic type argument is a class type, a possibly bounded wildcard type or a built-in type array
 316typeArgument
 317	:	(	typeArgumentSpec
 318		|	wildcardType
 319		)
 320		{#typeArgument = #(#[TYPE_ARGUMENT,"TYPE_ARGUMENT"], #typeArgument);}
 321	;
 322
 323// Wildcard type indicating all types (with possible constraint)
 324wildcardType
 325	:	q:QUESTION^ {#q.setType(WILDCARD_TYPE);}
 326		(("extends" | "super")=> typeArgumentBounds)?
 327	;
 328
 329// Type arguments to a class or interface type
 330typeArguments
 331{int currentLtLevel = 0;}
 332	:
 333		{currentLtLevel = ltCounter;}
 334		LT! {ltCounter++;}
 335		typeArgument
 336		(options{greedy=true;}: // match as many as possible
 337			{inputState.guessing !=0 || ltCounter == currentLtLevel + 1}?
 338			COMMA! typeArgument
 339		)*
 340
 341		(	// turn warning off since Antlr generates the right code,
 342			// plus we have our semantic predicate below
 343			options{generateAmbigWarnings=false;}:
 344			typeArgumentsOrParametersEnd
 345		)?
 346
 347		// make sure we have gobbled up enough '>' characters
 348		// if we are at the "top level" of nested typeArgument productions
 349		{(currentLtLevel != 0) || ltCounter == currentLtLevel}?
 350
 351		{#typeArguments = #(#[TYPE_ARGUMENTS, "TYPE_ARGUMENTS"], #typeArguments);}
 352	;
 353
 354// this gobbles up *some* amount of '>' characters, and counts how many
 355// it gobbled.
 356protected typeArgumentsOrParametersEnd
 357	:	GT! {ltCounter-=1;}
 358	|	SR! {ltCounter-=2;}
 359	|	BSR! {ltCounter-=3;}
 360	;
 361
 362// Restriction on wildcard types based on super class or derrived class
 363typeArgumentBounds
 364	{boolean isUpperBounds = false;}
 365	:
 366		( "extends"! {isUpperBounds=true;} | "super"! ) classOrInterfaceType[false]
 367		{
 368			if (isUpperBounds)
 369			{
 370				#typeArgumentBounds = #(#[TYPE_UPPER_BOUNDS,"TYPE_UPPER_BOUNDS"], #typeArgumentBounds);
 371			}
 372			else
 373			{
 374				#typeArgumentBounds = #(#[TYPE_LOWER_BOUNDS,"TYPE_LOWER_BOUNDS"], #typeArgumentBounds);
 375			}
 376		}
 377	;
 378
 379// A builtin type array specification is a builtin type with brackets afterwards
 380builtInTypeArraySpec[boolean addImagNode]
 381	:	builtInType
 382		(options{greedy=true;}: // match as many as possible
 383			lb:LBRACK^ {#lb.setType(ARRAY_DECLARATOR);} RBRACK!
 384		)+
 385
 386		{
 387			if ( addImagNode ) {
 388				#builtInTypeArraySpec = #(#[TYPE,"TYPE"], #builtInTypeArraySpec);
 389			}
 390		}
 391	;
 392
 393// A builtin type specification is a builtin type with possible brackets
 394// afterwards (which would make it an array type).
 395builtInTypeSpec[boolean addImagNode]
 396	:	builtInType
 397		(options{greedy=true;}: // match as many as possible
 398			lb:LBRACK^ {#lb.setType(ARRAY_DECLARATOR);} RBRACK!
 399		)*
 400		{
 401			if ( addImagNode ) {
 402				#builtInTypeSpec = #(#[TYPE,"TYPE"], #builtInTypeSpec);
 403			}
 404		}
 405	;
 406
 407// A type name. which is either a (possibly qualified and parameterized)
 408// class name or a primitive (builtin) type
 409type
 410	:	classOrInterfaceType[false]
 411	|	builtInType
 412	;
 413
 414// The primitive types.
 415builtInType
 416	:	"void"
 417	|	"boolean"
 418	|	"byte"
 419	|	"char"
 420	|	"short"
 421	|	"int"
 422	|	"float"
 423	|	"long"
 424	|	"double"
 425	;
 426
 427// A (possibly-qualified) java identifier. We start with the first IDENT
 428// and expand its name by adding dots and following IDENTS
 429identifier
 430	:	IDENT ( DOT^ IDENT )*
 431	;
 432
 433identifierStar
 434	:	IDENT
 435		( DOT^ IDENT )*
 436		( DOT^ STAR )?
 437	;
 438
 439// A list of zero or more modifiers. We could have used (modifier)* in
 440// place of a call to modifiers, but I thought it was a good idea to keep
 441// this rule separate so they can easily be collected in a Vector if
 442// someone so desires
 443modifiers
 444	:
 445		(
 446			//hush warnings since the semantic check for "@interface" solves the non-determinism
 447			options{generateAmbigWarnings=false;}:
 448
 449			modifier
 450			|
 451			//Semantic check that we aren't matching @interface as this is not an annotation
 452			//A nicer way to do this would be nice
 453			{LA(1)==AT && !LT(2).getText().equals("interface")}? annotation
 454		)*
 455
 456		{#modifiers = #([MODIFIERS, "MODIFIERS"], #modifiers);}
 457	;
 458
 459// modifiers for Java classes, interfaces, class/instance vars and methods
 460modifier
 461	:	"private"
 462	|	"public"
 463	|	"protected"
 464	|	"static"
 465	|	"transient"
 466	|	"final"
 467	|	"abstract"
 468	|	"native"
 469	|	"threadsafe"
 470	|	"synchronized"
 471	|	"volatile"
 472	|	"strictfp"
 473	;
 474
 475annotation!
 476	:	AT! i:identifier ( LPAREN! ( args:annotationArguments )? RPAREN! )?
 477		{#annotation = #(#[ANNOTATION,"ANNOTATION"], i, args);}
 478	;
 479
 480annotations
 481    :   (annotation)*
 482		{#annotations = #([ANNOTATIONS, "ANNOTATIONS"], #annotations);}
 483    ;
 484
 485annotationArguments
 486	:	annotationMemberValueInitializer | anntotationMemberValuePairs
 487	;
 488
 489anntotationMemberValuePairs
 490	:	annotationMemberValuePair ( COMMA! annotationMemberValuePair )*
 491	;
 492
 493annotationMemberValuePair!
 494	:	i:IDENT ASSIGN! v:annotationMemberValueInitializer
 495		{#annotationMemberValuePair = #(#[ANNOTATION_MEMBER_VALUE_PAIR,"ANNOTATION_MEMBER_VALUE_PAIR"], i, v);}
 496	;
 497
 498annotationMemberValueInitializer
 499	:
 500		conditionalExpression | annotation | annotationMemberArrayInitializer
 501	;
 502
 503// This is an initializer used to set up an annotation member array.
 504annotationMemberArrayInitializer
 505	:	lc:LCURLY^ {#lc.setType(ANNOTATION_ARRAY_INIT);}
 506			(	annotationMemberArrayValueInitializer
 507				(
 508					// CONFLICT: does a COMMA after an initializer start a new
 509					// initializer or start the option ',' at end?
 510					// ANTLR generates proper code by matching
 511					// the comma as soon as possible.
 512					options {
 513						warnWhenFollowAmbig = false;
 514					}
 515				:
 516					COMMA! annotationMemberArrayValueInitializer
 517				)*
 518				(COMMA!)?
 519			)?
 520		RCURLY!
 521	;
 522
 523// The two things that can initialize an annotation array element are a conditional expression
 524// and an annotation (nested annotation array initialisers are not valid)
 525annotationMemberArrayValueInitializer
 526	:	conditionalExpression
 527	|	annotation
 528	;
 529
 530superClassClause!
 531	:	( "extends" c:classOrInterfaceType[false] )?
 532		{#superClassClause = #(#[EXTENDS_CLAUSE,"EXTENDS_CLAUSE"],c);}
 533	;
 534
 535// Definition of a Java class
 536classDefinition![AST modifiers]
 537	:	"class" IDENT
 538		// it _might_ have type paramaters
 539		(tp:typeParameters)?
 540		// it _might_ have a superclass...
 541		sc:superClassClause
 542		// it might implement some interfaces...
 543		ic:implementsClause
 544		// now parse the body of the class
 545		cb:classBlock
 546		{#classDefinition = #(#[CLASS_DEF,"CLASS_DEF"],
 547								modifiers,IDENT,tp,sc,ic,cb);}
 548	;
 549
 550// Definition of a Java Interface
 551interfaceDefinition![AST modifiers]
 552	:	"interface" IDENT
 553		// it _might_ have type paramaters
 554		(tp:typeParameters)?
 555		// it might extend some other interfaces
 556		ie:interfaceExtends
 557		// now parse the body of the interface (looks like a class...)
 558		ib:interfaceBlock
 559		{#interfaceDefinition = #(#[INTERFACE_DEF,"INTERFACE_DEF"],
 560									modifiers,IDENT,tp,ie,ib);}
 561	;
 562
 563enumDefinition![AST modifiers]
 564	:	"enum" IDENT
 565		// it might implement some interfaces...
 566		ic:implementsClause
 567		// now parse the body of the enum
 568		eb:enumBlock
 569		{#enumDefinition = #(#[ENUM_DEF,"ENUM_DEF"],
 570								modifiers,IDENT,ic,eb);}
 571	;
 572
 573annotationDefinition![AST modifiers]
 574	:	AT "interface" IDENT
 575		// now parse the body of the annotation
 576		ab:annotationBlock
 577		{#annotationDefinition = #(#[ANNOTATION_DEF,"ANNOTATION_DEF"],
 578									modifiers,IDENT,ab);}
 579	;
 580
 581typeParameters
 582{int currentLtLevel = 0;}
 583	:
 584		{currentLtLevel = ltCounter;}
 585		LT! {ltCounter++;}
 586		typeParameter (COMMA! typeParameter)*
 587		(typeArgumentsOrParametersEnd)?
 588
 589		// make sure we have gobbled up enough '>' characters
 590		// if we are at the "top level" of nested typeArgument productions
 591		{(currentLtLevel != 0) || ltCounter == currentLtLevel}?
 592
 593		{#typeParameters = #(#[TYPE_PARAMETERS, "TYPE_PARAMETERS"], #typeParameters);}
 594	;
 595
 596typeParameter
 597	:
 598		// I'm pretty sure Antlr generates the right thing here:
 599		(id:IDENT) ( options{generateAmbigWarnings=false;}: typeParameterBounds )?
 600		{#typeParameter = #(#[TYPE_PARAMETER,"TYPE_PARAMETER"], #typeParameter);}
 601	;
 602
 603typeParameterBounds
 604	:
 605		"extends"! classOrInterfaceType[false]
 606		(BAND! classOrInterfaceType[false])*
 607		{#typeParameterBounds = #(#[TYPE_UPPER_BOUNDS,"TYPE_UPPER_BOUNDS"], #typeParameterBounds);}
 608	;
 609
 610// This is the body of a class. You can have classFields and extra semicolons.
 611classBlock
 612	:	LCURLY!
 613			( classField | SEMI! )*
 614		RCURLY!
 615		{#classBlock = #([OBJBLOCK, "OBJBLOCK"], #classBlock);}
 616	;
 617
 618// This is the body of an interface. You can have interfaceField and extra semicolons.
 619interfaceBlock
 620	:	LCURLY!
 621			( interfaceField | SEMI! )*
 622		RCURLY!
 623		{#interfaceBlock = #([OBJBLOCK, "OBJBLOCK"], #interfaceBlock);}
 624	;
 625
 626// This is the body of an annotation. You can have annotation fields and extra semicolons,
 627// That's about it (until you see what an annoation field is...)
 628annotationBlock
 629	:	LCURLY!
 630		( annotationField | SEMI! )*
 631		RCURLY!
 632		{#annotationBlock = #([OBJBLOCK, "OBJBLOCK"], #annotationBlock);}
 633	;
 634
 635// This is the body of an enum. You can have zero or more enum constants
 636// followed by any number of fields like a regular class
 637enumBlock
 638	:	LCURLY!
 639			( enumConstant ( options{greedy=true;}: COMMA! enumConstant )* ( COMMA! )? )?
 640			( SEMI! ( classField | SEMI! )* )?
 641		RCURLY!
 642		{#enumBlock = #([OBJBLOCK, "OBJBLOCK"], #enumBlock);}
 643	;
 644
 645// An annotation field
 646annotationField!
 647	:	mods:modifiers
 648		(	td:typeDefinitionInternal[#mods]
 649			{#annotationField = #td;}
 650		|	t:typeSpec[false]		// annotation field
 651			(	i:IDENT				// the name of the field
 652
 653				LPAREN! RPAREN!
 654
 655				rt:declaratorBrackets[#t]
 656
 657				( "default" amvi:annotationMemberValueInitializer )?
 658
 659				SEMI
 660
 661				{#annotationField =
 662					#(#[ANNOTATION_FIELD_DEF,"ANNOTATION_FIELD_DEF"],
 663						 mods,
 664						 #(#[TYPE,"TYPE"],rt),
 665						 i,amvi
 666						 );}
 667			|	v:variableDefinitions[#mods,#t] SEMI	// variable
 668				{#annotationField = #v;}
 669			)
 670		)
 671	;
 672
 673//An enum constant may have optional parameters and may have a
 674//a class body
 675enumConstant!
 676	:	an:annotations
 677		i:IDENT
 678		(	LPAREN!
 679			a:argList
 680			RPAREN!
 681		)?
 682		( b:enumConstantBlock )?
 683		{#enumConstant = #([ENUM_CONSTANT_DEF, "ENUM_CONSTANT_DEF"], an, i, a, b);}
 684	;
 685
 686//The class-like body of an enum constant
 687enumConstantBlock
 688	:	LCURLY!
 689		( enumConstantField | SEMI! )*
 690		RCURLY!
 691		{#enumConstantBlock = #([OBJBLOCK, "OBJBLOCK"], #enumConstantBlock);}
 692	;
 693
 694//An enum constant field is just like a class field but without
 695//the posibility of a constructor definition or a static initializer
 696enumConstantField!
 697	:	mods:modifiers
 698		(	td:typeDefinitionInternal[#mods]
 699			{#enumConstantField = #td;}
 700
 701		|	// A generic method has the typeParameters before the return type.
 702			// This is not allowed for variable definitions, but this production
 703			// allows it, a semantic check could be used if you wanted.
 704			(tp:typeParameters)? t:typeSpec[false]		// method or variable declaration(s)
 705			(	IDENT									// the name of the method
 706
 707				// parse the formal parameter declarations.
 708				LPAREN! param:parameterDeclarationList RPAREN!
 709
 710				rt:declaratorBrackets[#t]
 711
 712				// get the list of exceptions that this method is
 713				// declared to throw
 714				(tc:throwsClause)?
 715
 716				( s2:compoundStatement | SEMI )
 717				{#enumConstantField = #(#[METHOD_DEF,"METHOD_DEF"],
 718							 mods,
 719							 tp,
 720							 #(#[TYPE,"TYPE"],rt),
 721							 IDENT,
 722							 param,
 723							 tc,
 724							 s2);}
 725			|	v:variableDefinitions[#mods,#t] SEMI
 726				{#enumConstantField = #v;}
 727			)
 728		)
 729
 730	// "{ ... }" instance initializer
 731	|	s4:compoundStatement
 732		{#enumConstantField = #(#[INSTANCE_INIT,"INSTANCE_INIT"], s4);}
 733	;
 734
 735// An interface can extend several other interfaces...
 736interfaceExtends
 737	:	(
 738		e:"extends"!
 739		classOrInterfaceType[false] ( COMMA! classOrInterfaceType[false] )*
 740		)?
 741		{#interfaceExtends = #(#[EXTENDS_CLAUSE,"EXTENDS_CLAUSE"],
 742								#interfaceExtends);}
 743	;
 744
 745// A class can implement several interfaces...
 746implementsClause
 747	:	(
 748			i:"implements"! classOrInterfaceType[false] ( COMMA! classOrInterfaceType[false] )*
 749		)?
 750		{#implementsClause = #(#[IMPLEMENTS_CLAUSE,"IMPLEMENTS_CLAUSE"],
 751								 #implementsClause);}
 752	;
 753
 754// Now the various things that can be defined inside a class
 755classField!
 756	:	// method, constructor, or variable declaration
 757		mods:modifiers
 758		(	td:typeDefinitionInternal[#mods]
 759			{#classField = #td;}
 760
 761		|	(tp:typeParameters)?
 762			(
 763				h:ctorHead s:constructorBody // constructor
 764				{#classField = #(#[CTOR_DEF,"CTOR_DEF"], mods, tp, h, s);}
 765
 766				|	// A generic method/ctor has the typeParameters before the return type.
 767					// This is not allowed for variable definitions, but this production
 768					// allows it, a semantic check could be used if you wanted.
 769					t:typeSpec[false]		// method or variable declaration(s)
 770					(	IDENT				// the name of the method
 771
 772						// parse the formal parameter declarations.
 773						LPAREN! param:parameterDeclarationList RPAREN!
 774
 775						rt:declaratorBrackets[#t]
 776
 777						// get the list of exceptions that this method is
 778						// declared to throw
 779						(tc:throwsClause)?
 780
 781						( s2:compoundStatement | SEMI )
 782						{#classField = #(#[METHOD_DEF,"METHOD_DEF"],
 783									 mods,
 784									 tp,
 785									 #(#[TYPE,"TYPE"],rt),
 786									 IDENT,
 787									 param,
 788									 tc,
 789									 s2);}
 790					|	v:variableDefinitions[#mods,#t] SEMI
 791						{#classField = #v;}
 792					)
 793			)
 794		)
 795
 796	// "static { ... }" class initializer
 797	|	"static" s3:compoundStatement
 798		{#classField = #(#[STATIC_INIT,"STATIC_INIT"], s3);}
 799
 800	// "{ ... }" instance initializer
 801	|	s4:compoundStatement
 802		{#classField = #(#[INSTANCE_INIT,"INSTANCE_INIT"], s4);}
 803	;
 804
 805// Now the various things that can be defined inside a interface
 806interfaceField!
 807	:	// method, constructor, or variable declaration
 808		mods:modifiers
 809		(	td:typeDefinitionInternal[#mods]
 810			{#interfaceField = #td;}
 811
 812		|	(tp:typeParameters)?
 813			// A generic method has the typeParameters before the return type.
 814			// This is not allowed for variable definitions, but this production
 815			// allows it, a semantic check could be used if you want a more strict
 816			// grammar.
 817			t:typeSpec[false]		// method or variable declaration(s)
 818			(	IDENT				// the name of the method
 819
 820				// parse the formal parameter declarations.
 821				LPAREN! param:parameterDeclarationList RPAREN!
 822
 823				rt:declaratorBrackets[#t]
 824
 825				// get the list of exceptions that this method is
 826				// declared to throw
 827				(tc:throwsClause)?
 828
 829				SEMI
 830
 831				{#interfaceField = #(#[METHOD_DEF,"METHOD_DEF"],
 832							 mods,
 833							 tp,
 834							 #(#[TYPE,"TYPE"],rt),
 835							 IDENT,
 836							 param,
 837							 tc);}
 838			|	v:variableDefinitions[#mods,#t] SEMI
 839				{#interfaceField = #v;}
 840			)
 841		)
 842	;
 843
 844constructorBody
 845	:	lc:LCURLY^ {#lc.setType(SLIST);}
 846			( options { greedy=true; } : explicitConstructorInvocation)?
 847			(statement)*
 848		RCURLY!
 849	;
 850
 851/** Catch obvious constructor calls, but not the expr.super(...) calls */
 852explicitConstructorInvocation
 853	:	(typeArguments)?
 854		(	"this"! lp1:LPAREN^ argList RPAREN! SEMI!
 855			{#lp1.setType(CTOR_CALL);}
 856		|	"super"! lp2:LPAREN^ argList RPAREN! SEMI!
 857			{#lp2.setType(SUPER_CTOR_CALL);}
 858		)
 859	;
 860
 861variableDefinitions[AST mods, AST t]
 862	:	variableDeclarator[getASTFactory().dupTree(mods),
 863							getASTFactory().dupList(t)] //dupList as this also copies siblings (like TYPE_ARGUMENTS)
 864		(	COMMA!
 865			variableDeclarator[getASTFactory().dupTree(mods),
 866							getASTFactory().dupList(t)] //dupList as this also copies siblings (like TYPE_ARGUMENTS)
 867		)*
 868	;
 869
 870/** Declaration of a variable. This can be a class/instance variable,
 871 *  or a local variable in a method
 872 *  It can also include possible initialization.
 873 */
 874variableDeclarator![AST mods, AST t]
 875	:	id:IDENT d:declaratorBrackets[t] v:varInitializer
 876		{#variableDeclarator = #(#[VARIABLE_DEF,"VARIABLE_DEF"], mods, #(#[TYPE,"TYPE"],d), id, v);}
 877	;
 878
 879declaratorBrackets[AST typ]
 880	:	{#declaratorBrackets=typ;}
 881		(lb:LBRACK^ {#lb.setType(ARRAY_DECLARATOR);} RBRACK!)*
 882	;
 883
 884varInitializer
 885	:	( ASSIGN^ initializer )?
 886	;
 887
 888// This is an initializer used to set up an array.
 889arrayInitializer
 890	:	lc:LCURLY^ {#lc.setType(ARRAY_INIT);}
 891			(	initializer
 892				(
 893					// CONFLICT: does a COMMA after an initializer start a new
 894					// initializer or start the option ',' at end?
 895					// ANTLR generates proper code by matching
 896					// the comma as soon as possible.
 897					options {
 898						warnWhenFollowAmbig = false;
 899					}
 900				:
 901					COMMA! initializer
 902				)*
 903				(COMMA!)?
 904			)?
 905		RCURLY!
 906	;
 907
 908
 909// The two "things" that can initialize an array element are an expression
 910// and another (nested) array initializer.
 911initializer
 912	:	expression
 913	|	arrayInitializer
 914	;
 915
 916// This is the header of a method. It includes the name and parameters
 917// for the method.
 918// This also watches for a list of exception classes in a "throws" clause.
 919ctorHead
 920	:	IDENT // the name of the method
 921
 922		// parse the formal parameter declarations.
 923		LPAREN! parameterDeclarationList RPAREN!
 924
 925		// get the list of exceptions that this method is declared to throw
 926		(throwsClause)?
 927	;
 928
 929// This is a list of exception classes that the method is declared to throw
 930throwsClause
 931	:	"throws"^ identifier ( COMMA! identifier )*
 932	;
 933
 934// A list of formal parameters
 935//	 Zero or more parameters
 936//	 If a parameter is variable length (e.g. String... myArg) it is the right-most parameter
 937parameterDeclarationList
 938	// The semantic check in ( .... )* block is flagged as superfluous, and seems superfluous but
 939	// is the only way I could make this work. If my understanding is correct this is a known bug
 940	:	(	( parameterDeclaration )=> parameterDeclaration
 941			( options {warnWhenFollowAmbig=false;} : ( COMMA! parameterDeclaration ) => COMMA! parameterDeclaration )*
 942			( COMMA! variableLengthParameterDeclaration )?
 943		|
 944			variableLengthParameterDeclaration
 945		)?
 946		{#parameterDeclarationList = #(#[PARAMETERS,"PARAMETERS"],
 947									#parameterDeclarationList);}
 948	;
 949
 950// A formal parameter.
 951parameterDeclaration!
 952	:	pm:parameterModifier t:typeSpec[false] id:IDENT
 953		pd:declaratorBrackets[#t]
 954		{#parameterDeclaration = #(#[PARAMETER_DEF,"PARAMETER_DEF"],
 955									pm, #([TYPE,"TYPE"],pd), id);}
 956	;
 957
 958variableLengthParameterDeclaration!
 959	:	pm:parameterModifier t:typeSpec[false] TRIPLE_DOT! id:IDENT
 960		pd:declaratorBrackets[#t]
 961		{#variableLengthParameterDeclaration = #(#[VARIABLE_PARAMETER_DEF,"VARIABLE_PARAMETER_DEF"],
 962												pm, #([TYPE,"TYPE"],pd), id);}
 963	;
 964
 965parameterModifier
 966	//final can appear amongst annotations in any order - greedily consume any preceding
 967	//annotations to shut nond-eterminism warnings off
 968	:	(options{greedy=true;} : annotation)* (f:"final")? (annotation)*
 969		{#parameterModifier = #(#[MODIFIERS,"MODIFIERS"], #parameterModifier);}
 970	;
 971
 972// Compound statement. This is used in many contexts:
 973// Inside a class definition prefixed with "static":
 974// it is a class initializer
 975// Inside a class definition without "static":
 976// it is an instance initializer
 977// As the body of a method
 978// As a completely indepdent braced block of code inside a method
 979// it starts a new scope for variable definitions
 980
 981compoundStatement
 982	:	lc:LCURLY^ {#lc.setType(SLIST);}
 983			// include the (possibly-empty) list of statements
 984			(statement)*
 985		RCURLY!
 986	;
 987
 988
 989statement
 990	// A list of statements in curly braces -- start a new scope!
 991	:	compoundStatement
 992
 993	// declarations are ambiguous with "ID DOT" relative to expression
 994	// statements. Must backtrack to be sure. Could use a semantic
 995	// predicate to test symbol table to see what the type was coming
 996	// up, but that's pretty hard without a symbol table ;)
 997	|	(declaration)=> declaration SEMI!
 998
 999	// An expression statement. This could be a method call,
1000	// assignment statement, or any other expression evaluated for
1001	// side-effects.
1002	|	expression SEMI!
1003
1004	//TODO: what abour interfaces, enums and annotations
1005	// class definition
1006	|	m:modifiers! classDefinition[#m]
1007
1008	// Attach a label to the front of a statement
1009	|	IDENT c:COLON^ {#c.setType(LABELED_STAT);} statement
1010
1011	// If-else statement
1012	|	"if"^ LPAREN! expression RPAREN! statement
1013		(
1014			// CONFLICT: the old "dangling-else" problem...
1015			// ANTLR generates proper code matching
1016			// as soon as possible. Hush warning.
1017			options {
1018				warnWhenFollowAmbig = false;
1019			}
1020		:
1021			"else"! statement
1022		)?
1023
1024	// For statement
1025	|	forStatement
1026
1027	// While statement
1028	|	"while"^ LPAREN! expression RPAREN! statement
1029
1030	// do-while statement
1031	|	"do"^ statement "while"! LPAREN! expression RPAREN! SEMI!
1032
1033	// get out of a loop (or switch)
1034	|	"break"^ (IDENT)? SEMI!
1035
1036	// do next iteration of a loop
1037	|	"continue"^ (IDENT)? SEMI!
1038
1039	// Return an expression
1040	|	"return"^ (expression)? SEMI!
1041
1042	// switch/case statement
1043	|	"switch"^ LPAREN! expression RPAREN! LCURLY!
1044			( casesGroup )*
1045		RCURLY!
1046
1047	// exception try-catch block
1048	|	tryBlock
1049
1050	// throw an exception
1051	|	"throw"^ expression SEMI!
1052
1053	// synchronize a statement
1054	|	"synchronized"^ LPAREN! expression RPAREN! compoundStatement
1055
1056	// asserts (uncomment if you want 1.4 compatibility)
1057	|	"assert"^ expression ( COLON! expression )? SEMI!
1058
1059	// empty statement
1060	|	s:SEMI {#s.setType(EMPTY_STAT);}
1061	;
1062
1063forStatement
1064	:	f:"for"^
1065		LPAREN!
1066			(	(forInit SEMI)=>traditionalForClause
1067			|	forEachClause
1068			)
1069		RPAREN!
1070		statement					 // statement to loop over
1071	;
1072
1073traditionalForClause
1074	:
1075		forInit SEMI!	// initializer
1076		forCond SEMI!	// condition test
1077		forIter			// updater
1078	;
1079
1080forEachClause
1081	:
1082		p:parameterDeclaration COLON! expression
1083		{#forEachClause = #(#[FOR_EACH_CLAUSE,"FOR_EACH_CLAUSE"], #forEachClause);}
1084	;
1085
1086casesGroup
1087	:	(	// CONFLICT: to which case group do the statements bind?
1088			// ANTLR generates proper code: it groups the
1089			// many "case"/"default" labels together then
1090			// follows them with the statements
1091			options {
1092				greedy = true;
1093			}
1094			:
1095			aCase
1096		)+
1097		caseSList
1098		{#casesGroup = #([CASE_GROUP, "CASE_GROUP"], #casesGroup);}
1099	;
1100
1101aCase
1102	:	("case"^ expression | "default") COLON!
1103	;
1104
1105caseSList
1106	:	(statement)*
1107		{#caseSList = #(#[SLIST,"SLIST"],#caseSList);}
1108	;
1109
1110// The initializer for a for loop
1111forInit
1112		// if it looks like a declaration, it is
1113	:	((declaration)=> declaration
1114		// otherwise it could be an expression list...
1115		|	expressionList
1116		)?
1117		{#forInit = #(#[FOR_INIT,"FOR_INIT"],#forInit);}
1118	;
1119
1120forCond
1121	:	(expression)?
1122		{#forCond = #(#[FOR_CONDITION,"FOR_CONDITION"],#forCond);}
1123	;
1124
1125forIter
1126	:	(expressionList)?
1127		{#forIter = #(#[FOR_ITERATOR,"FOR_ITERATOR"],#forIter);}
1128	;
1129
1130// an exception handler try/catch block
1131tryBlock
1132	:	"try"^ compoundStatement
1133		(handler)*
1134		( finallyClause )?
1135	;
1136
1137finallyClause
1138	:	"finally"^ compoundStatement
1139	;
1140
1141// an exception handler
1142handler
1143	:	"catch"^ LPAREN! parameterDeclaration RPAREN! compoundStatement
1144	;
1145
1146
1147// expressions
1148// Note that most of these expressions follow the pattern
1149//   thisLevelExpression :
1150//	   nextHigherPrecedenceExpression
1151//		   (OPERATOR nextHigherPrecedenceExpression)*
1152// which is a standard recursive definition for a parsing an expression.
1153// The operators in java have the following precedences:
1154//	lowest  (13)  = *= /= %= += -= <<= >>= >>>= &= ^= |=
1155//			(12)  ?:
1156//			(11)  ||
1157//			(10)  &&
1158//			( 9)  |
1159//			( 8)  ^
1160//			( 7)  &
1161//			( 6)  == !=
1162//			( 5)  < <= > >=
1163//			( 4)  << >>
1164//			( 3)  +(binary) -(binary)
1165//			( 2)  * / %
1166//			( 1)  ++ -- +(unary) -(unary)  ~  !  (type)
1167//				  []   () (method call)  . (dot -- identifier qualification)
1168//				  new   ()  (explicit parenthesis)
1169//
1170// the last two are not usually on a precedence chart; I put them in
1171// to point out that new has a higher precedence than '.', so you
1172// can validy use
1173//	 new Frame().show()
1174//
1175// Note that the above precedence levels map to the rules below...
1176// Once you have a precedence chart, writing the appropriate rules as below
1177//   is usually very straightfoward
1178
1179
1180
1181// the mother of all expressions
1182expression
1183	:	assignmentExpression
1184		{#expression = #(#[EXPR,"EXPR"],#expression);}
1185	;
1186
1187
1188// This is a list of expressions.
1189expressionList
1190	:	expression (COMMA! expression)*
1191		{#expressionList = #(#[ELIST,"ELIST"], expressionList);}
1192	;
1193
1194
1195// assignment expression (level 13)
1196assignmentExpression
1197	:	conditionalExpression
1198		(	(	ASSIGN^
1199			|	PLUS_ASSIGN^
1200			|	MINUS_ASSIGN^
1201			|	STAR_ASSIGN^
1202			|	DIV_ASSIGN^
1203			|	MOD_ASSIGN^
1204			|	SR_ASSIGN^
1205			|	BSR_ASSIGN^
1206			|	SL_ASSIGN^
1207			|	BAND_ASSIGN^
1208			|	BXOR_ASSIGN^
1209			|	BOR_ASSIGN^
1210			)
1211			assignmentExpression
1212		)?
1213	;
1214
1215
1216// conditional test (level 12)
1217conditionalExpression
1218	:	logicalOrExpression
1219		( QUESTION^ assignmentExpression COLON! conditionalExpression )?
1220	;
1221
1222
1223// logical or (||) (level 11)
1224logicalOrExpression
1225	:	logicalAndExpression (LOR^ logicalAndExpression)*
1226	;
1227
1228
1229// logical and (&&) (level 10)
1230logicalAndExpression
1231	:	inclusiveOrExpression (LAND^ inclusiveOrExpression)*
1232	;
1233
1234
1235// bitwise or non-short-circuiting or (|) (level 9)
1236inclusiveOrExpression
1237	:	exclusiveOrExpression (BOR^ exclusiveOrExpression)*
1238	;
1239
1240
1241// exclusive or (^) (level 8)
1242exclusiveOrExpression
1243	:	andExpression (BXOR^ andExpression)*
1244	;
1245
1246
1247// bitwise or non-short-circuiting and (&) (level 7)
1248andExpression
1249	:	equalityExpression (BAND^ equalityExpression)*
1250	;
1251
1252
1253// equality/inequality (==/!=) (level 6)
1254equalityExpression
1255	:	relationalExpression ((NOT_EQUAL^ | EQUAL^) relationalExpression)*
1256	;
1257
1258
1259// boolean relational expressions (level 5)
1260relationalExpression
1261	:	shiftExpression
1262		(	(	(	LT^
1263				|	GT^
1264				|	LE^
1265				|	GE^
1266				)
1267				shiftExpression
1268			)*
1269		|	"instanceof"^ typeSpec[true]
1270		)
1271	;
1272
1273
1274// bit shift expressions (level 4)
1275shiftExpression
1276	:	additiveExpression ((SL^ | SR^ | BSR^) additiveExpression)*
1277	;
1278
1279
1280// binary addition/subtraction (level 3)
1281additiveExpression
1282	:	multiplicativeExpression ((PLUS^ | MINUS^) multiplicativeExpression)*
1283	;
1284
1285
1286// multiplication/division/modulo (level 2)
1287multiplicativeExpression
1288	:	unaryExpression ((STAR^ | DIV^ | MOD^ ) unaryExpression)*
1289	;
1290
1291unaryExpression
1292	:	INC^ unaryExpression
1293	|	DEC^ unaryExpression
1294	|	MINUS^ {#MINUS.setType(UNARY_MINUS);} unaryExpression
1295	|	PLUS^ {#PLUS.setType(UNARY_PLUS);} unaryExpression
1296	|	unaryExpressionNotPlusMinus
1297	;
1298
1299unaryExpressionNotPlusMinus
1300	:	BNOT^ unaryExpression
1301	|	LNOT^ unaryExpression
1302	|	(	// subrule allows option to shut off warnings
1303			options {
1304				// "(int" ambig with postfixExpr due to lack of sequence
1305				// info in linear approximate LL(k). It's ok. Shut up.
1306				generateAmbigWarnings=false;
1307			}
1308		:	// If typecast is built in type, must be numeric operand
1309			// Have to backtrack to see if operator follows
1310		(LPAREN builtInTypeSpec[true] RPAREN unaryExpression)=>
1311		lpb:LPAREN^ {#lpb.setType(TYPECAST);} builtInTypeSpec[true] RPAREN!
1312		unaryExpression
1313
1314		// Have to backtrack to see if operator follows. If no operator
1315		// follows, it's a typecast. No semantic checking needed to parse.
1316		// if it _looks_ like a cast, it _is_ a cast; else it's a "(expr)"
1317	|	(LPAREN classTypeSpec[true] RPAREN unaryExpressionNotPlusMinus)=>
1318		lp:LPAREN^ {#lp.setType(TYPECAST);} classTypeSpec[true] RPAREN!
1319		unaryExpressionNotPlusMinus
1320
1321	|	postfixExpression
1322	)
1323	;
1324
1325// qualified names, array expressions, method invocation, post inc/dec
1326postfixExpression
1327	:
1328		primaryExpression
1329
1330		(
1331			/*
1332			options {
1333				// the use of postfixExpression in SUPER_CTOR_CALL adds DOT
1334				// to the lookahead set, and gives loads of false non-det
1335				// warnings.
1336				// shut them off.
1337				generateAmbigWarnings=false;
1338			}
1339		:	*/
1340			//type arguments are only appropriate for a parameterized method/ctor invocations
1341			//semantic check may be needed here to ensure that this is the case
1342			DOT^ (typeArguments)?
1343				(	IDENT
1344					(	lp:LPAREN^ {#lp.setType(METHOD_CALL);}
1345						argList
1346						RPAREN!
1347					)?
1348				|	"super"
1349					(	// (new Outer()).super() (create enclosing instance)
1350						lp3:LPAREN^ argList RPAREN!
1351						{#lp3.setType(SUPER_CTOR_CALL);}
1352					|	DOT^ (typeArguments)? IDENT
1353						(	lps:LPAREN^ {#lps.setType(METHOD_CALL);}
1354							argList
1355							RPAREN!
1356						)?
1357					)
1358				)
1359		|	DOT^ "this"
1360		|	DOT^ newExpression
1361		|	lb:LBRACK^ {#lb.setType(INDEX_OP);} expression RBRACK!
1362		)*
1363
1364		(	// possibly add on a post-increment or post-decrement.
1365			// allows INC/DEC on too much, but semantics can check
1366			in:INC^ {#in.setType(POST_INC);}
1367	 	|	de:DEC^ {#de.setType(POST_DEC);}
1368		)?
1369 	;
1370
1371// the basic element of an expression
1372primaryExpression
1373	:	identPrimary ( options {greedy=true;} : DOT^ "class" )?
1374	|	constant
1375	|	"true"
1376	|	"false"
1377	|	"null"
1378	|	newExpression
1379	|	"this"
1380	|	"super"
1381	|	LPAREN! assignmentExpression RPAREN!
1382		// look for int.class and int[].class
1383	|	builtInType
1384		( lbt:LBRACK^ {#lbt.setType(ARRAY_DECLARATOR);} RBRACK! )*
1385		DOT^ "class"
1386	;
1387
1388/** Match a, a.b.c refs, a.b.c(...) refs, a.b.c[], a.b.c[].class,
1389 *  and a.b.c.class refs. Also this(...) and super(...). Match
1390 *  this or super.
1391 */
1392identPrimary
1393	:	(ta1:typeArguments!)?
1394		IDENT
1395		// Syntax for method invocation with type arguments is
1396		// <String>foo("blah")
1397		(
1398			options {
1399				// .ident could match here or in postfixExpression.
1400				// We do want to match here. Turn off warning.
1401				greedy=true;
1402				// This turns the ambiguity warning of the second alternative
1403				// off. See below. (The "false" predicate makes it non-issue)
1404				warnWhenFollowAmbig=false;
1405			}
1406			// we have a new nondeterminism because of
1407			// typeArguments... only a syntactic predicate will help...
1408			// The problem is that this loop here conflicts with
1409			// DOT typeArguments "super" in postfixExpression (k=2)
1410			// A proper solution would require a lot of refactoring...
1411		:	(DOT (typeArguments)? IDENT) =>
1412				DOT^ (ta2:typeArguments!)? IDENT
1413		|	{false}?	// FIXME: this is very ugly but it seems to work...
1414						// this will also produce an ANTLR warning!
1415				// Unfortunately a syntactic predicate can only select one of
1416				// multiple alternatives on the same level, not break out of
1417				// an enclosing loop, which is why this ugly hack (a fake
1418				// empty alternative with always-false semantic predicate)
1419				// is necessary.
1420		)*
1421		(
1422			options {
1423				// ARRAY_DECLARATOR here conflicts with INDEX_OP in
1424				// postfixExpression on LBRACK RBRACK.
1425				// We want to match [] here, so greedy. This overcomes
1426				// limitation of linear approximate lookahead.
1427				greedy=true;
1428			}
1429		:	(	lp:LPAREN^ {#lp.setType(METHOD_CALL);}
1430				// if the input is valid, only the last IDENT may
1431				// have preceding typeArguments... rather hacky, this is...
1432				{if (#ta2 != null) astFactory.addASTChild(currentAST, #ta2);}
1433				{if (#ta2 == null) astFactory.addASTChild(currentAST, #ta1);}
1434				argList RPAREN!
1435			)
1436		|	( options {greedy=true;} :
1437				lbc:LBRACK^ {#lbc.setType(ARRAY_DECLARATOR);} RBRACK!
1438			)+
1439		)?
1440	;
1441
1442/** object instantiation.
1443 *  Trees are built as illustrated by the following input/tree pairs:
1444 *
1445 *  new T()
1446 *
1447 *  new
1448 *   |
1449 *   T --  ELIST
1450 *		   |
1451 *		  arg1 -- arg2 -- .. -- argn
1452 *
1453 *  new int[]
1454 *
1455 *  new
1456 *   |
1457 *  int -- ARRAY_DECLARATOR
1458 *
1459 *  new int[] {1,2}
1460 *
1461 *  new
1462 *   |
1463 *  int -- ARRAY_DECLARATOR -- ARRAY_INIT
1464 *								  |
1465 *								EXPR -- EXPR
1466 *								  |	  |
1467 *								  1	  2
1468 *
1469 *  new int[3]
1470 *  new
1471 *   |
1472 *  int -- ARRAY_DECLARATOR
1473 *				|
1474 *			  EXPR
1475 *				|
1476 *				3
1477 *
1478 *  new int[1][2]
1479 *
1480 *  new
1481 *   |
1482 *  int -- ARRAY_DECLARATOR
1483 *			   |
1484 *		 ARRAY_DECLARATOR -- EXPR
1485 *			   |			  |
1486 *			 EXPR			 1
1487 *			   |
1488 *			   2
1489 *
1490 */
1491newExpression
1492	:	"new"^ (typeArguments)? type
1493		(	LPAREN! argList RPAREN! (classBlock)?
1494
1495			//java 1.1
1496			// Note: This will allow bad constructs like
1497			//	new int[4][][3] {exp,exp}.
1498			//	There needs to be a semantic check here...
1499			// to make sure:
1500			//   a) [ expr ] and [ ] are not mixed
1501			//   b) [ expr ] and an init are not used together
1502
1503		|	newArrayDeclarator (arrayInitializer)?
1504		)
1505	;
1506
1507argList
1508	:	(	expressionList
1509		|	/*nothing*/
1510			{#argList = #[ELIST,"ELIST"];}
1511		)
1512	;
1513
1514newArrayDeclarator
1515	:	(
1516			// CONFLICT:
1517			// newExpression is a primaryExpression which can be
1518			// followed by an array index reference. This is ok,
1519			// as the generated code will stay in this loop as
1520			// long as it sees an LBRACK (proper behavior)
1521			options {
1522				warnWhenFollowAmbig = false;
1523			}
1524		:
1525			lb:LBRACK^ {#lb.setType(ARRAY_DECLARATOR);}
1526				(expression)?
1527			RBRACK!
1528		)+
1529	;
1530
1531constant
1532	:	NUM_INT
1533	|	CHAR_LITERAL
1534	|	STRING_LITERAL
1535	|	NUM_FLOAT
1536	|	NUM_LONG
1537	|	NUM_DOUBLE
1538	;
1539
1540
1541//----------------------------------------------------------------------------
1542// The Java scanner
1543//----------------------------------------------------------------------------
1544class JavaLexer extends Lexer;
1545
1546options {
1547	exportVocab=Java;		// call the vocabulary "Java"
1548	testLiterals=false;		// don't automatically test for literals
1549	k=4;					// four characters of lookahead
1550	charVocabulary='\u0003'..'\uFFFF';
1551	// without inlining some bitset tests, couldn't do unicode;
1552	// I need to make ANTLR generate smaller bitsets; see
1553	// bottom of JavaLexer.java
1554	codeGenBitsetTestThreshold=20;
1555}
1556
1557{
1558	/** flag for enabling the "assert" keyword */
1559	private boolean assertEnabled = true;
1560	/** flag for enabling the "enum" keyword */
1561	private boolean enumEnabled = true;
1562
1563	/** Enable the "assert" keyword */
1564	public void enableAssert(boolean shouldEnable) { assertEnabled = shouldEnable; }
1565	/** Query the "assert" keyword state */
1566	public boolean isAssertEnabled() { return assertEnabled; }
1567	/** Enable the "enum" keyword */
1568	public void enableEnum(boolean shouldEnable) { enumEnabled = shouldEnable; }
1569	/** Query the "enum" keyword state */
1570	public boolean isEnumEnabled() { return enumEnabled; }
1571}
1572
1573// OPERATORS
1574QUESTION		:	'?'		;
1575LPAREN			:	'('		;
1576RPAREN			:	')'		;
1577LBRACK			:	'['		;
1578RBRACK			:	']'		;
1579LCURLY			:	'{'		;
1580RCURLY			:	'}'		;
1581COLON			:	':'		;
1582COMMA			:	','		;
1583//DOT			:	'.'		;
1584ASSIGN			:	'='		;
1585EQUAL			:	"=="	;
1586LNOT			:	'!'		;
1587BNOT			:	'~'		;
1588NOT_EQUAL		:	"!="	;
1589DIV				:	'/'		;
1590DIV_ASSIGN		:	"/="	;
1591PLUS			:	'+'		;
1592PLUS_ASSIGN		:	"+="	;
1593INC				:	"++"	;
1594MINUS			:	'-'		;
1595MINUS_ASSIGN	:	"-="	;
1596DEC				:	"--"	;
1597STAR			:	'*'		;
1598STAR_ASSIGN		:	"*="	;
1599MOD				:	'%'		;
1600MOD_ASSIGN		:	"%="	;
1601SR				:	">>"	;
1602SR_ASSIGN		:	">>="	;
1603BSR				:	">>>"	;
1604BSR_ASSIGN		:	">>>="	;
1605GE				:	">="	;
1606GT				:	">"		;
1607SL				:	"<<"	;
1608SL_ASSIGN		:	"<<="	;
1609LE				:	"<="	;
1610LT				:	'<'		;
1611BXOR			:	'^'		;
1612BXOR_ASSIGN		:	"^="	;
1613BOR				:	'|'		;
1614BOR_ASSIGN		:	"|="	;
1615LOR				:	"||"	;
1616BAND			:	'&'		;
1617BAND_ASSIGN		:	"&="	;
1618LAND			:	"&&"	;
1619SEMI			:	';'		;
1620
1621
1622// Whitespace -- ignored
1623WS	:	(	' '
1624		|	'\t'
1625		|	'\f'
1626			// handle newlines
1627		|	(	options {generateAmbigWarnings=false;}
1628			:	"\r\n"	// Evil DOS
1629			|	'\r'	// Macintosh
1630			|	'\n'	// Unix (the right way)
1631			)
1632			{ newline(); }
1633		)+
1634		{ _ttype = Token.SKIP; }
1635	;
1636
1637// Single-line comments
1638SL_COMMENT
1639	:	"//"
1640		(~('\n'|'\r'))* ('\n'|'\r'('\n')?)
1641		{$setType(Token.SKIP); newline();}
1642	;
1643
1644// multiple-line comments
1645ML_COMMENT
1646	:	"/*"
1647		(	/*	'\r' '\n' can be matched in one alternative or by matching
1648				'\r' in one iteration and '\n' in another. I am trying to
1649				handle any flavor of newline that comes in, but the language
1650				that allows both "\r\n" and "\r" and "\n" to all be valid
1651				newline is ambiguous. Consequently, the resulting grammar
1652				must be ambiguous. I'm shutting this warning off.
1653			 */
1654			options {
1655				generateAmbigWarnings=false;
1656			}
1657		:
1658			{ LA(2)!='/' }? '*'
1659		|	'\r' '\n'		{newline();}
1660		|	'\r'			{newline();}
1661		|	'\n'			{newline();}
1662		|	~('*'|'\n'|'\r')
1663		)*
1664		"*/"
1665		{$setType(Token.SKIP);}
1666	;
1667
1668
1669// character literals
1670CHAR_LITERAL
1671	:	'\'' ( ESC | ~('\''|'\n'|'\r'|'\\') ) '\''
1672	;
1673
1674// string literals
1675STRING_LITERAL
1676	:	'"' (ESC|~('"'|'\\'|'\n'|'\r'))* '"'
1677	;
1678
1679
1680// escape sequence -- note that this is protected; it can only be called
1681// from another lexer rule -- it will not ever directly return a token to
1682// the parser
1683// There are various ambiguities hushed in this rule. The optional
1684// '0'...'9' digit matches should be matched here rather than letting
1685// them go back to STRING_LITERAL to be matched. ANTLR does the
1686// right thing by matching immediately; hence, it's ok to shut off
1687// the FOLLOW ambig warnings.
1688protected
1689ESC
1690	:	'\\'
1691		(	'n'
1692		|	'r'
1693		|	't'
1694		|	'b'
1695		|	'f'
1696		|	'"'
1697		|	'\''
1698		|	'\\'
1699		|	('u')+ HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
1700		|	'0'..'3'
1701			(
1702				options {
1703					warnWhenFollowAmbig = false;
1704				}
1705			:	'0'..'7'
1706				(
1707					options {
1708						warnWhenFollowAmbig = false;
1709					}
1710				:	'0'..'7'
1711				)?
1712			)?
1713		|	'4'..'7'
1714			(
1715				options {
1716					warnWhenFollowAmbig = false;
1717				}
1718			:	'0'..'7'
1719			)?
1720		)
1721	;
1722
1723
1724// hexadecimal digit (again, note it's protected!)
1725protected
1726HEX_DIGIT
1727	:	('0'..'9'|'A'..'F'|'a'..'f')
1728	;
1729
1730
1731// a dummy rule to force vocabulary to be all characters (except special
1732// ones that ANTLR uses internally (0 to 2)
1733protected
1734VOCAB
1735	:	'\3'..'\377'
1736	;
1737
1738
1739// an identifier. Note that testLiterals is set to true! This means
1740// that after we match the rule, we look in the literals table to see
1741// if it's a literal or really an identifer
1742IDENT
1743	options {testLiterals=true;}
1744	:	('a'..'z'|'A'..'Z'|'_'|'$') ('a'..'z'|'A'..'Z'|'_'|'0'..'9'|'$')*
1745		{
1746			// check if "assert" keyword is enabled
1747			if (assertEnabled && "assert".equals($getText)) {
1748				$setType(LITERAL_assert); // set token type for the rule in the parser
1749			}
1750			// check if "enum" keyword is enabled
1751			if (enumEnabled && "enum".equals($getText)) {
1752				$setType(LITERAL_enum); // set token type for the rule in the parser
1753			}
1754		}
1755	;
1756
1757
1758// a numeric literal
1759NUM_INT
1760	{boolean isDecimal=false; Token t=null;}
1761	:	'.' {_ttype = DOT;}
1762			(
1763				(('0'..'9')+ (EXPONENT)? (f1:FLOAT_SUFFIX {t=f1;})?
1764				{
1765				if (t != null && t.getText().toUpperCase().indexOf('F')>=0) {
1766					_ttype = NUM_FLOAT;
1767				}
1768				else {
1769					_ttype = NUM_DOUBLE; // assume double
1770				}
1771				})
1772				|
1773				// JDK 1.5 token for variable length arguments
1774				(".." {_ttype = TRIPLE_DOT;})
1775			)?
1776
1777	|	(	'0' {isDecimal = true;} // special case for just '0'
1778			(	('x'|'X')
1779				(											// hex
1780					// the 'e'|'E' and float suffix stuff look
1781					// like hex digits, hence the (...)+ doesn't
1782					// know when to stop: ambig. ANTLR resolves
1783					// it correctly by matching immediately. It
1784					// is therefor ok to hush warning.
1785					options {
1786						warnWhenFollowAmbig=false;
1787					}
1788				:	HEX_DIGIT
1789				)+
1790
1791			|	//float or double with leading zero
1792				(('0'..'9')+ ('.'|EXP

Large files files are truncated, but you can click here to view the full file