Issue

I am looking to make a rule for a regex character class that is of the form:

 character_range
   : '[' literal '-' literal ']'
   ;

For example, with [1-5]+ I could match the string "1234543" but not "129". However, I'm having a hard time figuring out how I would define a "literal" in antlr4. Normally I would do [a-zA-Z], but then this is just ascii and won't include something such as é. So how would I do that?

Solution

Actually, you don't want to match an entire literal, because a literal can be more than one character. Instead you only need a single character for the match.

In the parser:

character_range: OPEN_BRACKET LETTER DASH LETTER CLOSE_BRACKET;

And in the lexer:

OPEN_BRACKET: '[';
CLOSE_BRACKET: ']';
LETTER: [\p{L}];

The character class used in the LETTER lexer rule is Unicode Letters as described in the Unicode description file of ANTLR. Other possible character classes are listed in the UAX #44 Annex of the Unicode Character DB. You may need others like Numbers, Punctuation or Separators for all possible regex character classes.

Answered By - Mike Lischke

Answer Checked By - Clifford M. (PHPFixing Volunteer)

Tuesday, December 13, 2022

[FIXED] How to capture a literal in antlr4?

Issue

Solution

0 Comments:

Post a Comment

Total Pageviews

Featured Post

Why Learn PHP Programming

Tuesday, December 13, 2022

Issue

Solution

0 Comments:

Post a Comment

Total Pageviews

Featured Post

Why Learn PHP Programming

Subscribe To