Issue
I am looking to make a rule for a regex character class that is of the form:
character_range
: '[' literal '-' literal ']'
;
For example, with [1-5]+
I could match the string "1234543" but not "129". However, I'm having a hard time figuring out how I would define a "literal" in antlr4. Normally I would do [a-zA-Z]
, but then this is just ascii and won't include something such as é
. So how would I do that?
Solution
Actually, you don't want to match an entire literal, because a literal can be more than one character. Instead you only need a single character for the match.
In the parser:
character_range: OPEN_BRACKET LETTER DASH LETTER CLOSE_BRACKET;
And in the lexer:
OPEN_BRACKET: '[';
CLOSE_BRACKET: ']';
LETTER: [\p{L}];
The character class used in the LETTER
lexer rule is Unicode Letters as described in the Unicode description file of ANTLR. Other possible character classes are listed in the UAX #44 Annex of the Unicode Character DB. You may need others like Numbers, Punctuation or Separators for all possible regex character classes.
Answered By - Mike Lischke Answer Checked By - Clifford M. (PHPFixing Volunteer)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.