PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0
Showing posts with label rakudo. Show all posts
Showing posts with label rakudo. Show all posts

Wednesday, June 29, 2022

[FIXED] What counts as a newline for Raku *source* files?

 June 29, 2022     comments, newline, raku, rakudo, unicode     No comments   

Issue

I was somewhat surprised to observe that the following code

# comment 
say 1;
# comment 
say 2;
# comment say 3;
# comment say 4;

prints 1, 2, 3, and 4.

Here are the relevant characters after "# comment":

say "

".uninames.raku;
# OUTPUT: «("PARAGRAPH SEPARATOR", "LINE SEPARATOR", "<control-000B>", "<control-000C>").Seq»

Note that many/all of these characters are invisible in most fonts. At least with my editor, none cause the following text to be printed on a new line. And at least one (<control-000C>, aka Form Feed, sometimes printed as ^L) is in fairly wide use in Vim/Emacs as a section separator.

This raises a few questions:

  1. Is this intentional, or a bug?
  2. If intentional, what's the use-case (other than winning obfuscated code contests!)
  3. Is it just these 4 characters, or are there others? (I found these because they share the mandatory break Unicode property. Does that property (or some other Unicode property?) govern what Raku considers as a newline?)
  4. Just, really, wow.

(I realize #4 is not technically a question, but I feel it needed to be said).


Solution

Raku's syntax is defined as a Raku grammar. The rule for parsing such a comment is:

token comment:sym<#> {
   '#' {} \N*
}

That is, it eats everything after the # that is not a newline character. As with all built-in character classes in Raku, \n and its negation are Unicode-aware. The language design docs state:

\n matches a logical (platform independent) newline, not just \x0a. See TR18 section 1.6 for a list of logical newlines.

Which is a reference to the Unicode standard for regular expressions.

I somewhat doubt there was ever a specific language design discussion along the lines of "let's enable all the kinds of newlines in Unicode, it'll be cool!" Rather, the decisions were that Raku should follow the Unicode regex technical report, and that Raku syntax would be defined in terms of a Raku grammar and thus make use of the Unicode-aware character classes. That a range of different newline characters are supported is a consequence of consistently following those principles.



Answered By - Jonathan Worthington
Answer Checked By - Timothy Miller (PHPFixing Admin)
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Older Posts Home
View mobile version

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
All Comments
Atom
All Comments

Copyright © PHPFixing