PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Saturday, July 16, 2022

[FIXED] How to avoid warnings in Perl regex substitution with alternatives?

 July 16, 2022     perl, regex, substitution, warnings     No comments   

Issue

I have this regex.

$string =~ s/(?<!["\w])(\w+)(?=:)|(?<=:)([\w\d\\.+=\/]+)/"$1$2"/g;

The regex itself works fine.

But since I am am substituting alternatives (and globally), I always get warning that $1 or $2 is uninitialized. These warnings clutter my logfile.

What can I do better to avoid such warning? Or is my best option to just turn the warning off? I doubt this.

Side question: Is there possibly some better way of doing this, e.g. not using regex at all? What I am doing is fixing JSON where some key:value pairs do not have double quotes and JSON module does not like it when trying to decode.


Solution

There are a couple of approaches to get around this.

If you intend to use capture groups:

  • When capturing the entirety of each clause of the alternation.
    Combine the capture groups into 1 and move the group out.

     (                             # (1 start)
          (?<! ["\w] )
          \w+ 
          (?= : )
       |  
          (?<= : )
          [\w\d\\.+=/]+ 
     )                             # (1 end)
    

    s/((?<!["\w])\w+(?=:)|(?<=:)[\w\d\\.+=\/]+)/"$1"/g

  • Use a Branch Reset construct (?| aaa ).
    This will cause capture groups in each alternation to start numbering it's groups
    from the same point.

     (?|
          (?<! ["\w] )
          ( \w+ )                       # (1)
          (?= : )
       |  
          (?<= : )
          ( [\w\d\\.+=/]+ )             # (1)
     )
    

    s/(?|(?<!["\w])(\w+)(?=:)|(?<=:)([\w\d\\.+=\/]+))/"$1"/g

  • Use Named capture groups that are re-useable (Similar to a branch reset).
    In each alternation, reuse the same names. Make the group that isn't relevant, the empty group.
    This works by using the name in the substitution instead of the number.

        (?<! ["\w] )
        (?<V1> \w+ )                  # (1)
        (?<V2> )                      # (2)
        (?= : )
     |  
        (?<= : )
        (?<V1> )                      # (3)
        (?<V2> [\w\d\\.+=/]+ )        # (4)
    

    s/(?<!["\w])(?<V1>\w+)(?<V2>)(?=:)|(?<=:)(?<V1>)(?<V2>[\w\d\\.+=\/]+)/"$+{V1}$+{V2}"/g


The two concepts of the named substitution and a branch reset can be combined
if an alternation contains more than 1 capture group.
The example below uses the capture group numbers.

The theory is that you put dummy capture groups in each alternation to
"pad" the branch to equal the largest number of groups in a single alternation.

Indeed, this must be done to avoid the bug in Perl regex that could cause a crash.

 (?|                    # Branch Reset
                             # ------ Br 1 --------
      ( )                    # (1)
      ( \d{4} )              # (2)
      ABC294
      ( [a-f]+ )             # (3)
   |  
                             # ------ Br 2 --------          
      ( :: )                 # (1)
      ( \d+ )                # (2)
      ABC555
      ( )                    # (3)
   |  
                             # ------ Br 3 --------
      ( == )                 # (1)
      ( )                    # (2)
      ABC18888
      ( )                    # (3)
 )

s/(?|()(\d{4})ABC294([a-f]+)|(::)(\d+)ABC555()|(==)()ABC18888())/"$1$2$3"/g



Answered By - user557597
Answer Checked By - Marilyn (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing