PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Sunday, November 6, 2022

[FIXED] How to generalize this regex so that it starts capturing substrings at the beginning of a string or if it is followed by some other word?

 November 06, 2022     python, python-3.x, regex, regex-group, string     No comments   

Issue

import re

name = "John"

#In these examples it works fine
input_sense_aux = "These sound system are too many, I think John can help us, otherwise it will be waiting for a while longer"
#input_sense_aux = "These sound system are too many but I know that John can help us, otherwise it will be waiting for a while longer"
#input_sense_aux = "These sound system are too many but I know that John can help us. otherwise it will be waiting for a while longer"
#input_sense_aux = "Do you know if John with the others could come this afternoon?"

#In these examples it does not work well
#input_sense_aux = "John can help us, otherwise it will be waiting for a while longer"
#input_sense_aux = "Can you help us, otherwise it will be waiting for a while longer for John"
#input_sense_aux = "sorry! can you help us? otherwise it will be waiting for a while longer for John"



regex_patron_m1 = r"\s*((?:\w\s*)+)\s*?" + name + r"\s*((?:\w\s*)+)\s*\??"
m1 = re.search(regex_patron_m1, input_sense_aux, re.IGNORECASE) #Con esto valido la regex haber si entra o no en el bloque de code
if m1:
    something_1, something_2 = m1.groups()

    something_1 = something_1.strip()
    something_2 = something_2.strip()

    print(repr(something_1))
    print(repr(something_2))

I need the regex to grab the content before "John" like this:

(start of sentence|¿|¡|,|;|:|(|[|.) \s* "content for something_1" \s* John

And then:

John \s* "content for something_2" \s* (end of sentence|?|!|,|;|:|)|]|.)

In the fists examples, the regex works fine:

'these teams are too many but I know that'
'can help us'
'Do you know if'
'with the others could come this afternoon'

But with the cases of the last 3 examples the regex does not return anything

And I need help to be able to generalize my regex to all these cases and at the same time respect the conditions in which it must extract the content of something_1 and something_2

For the 3 last examples, the expected results are:

''
' can help us'
' otherwise it will be waiting for a while longer for '
''
' otherwise it will be waiting for a while longer for '
''

Solution

You can use

import re

name = "John"

input_sense_auxs = [
    "These sound system are too many, I think John can help us, otherwise it will be waiting for a while longer",
    "These sound system are too many but I know that John can help us, otherwise it will be waiting for a while longer",
    "These sound system are too many but I know that John can help us. otherwise it will be waiting for a while longer",
    "Do you know if John with the others could come this afternoon?",

    "John can help us, otherwise it will be waiting for a while longer",
    "Can you help us, otherwise it will be waiting for a while longer for John",
    "sorry! can you help us? otherwise it will be waiting for a while longer for John"]

regex_patron_m1 = fr'(?:^|[?!¿¡,;:([.])\s*(?:(\w+(?:\s+\w+)*)\s*)?{name}(?:\s*(\w+(?:\s+\w+)*))?\s*(?:$|[]?!,;:).])'
# r"\s*((?:\w\s*)+)\s*?" + name + r"\s*((?:\w\s*)+)\s*\??"
for input_sense_aux in input_sense_auxs:
    print(f'--- {input_sense_aux} ---')
    m1 = re.search(regex_patron_m1, input_sense_aux, re.IGNORECASE) #Con esto valido la regex haber si entra o no en el bloque de code
    if m1:
        something_1, something_2 = m1.groups()

        something_1 = something_1.strip() if something_1 else ""
        something_2 = something_2.strip() if something_2 else ""

        print(repr(something_1))
        print(repr(something_2))

Output:

--- These sound system are too many, I think John can help us, otherwise it will be waiting for a while longer ---
'I think'
'can help us'
--- These sound system are too many but I know that John can help us, otherwise it will be waiting for a while longer ---
'These sound system are too many but I know that'
'can help us'
--- These sound system are too many but I know that John can help us. otherwise it will be waiting for a while longer ---
'These sound system are too many but I know that'
'can help us'
--- Do you know if John with the others could come this afternoon? ---
'Do you know if'
'with the others could come this afternoon'
--- John can help us, otherwise it will be waiting for a while longer ---
''
'can help us'
--- Can you help us, otherwise it will be waiting for a while longer for John ---
'otherwise it will be waiting for a while longer for'
''
--- sorry! can you help us? otherwise it will be waiting for a while longer for John ---
'otherwise it will be waiting for a while longer for'
''

See the Python demo.

Details:

  • (?:^|[?!¿¡,;:([.])\s*(?:(\w+(?:\s+\w+)*)\s*)? - the prefix, the left-hand side part, that matches
    • (?:^|[?!¿¡,;:([.]) - either start of string or a char from the ?!¿¡,;:([. set
    • \s* - zero or more whitespaces
    • (?:(\w+(?:\s+\w+)*)\s*)? - an optional occurrence of
      • (\w+(?:\s+\w+)*) - Group 1: one or more word chars and then zero or more sequences of one or more whitespaces and one or more word chars
      • \s* - zero or more whitespaces
  • John - the name
  • (?:\s*(\w+(?:\s+\w+)*))?\s*(?:$|[]?!,;:).]) - the right-hand part:
    • \s* - zero or more whitespaces
    • (\w+(?:\s+\w+)*))? - Group 2: an optional sequence of one or more word chars and then zero or more occurrences of one or more whitespaces followed with one or more word chars
    • \s* - zero or more whitespaces
    • (?:$|[]?!,;:).]) - end of string or a char from the ]?!,;:). charset.

See the regex demo.



Answered By - Wiktor Stribiżew
Answer Checked By - Katrina (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing