PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Sunday, November 6, 2022

[FIXED] How to set capturing groups to extract and replace with re.sub()

 November 06, 2022     python, python-3.x, regex, regex-group, string     No comments   

Issue

import re


#input_text_substring = "a partir de las 04:00 am 2022-09-02 hasta a las 04:15 pm 2022-09-04"
input_text_substring = "a partir de las 04:00 am 2022-09-02 fuimos a caminar hasta las 10 montanas, de alli hasta a las 04:15 pm 2022-09-04"


time_in_numbers = r"(\d{2})[\s|]*(?::|)[\s|]*(\d{2})[\s|]*(?:am|pm)"
date_in_numbers = r"\d{4}[\s|]*-[\s|]*\d{2}[\s|]*-[\s|]*\d{2}"
some_text = r"(.*?)" #any substring or no character (the condition will be set by the rest of the large regex)

regexp1 = r"(?:desde|apartir de|a partir de)[\s|]*(?:a esa de|a eso de|a|)[\s|]*(?:las|la|)[\s|]*" + time_in_numbers + r"[\s|]*(?:de|)[\s|]*" + date_in_numbers + r"[\s|]*" + some_text + r"[\s|]*hasta[\s|]*(?:a esa de|a eso de|a|)[\s|]*(?:las|la|)[\s|]*" + time_in_numbers + r"[\s|]*(?:de|)[\s|]*" + date_in_numbers

#Here you should place the capture groups obtained from the previous pattern
replacement1 = r"[(\2 \1)to(\5 \4)][\3]" #I need fix that!!

input_text_substring = re.sub(regexp1, replacement1, input_text_substring)


print(repr(input_text_substring))

The output with this format '[(XXXX-XX-XX XX:XX (am|pm))to(XXXX-XX-XX XX:XX (am|pm))][some_text]', where X is any numeric character, that I need is something like this:

'[(2022-09-02 04:00 am)to(2022-09-04 04:15 pm)][fuimos a caminar hasta las 10 montanas, de alli]'

The problem I'm having is that it prints the original string directly without modifying it, since either this regex pattern doesn't work for this or the replacements with re.sub() are never done.


Solution

I didn't have a look if this pattern could be shortened or be more efficient, but a few small little chances were enough to get it working (at least for this example)

#input_text_substring = "a partir de las 04:00 am 2022-09-02 hasta a las 04:15 pm 2022-09-04"
input_text_substring = "a partir de las 04:00 am 2022-09-02 fuimos a caminar hasta las 10 montanas, de alli hasta a las 04:15 pm 2022-09-04"


time_in_numbers = r"(\d{2}[\s|]*(?::|)[\s|]*\d{2})[\s|]*(am|pm)"
date_in_numbers = r"(\d{4}[\s|]*-[\s|]*\d{2}[\s|]*-[\s|]*\d{2})"
some_text = r"(.*?)" #any substring or no character (the condition will be set by the rest of the large regex)

regexp1 = r"(?:desde|apartir de|a partir de)[\s|]*(?:a esa de|a eso de|a|)[\s|]*(?:las|la|)[\s|]*" + time_in_numbers + r"[\s|]*(?:de|)[\s|]*" + date_in_numbers + r"[\s|]*" + some_text + r"[\s|]*hasta[\s|]*(?:a esa de|a eso de|a|)[\s|]*(?:las|la|)[\s|]*" + time_in_numbers + r"[\s|]*(?:de|)[\s|]*" + date_in_numbers

replacement1 = r"[(\3 \1 \2)to(\7 \5 \6)][\4]" 

input_text_substring = re.sub(regexp1, replacement1, input_text_substring)


print(repr(input_text_substring))

Output:

'[(2022-09-02 04:00 am)to(2022-09-04 04:15 pm)][fuimos a caminar hasta las 10 montanas, de alli]'

Check out the pattern at Regex101
The changes I made:

  1. surround date_in_numbers with () to make it its own capturing group
  2. make (am|pm) a capturing group by removing (?:...)
  3. time_in_numbers- the two digits before and after the colon were its own capturing groups. Merged them together to be only one capturing group as a whole.
  4. Adjust the groups in replacement1


Answered By - Rabinzel
Answer Checked By - Clifford M. (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing