Showing posts with label regex-group. Show all posts

Sunday, November 20, 2022

[FIXED] How to get part of a URL?

November 20, 2022 php, preg-replace, regex, regex-greedy, regex-group No comments

Issue

How can I remove all the parts from url except base url and first part. There is no certainty in number of parts. Base url is variable. I tried some regex but in vain.

$url =  http://www.example.com/part1/part2/part3/part4;
base_url = parse_url($url, PHP_URL_HOST); // Outputs www.example.com

$desired_output = http://www.example.com/part1;

Solution

Here we can use a preg_replace, with a simple expression, maybe similar to:

(.+\.com\/.+?\/).+

where we are capturing our desired output using this capturing group:

(.+\.com\/.+?\/)

and then we swipe to the end of string and replace it with $1.

Test

$re = '/(.+\.com\/.+?\/).+/m';
$str = 'http://www.example.com/part1/part2/part3/part4';
$subst = '$1';

$result = preg_replace($re, $subst, $str);

echo $result;

RegEx Circuit

jex.im visualizes regular expressions:

For all domains .com or not, we might be able to solve it with this expression:

(.+\..+?\/.+?\/).+

Test

$re = '/(.+\..+?\/.+?\/).+/m';
$str = 'http://www.example.com/part1/part2/part3/part4';
$subst = '$1';

$result = preg_replace($re, $subst, $str);

echo $result;

Demo

Answered By - Emma

Answer Checked By - Mary Flores (PHPFixing Volunteer)

[FIXED] How to remove 2 last characters with preg_replace?

November 20, 2022 php, preg-replace, regex, regex-group, substr No comments

Issue

I have a code like : 784XX . XX could be a character or number and I need an expression to remove the last 2 characters (XX) using ( and only ) preg_replace.

How can I do that?

For example, the output of :

782A3 is 782,

0012122 is 00121,

76542A is 7654,

333333CD is 333333,

Solution

You can use substr function.

But if you will use preg_replace you can do this:

$val = preg_replace('/[\w\d]{2}$/', '', $val);

Answered By - Vitalii

Answer Checked By - Terry (PHPFixing Volunteer)

[FIXED] How to loop through, match and replace?

November 20, 2022 php, preg-match-all, preg-replace, regex, regex-group No comments

Issue

I have multiple strings with same curly braces I want to replace them as dynamic if I get the count as 1 then need to replace the first occurrence, If count as 2 then replaces the second occurrence as so on until condition satisfies.

<?php

include_once("con.php");
$db = new Da();

$con = $db->con();

$String = "{{ONE}} {{TWO}} {{THREE}} {{FOUR}} {{FIVE}} {{SIX}}";

 $Count = 1;
 if(preg_match_all("/\{\{[^{}]+\}\}/", $lclString, $matches)) {

    foreach ($matches[0] as $match) {
        $Count++;
        $Query = "SELECT link FROM student WHERE linkVal = '".$match."'";
        $Result = $con->query($Query);

        if($row = $Result->fetch(PDO::FETCH_ASSOC)) {

            $NewValue = preg_replace("/\{\{[^{}]+\}\}/", $row["link"], $String);

        }
    }

        echo json_encode($NewValue);

 } 


?>

If first occurrence the {{ONE}} should replace with new value with $row["link"], Secondly replace {{TWO}} With New value so on.

Solution

Within the loop on each match, instead of using preg_replace, I suggest you to use str_replace:

if(preg_match_all("/\{\{[^{}]+\}\}/", $lclString, $matches)) {
    $NewValue = $String;
    foreach ($matches[0] as $match) {
        $Count++;
        $Query = "SELECT link FROM student WHERE linkVal = '".$match."'";
        $Result = $con->query($Query);

        if($row = $Result->fetch(PDO::FETCH_ASSOC)) {
            $NewValue = str_replace($match, $row["link"], $NewValue);
            //          ^^^^^^^^^^^^^^^^^^
        }
    }
    echo json_encode($NewValue);
}

Answered By - Toto

Answer Checked By - Clifford M. (PHPFixing Volunteer)

[FIXED] How can I create a VSCode snippet to automatically insert namespace of files inside my src folder?

November 06, 2022 php, regex, regex-group, visual-studio-code, vscode-snippets No comments

Issue

First off, I have the following psr-4 declaration for the src folder inside my composer.json file:

"autoload": {
        "psr-4": {
            "Src\\": "src/"
        }
    },

I would like to build a VSCode snippet that autogenerates the namespace of a new file that resides inside the src folder.

So far I have this inside php.json:

    "Namespace Src": {
        "prefix": "ns_src",
        "body": [
            "namespace ${RELATIVE_FILEPATH};",
        ],
        "description": "Namespace for file routes inside the src folder"
    },

Taking as an example the following file:

src/MyEntity/Application/MyUseCase.php

The desired result would be:

namespace Src\MyEntity\Application;

And right now this is what it returns to me:

namespace src/MyEntity/Application/MyUseCase.php;

So I need to tackle:

Upper case of src.
Replacement of forward slashes / into back slashes \.
Removal of everything that is after the last forward slash /.

I know there has to be a way to do this with regex. I have read this similar problem (VSCODE snippet PHP to fill namespace automatically) but I haven't quite got the hang of it after reading it. And I think the upper case problem could maybe be solved with \F as in here: https://www.regular-expressions.info/replacecase.html#:~:text=In%20the%20regex%20%5CU%5Cw,is%20not%20a%20word%20character.

Is this the right approach? Could you give me any tips on this problem?

Thank you so much.

Solution

You can use

"Namespace Src": {
    "prefix": "ns_src",
    "body": [
        "namespace ${RELATIVE_FILEPATH/^(?:.*[\\\\\\/])?(src)(?=[\\\\\\/])|[\\\\\\/][^\\\\\\/]*$|([\\\\\\/])/${1:+Src}${2:+\\\\}/g};",
     ],
    "description": "Namespace for file routes inside the src folder"
},

See the regex demo. Details:

^(?:.*[\\\/])?(src)(?=[\\\/]) - start of string (^), then an optional occurrence of any zero or more chars as many as possible (.*) and a \ or / char ([\\\/]), and then src captured into Group 1, that is immediately followed with / or \
| - or
[\\\/][^\\\/]*$ - a \ or / char and then zero or more chars other than / and \ till end of string
| - or
([\\\/]) - Group 2: a \ or / char.

The ${1:+Src}${2:+\\\\} replacement replaces in this way:

${1:+Src} - if Group 1 matched, replace with Src
${2:+\\\\} - if Group 2 matched, replace with \.

Answered By - Wiktor Stribiżew

Answer Checked By - Candace Johnson (PHPFixing Volunteer)

[FIXED] How to get the match for the following use cases from this regex pattern?

November 06, 2022 javascript, regex, regex-group, regex-lookarounds No comments

Issue

I have the Regex to match the following patterns,

Link to the Use case: https://regex101.com/r/wnp1k4/1

How can i get the match for the same by modifying the Regex? please help.

(?:^|(?<=[\D;a-zA-Z(),.:;?!"'`>]))(?!000|666|9)(?<![Oo][Rr][Dd][Ee][Rr].)(?<![Oo][Rr][Dd][Ee][Rr]..)(?<![Oo][Rr][Dd][Ee][Rr]...)(?<![Oo][Rr][Dd][Ee][Rr].[Nn][Uu][Mm][Bb][Ee][Rr].)(?<![Oo][Rr][Dd][Ee][Rr].[Nn][Uu][Mm][Bb][Ee][Rr]..)(?<![Oo][Rr][Dd][Ee][Rr].[Nn][Uu][Mm][Bb][Ee][Rr]...)(?<![Xx])\d{3}[ -.=\n\r]{0,10}(?!00)\d{2}[ -.=\n\r]{0,10}(?!0000)\d{4}(?:$|(?=[\Da-zA-Z(),.:;?!"'`<= ]))

Order numbers should not get detected if 'X or x' precedes the number. so this is working fine.

x123456789

X123456789

x123-456-789

X123-456-789

123-456-789

Need to modify the regex pattern to get the match for the list of ordernumbers written like below...along with the word (order number) should be case insensitive.

ordernumber123-456-789

order number123-456789

order number 123456789

123-456789

123456789

ordernumber-123456787

ordernumber - 123456789

ordernumber #123456789

ordernumber anysplcharacter123456789

Solution

Converting my comment to answer so that solution is easy to find for future visitors.

You may use this regex:

(?<!\d)(?!000|666|9)(order\W?number)?\W*(?<!x)\d{3}[ .=-]{0,10}(?!000)\d{3}[ .=-]{0,10}(?!000)\d{3}

RegEx Demo

RegEx Details:

(?<!\d): Make sure that previous character is not a digit
(?!000|666|9): Make sure that we don't have 000 or 666 or 9 at the next position
(order\W?number)?: Match order and number optionally separated with a non-word character
\W*: Match 0 or more non-word characters
(?<!x): Make sure previous character is not x
\d{3}: Match 3 digits
[ .=-]{0,10}: Match 0 to 10 instances of given separators
(?!000): Make sure we don't have 000 at next position
\d{3}: Match 3 digits
[ .=-]{0,10}: Match 0 to 10 instances of given separators
(?!000): Make sure we don't have 000 at next position
\d{3}: Match 3 digits

Answered By - anubhava

Answer Checked By - Mildred Charles (PHPFixing Admin)

[FIXED] How does regex capture groups work on this particular log statement

November 06, 2022 regex, regex-group No comments

Issue

I have constructed the regex for the below log statement and trying to add capture groups so that I can assign each group to a variable and print them. I am getting null values when I added parentheses as capture blocks. Is there away I can add capture groups to below regex.

Log:

2022-02-09 10:00:52,785 EST|2022-02-09 10:00:52.785 CST 48767a165b22 [INFO ] CorrelationId=d0b0005a-56aa-4e23-a00e-7b22bc41d001 ApplicationName=ATSystems [http-nio-8080-exec-8] com.jivasciences.jrx.mo.bundle.app.transform.handler.bundleHandler - ReceivedDate=2022-02-09 10:00:56

from which I want to capture the following as groups:

48767a165b22
INFO
d0b0005a-56aa-4e23-a00e-7b22bc41d001
ATSystems
com.jivasciences.jrx.mo.bundle.app.transform.handler.bundleHandler
2022-02-09 10:00:56

My regex so far:

([0-9]+(-[0-9]+)+) [0-9]{2}:[0-9]{2}:[0-9]{2}(\.[0-9]{1,3})?,[0-9]+ [a-zA-Z]+|[0-9]{4}-[0-9]{2}-[0-9]{2} (([+-]?(?=\.\d|\d)(?:\d+)?(?:\.?\d*))(?:[eE]([+-]?\d+))?(:([+-]?(?=\.\d|\d)(?:\d+)?(?:\.?\d*))(?:[eE]([+-]?\d+))?)+) [a-zA-Z]+ ([0-9]+([a-zA-Z]+[0-9]+)+) \[[^\]]*] CorrelationId=[{]?[0-9a-fA-F]{8}-([0-9a-fA-F]{4}-){3}[0-9a-fA-F]{12}[}]? ApplicationName=[a-zA-Z]+ \[[^\]]*] [a-zA-Z]+(\.[a-zA-Z]+)+ - ReceivedDate=([0-9]+(-[0-9]+)+) ([0-9]+(:[0-9]+)+)

Solution

Try this:

[0-9 ,.:-]+ [A-Z]{3}\|[0-9 ,.:-]+ [A-Z]{3} ([0-9a-f]+) \[(\w+) *].*?=([0-9a-f-]+).*?=(\w+).*?\] ([\w.]+).*?=([0-9-]+ [0-9:]+)

See live demo.

[FIXED] Why does this regex execution not return the match

November 06, 2022 javascript, regex, regex-group No comments

Issue

My regex is:

let a = new RegExp("(?:https?:)?\/\/(?:www\.)?(?:facebook|fb)\.com\/(?<profile>(?![A-z]+\.php)(?!marketplace|gaming|watch|me|messages|help|search|groups)[\w.\-]+)\/?", "g")

It's basically a modification of the one seen here for facebook to extract the username from a facebook url.

My test string is https://facebook.com/peterparker and my code is:

a.exec("https://facebook.com/peterparker")

When I try this in RegExr, it works fine. It shows the correct group captured (peterparker).

Yet, when I try the same code in Google Chrome's console, the code returns null:

Why doesn't it show up in the chrome console?

Solution

Since you're creating your regex from a string, you have to escape your backslashes.

let a = new RegExp("(?:https?:)?\/\/(?:www\.)?(?:facebook|fb)\\.com\/(?<profile>(?![A-z]+\\.php)(?!marketplace|gaming|watch|me|messages|help|search|groups)[\\w.\\-]+)\\/?", "g")
console.log(a.exec("https://facebook.com/peterparker"))

Creating it inline does not have this problem.

let a = /(?:https?:)?\/\/(?:www\.)?(?:facebook|fb)\.com\/(?<profile>(?![A-z]+\.php)(?!marketplace|gaming|watch|me|messages|help|search|groups)[\w.\-]+)\/?/g
console.log(a.exec("https://facebook.com/peterparker"))

Answered By - Liftoff

Answer Checked By - David Goodson (PHPFixing Volunteer)

[FIXED] How can I modify my regex so that it includes 1171

November 06, 2022 regex, regex-group No comments

Issue

https://regex101.com/r/QTdaAT/1

My current regex matches all numbers that have 117 except for 1171. I am trying to modify the regex so that it includes 1171 1711 7111. I included a link that provides examples of the matches that are made and missed with the regex I am using. Any help will be greatly appreciated.

"\b(?=[02-9]1[02-9]1[02-9]\b)(?=\d{3})\d7\d*\b"

example:

matches 1172, 1173 1174

Needs to include 1171.

Solution

To match all numbers that contain at least two 1 and one 7

Then this simplified regex pattern will match them

\b(?=\d*1\d*1)(?=\d*7)\d+\b

The first lookahead (?=\d*1\d*1) checks for two 1 digits.
The second lookahead (?=\d*7) checks for a 7 digit.

Answered By - LukStorms

Answer Checked By - Senaida (PHPFixing Volunteer)

[FIXED] Why do I get the first capture group only?

November 06, 2022 perl, regex, regex-group No comments

Issue

(https://stackoverflow.com/a/2304626/6607497 and https://stackoverflow.com/a/37004214/6607497 did not help me)

Analyzing a problem with /proc/stat in Linux I started to write a small utility, but I can't get the capture groups the way I wanted. Here is the code:

#!/usr/bin/perl
use strict;
use warnings;

if (open(my $fh, '<', my $file = '/proc/stat')) {
    while (<$fh>) {
        if (my ($cpu, @vals) = /^cpu(\d*)(?:\s+(\d+))+$/) {
            print "$cpu $#vals\n";
        }
    }
    close($fh);
} else {
    die "$file: $!\n";
}

For example with these input lines I get the output:

> cat /proc/stat
cpu  2709779 13999 551920 11622773 135610 0 194680 0 0 0
cpu0 677679 3082 124900 11507188 134042 0 164081 0 0 0
cpu1 775182 3866 147044 38910 135 0 15026 0 0 0
cpu2 704411 3024 143057 37674 1272 0 8403 0 0 0
cpu3 552506 4025 136918 38999 160 0 7169 0 0 0
intr 176332106  ...

So the match actually works, but I don't get the capture groups into @vals (perls 5.18.2 and 5.26.1).

Solution

Replacing

    while (<$fh>) {
        if (my ($cpu, @vals) = /^cpu(\d*)(?:\s+(\d+))+$/) {

with

    while (<$fh>) {
        my @vals;
        if (my ($cpu) = /^cpu(\d*)(?:\s+(\d+)(?{ push(@vals, $^N) }))+$/) {

does what I wanted (requires perl 5.8 or newer).

Answered By - U. Windl

Answer Checked By - Terry (PHPFixing Volunteer)

[FIXED] How do I replace the last character of the selected regex?

November 06, 2022 javascript, regex, regex-group No comments

Issue

I want this string {Rotation:[45f,90f],lvl:10s} to turn into {Rotation:[45,90],lvl:10}.

I've tried this:

const bar = `{Rotation:[45f,90f],lvl:10s}`
const regex = /(\d)\w+/g
console.log(bar.replace(regex, '$&'.substring(0, -1)))

I've also tried to just select the letter at the end using $ but I can't seem to get it right.

Solution

You can use

bar.replace(/(\d+)[a-z]\b/gi, '$1')

See the regex demo. Here,

(\d+) - captures one or more digits into Group 1
[a-z] - matches any letter
\b - at the word boundary, ie. at the end of the word
gi - all occurrences, case insensitive

The replacement is Group 1 value, $1.

See the JavaScript demo:

const bar = `{Rotation:[45f,90f],lvl:10s}`
const regex = /(\d+)[a-z]\b/gi
console.log(bar.replace(regex, '$1'))

Answered By - Wiktor Stribiżew

Answer Checked By - Candace Johnson (PHPFixing Volunteer)

[FIXED] How to write a regex capture group which matches a character 3 or 4 times before a delimiter?

November 06, 2022 java, regex, regex-group No comments

Issue

I'm trying to write a regex that splits elements out according to a delimiter. The regex also needs to ensure there are ideally 4, but at least 3 colons : in each match.

Here's an example string:

"Checkers, etc:Blue::C, Backgammon, I say:Green::Pepsi:P, Chess, misc:White:Coke:Florida:A, :::U"

From this, there should be 4 matches:

Checkers, etc:Blue::C
Backgammon, I say:Green::Pepsi:P
Chess, misc:White:Coke:Florida:A
:::U

Here's what I've tried so far:

([^:]*:[^:]*){3,4}(?:, )

Regex 101 at: https://regex101.com/r/O8iacP/8

I tried setting up a non-capturing group for ,

Then I tried matching a group of any character that's not a :, a :, and any character that's not a : 3 or 4 times.

The code I'm using to iterate over these groups is:

String line = "Checkers, etc:Blue::C, Backgammon, I say::Pepsi:P, Chess:White:Coke:Florida:A, :::U";
String pattern = "([^:]*:[^:]*){3,4}(?:, )";

  // Create a Pattern object
  Pattern r = Pattern.compile(pattern);

  // Now create matcher object.
  Matcher matcher = r.matcher(line);
  while (matcher.find()) {
        System.out.println(matcher.group(1));
    }

Any help is appreciated!

Edit

Using @Casimir's regex, it's working. I had to change the above code to use group(0) like this:

String line = "Checkers, etc:Blue::C, Backgammon, I say::Pepsi:P, Chess:White:Coke:Florida:A, :::U";
String pattern = "(?![\\s,])(?:[^:]*:){3}\\S*(?![^,])";

// Create a Pattern object
Pattern r = Pattern.compile(pattern);

// Now create matcher object.
Matcher matcher = r.matcher(line);
while (matcher.find()) {
    System.out.println(matcher.group(0));
}

Now prints:

Checkers, etc:Blue::C
Backgammon, I say::Pepsi:P
Chess:White:Coke:Florida:A
:::U

Thanks again!

Solution

I suggest this pattern:

(?![\\s,])(?:[^:]*:){3}\\S*(?![^,])

Negative lookaheads avoid to match leading or trailing delimiters. The second one in particular forces the match to be followed by the delimiter or the end of the string (not followed by a character that isn't a comma).

demo

Note that the pattern doesn't have capture groups, so the result is the whole match (or group 0).

Answered By - Casimir et Hippolyte

Answer Checked By - Katrina (PHPFixing Volunteer)

[FIXED] What does the expression : Select `(column1|column2|column3)?+.+` from Table in SQL means?

November 06, 2022 apache-spark-sql, pyspark, regex-group, sql No comments

Issue

I am trying to convert a SQL Code into Pyspark SQL. While selecting the columns from a table , the Select Statement has something as below :

Select a.`(column1|column2|column3)?+.+`,trim(column c)  from Table a;

I would like to understand what

a.`(column1|column2|column3)?+.+`

expression resolves to and what it actually implies? How to address this while converting the sql into pyspark?

Solution

That is a way of selecting certain column names using regexps. That regex matches (and excludes) the columns column1, column2 or column3.

It is the Spark's equivalent of the Hive's Quoted Identifiers. See also Spark's documentation.

Be aware that, for enabling this behavior, it is first necessary to run the following command:

spark.sql("SET spark.sql.parser.quotedRegexColumnNames=true").show(false)

Answered By - horcrux

Answer Checked By - Candace Johnson (PHPFixing Volunteer)

[FIXED] How to select only numbers/digits from a given string and skip text using python regex?

November 06, 2022 pandas, python, regex, regex-group No comments

Issue

Given Strings:

57 years, 67 daysApr 30, 1789

61 years, 125 daysMar 4, 1797

57 years, 325 daysMar 4, 1801

57 years, 353 daysMar 4, 1809

58 years, 310 daysMar 4, 1817

In regex101:

Pattern = (?P<Years>[\d]{1,2}) years, (?P<Days>[\d]{1,3}) days(?P<Month>[\w]{3} [\d]{1,2}), (?P<Year>[\d]{4})

Output: Output of Regex Pattern

In Python(IDE : Jupyter Notebook) : Python Output Here it is showing only nan values in dataframe, how to solve this ?

Solution

FYI, your code ran perfectly for me, maybe you have some whitespace issues in your dataframe:

import pandas as pd
import numpy as np

from io import StringIO

st = StringIO("""57 years, 67 daysApr 30, 1789

61 years, 125 daysMar 4, 1797

57 years, 325 daysMar 4, 1801

57 years, 353 daysMar 4, 1809

58 years, 310 daysMar 4, 1817""")

df = pd.read_csv(st, sep='\s\s\s+', header=None, engine='python')

Pattern = '(?P<Years>[\d]{1,2}) years, (?P<Days>[\d]{1,3}) days(?P<Month>[\w]{3} [\d]{1,2}), (?P<Year>[\d]{4})'

df[0].str.extract(Pattern)

Output:

  Years Days   Month  Year
0    57   67  Apr 30  1789
1    61  125   Mar 4  1797
2    57  325   Mar 4  1801
3    57  353   Mar 4  1809
4    58  310   Mar 4  1817

Answered By - Scott Boston

Answer Checked By - Dawn Plyler (PHPFixing Volunteer)

[FIXED] How to get all occurances of group with regex in Python?

November 06, 2022 python, regex, regex-group No comments

Issue

From code:

myStr = "one two four five"
myPattern = r'\w*\s(two\s|three\s|four\s)*\w*'
matched = re.search(myPattern, myStr)

if matched:
    res = matched.group(1)
    print(res)

I get "four ", but I want to get ["two ", "four "]

How can I do it?

Solution

As all your surrounding word characters are optional, you can use re.findall and assert a whiteapace char to the left and match the trailing one:

(?<=\s)(?:two|three|four)\s

Regex demo

import re

myStr = "one two four five"
pattern="(?<=\s)(?:two|three|four)\s"

print(re.findall(pattern, myStr))

Output

['two ', 'four ']

Answered By - The fourth bird

Answer Checked By - Clifford M. (PHPFixing Volunteer)

[FIXED] How do I create a regex that continues to match only if there is a comma or " and " after the last one?

November 06, 2022 python, python-re, regex, regex-group, string No comments

Issue

What this code does is extract a verb and the information that follows after it. Then create a .txt file with the name of the verb and write the information inside.

I have to run to win the race

import re, os

regex_patron_01 = r"\s*\¿?(?:have to|haveto|must to|mustto)\s*((?:\w\s*)+)\?"
n = re.search(regex_patron_01, input_text_to_check, re.IGNORECASE)

if n:
    word, = n.groups()
    try:
        word = word.strip()
    except AttributeError:
        print("no verb specified!!!")

    regex_patron_01 = r"\s*((?:\w+)?) \s*((?:\w\s*)+)\s*\??"
    n = re.search(regex_patron_01, word, re.IGNORECASE)
    if n:
        #This will have to be repeated for all the verbs that are present in the sentence.
        verb, order_to_remember = n.groups()
        verb = verb.strip()
        order_to_remember = order_to_remember.strip()

        target_file = target_file + verb + ".txt"
    
        with open(target_file, 'w') as f:
            f.write(order_to_remember)

This make a "run.txt", and white in this file : "to win the race"

but now I need that in addition to that, the regex can be extended to the possibility that there is more than one verb, for example

I have to run, jump and hurry to win the race

In that case you should create 3 files, one with the name "run.txt", another with the name "jump.txt", and another with the name "hurry.txt", and in each of them write the line "to win the race.

The problem I'm having is how to make it repeat the process whenever a comma (,) or an "and" follows a verb.

Other example:

I have to dance and sing more to be a pop star

And make 2 files, "dance.txt" and "sing.txt", and both with the line "more to be a pop star"

Solution

I simplified the search and conditions somewhat and did this:

def fun(x):
    match=re.search(r"(?<=have to) ([\w\s,]+) (to [\w\s]+)",x)
    if match:
        for i in re.split(',|and',match[1]):
            with open(f'{i}.txt','w') as file:
                file.write(match[2])

If there is a match, the function will create one or more 'txt' files, with the caught verbs as its names. If there is no match - it'll do nothing.

The regex I used is looking for two groups. The first must be preceded by "have to" and may contain words and whitespaces separated by comma or "and". The second group should start with "to " and can contain only words and whitespaces.

match[0] is a whole match
match[1] is the first group
match[2] is the second group

The 'for' loop iterates through the list obtained by separating the first group using comma and 'and' as separators. At each iteration a file with the name from this list is created.

Answered By - Иван Балван

Answer Checked By - Robin (PHPFixing Admin)

[FIXED] How To Capture Positive Lookahead

November 06, 2022 regex, regex-group, regex-lookarounds No comments

Issue

I am trying to figure out how to capture the positive lookahead group in the following regex:

(((Initial commit)|(Merge [^\r\n]+)|(((build|chore|ci|docs|feat|fix|perf|refactor|revert|style|test|BREAKING CHANGE)(\(\w+\))?!?: ([\w ]+))(\r|\n|\r\n){0,2}((?:\w|\s|\r|\n|\r\n)+)(?=(((\r|\n|\r\n){2}([\w-]+): (\w+))|$)))))

My sample dataset I am trying to match with is as follows:

build(Breaking): la asdf asdf asdf

asdfasdf asdf asdf
asdf
asdf
asdf

asdf
asdf

asdf

aef asdf asdf

build(Breaking): la asdf asdf asdf

asdfasdf asdf asdf
asdf
asdf
asdf

asdf
asdf

asdf

aef asdf asdf

asdf-asdf: asdf

I successfully capture all fields preceeding the positive lookahead of asdf-asdf: asdf, whether or not it is there, but for some reason, even if the positive look-ahead finds the asdf-asdf: asdf match, the capturing group doesn't seem to capture the asdf-asdf: asdf match.

What should I be doing in order to accomplish this goal, or what am I doing wrong?

Solution

Your regex string is very long, but your problem is essentially that your positive lookahead is not being captured, because positive lookaheads do not capture itself. A simpler example is bad (?=tea) which will not capture bad tea and only bad . However if you do bad (?=(tea))\1 it will indeed capture the entire string. Your correct regex string is

(((Initial commit)|(Merge [^\r\n]+)|(((build|chore|ci|docs|feat|fix|perf|refactor|revert|style|test|BREAKING CHANGE)(\(\w+\))?!?: ([\w ]+))(\r|\n|\r\n){0,2}((?:\w|\s|\r|\n|\r\n)+)(?=(((\r|\n|\r\n){2}([\w-]+): (\w+))|$))\12)))

You simply add \12 (or just replicate whatever string is inside the positive lookahead) after the lookahead itself.

Answered By - Subienay Ganesh

Answer Checked By - Mildred Charles (PHPFixing Admin)

[FIXED] How to capture a regex group after the specific word?

November 06, 2022 c#, regex, regex-group No comments

Issue

I am stuck with regex again. I have this example:

 3, 
SomethingThatShouldntMatch: 3, 5
Divisions: 1, 2, 3, 5, 13
Divisions: 33
Divisions: 3, 22
Divisions: 4, 55
Divisions: 10, 31

I want to find lines beginning with "Divisions:" and only in these lines capture the first specified number.

For example, if I put number "3" to search pattern it should return me only lines beginning with "Divisions:" and number 3 somewhere after the word.

Not 13, not 33, not 30. Only 3 or nothing. This pattern is planned to use in Regex.Replace().

The main problem I've ran into is that it won't continue searching if the specified number is not first after the word. "Divisions: 1, 2, 3" won't match but it should.

However, "Divisions: 3, 5" matches. I've tried about a dozen of expressions but I can't figure it out. What am I doing wrong? Why it doesn't search after the first comma?

[\n\r].*Divisions:?(\D3\D)?

https://regex101.com/r/ydqLGB/1

Solution

The pattern [\n\r].*Divisions:?(\D3\D)? does not match because it matches Divisions followed by an optional colon and then directly after is optionally matches a 3 char surrounded by non digits.

The only string where that matches in the examples is Divisions: 3, as the 3 is directly following and the \D matches the space before it and the comma after it.

Note that \D actually matches a character and the [\n\r] matches a mandatory newline.

If a single line like Divisions: 3 should also match, you might write the pattern as:

^Divisions:[ ,0-9]*\b(3)\b

^ Start of string
Divisions: Match literally
[ ,0-9]* Match optional spaces, comma or digit 0-9
\b(3)\b Capture a 3 char between word boundaries

See a regex demo

If the colon is optional:

^Divisions:?[ ,0-9]*\b(3)\b

Answered By - The fourth bird

Answer Checked By - Pedro (PHPFixing Volunteer)

[FIXED] How to build this regex so that it extracts a word that starts with a capital letter if only if it appears after a previous pattern?

November 06, 2022 python, python-3.x, regex, regex-group, string No comments

Issue

I need a regex that extracts all the names (we will consider that they are all the words that start with a capital letter and respect having certain conditions prior to their appearance within the sentence) that are in a sentence. This must be done respecting the pattern that I clarify below, also extracting the content before and after this name, so that it can be printed next to the name that was extracted within that sequence or pattern.

This is the pseudo-regex pattern that I need:

the beginning of the input sentence or (,|;|.|y)

associated_sense_1: "some character string (alphanumeric)" or "nothing"

(con |juntos a |junto a |en compania de )

identified_person: "some word that starts with a capital letter (the name that I must extract)" and it ends when the regex find one or more space

associated_sense_2: "some character string (alphanumeric)" or "nothing"

the end o the input sentence or (,|;|.|y |con |juntos a |junto a |en compania de )

the (,|;|.|y) are just person connectors that are used to build a regex pattern, but they do not provide information beyond indicating the sequence of belonging, then they can be eliminated with a .replace( , "")

And with this regex I need extract this 3 string groups

associated_sense_1

identified_person

associated_sense_2


associated_sense = associated_sense_1 + " " + associated_sense_2

This is the proto-code:

import re

#Example 1
sense = "puede ser peligroso ir solas, quizas sea mejor ir con Adrian y seguro que luego podemos esperar por Melisa, Marcos y Lucy en la parada"
#Example 2
#sense = "Adrian ya esta en la parada; y alli probablemente esten Lucy y May en la parada esperandonos"

person_identify_pattern = r"\s*(con |por |, y |, |,y |y )\s*[A-Z][^A-Z]*"
#person_identify_pattern = r"\s*(con |por |, y |, |,y |y )\s*[^A-Z]*"


for identified_person in re.split(person_identify_pattern, sense):
    identified_person = identified_person.strip()
    if identified_person:
        try:
            print(f"Write '{associated_sense}' to {identified_person}.txt")
        except:
            associated_sense = identified_person

The wrong output I get...

Write 'puede ser peligroso ir solas, quizas sea mejor ir' to con.txt
Write 'puede ser peligroso ir solas, quizas sea mejor ir' to Melisa.txt
Write 'puede ser peligroso ir solas, quizas sea mejor ir' to ,.txt
Write 'puede ser peligroso ir solas, quizas sea mejor ir' to Lucy en la parada.txt

Correct output for example 1:

Write 'quizas sea mejor ir con' to Adrian.txt
Write 'y seguro que luego podemos esperar por en la parada' to Melisa.txt
Write 'y seguro que luego podemos esperar por en la parada' to Marcos.txt
Write 'y seguro que luego podemos esperar por en la parada' to Lucy.txt

Correct output for example 2:

Write 'ya esta en la parada' to Adrian.txt
Write 'alli probablemente esten en la parada esperandonos' to Lucy.txt
Write 'alli probablemente esten en la parada esperandonos' to May.txt

I was trying with this other regex but I still have problems with this code:

import re

sense = "puede ser peligroso ir solas, quizas sea mejor ir con Adrian y seguro que luego podemos esperar por Melisa, Marcos y Lucy en la parada"

person_identify_pattern = r"\s*(?:,|;|.|y |con |juntos a |junto a |en compania de |)\s*((?:\w\s*)+)\s*(?<=con|por|a, | y )\s*([A-Z].*?\b)\s*((?:\w\s*)+)\s*(?:,|;|.|y |con |juntos a |junto a |en compania de )\s*"

for m in re.split(person_identify_pattern, sense):
    m = m.strip()
    if m:
        try:
            print(f"Write '{content}' to {m}.txt")
        except:
            content = m

But I keep getting this wrong output

Write 'puede ser peligroso ir solas' to quizas sea mejor ir con Adrian y seguro que luego podemos esperar por.txt
Write 'puede ser peligroso ir solas' to Melisa,.txt
Write 'puede ser peligroso ir solas' to Marcos y Lucy en la parad.txt

Solution

import re

sense = "puede ser peligroso ir solas, quizas sea mejor ir con Adrian y seguro que luego podemos esperar por Melisa, Marcos y Lucy en la parada"
if match := re.findall(r"(?<=con|por|a, | y )\s*([A-Z].*?\b)", sense):
    print(match)

it result = ['Adrian', 'Melisa', 'Marcos', 'Lucy']

Answered By - RedApple

Answer Checked By - Terry (PHPFixing Volunteer)

[FIXED] How do I group every each 'HELLO THERE WORLD' lines?

November 06, 2022 python, regex, regex-group No comments

Issue

I want to capture the 'HELLO THERE WORLD' lines, but use the start and the end lines. However, it's just taking the last line.

regex: start\n(((\w+) (.+) (.+))\n)+end

examples:

abcd 123 123
start
abcd 123 123
abcd 123 123
abcd 123 123
end
abcd 123 123

In the examples I want all the text between the start and the end to be In 3 groups for each line(group1=abcd,group2=123,group3=123) like that:

Solution

(?s)(?!.*?start)^(\w+)\s(\w+)\s(\w+)(?=.*?end)

https://regex101.com/r/fDcMJd/1

Answered By - ZygD

Answer Checked By - Marie Seifert (PHPFixing Admin)

[FIXED] How to generalize this regex so that it starts capturing substrings at the beginning of a string or if it is followed by some other word?

November 06, 2022 python, python-3.x, regex, regex-group, string No comments

Issue

import re

name = "John"

#In these examples it works fine
input_sense_aux = "These sound system are too many, I think John can help us, otherwise it will be waiting for a while longer"
#input_sense_aux = "These sound system are too many but I know that John can help us, otherwise it will be waiting for a while longer"
#input_sense_aux = "These sound system are too many but I know that John can help us. otherwise it will be waiting for a while longer"
#input_sense_aux = "Do you know if John with the others could come this afternoon?"

#In these examples it does not work well
#input_sense_aux = "John can help us, otherwise it will be waiting for a while longer"
#input_sense_aux = "Can you help us, otherwise it will be waiting for a while longer for John"
#input_sense_aux = "sorry! can you help us? otherwise it will be waiting for a while longer for John"



regex_patron_m1 = r"\s*((?:\w\s*)+)\s*?" + name + r"\s*((?:\w\s*)+)\s*\??"
m1 = re.search(regex_patron_m1, input_sense_aux, re.IGNORECASE) #Con esto valido la regex haber si entra o no en el bloque de code
if m1:
    something_1, something_2 = m1.groups()

    something_1 = something_1.strip()
    something_2 = something_2.strip()

    print(repr(something_1))
    print(repr(something_2))

I need the regex to grab the content before "John" like this:

(start of sentence|¿|¡|,|;|:|(|[|.) \s* "content for something_1" \s* John

And then:

John \s* "content for something_2" \s* (end of sentence|?|!|,|;|:|)|]|.)

In the fists examples, the regex works fine:

'these teams are too many but I know that'
'can help us'

'Do you know if'
'with the others could come this afternoon'

But with the cases of the last 3 examples the regex does not return anything

And I need help to be able to generalize my regex to all these cases and at the same time respect the conditions in which it must extract the content of something_1 and something_2

For the 3 last examples, the expected results are:

''
' can help us'

' otherwise it will be waiting for a while longer for '
''

' otherwise it will be waiting for a while longer for '
''

Solution

You can use

import re

name = "John"

input_sense_auxs = [
    "These sound system are too many, I think John can help us, otherwise it will be waiting for a while longer",
    "These sound system are too many but I know that John can help us, otherwise it will be waiting for a while longer",
    "These sound system are too many but I know that John can help us. otherwise it will be waiting for a while longer",
    "Do you know if John with the others could come this afternoon?",

    "John can help us, otherwise it will be waiting for a while longer",
    "Can you help us, otherwise it will be waiting for a while longer for John",
    "sorry! can you help us? otherwise it will be waiting for a while longer for John"]

regex_patron_m1 = fr'(?:^|[?!¿¡,;:([.])\s*(?:(\w+(?:\s+\w+)*)\s*)?{name}(?:\s*(\w+(?:\s+\w+)*))?\s*(?:$|[]?!,;:).])'
# r"\s*((?:\w\s*)+)\s*?" + name + r"\s*((?:\w\s*)+)\s*\??"
for input_sense_aux in input_sense_auxs:
    print(f'--- {input_sense_aux} ---')
    m1 = re.search(regex_patron_m1, input_sense_aux, re.IGNORECASE) #Con esto valido la regex haber si entra o no en el bloque de code
    if m1:
        something_1, something_2 = m1.groups()

        something_1 = something_1.strip() if something_1 else ""
        something_2 = something_2.strip() if something_2 else ""

        print(repr(something_1))
        print(repr(something_2))

Output:

--- These sound system are too many, I think John can help us, otherwise it will be waiting for a while longer ---
'I think'
'can help us'
--- These sound system are too many but I know that John can help us, otherwise it will be waiting for a while longer ---
'These sound system are too many but I know that'
'can help us'
--- These sound system are too many but I know that John can help us. otherwise it will be waiting for a while longer ---
'These sound system are too many but I know that'
'can help us'
--- Do you know if John with the others could come this afternoon? ---
'Do you know if'
'with the others could come this afternoon'
--- John can help us, otherwise it will be waiting for a while longer ---
''
'can help us'
--- Can you help us, otherwise it will be waiting for a while longer for John ---
'otherwise it will be waiting for a while longer for'
''
--- sorry! can you help us? otherwise it will be waiting for a while longer for John ---
'otherwise it will be waiting for a while longer for'
''

See the Python demo.

Details:

(?:^|[?!¿¡,;:([.])\s*(?:(\w+(?:\s+\w+)*)\s*)? - the prefix, the left-hand side part, that matches
- (?:^|[?!¿¡,;:([.]) - either start of string or a char from the ?!¿¡,;:([. set
- \s* - zero or more whitespaces
- (?:(\w+(?:\s+\w+)*)\s*)? - an optional occurrence of
  - (\w+(?:\s+\w+)*) - Group 1: one or more word chars and then zero or more sequences of one or more whitespaces and one or more word chars
  - \s* - zero or more whitespaces
John - the name
(?:\s*(\w+(?:\s+\w+)*))?\s*(?:$|[]?!,;:).]) - the right-hand part:
- \s* - zero or more whitespaces
- (\w+(?:\s+\w+)*))? - Group 2: an optional sequence of one or more word chars and then zero or more occurrences of one or more whitespaces followed with one or more word chars
- \s* - zero or more whitespaces
- (?:$|[]?!,;:).]) - end of string or a char from the ]?!,;:). charset.

See the regex demo.

Answered By - Wiktor Stribiżew

Answer Checked By - Katrina (PHPFixing Volunteer)

Sunday, November 20, 2022

Issue

Solution

Test

RegEx Circuit

Test

Issue

Solution

Issue

Solution

Sunday, November 6, 2022

Issue

Solution

Issue

Solution

Issue

Solution

Issue

Solution

Issue

Solution

Issue

Solution

Issue

Solution

Issue

Solution

Issue

Solution

Issue

Solution

Issue

Solution

Issue

Solution

Issue

Solution

Issue

Solution

Issue

Solution

Issue

Solution

Issue

Solution

Total Pageviews

Featured Post

Subscribe To