Issue
I am trying to figure out how to remove a word if it contains numbers. For example I have the sentence "Lorum ipsum: 7-Dt Dolor Sit, Amet 8-AM. Consectetur adipiscing". What I want to remove is "7-Dt" and "8-AM"
I have tried:
$str = 'Lorum ipsum: 7-Dt Dolor Sit, Amet 8-AM. Consectetur adipiscing';
$arr = preg_replace("/[^a-zA-Z\']/"," ",$str);
echo($arr);
// Outputs: Lorum ipsum Dt Dolor Sit Amet AM Consectetur adipiscing
With my solution it only removes the numbers but not the letters/words behind it.
Preferably I create a function so I can use it multiple times.
Solution
To remove words consisting of letters, digits or underscores, and containing a digit you may use
preg_replace("/[^\W\d]*\d\w*/", " ",$str)
To remove chunks of non-whitespace characters that contain a digit, use
preg_replace("/[^\s\d]*\d\S*/", " ",$str)
If a digit should be mixed with other chars you need to remove, use
preg_replace("/(?:[^\W\d]+\d|\d+[^\W\d])\w*/", " ",$str)
preg_replace("/(?:[^\s\d]+\d|\d+[^\s\d])\S*/", " ",$str)
In your concrete case, since you also want to keep the trailing punctuation, you may use
preg_replace("/(?:[^\s\d]+\d|\d+[^\s\d])\S*\b/", " ",$str)
See the PHP demo, the \b
word boundary will require the last char matches with \S*
to be a word char. In some cases, you would even need to make sure there is no word char after it, then you will replace \b
with \b(?!\w)
.
Pattern details
[^\s\d]*
- zero or more chars other than whitespace and digits\d
- a digit\S*
- 0 or more non-whitespace chars[^\W\d]*\d\w*
- matches 0 or more chars other than non-word and digit chars, then a digit, and then 0+ word chars ("word" char is a letter, digit or_
)(?:[^\s\d]+\d|\d+[^\s\d])
- matches either of the two alternatives:[^\s\d]+\d
- 1+ chars other than a whitespace and digit and then a digit|
- or\d+[^\s\d]
- 1+ digits and then a char other than a whitespace and digit
Answered By - Wiktor Stribiżew Answer Checked By - Willingham (PHPFixing Volunteer)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.