Issue
I have a text that i need to clean some characters. This characters are showed in the pictures i attached to the question. I want to replace them with white space x20
.
My attempt was to use preg_replace
.
$result = preg_replace("/[\xef\x82\xac\x09|\xef\x81\xa1\x09]/", "\x20", $string);
For a particular case this approach works, but for some cases it won't, because for example i had a text with a comma and it matched x82
and removed it from that text.
How could i write my regex to search exact this sequence ef 82 ac 09
, or the other one ef 81 a1 09
, and not for each pair separately like ef
82
ac
09
?
Solution
1.) You match any of the 6 different hex bytes or pipe character in the character class. Probably wanted to use a group (?:
...|
...)
for matching the different byte sequences.
2.) Also the byte sequences do not match the image. Seems like you messed up two bytes. The picture shows: ef 82 a1 09
and ef 81 ac 09
vs your try: \xef\x82\xac\x09
| \xef\x81\xa1\x09
3.) When testing your input sample
$str = "de la nouvelle; Fourniture $ Option :";
foreach(preg_split("//u", $str) AS $v) {
var_dump($v, bin2hex($v)); echo "\n";
}
it turned out, that 09
was too much. The characters to be removed are actually ef81ac
and ef82a1
.
So the right regex would be (?:\xef\x81\xac|\xef\x82\xa1)
$result = preg_replace("/(?:\xef\x81\xac|\xef\x82\xa1)/", "\x20", $string);
See test at eval.in
Answered By - Jonny 5 Answer Checked By - Gilberto Lyons (PHPFixing Admin)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.