Issue
I would like to strip Cantillation from Hebrew strings, not Nikkud. I found this JS code. How do I do this in PHP ?
function stripCantillation(str){
return str.replace(/[\u0591-\u05AF]/g,"").replace("׀", "").replace("׃","").replace("־","");
}
Hebrew text with Cantillation
בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ
Hebrew text without Cantillation
בְּרֵאשִׁית בָּרָא אֱלֹהִים אֵת הַשָּׁמַיִם וְאֵת הָאָרֶץ
Solution
This is the php-friendly regex pattern that includes all of your unicode characters:
/[\x{0591}-\x{05AF}\x{05BE}\x{05C0}\x{05C3}]/u
(Pattern Demo)
To express these unicode characters, the 4-character codes are wrapped in curly brackets {}
and prepended with \x
. The u
flag must trail the expression. The contents of the character class (between the square brackets []
) begins with a range of characters, followed by three individual characters.
The following snippet will execute the regex pattern with php and display the output depending on if any replacements were actually made. Of course, if you don't need to count the replacements, you can just re-declare the input string with the return value from preg_replace()
and omit the 3rd and 4th parameters.
Code (Demo):
$inputs = [
'בְּרֵאשִׁית בָּרָא אֱלֹהִים אֵת הַשָּׁמַיִם וְאֵת הָאָֽרֶץ',
'בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ',
];
foreach ($inputs as $input) {
$output = preg_replace('/[\x{0591}-\x{05AF}\x{05BE}\x{05C0}\x{05C3}]+/u', '', $input, -1, $count);
echo !$count ? "no change" : "Replacement Count: {$count}\nBefore: {$input}\n After: {$output}";
echo "\n---\n";
}
Output:
no change
---
Replacement Count: 6
Before: בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ
After: בְּרֵאשִׁית בָּרָא אֱלֹהִים אֵת הַשָּׁמַיִם וְאֵת הָאָֽרֶץ
---
This is a highlighted table of the characters that will be replaced: Image Source: http://unicode.org/charts/PDF/U0590.pdf
Answered By - mickmackusa Answer Checked By - Timothy Miller (PHPFixing Admin)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.