PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Sunday, November 20, 2022

[FIXED] How to replace Hebrew Cantillation (not Nikkud) characters from string with PHP?

 November 20, 2022     hebrew, php, preg-replace, regex, unicode     No comments   

Issue

I would like to strip Cantillation from Hebrew strings, not Nikkud. I found this JS code. How do I do this in PHP ?

function stripCantillation(str){
    return str.replace(/[\u0591-\u05AF]/g,"").replace("׀", "").replace("׃","").replace("־","");
}

Hebrew text with Cantillation

בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ

Hebrew text without Cantillation

בְּרֵאשִׁית בָּרָא אֱלֹהִים אֵת הַשָּׁמַיִם וְאֵת הָאָרֶץ


Solution

This is the php-friendly regex pattern that includes all of your unicode characters:

/[\x{0591}-\x{05AF}\x{05BE}\x{05C0}\x{05C3}]/u

(Pattern Demo)
To express these unicode characters, the 4-character codes are wrapped in curly brackets {} and prepended with \x. The u flag must trail the expression. The contents of the character class (between the square brackets []) begins with a range of characters, followed by three individual characters.

The following snippet will execute the regex pattern with php and display the output depending on if any replacements were actually made. Of course, if you don't need to count the replacements, you can just re-declare the input string with the return value from preg_replace() and omit the 3rd and 4th parameters.

Code (Demo):

$inputs = [
    'בְּרֵאשִׁית בָּרָא אֱלֹהִים אֵת הַשָּׁמַיִם וְאֵת הָאָֽרֶץ',
    'בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ',
];
foreach ($inputs as $input) {
    $output = preg_replace('/[\x{0591}-\x{05AF}\x{05BE}\x{05C0}\x{05C3}]+/u', '', $input, -1, $count);
    echo !$count ? "no change" : "Replacement Count: {$count}\nBefore: {$input}\n After: {$output}";
    echo "\n---\n";
}

Output:

no change
---
Replacement Count: 6
Before: בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ
 After: בְּרֵאשִׁית בָּרָא אֱלֹהִים אֵת הַשָּׁמַיִם וְאֵת הָאָֽרֶץ
---

This is a highlighted table of the characters that will be replaced: Image Source: http://unicode.org/charts/PDF/U0590.pdf

enter image description here



Answered By - mickmackusa
Answer Checked By - Timothy Miller (PHPFixing Admin)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing