PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Sunday, July 24, 2022

[FIXED] How to convert unicode characters into corresponding emojis?

 July 24, 2022     javascript, json, regex, unicode     No comments   

Issue

I'm doing something similar to this website with my data. I have the Unicode in the format below, and the code to convert UTF16 into UTF string works.

function decodeFBEmoji (fbString) {
  // Convert String to Array of hex codes
  const codeArray = (
    fbString  // starts as '\u00f0\u009f\u0098\u00a2'
    .split('')
    .map(char => (
      char.charCodeAt(0)  // convert '\u00f0' to 0xf0
    )
  );  // result is [0xf0, 0x9f, 0x98, 0xa2]

  // Convert plain JavaScript array to Uint8Array
  const byteArray = Uint8Array.from(codeArray);

  // Decode byte array as a UTF-8 string
  return new TextDecoder('utf-8').decode(byteArray);  // '😢'

I am trying to extract the Unicode from the text string, and then replace it with its decoded Unicode to display as a proper emoji. I tried to use regex to extract the Unicode string, however, it converts to the random garbage character, and regex results out null. How can I replace the given code with its emoji without changing the rest of the text?

function replaceEmoji(text){
      let str = "lorem ipsum lorem ipsum \u00e2\u009d\u00a4\u00ef\u00b8\u008f lorem ipsum"; 
      let res = str.match(/[\\]\w+/g); 
      console.log(str);
      console.log(res); //Result is null
}

Console output of the above code

Edit: Regex Pattern I tested


Solution

You're trying to decode some UTF8 but you're mixing up JS string escapes and bytes.

When you type \uXXXX you type an escape for a unicode codepoint (just like \n is an escape for a newline), so this is true for instance: "\u0041" == "A"

This is the reason your regex cannot match anything, there is actually no backslash \ in the string. Now it's not clear in what form you have your UTF8 coming in, but if it is like you wrote it it is a series of UTF8 bytes which need to be decoded like so:

const utf8 = new Uint8Array(
    Array.prototype.map.call(
        "lorem ipsum lorem ipsum \u00e2\u009d\u00a4\u00ef\u00b8\u008f lorem ipsum", 
        c => c.charCodeAt(0)
    )
);
console.log(new TextDecoder('utf8').decode(utf8));



Answered By - Lucero
Answer Checked By - Marie Seifert (PHPFixing Admin)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing