PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Wednesday, November 2, 2022

[FIXED] How to unescape non-usa, en, ASCII type characters using grep?

 November 02, 2022     escaping, facebook-graph-api, grep, java, unicode     No comments   

Issue

I am using grep to parse a friend list obtained via the facebook Open Graph API. I am mostly able to do what I want with the following command, issued in bash:

grep -aiPo '"name":"(.*?)","id":"[[:digit:]]*"' friends?blahblah-access-token-stuff

which yields a list which looks like:

"name":"John Day","id":"--id ommitted--"
"name":"Andria Cast\u00f1eda","id":"--id ommitted--" // let me draw your attention here
"name":"Jane Doe","id":"--id ommitted--"

Names were changed above to preserve privacy

If you notice, there is an unescaped sequence in the middle entry, that corresponds to a tilde N. Is there an easy way to to feed such characters into a java program (my primary intention) so that java understands that \u00f1eda is unicode speak for the curly n?

I would prefer not to solve this problem by parsing the string in java and manually unescaping the unicode. I would very much prefer to instruct grep to handle this situation, or another GNU or open source tool that is widely available for bash.

At that point, I would feed the entire input as a file to a java program without having to worry about OMG, is that a unicode escape sequence!!? Java would naturally detect the unicode characters and map them to it's corresponding internal representation.

Thanks in advance!


Solution

Java understands Unicode. You provide Java Unicode escapes in the following manner:

String str = "\u00F6";

So if you pass a string such as "Andria Cast\u00f1eda" which is an escaped sequence, it should be handled correctly without any additional handling required.

Here's also a very brief, but easy to understand introduction:

Unicode in Java

If you're still not convinced, try this class:

public class UnicodeExample {

    public static void main(String[] args) {
        
        String escaped = new String("\u00f1");
        String unescaped = new String("ñ");
        System.out.println(escaped);        
        System.out.println(unescaped);
        
        if(escaped.equals(unescaped)){
            System.out.println("The strings are the same!");
        }
        else {
            System.out.println("The strings are different!");
        }

    }

}


Answered By - Michael
Answer Checked By - Cary Denson (PHPFixing Admin)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing