PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Tuesday, August 30, 2022

[FIXED] How can I clean a TSV file having record or fields separators in one of its fields?

 August 30, 2022     csv     No comments   

Issue

Given a TSV file with col2 that contains either a field or record separator (FS/RS) being respectively a tab or a carriage return which are escaped/surrounded by quotes.

$ printf '%b\n' 'col1\tcol2\tcol3' '1\t"A\tB"\t1234' '2\t"CD\nEF"\t567' | \cat -vet
col1^Icol2^Icol3$
1^I"A^IB"^I1234$
2^I"CD$
EF"^I567$

+------+---------+------+
| col1 | col2    | col3 |
+------+---------+------+
| 1    | "A   B" | 1234 |
| 2    | "CD     | 567  |
|      | EF"     |      |
+------+---------+------+

Is there a way in sed/awk/perl or even (preferably) miller/mlr to transform those pesky characters into spaces in order to generate the following result:

+------+---------+------+
| col1 | col2    | col3 |
+------+---------+------+
| 1    | "A B"   | 1234 |
| 2    | "CD EF" | 567  |
+------+---------+------+

I cannot get miller 6.2 to make the proper transformation (tried with DSL put/gsub) because it doesn't recognize the tab or CR/LF being part of the columns which breaks the field number:

$ printf '%b\n' 'col1\tcol2\tcol3' '1\t"A\tB"\t1234' '2\t"CD\nEF"\t567' | mlr --opprint --barred --itsv cat
mlr :  mlr: CSV header/data length mismatch 3 != 4 at filename (stdin) line  2.

Solution

if you run

printf '%b\n' 'col1\tcol2\tcol3' '1\t"A\tB"\t1234' '2\t"CD\nEF"\t567' | \
mlr --c2t --fs "\t" clean-whitespace
col1    col2    col3
1       A B     1234
2       CD EF   567

I'm using mlr 6.2.

A way to do it in miller 5 is to use simply the put verb:

printf '%b\n' 'col1\tcol2\tcol3' '1\t"A\tB"\t1234' '2\t"CD\nEF"\t567' | \
mlr --tsv  put -S 'for (k in $*) {$[k] = gsub($[k], "\n", " ")}' then clean-whitespace


Answered By - aborruso
Answer Checked By - Dawn Plyler (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing