PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Thursday, December 15, 2022

[FIXED] What do $f, \t", "\n" mean when linearizing fasta using the awk?

 December 15, 2022     awk, shell, syntax     No comments   

Issue

I am trying to linearize fasta using awk. I am totally new to it. I have a script

awk '/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;} {printf("%s",$0);} END {printf("\n");}'  < $f | tr "\t" "\n" > ${f/.fasta/_lin.fasta}

I dont understand anything in the < $f | tr "\t" "\n" > ${f/.fasta/_lin.fasta}. What is $f, whats tr, t, n. Where exactly I am supposed to give the input file? Can someone please elaborate?


Solution

Let's step through that code piece by piece. First, I'll add some white space to make it more legible:

awk '
  /^>/ {
    printf("%s%s\t", (N>0?"\n":""), $0);
    N++;
    next;
  }
  {
    printf("%s",$0);
  }
  END {
    printf("\n");
  }
' < $f \
  | tr "\t" "\n" \
  > ${f/.fasta/_lin.fasta}

Okay. First, $f is your input file. The code's author expects it to contain .fasta, presumably at the end, like myfile.fasta. The < operator in shell scripts is redundant in this particular case (unless you have an equals sign in the filename since awk may interpret that as a variable assignment), simply telling awk to consume the contents of that file.

AWK then comes in and matches lines that start with >. On those lines, it will print a newline (if N > 0) or else nothing, followed by the contents of the line. It then increments N and skips the next command for that line. Other lines are printed as they're seen. After reading all of the lines of $f, a final newline is printed.

This awk code is not very legible. It could be rewritten like this:

awk '
  /^>/ && N++ {
    printf "\n";
  }
  {
    print;
  }
  END {
    printf "\n";
  }
'

The only tricky piece here is that N is initially zero, so when you say N++ the first time, it returns the value before incrementing (zero = false) and therefore that condition does not trigger. When you say it the second time, it returns the value before the next incrementing (one = true) and therefore that condition triggers. Anything that is not an empty string or a zero evaluates as true.

On one line, and more golfed, that could be awk '/^>/&&N++{printf"\n"}1;END{printf"\n"}' (1; triggers the default action, which is to print the line).

After awk, the output is passed to tr to translate all tabs (\t) into newlines (\n). Then the output is piped using the > operator to write to a file described by the shell replacement ${f/.fasta/_lin.fasta}, which replaces the first instance of .fasta in $f with _lin.fasta, so our example input file myfile.fasta is transformed to output file myfile_lin.fasta.



Answered By - Adam Katz
Answer Checked By - David Marino (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing