Issue
The Python code here gets me the output I want. However, I need help with limiting the result to first 20 lines.
Input example is shown below,
gi|170079688|ref|YP_001729008.1| bifunctional riboflavin kinase/FMN adenylyltransferase [Escherichia coli str. K-12 substr. DH10B] MKLIRGIHNLSQAPQEGCVLTIGNFDGVHRGHRALLQGLQEEGRKRNLPVMVMLFEPQPLELFATDKAPA RLTRLREKLRYLAECGVDYVLCVRFDRRFAALTAQNFISDLLVKHLRVKFLAVGDDFRFGAGREGDFLLL QKAGMEYGFDITSTQTFCEGGVRISSTAVRQALADDNLALAESLLGHPFAISGRVVHGDELGRTIGFPTA NVPLRRQVSPVKGVYAVEVLGLGEKPLPGVANIGTRPTVAGIRQQLEVHLLDVAMDLYGRHIQVVLRKKI RNEQRFASLDELKAQIARDELTAREFFGLTKPA gi|170079689|ref|YP_001729009.1| isoleucyl-tRNA synthetase [Escherichia coli str. K-12 substr. DH10B] MSDYKSTLNLPETGFPMRGDLAKREPGMLARWTDDDLYGIIRAAKKGKKTFILHDGPPYANGSIHIGHSV NKILKDIIVKSKGLSGYDSPYVPGWDCHGLPIELKVEQEYGKPGEKFTAAEFRAKCREYAATQVDGQRKD FIRLGVLGDWSHPYLTMDFKTEANIIRALGKIIGNGHLHKGAKPVHWCVDCRSALAEAEVEYYDKTSPSI DVAFQAVDQDALKAKFAVSNVNGPISLVIWTTTPWTLPANRAISIAPDFDYALVQIDGQAVILAKDLVES VMQRIGVTDYTILGTVKGAELELLRFTHPFMGFDVPAILGDHVTLDAGTGAVHTAPGHGPDDYVIGQKYG LETANPVGPDGTYLPGTYPTLDGVNVFKANDIVVALLQEKGALLHVEKMQHSYPCCWRHKTPIIFRATPQ WFVSMDQKGLRAQSLKEIKGVQWIPDWGQARIESMVANRPDWCISRQRTWGVPMSLFVHKDTEELHPRTL ELMEEVAKRVEVDGIQAWWDLDAKEILGDEADQYVKVPDTLDVWFDSGSTHSSVVDVRPEFAGHAADMYL EGSDQHRGWFMSSLMISTAMKGKAPYRQVLTHGFTVDGQGRKMSKSIGNTVSPQDVMNKLGADILRLWVA STDYTGEMAVSDEILKRAADSYRRIRNTARFLLANLNGFDPAKDMVKPEEMVVLDRWAVGCAKAAQEDIL KAYEAYDFHEVVQRLMRFCSVEMGSFYLDIIKDRQYTAKADSVARRSCQTALYHIAEALVRWMAPILSFT ADEVWGYLPGEREKYVFTGEWYEGLFGLADSEAMNDAFWDELLKVRGEVNKVIEQARADKKVGGSLEAAV TLYAEPELSAKLTALGDELRFVLLTSGATVADYNDAPADAQQSEVLKGLKVALSKAEGEKCPRCWHYTQD VGKVAEHAEICGRCVSNVAGDGEKRKFA gi|170079690|ref|YP_001729010.1| lipoprotein signal peptidase [Escherichia coli str. K-12 substr. DH10B] MSQSICSTGLRWLWLVVVVLIIDLGSKYLILQNFALGDTVPLFPSLNLHYARNYGAAFSFLADSGGWQRW FFAGIAIGISVILAVMMYRSKATQKLNNIAYALIIGGALGNLFDRLWHGFVVDMIDFYVGDWHFATFNLA DTAICVGAALIVLEGFLPSRAKKQ
import re
id = None
header = None
seq = ''
a_file = open('e_coli.faa')
for line in a_file:
m = re.match(">(\S+)\s+(.+)", line.rstrip())
if m:
if id is not None:
print("{0} length:{1} {2}".format(id, len(seq),header))
id, header = m.groups()
seq = ''
else:
seq += line.rstrip()
Solution
In the very top, add c = 0
. Then, change
print("{0} length:{1} {2}".format(id, len(seq),header))
to
if c < 10:
print("{0} length:{1} {2}".format(id, len(seq),header))
c += 1
Result with a few adjustments:
import re
id = None
header = None
seq = ''
with open('e_coli.faa') as a_file:
for line in a_file:
m = re.match(">(\S+)\s+(.+)", line.rstrip())
if m:
if id and c < 20:
print("{0} length:{1} {2}".format(id, len(seq),header))
c += 1
id, header = m.groups()
seq = ''
else:
seq += line.rstrip()
To read the first 20 lines of the file.
you can use readlines()
:
Instead of:
for line in a_file:
use:
for line in a_file.readlines()[:20]:
Answered By - Ann Zen Answer Checked By - Gilberto Lyons (PHPFixing Admin)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.