PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Monday, August 29, 2022

[FIXED] How to extract data matrix using pandas?

 August 29, 2022     csv, pandas, python, txt     No comments   

Issue

I have a csv file with 6901 rows x 42 column. 39 columns of this file is a matrix of data that I would like to do some analysis on. I do not know how to extract this data from pandas as a matrix which does not need index and treat it as a numerical matrix.

df1=pd.read_csv(fileName, sep='\\t',lineterminator='\\r', engine='python', header='infer')
df1.info()

< bound method DataFrame.info of Protein.IDs ... Ratio.H.L.33

0          A0A024QZP7;P06493;P06493-2;E5RIU6;A0A087WZZ9  ...     47.88100

1                          A0A024QZX5;A0A087X1N8;P35237  ...      0.13615

2                A0A024R0T9;K7ER74;P02655;Q6P163;V9GYJ8  ...          NaN

3     A0A024R4E5;Q00341;Q00341-2;H0Y394;H7C0A4;C9J5E...  ...      5.97650

4      A0A087WZA9;A0A024R4K9;A0A087X266;Q9BXJ8-2;Q9BXJ8  ...          NaN

                                        ...  ...          ...

6896                                             V9GYT7  ...          NaN

6897                                             V9GZ54  ...          NaN

6898  X5CMH5;A0A140T9S0;A0A0G2JLV0;A0A087WYD6;E7ENX8...  ...          NaN

6899                               X6RAL5;H7BZW6;U3KPY7  ...          NaN

6900                                             X6RJP6  ...          NaN

[6901 rows x 42 columns] >

Then I would like to put the column 4 to 42 as a normal matrix for computation. Does anyone knows how to do it?


Solution

pandas provides you with everything you need. :) You dont need to convert it to a numpy array. This way you will keep a couple of handy methods from pandas DataFrames :)

You have a .csv file which means "comma seperated values" - that has historical reason but nowdays the values are seperated by different signs or in panda-terms by different seperators, short sep. For example commas, semi-colons, tabs.

Your data shows a seperation by semi-colons, so you should use sep=';' in your pd.read_csv command.

You want to ignore the first 3 columns as i understood. So you just set the pd.read_csv variable usecols (=use columns)

usecols=range(4,43)

usecols expects you to tell him exactly the columns you wanna use. You can just give him a range from 4 to 43 or you can pass a list

a=[4,5,6,7,....,42]

which is obviously only handy if you want to define specific columns. The python-function range does this messy job for you.

So your command should look like this: df1=pd.read_csv(fileName, sep=';',lineterminator='\\r', engine='python', header='infer',usecols=range(4,43))

Best regards



Answered By - Jakob
Answer Checked By - Clifford M. (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing