PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Sunday, August 28, 2022

[FIXED] How to compare two CSV files in Python?

 August 28, 2022     csv, python, python-3.x     No comments   

Issue

I've two CSV file named as file1.csv and file2.csv in file2.csv there is only one column which contain only five records and in file1.csv I've three column which contain more than thousand records I want to get those records which contain in file2.csv for example this is my file1.csv

'A J1, Jhon1',jhon1@jhon.com, A/B-201 Test1
'A J2, Jhon2',jhon2@jhon.com, A/B-202 Test2
'A J3, Jhon3',jhon3@jhon.com, A/B-203 Test3
'A J4, Jhon4',jhon4@jhon.com, A/B-204 Test4
.......and more records

and inside my file2.csv I've only five records right now but in future it can be many

A/B-201 Test1
A/B-2012 Test12
A/B-203 Test3
A/B-2022 Test22

so I've to find records from my file1.csv at index[2] or index[-1]

this is what I did but it not giving me any output it just returning empty list

import csv 

file1 = open('file1.csv','r')
file2 = open('file2.csv','r')

f1 = list(csv.reader(file1))
f2 = list(csv.reader(file2))


new_list = []

for i in f1:
  if i[-1] in f2:
     new_list.append(i)

print('New List : ',new_list)

it gives me output like this

New List :  []

Please help if I did any thing wrong correct me.


Solution

Method 1: pandas

This task can be done with relative ease using pandas. DataFrame documentation here.

Example:

In the example below, the two CSV files are read into two DataFrames. The DataFrames are merged using an inner join on the matching columns.

The output shows the merged result.

import pandas as pd

df1 = pd.read_csv('file1.csv', names=['col1', 'col2', 'col3'], quotechar="'", skipinitialspace=True)
df2 = pd.read_csv('file2.csv', names=['match'])

df = pd.merge(df1, df2, left_on=df1['col3'], right_on=df2['match'], how='inner')

The quotechar and skipinitialspace parameters are used as the first column in file1 is quoted and contains a comma, and there is leading whitespace after the comma before the last field.

Output:

    col1            col2            col3
0   A J1, Jhon1     jhon1@jhon.com  A/B-201 Test1
1   A J3, Jhon3     jhon3@jhon.com  A/B-203 Test3

If you choose, the output can easily be written back to a CSV file as:

df.to_csv('path/to/output.csv')

For other DataFrame operations, refer to the documentation linked above.


Method 2: Core Python

The method below does not use any libraries, only core Python.

  1. Read the matches from file2 into a list.
  2. Iterate over file1 and search each line to determine if the last value is a match for an item in file2.
  3. Report the output.

Any subsequent data cleaning (if required) will be up to your personal requirements or use-case.

Example:

output = []

# Read the matching values into a list.
with open('file2.csv') as f:
    matches = [i.strip() for i in f]

# Iterate over file1 and place any matches into the output.
with open('file1.csv') as f:
    for i in f:
        match = i.split(',')[-1].strip()
        if any(match == j for j in matches):
            output.append(i)

Output:

["'A J1, Jhon1',jhon1@jhon.com, A/B-201 Test1\n",
 "'A J3, Jhon3',jhon3@jhon.com, A/B-203 Test3\n"]


Answered By - S3DEV
Answer Checked By - Candace Johnson (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing