Monday, August 29, 2022

[FIXED] How to remove lines that has a duplicated value in a column from a CSV file?

Issue

I have a txt file with this format:

 - 01, Spain
 - 02, USA
 - 03, India
 - 01, Italy
 - 01, Portugal
 - 04, Brasil

I need to check if the numbers are repeated. In this example, the number "01" has Spain, Italy and Portugal. If two or more lines have the same number, I need to keep only the first of the repeated number and get rid of the others. It would show this in the output file:

 - 01, Spain
 - 02, USA
 - 03, India
 - 04, Brasil

Solution

# Read your entire file into memory.
my_file = 'my_file.txt'
with open(my_file) as f_in:
    content = f_in.readlines()

# Keep track of the numbers that have already appeared
# while rewriting the content back to your file.
numbers = []
with open(my_file, 'w') as f_out:
    for line in content:
        number, country = line.split(',')
        if not number in numbers:
            f_out.write(line)
            numbers.append(number)

I hope this is the easiest to understand.



Answered By - Quang Nguyen
Answer Checked By - Timothy Miller (PHPFixing Admin)

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.