PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Saturday, October 29, 2022

[FIXED] How to clean double lines from joint statement in R?

 October 29, 2022     left-join, r, subset     No comments   

Issue

Having done a join operation to compare addresses with itself.

library(tidyverse)
library(lubridate)
library(stringr)
library(stringdist)
library(fuzzyjoin)
doTheJoin <- function (threshold) {
      joined <- trimData(d_client_squashed) %>% 
        stringdist_left_join(
          trimData(d_client_squashed), 
          by = c(address_full="address_full"),
          distance_col = "distance",
          max_dist = threshold,
          method = "jw"
        )
    }    

The structure of d_client_squashed is the following and contains string values:

Client_Reference adress_full
C01 Client1 Name, Street, Zipcode, Town
C02 Client2 Name, Street2, Zipcode2, Town2
... ...

The following operation:

sensible_matches <- doTheJoin(0.2)
View(sensible_matches %>% filter(Client_Reference.x != Client_Reference.y))

Results in the following output:

Client_Reference.x address_full.x Client_Reference.y address_full.y Distance
C01 Client1 Name, Street, Zipcode, Town C02 Client2 Name, Street2, Zipcode2, Town2 0.05486
C02 Client2 Name, Street2, Zipcode2, Town2 C01 Client1 Name, Street, Zipcode, Town 0.05486
... ... ... ... ...

The output of this join operation is double with reversed client information. The distance value is not unique. How can I subset the data frame to avoid those double lines?


Solution

In order to remove the rows containing the same data, you can order them based on the contained elements, so there is not difference between rows containing the same pair of Client_Reference, and then delete the duplicates. After that you can filter the ones containing the same Client_Reference as you did.

sensible_matches <- sensible_matches[!duplicated(t(apply(sensible_matches,1,sort))),]
View(sensible_matches  %>% filter(Client_Reference.x != Client_Reference.y))


Answered By - Giulio Mattolin
Answer Checked By - Gilberto Lyons (PHPFixing Admin)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing