Issue

Having done a join operation to compare addresses with itself.

library(tidyverse)
library(lubridate)
library(stringr)
library(stringdist)
library(fuzzyjoin)
doTheJoin <- function (threshold) {
      joined <- trimData(d_client_squashed) %>% 
        stringdist_left_join(
          trimData(d_client_squashed), 
          by = c(address_full="address_full"),
          distance_col = "distance",
          max_dist = threshold,
          method = "jw"
        )
    }

The structure of d_client_squashed is the following and contains string values:

Client_Reference	adress_full
C01	Client1 Name, Street, Zipcode, Town
C02	Client2 Name, Street2, Zipcode2, Town2
...	...

The following operation:

sensible_matches <- doTheJoin(0.2)
View(sensible_matches %>% filter(Client_Reference.x != Client_Reference.y))

Results in the following output:

Client_Reference.x	address_full.x	Client_Reference.y	address_full.y	Distance
C01	Client1 Name, Street, Zipcode, Town	C02	Client2 Name, Street2, Zipcode2, Town2	0.05486
C02	Client2 Name, Street2, Zipcode2, Town2	C01	Client1 Name, Street, Zipcode, Town	0.05486
...	...	...	...	...

The output of this join operation is double with reversed client information. The distance value is not unique. How can I subset the data frame to avoid those double lines?

Solution

In order to remove the rows containing the same data, you can order them based on the contained elements, so there is not difference between rows containing the same pair of Client_Reference, and then delete the duplicates. After that you can filter the ones containing the same Client_Reference as you did.

sensible_matches <- sensible_matches[!duplicated(t(apply(sensible_matches,1,sort))),]
View(sensible_matches  %>% filter(Client_Reference.x != Client_Reference.y))

Answered By - Giulio Mattolin

Answer Checked By - Gilberto Lyons (PHPFixing Admin)

Saturday, October 29, 2022

[FIXED] How to clean double lines from joint statement in R?

Issue

Solution

0 Comments:

Post a Comment

Total Pageviews

Featured Post

Why Learn PHP Programming

Saturday, October 29, 2022

Issue

Solution

0 Comments:

Post a Comment

Total Pageviews

Featured Post

Why Learn PHP Programming

Subscribe To