PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0
Showing posts with label match. Show all posts
Showing posts with label match. Show all posts

Tuesday, December 13, 2022

[FIXED] How to change a cells colour based on a cell on a seperate sheet being empty

 December 13, 2022     conditional-formatting, formula, google-sheets-formula, match, syntax     No comments   

Issue

I'm struggling to nail this. I'm trying to use conditional formatting on google sheets to change the colour of a group of cells based on a cell in a different sheet being empty.

So for example I have a sheet called "Controls" with an empty cell, M13, to write a persons name. Once that cell has had text inputted I would like a group of cells, BE2:BN25, to change colour to white so as effectively 'unlock' them for data entry.

I've tried =NOT(ISBLANK("Controls!M13")) but it's not changing colour once some text has been entered to that cell.

Is this possible on google sheets or am I just making a glaringly obvious error?

Thanks in advance for your help.


Solution

try:

=NOT(ISBLANK(INDIRECT("Controls!M13")))


Answered By - player0
Answer Checked By - David Goodson (PHPFixing Volunteer)
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg

[FIXED] How does Match function treat an array of search key as input?

 December 13, 2022     arrays, google-sheets, google-sheets-formula, match, syntax     No comments   

Issue

enter image description here

I have put formula =match($A$4:$A,$B$4:$B,0) in column C & =match($A4:$A,$B$4:$B,0) in Column D. The only difference between both is of a $.

In Column D Match formula search key $A4:$A is treated same as $A4 ($A4, $A5, $A6...) but in Column C Match formula search key $A$4:$A is not treated same as $A$4.

I understand absolute & relative cell references, but I really don't understand how come Column C & Column D have same output or in simple terms,
How does the same formula =match($A$4:$A,$B$4:$B,0) at different cells: C4, C5, C6, C7 & C8 give different outputs? How exactly the Match function treat an array of search key ($A$4:$A) as inputs?

Google Sheet Link


Solution

your formula is incomplete.

you are using arrays/ranges within a MATCH but there should be ARRAYFORMULA/INDEX wrapping around MATCH to process it as an array. also, since this is an array the locking mechanism with $ is pointless

use:

=ARRAYFORMULA(IFNA(MATCH(A4:A, B4:B, 0)))

enter image description here


update:

let's see your row 5 as an example... in C5 you have

=MATCH($A$4:$A, $B$4:$B, 0)

and in D5 you have:

=MATCH($A5:$A, $B$4:$B, 0)

what you need to realize is that both above formulae are exactly the same as this one:

=MATCH(A5, B4:B, 0)

why? let's analyze the above functionality... we are trying to find the exact position of A5 within range B4:B the result is 2 because A5 is found in B5 and that's the 2nd cell/position from range B4:B. this formula is syntax-wise correct and it is meant to be used in one single cell with the possibility of dragging it up or down.

now let's analyze =MATCH($A5:$A, $B$4:$B, 0) - everything is the same because 1st argument of MATCH was not activated as array/range. in other words, as soon as you wrap the MATCH into ARAYFORMULA the 1s argument of MATCH (which is $A5:$A) gets activated by ARRAYFORMULA wrapping and it gets recognized as an actual valid range. without ARRAYFORMULA wrapping $A5:$A is recognized only as A5 despite it being written as a multi-cell range. eg MATCH is able to read it only as A5

at this point you may say "ok, so far it makes sense but how about when the 1st argument is A4:A why the behavior is same?" let's take a look... first we need to agree that A4:A is the same as

$A$4:$A
A$4:A
$A4:$A
$A$4:A
A4:$A
A$4:$A
INDIRECT("A4:A")

the lock $ we may or may not use to lock the column, row, or both or none - does not matter. the lock $ is useful only when we are about to drag the formula down, up, or to the sides so we could keep the intended window of reference

so for simplicity, we are left with A4:A but since the formula =MATCH($A4:$A, $B$4:$B, 0) sits in row 5, the range A4:A is understood as it being as A5. the row on which the formula is located has higher priority over range like A4:A because MATCH recognizes it that A4:A was not activated as a valid range from some "outside force" (ARRAYFORMMULA) so even if you write it as A4:A the MATCH formula sees only the row it sits on and nothing beyond.

therefore even =MATCH($A$4:$A, $B$4:$B, 0) is exactly same as =MATCH(A5, B4:B, 0)



Answered By - player0
Answer Checked By - Gilberto Lyons (PHPFixing Admin)
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg

Tuesday, November 22, 2022

[FIXED] how to match or get dynamic value from another cell based on multiple criteria in google spreadsheets

 November 22, 2022     google-sheets, match, multiple-conditions, vlookup     No comments   

Issue

This sheet i want to fill out and based on the values i choose here, the value in cell G should be copied to the sheet Sittard

This sheet i want to fill out and based on the values i choose here, the value in cell G
should be copied to the sheet Sittard

The sheet Sittard looks like this Sittard Sheet where the value should be based


Solution

use in D2:

=ARRAYFORMULA(IFNA(VLOOKUP(A2:A&B2:B&C2:C; 
 {Inkoop!D2:D&Inkoop!E2:E&Inkoop!F2:F\ Inkoop!G2:G}; 2; 0)))

or in D1:

={"Stukprijs"; ARRAYFORMULA(IFNA(VLOOKUP(A2:A&B2:B&C2:C; 
 {Inkoop!D2:D&Inkoop!E2:E&Inkoop!F2:F \Inkoop!G2:G}; 2; 0)))}


Answered By - player0
Answer Checked By - Gilberto Lyons (PHPFixing Admin)
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg

Tuesday, November 1, 2022

[FIXED] How to get specific values of items from other sheets based on column names in the master sheet?

 November 01, 2022     google-sheets, indexing, match, textjoin, vlookup     No comments   

Issue

To describe my problem here is my sheet

Master sheet

Here, From Column E onwards there are different attributes for each of the items in column A.

Now, I have a subsheet based on the type of the items as listed in column B, here

Subsheet

Here we can see from Column C onwards attributes of the items in column A. In this case, the other attributes from the master sheet above that are not mentioned do not apply to these items and therefore will be blank in the master sheet.

My problem is to populate the attributes of the corresponding items from the subsheet that are in column C to F, in the master sheet where all the attributes are listed. I only want to add those values that exist in the sub sheet and leave the rest blank.

Here is a sample sheet with the desired output. There is a Master sheet and an Subsheet named DEF. The Master sheet has to lookup the values from the Subsheet and filled the appropriate cells based on the reference from column A and B of the Master sheet.

https://docs.google.com/spreadsheets/d/1lNPPhTr7_N0N8CA7BLyO9JhYGd1Hx0c6Q0Uq4Wdt_GY/edit?usp=sharing


Solution

try in D2:

=INDEX(IF(A2:A="",,IFERROR(VLOOKUP(A2:A&B2:B, {DEF!A:A&DEF!B:B, DEF!C:Z}, 
 IFERROR(MATCH(D1:P1, DEF!A1:Z1, 0)-1, 100), 0))))

enter image description here



Answered By - player0
Answer Checked By - Marie Seifert (PHPFixing Admin)
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg

[FIXED] How to improve performance of dataframe slices matching?

 November 01, 2022     match, numpy, pandas, performance, python     No comments   

Issue

I need to improve the performance of the following dataframe slices matching. What I need to do is find the matching trips between 2 dataframes, according to the sequence column values with order conserved.

My 2 dataframes:

>>>df1
  trips sequence
0   11  a
1   11  d
2   21  d
3   21  a
4   31  a
5   31  b
6   31  c

>>>df2
  trips sequence
0   12  a
1   12  d
2   22  c
3   22  b
4   22  a
5   32  a
6   32  d

Expected output:

['11 match 12']

This is the following code I' m using:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'trips': [11, 11, 21, 21, 31, 31, 31], 'sequence': ['a', 'd', 'd', 'a', 'a', 'b', 'c']})

df2 = pd.DataFrame({'trips': [12, 12, 22, 22, 22, 32, 32], 'sequence': ['a', 'd', 'c', 'b', 'a', 'a', 'd']})

route_match = []
for trip1 in df1['trips'].drop_duplicates():
    for trip2 in df2['trips'].drop_duplicates():
        route1 = df1[df1['trips'] == trip1]['sequence']
        route2 = df2[df2['trips'] == trip2]['sequence']
        if np.array_equal(route1.values,route2.values):
            route_match.append(str(trip1) + ' match ' + str(trip2))
            break
        else:
            continue

Despite working, this is very time costly and unefficient as my real dataframes are longer. Any suggestions?


Solution

You can aggregate each trip as tuple with groupby.agg, then merge the two outputs to identify the identical routes:

out = pd.merge(df1.groupby('trips', as_index=False)['sequence'].agg(tuple),
               df2.groupby('trips', as_index=False)['sequence'].agg(tuple),
               on='sequence'
              )

output:

   trips_x sequence  trips_y
0       11   (a, d)       12
1       11   (a, d)       32

If you only want the first match, drop_duplicates the output of df2 aggregation to prevent unnecessary merging:

out = pd.merge(df1.groupby('trips', as_index=False)['sequence'].agg(tuple),
               df2.groupby('trips', as_index=False)['sequence'].agg(tuple)
                  .drop_duplicates(subset='sequence'),
               on='sequence'
              )

output:

   trips_x sequence  trips_y
0       11   (a, d)       12


Answered By - mozway
Answer Checked By - Candace Johnson (PHPFixing Volunteer)
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg

Friday, October 28, 2022

[FIXED] How to match elements between dataframes based on conditions in base R or dplyr?

 October 28, 2022     dplyr, left-join, match, merge, r     No comments   

Issue

I am trying to match these two data frames, nCode and grpRnk, as illustrated below.

enter image description here

Using the code shown at the bottom, I've been able to get the below output whereby the last column on the right shows the correct values, but I don't need those extra columns from grpRnk$Name through $subGrp (columns 6 -10) and I had specified a name for the last column of grpRnk but I instead get $grpRnk. Am I using this merge() function correctly? Is there a more correct or efficient way to do this sort of multi-factor matching?

  Name  Group nmCnt seqBase subGrp grpRnk$Name $Group $nmCnt $seqBase $subGrp $grpRnk
  <chr> <dbl> <int>   <int>  <int> <chr>        <dbl>  <int>    <int>   <int>   <int>
1 R         0     1       1      0 B                0      1        1       0      NA
2 R         0     2       2      0 R                0      1        1       0      NA
3 B         0     1       1      0 R                0      2        2       0      NA
4 R         0     3       3      0 R                0      3        3       0      NA
5 X         1     1       1      1 X                1      1        1       1       1
6 X         1     2       1      2 X                1      2        1       2       1

The code:

library(dplyr)

myDF1 <- 
  data.frame(
    Name = c("R","R","B","R","X","X"),
    Group = c(0,0,0,0,1,1)
  )

nCode <-  myDF1 %>%
  group_by(Name) %>%
  mutate(nmCnt = row_number()) %>%
  ungroup() %>%
  mutate(seqBase = ifelse(Group == 0 | Group != lag(Group), nmCnt,0)) %>%
  mutate(seqBase = na_if(seqBase, 0)) %>%
  group_by(Name) %>%
  fill(seqBase) %>%
  mutate(seqBase = match(seqBase, unique(seqBase))) %>%
  ungroup() %>%
  mutate(subGrp = as.integer(ifelse(Group > 0, sapply(1:n(), function(x) sum(Name[1:x]==Name[x] & Group[1:x] == Group[x])),0))) 

grpRnk <- nCode %>% select(Name,Group,nmCnt) %>% 
  filter(Group > 0) %>% 
  group_by(Name,Group) %>% 
  slice(which.min(Group)) %>% 
  ungroup() %>%
  arrange(nmCnt) %>%
  mutate(grpRnk = dense_rank(nmCnt)) %>%
  select (-nmCnt)

nCode %>% mutate(grpRnk = merge(nCode,grpRnk, by=c("Name","Group"), all.x=T))

Solution

You need to specify what column you want to extract into your newly created grpRnk column. You can achieve this by adding $grpRnk to the end of your merge() statement.

Here I have first a solution with merge():

library(tidyverse)

myDF1 <- 
  data.frame(
    Name = c("R","R","B","R","X","X"),
    Group = c(0,0,0,0,1,1)
  )

nCode <-  myDF1 %>%
  group_by(Name) %>%
  mutate(nmCnt = row_number()) %>%
  ungroup() %>%
  mutate(seqBase = ifelse(Group == 0 | Group != lag(Group), nmCnt,0)) %>%
  mutate(seqBase = na_if(seqBase, 0)) %>%
  group_by(Name) %>%
  fill(seqBase) %>%
  mutate(seqBase = match(seqBase, unique(seqBase))) %>%
  ungroup() %>%
  mutate(subGrp = as.integer(ifelse(Group > 0, sapply(1:n(), function(x) sum(Name[1:x]==Name[x] & Group[1:x] == Group[x])),0))) 

grpRnk <- nCode %>% select(Name,Group,nmCnt) %>% 
  filter(Group > 0) %>% 
  group_by(Name,Group) %>% 
  slice(which.min(Group)) %>% 
  ungroup() %>%
  arrange(nmCnt) %>%
  mutate(grpRnk = dense_rank(nmCnt)) %>%
  select (-nmCnt)

nCode %>% mutate(grpRnk = merge(nCode,grpRnk, by=c("Name","Group"), all.x = TRUE)$grpRnk)
#> # A tibble: 6 × 6
#>   Name  Group nmCnt seqBase subGrp grpRnk
#>   <chr> <dbl> <int>   <int>  <int>  <int>
#> 1 R         0     1       1      0     NA
#> 2 R         0     2       2      0     NA
#> 3 B         0     1       1      0     NA
#> 4 R         0     3       3      0     NA
#> 5 X         1     1       1      1      1
#> 6 X         1     2       1      2      1

However, you can achieve the same result using only merge() like this:

merge(nCode,grpRnk, by=c("Name","Group"), all.x = TRUE)
#>   Name Group nmCnt seqBase subGrp grpRnk
#> 1    B     0     1       1      0     NA
#> 2    R     0     1       1      0     NA
#> 3    R     0     2       2      0     NA
#> 4    R     0     3       3      0     NA
#> 5    X     1     1       1      1      1
#> 6    X     1     2       1      2      1

And here is another solution using left_join() from the dplyr package:

left_join(nCode, grpRnk, by = c("Name", "Group"))
#> # A tibble: 6 × 6
#>   Name  Group nmCnt seqBase subGrp grpRnk
#>   <chr> <dbl> <int>   <int>  <int>  <int>
#> 1 R         0     1       1      0     NA
#> 2 R         0     2       2      0     NA
#> 3 B         0     1       1      0     NA
#> 4 R         0     3       3      0     NA
#> 5 X         1     1       1      1      1
#> 6 X         1     2       1      2      1

Created on 2022-09-19 with reprex v2.0.2



Answered By - Noah
Answer Checked By - Katrina (PHPFixing Volunteer)
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg

Thursday, August 4, 2022

[FIXED] Why doesn't out of range indexes throw an exception for Regex GroupCollection?

 August 04, 2022     c#, exception, match, regex     No comments   

Issue

Ordinarily if you try to access an array member that doesn't exist you get an "IndexOutOfRangeException".

However, for some reason, I don't get that for a Regex GroupCollection.

For example:

    var match = Regex.Match("", "a");
    var test = match.Groups[6];

Does not throw an exception, despite match having only 1 group.

Why does this happen?


Solution

This is by design.

See the GroupCollection documentation:

If groupnum is not the index of a member of the collection, or if groupnum is the index of a capturing group that has not been matched in the input string, the method returns a Group object whose Group.Success property is false and whose Group.Value property is String.Empty.

I bolded the beginning of the sentence since it describes your current scenario: a regex pattern finds no matches in an empty string, the match is empty, and there is no group with ID = 6.



Answered By - Wiktor Stribiżew
Answer Checked By - Pedro (PHPFixing Volunteer)
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg

Monday, July 18, 2022

[FIXED] How to check if a particular file already exists by using an uploaded file to find a match in PHP

 July 18, 2022     file-upload, gif, match, pattern-matching, php     No comments   

Issue

giphy.com, during an upload, somehow recognizes if a particular GIF animation has already been uploaded to their servers. and I was wondering how I can replicate that using PHP in a very efficient (low-server intensive) manner?

The only way I can think of doing this is by looping between all the files on my server (which I would think would be a highly server intensive task) and trying to find a match (and by match, I don't mean filename; I mean if the exact same gif was uploaded (the exact file regardless of the filename)).

But I'm not too sure of what particular PHP functions I'd have to use.


Solution

Use a database. At each upload you calculate a md5 hash of the file ( using md5_file ) and you store that in the database. Looking up the database to see if the hash already exists will be very fast.



Answered By - Lorenz Meyer
Answer Checked By - Candace Johnson (PHPFixing Volunteer)
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg

Sunday, July 17, 2022

[FIXED] Why does a match expression not report an error for a catch-all arm (_) prior to other arms?

 July 17, 2022     match, rust, warnings     No comments   

Issue

Rust has a construct called match which looks very similar to switch in other languages. However I observed a very peculiar behavior of match.

let some_u32_value = 3;
match some_u32_value {
    _ => (),
    3 => println!("three"),
}

match respects the order in which the cases/patterns are mentioned. Why does it not report an error if the default (_) case is at the top? It does give a warning though:

warning: unreachable pattern
 --> src/main.rs:5:9
  |
5 |         3 => println!("three"),
  |         ^
  |
  = note: #[warn(unreachable_patterns)] on by default

A similar construct in Java, the switch, does not preserve any order, so having a default prior to other cases is not an error (ignoring the fall through behavior).

int x = 0;

switch (x) {
  default:
    System.out.println("default");
    break;
  case 0:
      System.out.println("Zero");
} 

Is there some purpose for doing this explicitly?


Solution

An unreachable pattern is not strictly an error, I mean: it does not prevent the compiler from "understanding" the code nor does it make the code unsafe.

Similarly, in C, for example, you can return a reference to a local variable without triggering an error (at least with gcc):

#include <stdio.h>

int* foo() {
    int x = 0;

    return &x;
}

int main() {
    printf("%d", *foo());

    return 0;
}

Generally, you should not consider a warning as "oh, that's only a warning, I don't care". A warning is an actual useful advice/information given by the compiler.

I like the definition given in warnings:

A warning is often issued on recognizing a potential high-risk situation, a probable misunderstanding, degraded service or imminent failure.

because it helps to understand the difference between an error and a warning:

  • An error is an error,
  • A warning is a potential/probable error or a problematic thing.

In this situation, the last arm of the match is some dead code, so the compiler reports it accordingly.



Answered By - Boiethios
Answer Checked By - Candace Johnson (PHPFixing Volunteer)
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg

Wednesday, May 18, 2022

[FIXED] How to check if two string are a partial match in C#?

 May 18, 2022     c#, match, partial, string     No comments   

Issue

Possible Duplicate:
Are there any Fuzzy Search or String Similarity Functions libraries written for C#?

I am creating an application which will except user input of a Song or Artist or Album name and then will look through a String Array or ArrayList for any possible matches.

The auto suggestions will be calculated based on the match percentage.

For example

If user types link prk it should find Linkin Park or Link 80 or Link Wray but the match percentage will be different for all

Assume that the collection will only search for Artist names in Artist Collection and Song name in song collection.

(Percentage figures are just for explanation)

Linkin Park - 98%
Link Wray -82%
Link 80 - 62%

Solution does not have to be C# code, any regex or pseudo code will be good but should be implementable in C#.


Solution

Usually an implementation of the Levenshtein distance also called edit distance is used for this. This will find matches based on the minimum number of edits needed to transform one string into the other, counting all insertions, deletions, or substitutions of a single character as a measure for the "cost" - candidates are then strings that have the minimum cost.

Here's a link to an article with a generic implementation in C#.



Answered By - BrokenGlass
Answer Checked By - Senaida (PHPFixing Volunteer)
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg

[FIXED] How to search for partial matches in numbers using SlickGrid header row?

 May 18, 2022     header, match, partial, slickgrid     No comments   

Issue

I'm trying to perform partial matches in the header row example for SlickGrid.

I used the answer here: How to perform partial matches when filtering Slickgrid using column-level headers?

Which does do partial matches on text but will not do partial matches on numbers. Does anyone know how to do partial matches on both? Some of my columns have text and some have numbers and I want to be able to filter by each column.


Solution

Store your numbers as text in the grid data. Then this method should work for numbers as well. Same applies to boolean and dates as well.

If you want to keep using numbers though, you could check for the variable type in your filter using the typeof function and adding a new condition to handle number matches.



Answered By - ganeshk
Answer Checked By - Marie Seifert (PHPFixing Admin)
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg

Tuesday, May 17, 2022

[FIXED] How to match two data frames based on substrings within groups

 May 17, 2022     grouping, match, partial, r, string     No comments   

Issue

I want to merge two dataframes which are grouped by the same identifier. The variable in the first dataframe (valueA) should match with a substring of a variable in the second data frame (valueB), but only within groups.

I could manage to match the matching variables, but I'm struggling to limit the matching to the grouping variable. Here are the sample data frames and the matching code:

df1 <- data.frame(report = c('Report1','Report1','Report1','Report1','Report1','Report1'),
            identifier = c('Abraham', 'Abraham', 'Abraham','Barack', 'Barack','Barack'),
            variableA = c('V1','V2','V3','V1','V2', 'V3'),
            value = c('CDKN2A/B','PALB2','KRAS','TP53','RB1','KRAS'))

df2 <- data.frame(report = c('Report1','Report1','Report1','Report1','Report1','Report1','Report1'),
            identifier = c('Abraham', 'Abraham', 'Abraham','Abraham','Barack', 'Barack','Barack'),
            variableB = c('F1','F2','F3','F4','F1','F2', 'F3'),
            valueB = c('CDKN2A/B LOSS','PALB2 P1111FS*13','KRAS G12R','PALB2 N540FS*1','RB1 SPLICE SITE 2325+1G>A','KRAS G13C','TP53 C238F'))

Here is the code I've tried, but which is not working for groups

idx2 <- sapply(df1$value, grep, df2$valueB)
idx1 <- sapply(seq_along(idx2), function(i) rep(i, length(idx2[[i]])))
idx3 <- cbind(df1[unlist(idx1),,drop=F], df2[unlist(idx2),,drop=F])

Expected output is (code for data frame)

df3 <- data.frame(report=c('Report1','Report1','Report1','Report1','Report1','Report1','Report1'),
                  identifier = c('Abraham', 'Abraham', 'Abraham','Abraham','Barack', 'Barack','Barack'),
                  variableA = c('V1','V2','V3','V2','V1','V2', 'V3'),
                  value = c('CDKN2A/B','PALB2','KRAS','PALB2','TP53','RB1','KRAS'),
                  variableB = c('F1','F2','F3','F4','F1','F2', 'F3'),
                  valueB = c('CDKN2A/B LOSS','PALB2 P1111FS*13','KRAS G12R','PALB2 N540FS*1','TP53 C238F','RB1 SPLICE SITE 2325+1G>A','KRAS G13C'))

resulting dataframe

report  identifier  variableA   value   variableB   valueB
Report1 Abraham     V1      CDKN2A/B    F1  CDKN2A/B LOSS
Report1 Abraham     V2      PALB2   F2  PALB2   P1111FS*13
Report1 Abraham     V3      KRAS    F3  KRAS    G12R
Report1 Abraham     V2      PALB2   F4  PALB2   N540FS*1
Report1 Barack      V1      TP53    F1  TP53    C238F
Report1 Barack      V2      RB1 F2  RB1 SPLICE SITE 2325+1G>A
Report1 Barack      V3      KRAS    F3  KRAS    G13C

Hope this makes sense, many thanks for your help!


Solution

You can use the fuzzyjoin-package for this:

fuzzy_inner_join(df2, df1, by = c("valueB" = "valueA", "identifier" = "identifier"), match_fun = list(str_detect, `==`)) %>%
  select(report.x, identifier.x, variableA, valueA, variableB, valueB)

  report.x identifier.x variableA   valueA variableB                    valueB
1  Report1      Abraham        V1 CDKN2A/B        F1             CDKN2A/B LOSS
2  Report1      Abraham        V2    PALB2        F2          PALB2 P1111FS*13
3  Report1      Abraham        V3     KRAS        F3                 KRAS G12R
4  Report1      Abraham        V2    PALB2        F4            PALB2 N540FS*1
5  Report1       Barack        V2      RB1        F1 RB1 SPLICE SITE 2325+1G>A
6  Report1       Barack        V3     KRAS        F2                 KRAS G13C
7  Report1       Barack        V1     TP53        F3                TP53 C238F

This way you can apply different matching functions for different columns. In this case we used str_detect() for your fuzzy matching column and == for your grouping column.



Answered By - Humpelstielzchen
Answer Checked By - Katrina (PHPFixing Volunteer)
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg

[FIXED] How to subset multiple columns from df including grep match

 May 17, 2022     dataframe, match, partial, r, subset     No comments   

Issue

I have a very large data set that includes multiple columns with common portions of their names (e.g ctq_1, ctq_2, ctq_3 and also panas_1, panas_2, panas_3). I'd like to subset some of those columns (e.g. only those containing 'panas' in the column name) alongside certain other columns from the same data frame that have unique names (e.g. id, group).

I tried using a grep function inside square brackets, which worked nicely: panas <- bigdata[ , grep('panas', colnames(bigdata))] but now I need to work out how to also include the other two columns that I need, which are id and group. I tried: panas <- bigdata[ , c('id', 'group', grep('panas', colnames(bigdata)))] but I get this error: Error: Can't find columns 114, 115, 116, 117, 118, … (and 15 more) in .data. Call rlang::last_error() to see a backtrace.

How can I achieve what I want to with the simplest code possible? I am an R newbie so avoiding fancy functions would be ideal!

Here is a reproducible example.


> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

> newframe <- iris[ , grep('Petal', colnames(iris))] # This works

> newframe <- iris[ , c('Species', grep('Petal', colnames(iris)))] # This doesn't work

This time, the error is:

Error in [.data.frame(iris, , c("Species", grep("Petal", colnames(iris)))) : undefined columns selected


Solution

Assuming I understood what you would like to do, a possible solution that may not be useful and/or may be redundant:

my_selector <- function(df,partial_name,...){
  positional_names <- match(...,names(df))
  df[,c(positional_names,grep(partial_name,names(df)))]
}
my_selector(iris, partial_name = "Petal","Species")

A "simpler" option would be to use grep and the like to match the target names at once:

iris[grep("Spec.*|Peta.*", names(iris))]

Or even simpler, as suggested by @akrun , we can simply do:

iris[grep("(Spec|Peta).*", names(iris))]

For more columns, we could do something like:

my_selector(iris, partial_name = "Petal",c("Species","Sepal.Length"))
       Species Sepal.Length Petal.Length Petal.Width
1       setosa          5.1          1.4         0.2
2       setosa          4.9          1.4         0.2

Note however that in the above function, the columns are selected counter-intuitively in that the names supplied last are selected first.

Result for the first part(truncated):

         Species Petal.Length Petal.Width
1       setosa          1.4         0.2
2       setosa          1.4         0.2
3       setosa          1.3         0.2
4       setosa          1.5         0.2
5       setosa          1.4         0.2
6       setosa          1.7         0.4
7       setosa          1.4         0.3


Answered By - NelsonGon
Answer Checked By - Senaida (PHPFixing Volunteer)
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg

[FIXED] How to merge two data frames with specific string match in columns in R?

 May 17, 2022     dataframe, match, merge, partial, r     No comments   

Issue

I have two dataframes data1 and data2 which have information like below:

dput(data1)

structure(list(ProfName = c("Hua (Christine) Xin", "Dereck Barr-Pulliam", 
"Lisa M. Blum", "Russell  Williamson", "William D. Stout", "Michael F. Wade", 
"Sheila A.  Johnston", "Julie Huang", "Alan Attaway", "Alan Levitan", 
"Benjamin P. Foster", "Carolyn M.  Callahan"), Title = c(" PhD", 
" PhD", " LLM", " PhD", " PhD", " CPA", " MS", " PhD", " PhD", 
" PhD", " PhD", " PhD"), Profession = c("Assistant Professor", 
"Assistant Professor", "Instructor", "Assistant Professor", "Associate Professor and Director", 
"Instructor", "Instructor", "Associate Professor", "Professor", 
"Professor", "Professor", "Brown-Forman Professor of Accountancy"
)), row.names = c(8L, 18L, 25L, 36L, 49L, 50L, 56L, 69L, 71L, 
82L, 88L, 89L), class = "data.frame")

It looks like below:

enter image description here

dput(data2)

structure(list(ProfName = c("Blandford, K     ", "Okafor, A     ", 
"Johnston, S     ", "Rolen, R     ", "Attaway, A     ", "Xin, H     ", 
"Huang, Y     ", "Stout, W     ", "Williamson, R     ", "Callahan, C     ", 
"Foster, B     ", "Blum, L     ", "Levitan, A     ", "Barr-Pulliam, D     ", 
"Wade, M     ")), row.names = c(NA, -15L), class = "data.frame")

data2 looks like below:

enter image description here

I wanted to merge two dataframes, but the names look different. Only a specific string is matching between two dataframes with column ProfName. The data should be merged and if the names don't have any information it should be empty. If they don't have any information in the columns Title and Profession, both ProfName and New columns should have the same name.

I tried using merge, but it didn't give the desired output.

merge(data1, data2, by="ProfName", all.x=TRUE, all.y = TRUE)

The output should look like below:

enter image description here


Solution

Here's a simple solution:

library(stringr)
library(dplyr)
library(tidyr)
library(magrittr)

data1 %<>% mutate(lname = str_extract(ProfName, "[A-Za-z\\-]+$"))
data2 %<>% mutate(lname = str_extract(ProfName, "^[A-Za-z\\-]+"))

df <- merge(data1, data2, all.y = TRUE, by = "lname")

head(df)

#          lname           ProfName.x Title                            Profession           # ProfName.y
# 1      Attaway         Alan Attaway   PhD                             Professor      Attaway, A     
# 2 Barr-Pulliam  Dereck Barr-Pulliam   PhD                   Assistant Professor Barr-Pulliam, D     
# 3    Blandford                 <NA>  <NA>                                  <NA>    Blandford, K     
# 4         Blum         Lisa M. Blum   LLM                            Instructor         Blum, L     
# 5     Callahan Carolyn M.  Callahan   PhD Brown-Forman Professor of Accountancy     Callahan, C     
# 6       Foster   Benjamin P. Foster   PhD                             Professor       Foster, B 


Answered By - Dunois
Answer Checked By - Pedro (PHPFixing Volunteer)
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Older Posts Home
View mobile version

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
All Comments
Atom
All Comments

Copyright © PHPFixing