Issue
I'm a newbie to the world of coding/programming. Loving the challenge so far but I've hit a bit of a roadblock.
I'm attempting to automate genotyping preparation for my job as a lab technician.
Unnecessary background:
I take care of a colony of 500-600 mice with 40-50 given genotype constructs at any given time. Whenever I get new litters (depending on the genotype of the parents) I have to extract DNA and confirm the genotype of the offspring. The task and the challenge of trouble-shooting the process were fun at first but it's getting very mundane and repetitive now. So I've started using R to automate certain parts of my job.
TL/DR: My job is getting repetitive, I want R to help out with that.
So, in essence, I have an archive of mice as follows. I grouped them by the temperature requirements for the genotyping process and by the number of times I need to genotype the DNA samples.
Mouse ID Genotype Gender Age Litter_ID PCR_Temp Rxns
ZDP658 zDC.Cre F 4.9 B23844-1 Z 1
ZDP659 zDC.Cre F 4.9 B23844-1 Z 1
ZDP631 Villin.Cre F 4.9 B23745-2 Y 1
ZDP575 K14.CreER M 5.3 B23744-2 Z 1
ZDO931 K14.CreER M 8.6 B23744-1 Z 1
ZDO932 K14.CreER M 8.6 B23744-1 Z 1
ZDO933 K14.CreER M 8.6 B23744-1 Z 1
ZDQ31 Rosa.TSLP M 3.4 B23701-2 Z 2
ZDQ32 Rosa.TSLP M 3.4 B23701-2 Z 2
My goal is to receive an output of the individual Mouse_ID's in an 8x6 grid grouped by their "PCR_Temps" and multiplied by their "Rxns" in a zigzag order if possible with two extra spaces per genotype group.
My vision of the output is as follows. I would want to input the Litter_ID of the litters that need genotyping and receive the following.
The honeycomb structure is not necessary. A simple rectangular grid works perfectly fine. The same goes for the zigzag format. Both of those aspects of the output format would be nice but aren't a requirement.
Every group of genotypes would need one space for positive control samples and one space for Wild type/Neg control samples. The genotypes that have a value of "2" or more would be repeated as many times as their "Rxns" value states.
I'm sorry if this question is too dense to follow or code for. I have so far been working with dplyr
and ggplot
to manipulate and visualize my mouse archives but this particular problem has me at a loss.
If anyone could even point me in the direction of a package that could get me started I would really appreciate it.
So far I have tried some combinations of dplyr
and purrr
with no success. I have thought of ways to use for loops but have come up empty.
Thank you in advance for any advice.
Solution
Here's an approach with the help of dplyr
, grid
and gridExtra
.
Please excuse the hot mess that is my variable naming convention.
Your data was not complex enough to make a good system, so I generated some random data. Find that at the very end.
First, lets define our litters and filter the mouse data.
library(dplyr)
library(grid)
library(gridExtra)
geno.litters <- c("B23701-2", "B23744-1", "B23844-1","B23944-1")
mice <- data %>%
filter(Litter_ID %in% geno.litters) %>%
arrange(Litter_ID,MouseID) %>%
split(.,.$PCR_Temp)
mice
is now a list of the mice split into plates by PCR temperature.
Let's define a custom function to add positive and negative controls and duplicate rows for those genotypes that need duplicates. We can apply that function to every list element with lapply
.
addControlSlots <- function(x){
genotypes <- unique(x$Genotype)
genotype.dfs <- list()
for ( i in seq_along(genotypes)){
litter.mice <- x[x$Genotype == genotypes[i],]
litter <- litter.mice[1,"Litter_ID"]
Temp <- litter.mice[1,"PCR_Temp"]
litter.mice <- rbind(litter.mice,litter.mice[litter.mice$Rxns == 2,])
litter.mice <- litter.mice[order(litter.mice$MouseID),]
control.rows <- data.frame(Litter_ID = litter, MouseID = c("PosCont","NegCont"),Gender = NA,Genotype = genotypes[i], PCR_Temp = Temp, Rxns = 1)
genotype.dfs[[i]] <- rbind(litter.mice,control.rows)
}
do.call(rbind,genotype.dfs)
}
processed.temps <- lapply(mice,addControlSlots)
processed.temps[[2]]
#$Z
# Litter_ID MouseID Gender Genotype PCR_Temp Rxns
#1 B23701-2 ZO960 F zDC.Cre Z 1
#2 B23701-2 ZP810 F zDC.Cre Z 1
#3 B23701-2 ZP992 M zDC.Cre Z 1
#4 B23701-2 PosCont <NA> zDC.Cre Z 1
#5 B23701-2 NegCont <NA> zDC.Cre Z 1
#...15 more rows
We now have controls after every genotype.
Now let's define a function to fill in the PCR plate. And again apply it to the list.
makePCRPlate <- function(x){
mouse.vector <- as.character(x$MouseID)
plate.vector <- rep(NA,6*8)
plate.vector[1:length(mouse.vector)] <- mouse.vector
wide <- matrix(plate.vector,nrow=2,byrow = FALSE)
rbind(wide[,1:8],wide[,9:16],wide[,17:24])
}
pcr.plates <- lapply(processed.temps,makePCRPlate)
pcr.plates[[2]]
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#[1,] "ZO960" "ZP992" "NegCont" "ZO214" "ZP333" "ZP455" "ZP478" "ZQ130"
#[2,] "ZP810" "PosCont" "ZO214" "ZP333" "ZP455" "ZP478" "ZQ130" "ZQ875"
#[3,] "ZQ875" "NegCont" NA NA NA NA NA NA
#[4,] "PosCont" NA NA NA NA NA NA NA
#[5,] NA NA NA NA NA NA NA NA
#[6,] NA NA NA NA NA NA NA NA
We can see that the samples have been filled in in the zig-zag pattern.
Now let's use grid
to make a .pdf
file with the layouts.
pdf("MyPCRPlates.pdf")
for(i in seq_along(pcr.plates)){
grid.newpage()
grid.table(pcr.plates[[i]])
grid.text(paste0("PCR Temp ",names(pcr.plates)[i]),y = unit(0.9,"npc"))
}
dev.off()
The .pdf
file should have a page for each temperature.
Data
set.seed(1)
data1 <- data.frame("MouseID" = paste0("Z",sample(c("O","P","Q"),size = 50,replace = TRUE),round(runif(50,1,999))),
Litter_ID = sample(c("B23701-2", "B23744-1", "B23744-2", "B23745-2", "B23844-1","B23944-1", "B23944-2", "B23951-1"),size=50, replace = TRUE),
Gender = sample(c("F","M"), size = 50, replace = TRUE))
data2 <- data.frame(Genotype = c("zDC.Cre","Villin.Cre","Villin.Cre","zDC.Cre","K14.CreER","Rosa.TSLP","Rosa.TSLP","K14.CreER"),
Litter_ID = c("B23701-2", "B23744-1", "B23744-2", "B23745-2", "B23844-1","B23944-1", "B23944-2", "B23951-1"),
PCR_Temp = c("Z","Y","Y","Z","Y","Z","Z","Y"),
Rxns = c(1,1,1,1,1,2,2,1))
data <- merge(data1,data2)
data
# Litter_ID MouseID Gender Genotype PCR_Temp Rxns
#1 B23701-2 ZP810 F zDC.Cre Z 1
#2 B23701-2 ZP992 M zDC.Cre Z 1
#3 B23701-2 ZO960 F zDC.Cre Z 1
#4 B23744-1 ZO122 F Villin.Cre Y 1
#5 B23744-1 ZQ259 F Villin.Cre Y 1
#... 45 more rows
Answered By - Ian Campbell Answer Checked By - David Marino (PHPFixing Volunteer)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.