PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Saturday, October 8, 2022

[FIXED] How can I put zero or NA in a dataframe having 2 type of dates using R?

 October 08, 2022     dataset, r, statistics     No comments   

Issue

The basic idea is to add 0 or NA and so add rows where we add NA or 0 in a variables that don't have values in there day.enter image description here

The function should to align the two variables but should work also for price, and at the same time add NA in the variables that does not have the value. enter image description here

This is my Dataset, and I want to add 0 and NA, for example in the period where miss this value in variable futures, and in variable Date.

|DATA | Date | `futures' | | |
|1 2021-12-23 | 2021-12-23 | 1388.17
|2 2021-12-22 | 2021-12-22 | 1432.36
|3 2021-12-21 | 2021-12-21 | 1508.98
|4 2021-12-20 | 2021-12-20 | 1493.13
|5 2021-12-19 | 2021-12-17 | 1379.97
|6 2021-12-18 | 2021-12-16 | 1597.91

The function should work more or less like this:

|DATA | Date | `futures' | | |
|1 2021-12-23 | 2021-12-23 | 1388.17
|2 2021-12-22 | 2021-12-22 | 1432.36
|3 2021-12-21 | 2021-12-21 | 1508.98
|4 2021-12-20 | 2021-12-20 | 1493.13
|5 2021-12-19 | NA | NA
|6 2021-12-18 | NA | NA

I thought about a for-loop but I do not able to do.

Thanks a lot for help


Solution

I created some arbitrary data and randomly removed rows. This is edited based on the changes you made to your question. You didn't specifically state this, but I assume that if the first date field is NA, you wanted to keep the price.

library(tidyverse)

daten <- seq(as.Date("2020/10/23"), as.Date("2021/10/12"), "days")
price <- round(runif(350, 1000, 1500), digits = 2)

dff <- data.frame(dateOne = sort(sample(daten, size = 325, replace = F), 
                                 decreasing = T),
                  dateTwo = sort(sample(daten, size = 325, replace = F), 
                                 decreasing = T),
                  futures = sample(price, size = 325, replace = T))

This is based on the assumption that the dates are in order.

ordering <- function(d, dT, df1){ # two fields with dates and the data frame
  
  # get indices of date columns
  tellMe <- which(colnames(df1) %in% c(d, dT))
  
  # create ranking (to return original sorting)
  df1$rank <- 1:nrow(df1)
  
  # separate and sort date columns
                # the first date; the rank
  dc  <<- df1[, c(tellMe[[1]], ncol(df1))] %>% arrange_at(vars(d))

                # everything *except the first date field & rank
  dTc <<- df1[, -c(tellMe[[1]], ncol(df1))] %>% arrange_at(vars(dT))

  # identify the index of the date in dTc
  tellMe2 <- which(colnames(dTc) == {{ dT }})

  # find differences
         # missing in the first date field
             # the index of the date is in tellMe2
  dfd2 <- dTc[!dTc[, tellMe2] %in% dc[, 1], tellMe2] 

         # missing in the second date field
  dfd  <- dc[!dc[, 1] %in% dTc[, tellMe2], 1] # since date is in column 1
  
  # find indices of where the NA's need to be placed
  dcInt <<- lapply(dfd2,
                  findInterval,
                  unlist(dc[, 1])) %>% 
    unlist()
  dTcInt <<- lapply(dfd,
                   findInterval,
                   unlist(dTc[, tellMe2])) %>% 
    unlist()

  # build up with differences as NA
  # preceding index provided, offset by index number - 1
  for(i in 1:length(dcInt)){
    dc <- rbind(dc[0:(dcInt[[i]] + i - 1), ], # everything before
                rep(NA, times = ncol(dc)),
                dc[(dcInt[[i]] + i):nrow(dc), ], # everything after
                make.row.names = F)
  }
  
  # preceding index provided, offset by index number - 1
  for(j in 1:length(dTcInt)){
    dTc <- rbind(dTc[0:(dTcInt[[j]] + j - 1), ], # everything before
                 rep(NA, times = ncol(dTc)), 
                 dTc[(dTcInt[[j]] + j):nrow(dTc), ], # everything after
                 make.row.names = F)
  } 
  
  # reassemble the data, in the original order
  df2 <- cbind(dc, dTc) %>%
    select(colnames(df1), rank)
  
  # check row order
  # if the ranking added has any number 1:10 in the first 10 rows
  if(length(df2[1:10, ]$rank %in% 1:10) == 0){
    # add a new ranking variable
    df2$rank2 <- 1:nrow(df2)
    # reverse the new ranking variable and delete both ranking variables
    df2 <- arrange(df2, -rank2) %>% select(-rank, -rank2)
  } else {
    # delete the ranking variable; they are already in the right order
    df2 <- select(df2, -rank)
  }
  return(df2)
}

Now you can use this function with the data.

tryIt <- ordering("dateOne", "dateTwo", dff)
head(tryIt)

This will return a data frame. Whether sorted with the dates increasing or decreasing when sent to the function, it will return it in the order in which it was sent.

enter image description here

tail(tryIt, n = 15)

enter image description here



Answered By - Kat
Answer Checked By - Timothy Miller (PHPFixing Admin)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing