PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Monday, August 29, 2022

[FIXED] How to read .csv-data containing thousand separators and special handling of zeros (in R)?

 August 29, 2022     csv, r, zero     No comments   

Issue

R version 3.2.2 on Ubuntu 14.04

I am trying to read in R .csv-data (two columns: "id" and "variable1") containing the thousand separator ",". So far no problem. I am using read.csv2 and the data looks like that:

> data <- read.csv2("data.csv", sep = ";", stringsAsFactors = FALSE, dec = ".")
> data[1000:1010, ]
     id        variable1
         1     2,001
     1,001     2,002
     1,002     2,001
     1,003     2,002
     1,004     2,001
     1,005     2,002
     1,006     2,001
     1,007     2,002
     1,008     2,001
     1,009     2,002
      1,01     2,001

After that first I tried to use gsub() to remove the commas:

data[, c("id", "variable1")] <- sapply(data[, c("id", "variable1")],
          function(x) {as.numeric(gsub("\\,","", as.character(x)))})
> data[1000:1010, ]
     id      variable1
        1      2001
     1001      2002
     1002      2001
     1003      2002
     1004      2001
     1005      2002
     1006      2001
     1007      2002
     1008      2001
     1009      2002
      101      2001

I think my problem is already obvious in the first output, because there is a thousand separator, but the "ending zeros" are missing. Like number "1000" is just displayed as "1" and "1010" as "1,01" for the "id"-variable in the data (also in the .csv-data). Of course, R can't identify this.

So my question is: Is there are way to tell R that every number must have three numbers after the thousand separator when reading in the data (or maybe after that), so that I have the correct numbers? The data should look like this:

> data[1000:1010, ]
     id      variable1
     1000      2001
     1001      2002
     1002      2001
     1003      2002
     1004      2001
     1005      2002
     1006      2001
     1007      2002
     1008      2001
     1009      2002
     1010      2001

Edit: Thanks you all for your answers. Unfortunately the suggestions will work for this example but not for my data, because I think I chose bad example rows. Other rows in the data can look like this:

       id1 variable1
1        1     2,001
999    999     1,102
1000     1     2,001
1001 1,001     2,002
1002 1,002     2,001

Of course, there is twice the number "1". The first is really a "1", but the second should be a "1000". But now I think I can't solve my problem with R. Maybe I need a better export of the original data, because the problem appears also in the .csv data.


Solution

If "," is the only separator, i.e. all of the numbers are integers, you can set the dec argument of csv2 (or read.csv) to "," and multiply by 1000:

data <- read.csv2(
  text = "id    ; variable1
          1     ; 2,001
          1,008 ; 2,001
          1,009 ; 2,002
          1,01  ; 2,001
          1,3   ; 2,0",
  sep = ";",
  stringsAsFactors = FALSE,
  header = TRUE,
  dec = "," )

.

> 1000*data
    id variable1
1 1000      2001
2 1008      2001
3 1009      2002
4 1010      2001
5 1300      2000
> 


Answered By - mra68
Answer Checked By - Gilberto Lyons (PHPFixing Admin)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing