Issue
R version 3.2.2 on Ubuntu 14.04
I am trying to read in R .csv-data (two columns: "id" and "variable1") containing the thousand separator ",". So far no problem. I am using read.csv2 and the data looks like that:
> data <- read.csv2("data.csv", sep = ";", stringsAsFactors = FALSE, dec = ".")
> data[1000:1010, ]
id variable1
1 2,001
1,001 2,002
1,002 2,001
1,003 2,002
1,004 2,001
1,005 2,002
1,006 2,001
1,007 2,002
1,008 2,001
1,009 2,002
1,01 2,001
After that first I tried to use gsub() to remove the commas:
data[, c("id", "variable1")] <- sapply(data[, c("id", "variable1")],
function(x) {as.numeric(gsub("\\,","", as.character(x)))})
> data[1000:1010, ]
id variable1
1 2001
1001 2002
1002 2001
1003 2002
1004 2001
1005 2002
1006 2001
1007 2002
1008 2001
1009 2002
101 2001
I think my problem is already obvious in the first output, because there is a thousand separator, but the "ending zeros" are missing. Like number "1000" is just displayed as "1" and "1010" as "1,01" for the "id"-variable in the data (also in the .csv-data). Of course, R can't identify this.
So my question is: Is there are way to tell R that every number must have three numbers after the thousand separator when reading in the data (or maybe after that), so that I have the correct numbers? The data should look like this:
> data[1000:1010, ]
id variable1
1000 2001
1001 2002
1002 2001
1003 2002
1004 2001
1005 2002
1006 2001
1007 2002
1008 2001
1009 2002
1010 2001
Edit: Thanks you all for your answers. Unfortunately the suggestions will work for this example but not for my data, because I think I chose bad example rows. Other rows in the data can look like this:
id1 variable1
1 1 2,001
999 999 1,102
1000 1 2,001
1001 1,001 2,002
1002 1,002 2,001
Of course, there is twice the number "1". The first is really a "1", but the second should be a "1000". But now I think I can't solve my problem with R. Maybe I need a better export of the original data, because the problem appears also in the .csv data.
Solution
If "," is the only separator, i.e. all of the numbers are integers, you can set the dec
argument of csv2
(or read.csv
) to "," and multiply by 1000:
data <- read.csv2(
text = "id ; variable1
1 ; 2,001
1,008 ; 2,001
1,009 ; 2,002
1,01 ; 2,001
1,3 ; 2,0",
sep = ";",
stringsAsFactors = FALSE,
header = TRUE,
dec = "," )
.
> 1000*data
id variable1
1 1000 2001
2 1008 2001
3 1009 2002
4 1010 2001
5 1300 2000
>
Answered By - mra68 Answer Checked By - Senaida (PHPFixing Volunteer)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.