Issue

My text file looks like the following

"
file1
cols=
col1
col2
# this is a comment
col3

data
a,b,c
d,e,f
"

As you can see, the data only starts after the data tag and the rows before that essentially tell me what the column names are. There could be some comments which means the number of rows before the data tag is variable.

How can I parse that in R? Possibly with some tidy tools? Expected output is:

# A tibble: 2 x 3
  col1  col2  col3 
  <chr> <chr> <chr>
1 a     b     c    
2 d     e     f

Thanks!

Solution

Here is a base way with scan(). strip.white = T to remove blank lines and comment.char = "#" to remove lines leading with #.

text <- scan("test.txt", "", sep = "\n", strip.white = T, comment.char = "#")
text
# [1] "file1" "cols=" "col1"  "col2"  "col3"  "data"  "a,b,c" "d,e,f"

ind1 <- which(text == "cols=")
ind2 <- which(text == "data")
df <- read.table(text = paste(text[-seq(ind2)], collapse = "\n"),
                 sep = ",", col.names = text[(ind1 + 1):(ind2 - 1)])

df
#   col1 col2 col3
# 1    a    b    c
# 2    d    e    f

Answered By - Darren Tsai

Answer Checked By - David Goodson (PHPFixing Volunteer)

Friday, August 26, 2022

[FIXED] how to parse a text file that contains the column names at the beginning of the file?

Issue

Solution

0 Comments:

Post a Comment

Total Pageviews

Featured Post

Why Learn PHP Programming

Friday, August 26, 2022

Issue

Solution

0 Comments:

Post a Comment

Total Pageviews

Featured Post

Why Learn PHP Programming

Subscribe To