Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
843 views
in Technique[技术] by (71.8m points)

split - R semicolon delimited a column into rows

I am using RStudio 2.15.0 and have created an object from Excel using XLConnect with 3000+ rows and 12 columns I am trying to delimit/split a column into the rows but don't know if this is possible or how to do it. Example of the data below using the 3 columns in connection. any help on this would be grand.

Code that is working for 2 of the columns is below.

v1 <- with(df, tapply(PolId, Description,  FUN= function(x) {
x1 <- paste(x, collapse=";")
gsub('(\b\S+\b)(?=.*\b\1\b.*);', '',     x1, perl=TRUE)}))
library(stringr)
Description <- rep(names(v1),  str_count(v1, '\w+'))
PolId <- scan(text=gsub(';+', ' ', v1), what='', quiet=TRUE)
data.frame(PolId, Description)  

Sample data

PolId   Description  Document.Type
ABC123;ABC456;ABC789;   TEST1  Pol1
ABC123;ABC456;ABC789;   TEST1  Pol1
ABC123;ABC456;ABC789;   TEST1  Pol1
AAA123; TEST1  End1
AAA123; TEST2  End2
ABB123;ABC123;  TEST3  End1
ABB123;ABC123;  TEST3  End1

I want the output to be like this (replacing the duplicate Polid's)

PolId   Description  Document.Type
ABC123  TEST1        Pol1
ABC456  TEST1        Pol1
ABC789  TEST1        Pol1
AAA123  TEST1        End1
AAA123  TEST2        End2
ABB123  TEST3        End1
ABC123  TEST3        End1
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here is a base R solution. Split the PolId field using strplit and for each such split field cbind it with the corresponding Description. This gives a list of matrices which we rbind together. Finally set the column names.

out <- do.call(rbind, Map(cbind, strsplit(DF$PolId, ";"), DF$Description))
colnames(out) <- colnames(DF)

giving:

> out
      PolId    Description
 [1,] "ABC123" "TEST1"    
 [2,] "ABC456" "TEST1"    
 [3,] "ABC789" "TEST1"    
 [4,] "ABC123" "TEST1"    
 [5,] "ABC456" "TEST1"    
 [6,] "ABC789" "TEST1"    
 [7,] "ABC123" "TEST1"    
 [8,] "ABC456" "TEST1"    
 [9,] "ABC789" "TEST1"    
[10,] "AAA123" "TEST1"    
[11,] "AAA123" "TEST2"    
[12,] "ABB123" "TEST3"    
[13,] "ABC123" "TEST3"    
[14,] "ABB123" "TEST3"    
[15,] "ABC123" "TEST3" 

Note: We used this as the input:

DF <-
structure(list(PolId = c("ABC123;ABC456;ABC789;", "ABC123;ABC456;ABC789;", 
"ABC123;ABC456;ABC789;", "AAA123;", "AAA123;", "ABB123;ABC123;", 
"ABB123;ABC123;"), Description = c("TEST1", "TEST1", "TEST1", 
"TEST1", "TEST2", "TEST3", "TEST3")), .Names = c("PolId", "Description"
), class = "data.frame", row.names = c(NA, -7L))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...