Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
660 views
in Technique[技术] by (71.8m points)

r - Row-wise sort then concatenate across specific columns of data frame

(Related question that does not include sorting. It's easy to just use paste when you don't need to sort.)

I have a less-than-ideally-structured table with character columns that are generic "item1","item2" etc. I would like to create a new character variable that is the alphabetized, comma-separated concatenation of these columns. So for example, in row 5, if item1 = "milk", item2 = "eggs", and item3 = "butter", the new variable in row 5 might be "butter, eggs, milk"

I wrote a function f() below that works on two character variables. However, I am having trouble

  • Using mapply or other "vectorization" (I know it's really just a for loop)
  • Generalizing the function to an arbitrary number of columns

Any help much appreciated.

df <- data.frame(a =c("foo","bar"), 
                 b= c("baz","qux"))   
paste(df$a,df$b, sep=", ")
# returns [1] "foo, baz" "bar, qux" ... but I want [1] "baz, foo" "bar, qux"

f <- function(a,b) paste(c(a,b)[order(c(a,b))],collapse=", ")
f("foo","baz") 
# returns [1] "baz, foo" ... which is what I want ... how to vectorize?

df$new_var <- mapply(f, df$a, df$b)
df 
#     a   b new_var      <- new_var is not what I want
# 1 foo baz    1, 2
# 2 bar qux    1, 2

# Interestingly, data.table is smart enough to fix my bad mapply
library(data.table)
dt <- data.table(a =c("foo","bar"), 
                 b= c("baz","qux"))  
dt[,new_var:=mapply(f, a, b)]
dt
#     a    b  new_var    <- new var IS what I want
# 1: foo baz baz, foo
# 2: bar qux bar, qux
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Just apply down rows:

apply(df,1,function(x){
  paste(sort(x),collapse = ",")
})

Wrap it in a function if you want. You'll either have to define which columns to send or assume all. i.e. apply(df[ ,2:3],1,f()...

sort(x) is the same as x[order(x)]


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...