Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
4.2k views
in Technique[技术] by (71.8m points)

r - Replacing specific values with NA in a dataframe

I am trying to replace all * values in my dataset with NA but I get errors. Here is what I do:

strt = as.POSIXct("2021-01-08")
end  = as.POSIXct("2021-01-12")

time = seq.POSIXt(strt, end, by = "day")
x = c(1,2,3,'*','*')
y = c('*',2,3,4,5)

df = data.frame(time, x, y)

df[df == '*'] = NA  #This doesn't work
df[df[-1] == '*'] = NA #Same as above
df[df[,-1] == '*'] = NA #Same as above

There is a problem with POSIXct variable (time). The error is:

Error in as.POSIXlt.character(x, tz, ...) : character string is not in a standard unambiguous format

I tried disregarding time variable by writing df[df[-1] == '*'] but then I get another error:

Error in [<-.data.frame(*tmp*, df[, -1] == "", value = NA) : unsupported matrix index in replacement*

So now I'm stuck. Does anyone know what's the problem here and why R can't run consistently with all types of variables?!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Try this. The issue is because of the date variable. Using dplyr you can have:

library(dplyr)
#Code
new <- df %>% mutate(across(everything(),~as.character(.))) %>%
  replace(.=='*',NA) %>%
  mutate(time=as.Date(time))

Output:

        time    x    y
1 2021-01-08    1 <NA>
2 2021-01-09    2    2
3 2021-01-10    3    3
4 2021-01-11 <NA>    4
5 2021-01-12 <NA>    5

The base R way:

#Base R
df[-1][df[-1]=='*']<-NA

Output:

        time    x    y
1 2021-01-08    1 <NA>
2 2021-01-09    2    2
3 2021-01-10    3    3
4 2021-01-11 <NA>    4
5 2021-01-12 <NA>    5

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...