Programing

data.frame에 열 추가

crosscheck 2020. 8. 10. 07:41
반응형

data.frame에 열 추가


아래에 data.frame이 있습니다. h_noh_no 1,2,3,4의 첫 번째 시리즈는 클래스 1이고 두 번째 시리즈 h_no는 클래스 2 등이 되도록 열 1 ( )에 따라 데이터를 분류하는 열을 추가하고 싶습니다 . 마지막 열에 표시된 것과 같은.

h_no  h_freq  h_freqsq
1     0.09091 0.008264628 1
2     0.00000 0.000000000 1
3     0.04545 0.002065702 1
4     0.00000 0.000000000 1  
1     0.13636 0.018594050 2
2     0.00000 0.000000000 2
3     0.00000 0.000000000 2
4     0.04545 0.002065702 2
5     0.31818 0.101238512 2
6     0.00000 0.000000000 2
7     0.50000 0.250000000 2 
1     0.13636 0.018594050 3 
2     0.09091 0.008264628 3
3     0.40909 0.167354628 3
4     0.04545 0.002065702 3

다양한 기술을 사용하여 데이터에 열을 추가 할 수 있습니다. 아래 인용문은 관련 도움말 텍스트의 "세부 정보"섹션에서 가져온 것입니다 [[.data.frame.

데이터 프레임은 여러 모드에서 인덱싱 할 수 있습니다. [[[단일 벡터 인덱스 ( x[i]또는 x[[i]]) 와 함께 사용 되면 데이터 프레임이 목록 인 것처럼 인덱싱됩니다.

my.dataframe["new.col"] <- a.vector
my.dataframe[["new.col"]] <- a.vector

의 data.frame 메서드 는 목록으로 $처리 x됩니다.

my.dataframe$new.col <- a.vector

경우 [[[두 지수 (함께 사용 x[i, j]하고 x[[i, j]])들이 매트릭스를 인덱싱처럼 행동

my.dataframe[ , "new.col"] <- a.vector

의 방법은 data.frame열 또는 행으로 작업하는지 여부를 지정하지 않으면 열을 의미한다고 가정합니다.


예를 들어 다음과 같이 작동합니다.

# make some fake data
your.df <- data.frame(no = c(1:4, 1:7, 1:5), h_freq = runif(16), h_freqsq = runif(16))

# find where one appears and 
from <- which(your.df$no == 1)
to <- c((from-1)[-1], nrow(your.df)) # up to which point the sequence runs

# generate a sequence (len) and based on its length, repeat a consecutive number len times
get.seq <- mapply(from, to, 1:length(from), FUN = function(x, y, z) {
            len <- length(seq(from = x[1], to = y[1]))
            return(rep(z, times = len))
         })

# when we unlist, we get a vector
your.df$group <- unlist(get.seq)
# and append it to your original data.frame. since this is
# designating a group, it makes sense to make it a factor
your.df$group <- as.factor(your.df$group)


   no     h_freq   h_freqsq group
1   1 0.40998238 0.06463876     1
2   2 0.98086928 0.33093795     1
3   3 0.28908651 0.74077119     1
4   4 0.10476768 0.56784786     1
5   1 0.75478995 0.60479945     2
6   2 0.26974011 0.95231761     2
7   3 0.53676266 0.74370154     2
8   4 0.99784066 0.37499294     2
9   5 0.89771767 0.83467805     2
10  6 0.05363139 0.32066178     2
11  7 0.71741529 0.84572717     2
12  1 0.10654430 0.32917711     3
13  2 0.41971959 0.87155514     3
14  3 0.32432646 0.65789294     3
15  4 0.77896780 0.27599187     3
16  5 0.06100008 0.55399326     3

쉽게 : 데이터 프레임은 A입니다.

b <- A[,1]
b <- b==1
b <- cumsum(b)

그런 다음 열 b를 얻습니다.


질문을 올바르게 이해하면가 h_no증가하지 않는 시기를 감지 한 다음 class. (이 문제를 어떻게 해결했는지 살펴 보겠습니다. 마지막에는 자체 기능이 있습니다.)

우리 h_no는 현재 열에 대해서만 신경 을 쓰므로 데이터 프레임에서 추출 할 수 있습니다.

> h_no <- data$h_no

We want to detect when h_no doesn't go up, which we can do by working out when the difference between successive elements is either negative or zero. R provides the diff function which gives us the vector of differences:

> d.h_no <- diff(h_no)
> d.h_no
 [1]  1  1  1 -3  1  1  1  1  1  1 -6  1  1  1

Once we have that, it is a simple matter to find the ones that are non-positive:

> nonpos <- d.h_no <= 0
> nonpos
 [1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
[13] FALSE FALSE

In R, TRUE and FALSE are basically the same as 1 and 0, so if we get the cumulative sum of nonpos, it will increase by 1 in (almost) the appropriate spots. The cumsum function (which is basically the opposite of diff) can do this.

> cumsum(nonpos)
 [1] 0 0 0 1 1 1 1 1 1 1 2 2 2 2

But, there are two problems: the numbers are one too small; and, we are missing the first element (there should be four in the first class).

The first problem is simply solved: 1+cumsum(nonpos). And the second just requires adding a 1 to the front of the vector, since the first element is always in class 1:

 > classes <- c(1, 1 + cumsum(nonpos))
 > classes
  [1] 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3

Now, we can attach it back onto our data frame with cbind (by using the class= syntax, we can give the column the class heading):

 > data_w_classes <- cbind(data, class=classes)

And data_w_classes now contains the result.

Final result

We can compress the lines together and wrap it all up into a function to make it easier to use:

classify <- function(data) {
   cbind(data, class=c(1, 1 + cumsum(diff(data$h_no) <= 0)))
}

Or, since it makes sense for the class to be a factor:

classify <- function(data) {
   cbind(data, class=factor(c(1, 1 + cumsum(diff(data$h_no) <= 0))))
}

You use either function like:

> classified <- classify(data) # doesn't overwrite data
> data <- classify(data) # data now has the "class" column

(This method of solving this problem is good because it avoids explicit iteration, which is generally recommend for R, and avoids generating lots of intermediate vectors and list etc. And also it's kinda neat how it can be written on one line :) )


In addition to Roman's answer, something like this might be even simpler. Note that I haven't tested it because I do not have access to R right now.

# Note that I use a global variable here
# normally not advisable, but I liked the
# use here to make the code shorter
index <<- 0
new_column = sapply(df$h_no, function(x) {
  if(x == 1) index = index + 1
  return(index)
})

The function iterates over the values in n_ho and always returns the categorie that the current value belongs to. If a value of 1 is detected, we increase the global variable index and continue.


Data.frame[,'h_new_column'] <- as.integer(Data.frame[,'h_no'], breaks=c(1, 4, 7))

Approach based on identifying number of groups (x in mapply) and its length (y in mapply)

mytb<-read.table(text="h_no  h_freq  h_freqsq group
1     0.09091 0.008264628 1
2     0.00000 0.000000000 1
3     0.04545 0.002065702 1
4     0.00000 0.000000000 1  
1     0.13636 0.018594050 2
2     0.00000 0.000000000 2
3     0.00000 0.000000000 2
4     0.04545 0.002065702 2
5     0.31818 0.101238512 2
6     0.00000 0.000000000 2
7     0.50000 0.250000000 2 
1     0.13636 0.018594050 3 
2     0.09091 0.008264628 3
3     0.40909 0.167354628 3
4     0.04545 0.002065702 3", header=T, stringsAsFactors=F)
mytb$group<-NULL

positionsof1s<-grep(1,mytb$h_no)

mytb$newgroup<-unlist(mapply(function(x,y) 
  rep(x,y),                      # repeat x number y times
  x= 1:length(positionsof1s),    # x is 1 to number of nth group = g1:g3
  y= c( diff(positionsof1s),     # y is number of repeats of groups g1 to penultimate (g2) = 4, 7
        nrow(mytb)-              # this line and the following gives number of repeat for last group (g3)
          (positionsof1s[length(positionsof1s )]-1 )  # number of rows - position of penultimate group (g2) 
      ) ) )
mytb

I believe that using "cbind" is the simplest way to add a column to a data frame in R. Below an example:

    myDf = data.frame(index=seq(1,10,1), Val=seq(1,10,1))
    newCol= seq(2,20,2)
    myDf = cbind(myDf,newCol)

you can add first an empty column to your data.frame then specify your conditions for the new column,

agtoexcel2$NONloan <- NA
agtoexcel2$NONloan[agtoexcel2$haveloan==2 & agtoexcel2$ifoloan==2 & agtoexcel2$both==0 ] <- 1
agtoexcel2$NONloan[agtoexcel2$haveloan==1 | agtoexcel2$ifoloan==1 | agtoexcel2$both==1 ] <- 0 

참고URL : https://stackoverflow.com/questions/10150579/adding-a-column-to-a-data-frame

반응형