데이터 프레임의 변수가 많은 수식을 간결하게 작성하는 방법은 무엇입니까?

Programing

데이터 프레임의 변수가 많은 수식을 간결하게 작성하는 방법은 무엇입니까?

crosscheck 2020. 7. 25. 10:33

데이터 프레임의 변수가 많은 수식을 간결하게 작성하는 방법은 무엇입니까?

응답 변수와 세 가지 공변량을 포함하는 데이터가 있다고 가정합니다 (장난감으로).

y = c(1,4,6)
d = data.frame(x1 = c(4,-1,3), x2 = c(3,9,8), x3 = c(4,-4,-2))

데이터에 선형 회귀를 맞추고 싶습니다.

fit = lm(y ~ d$x1 + d$x2 + d$y2)

각 개별 공변량을 작성할 필요가 없도록 수식을 작성하는 방법이 있습니까? 예를 들어

fit = lm(y ~ d)

(데이터 프레임의 각 변수가 공변량이되기를 원합니다.) 실제로 데이터 프레임에 50 개의 변수가 있기 때문에 묻지 않기를 원합니다 x1 + x2 + x3 + etc.

수식에서 모든 변수를 의미하는 데 사용할 수있는 특수 식별자가 있습니다 .. 식별자입니다.

y <- c(1,4,6)
d <- data.frame(y = y, x1 = c(4,-1,3), x2 = c(3,9,8), x3 = c(4,-4,-2))
mod <- lm(y ~ ., data = d)

하나를 제외한 모든 변수를 사용하기 위해 이와 같은 작업을 수행 할 수도 있습니다 (이 경우 x3은 제외됨).

mod <- lm(y ~ . - x3, data = d)

기술적으로 공식에 아직 언급 되지 않은 모든 변수를. 의미 합니다 . 예를 들어

lm(y ~ x1 * x2 + ., data = d)

어디 .까지나 참고 것 x3같은 x1과 x2화학식 이미.

약간 다른 방법은 문자열에서 수식을 만드는 것입니다. 에서 formula도움말 페이지 다음과 같은 예를 찾을 수 있습니다 :

## Create a formula for a model with a large number of variables:
xnam <- paste("x", 1:25, sep="")
fmla <- as.formula(paste("y ~ ", paste(xnam, collapse= "+")))

그런 다음 생성 된 수식을 보면 다음과 같은 결과가 나타납니다.

R> fmla
y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 + x11 + 
    x12 + x13 + x14 + x15 + x16 + x17 + x18 + x19 + x20 + x21 + 
    x22 + x23 + x24 + x25

예 y, 데이터 프레임의 첫 번째 열로 응답 을 추가 하고 호출 lm()하십시오.

d2<-data.frame(y,d)
> d2
  y x1 x2 x3
1 1  4  3  4
2 4 -1  9 -4
3 6  3  8 -2
> lm(d2)

Call:
lm(formula = d2)

Coefficients:
(Intercept)           x1           x2           x3  
    -5.6316       0.7895       1.1579           NA

또한 R에 대한 내 정보 <-는 위에 할당 이 권장 된다고 지적합니다 =.

juba의 방법의 확장은 reformulate이러한 작업을 위해 명시 적으로 설계된 기능인 을 사용 하는 것입니다.

## Create a formula for a model with a large number of variables:
xnam <- paste("x", 1:25, sep="")

reformulate(xnam, "y")
y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 + x11 + 
    x12 + x13 + x14 + x15 + x16 + x17 + x18 + x19 + x20 + x21 + 
    x22 + x23 + x24 + x25

For the example in the OP, the easiest solution here would be

# add y variable to data.frame d
d <- cbind(y, d)
reformulate(names(d)[-1], names(d[1]))
y ~ x1 + x2 + x3

mod <- lm(reformulate(names(d)[-1], names(d[1])), data=d)

Note that adding the dependent variable to the data.frame in d <- cbind(y, d) is preferred not only because it allows for the use of reformulate, but also because it allows for future use of the lm object in functions like predict.

I build this solution, reformulate does not take care if variable names have white spaces.

add_backticks = function(x) {
    paste0("`", x, "`")
}

x_lm_formula = function(x) {
    paste(add_backticks(x), collapse = " + ")
}

build_lm_formula = function(x, y){
    if (length(y)>1){
        stop("y needs to be just one variable")
    }
    as.formula(        
        paste0("`",y,"`", " ~ ", x_lm_formula(x))
    )
}

# Example
df <- data.frame(
    y = c(1,4,6), 
    x1 = c(4,-1,3), 
    x2 = c(3,9,8), 
    x3 = c(4,-4,-2)
    )

# Model Specification
columns = colnames(df)
y_cols = columns[1]
x_cols = columns[2:length(columns)]
formula = build_lm_formula(x_cols, y_cols)
formula
# output
# "`y` ~ `x1` + `x2` + `x3`"

# Run Model
lm(formula = formula, data = df)
# output
Call:
    lm(formula = formula, data = df)

Coefficients:
    (Intercept)           x1           x2           x3  
        -5.6316       0.7895       1.1579           NA

```

You can check the package leaps and in particular the function regsubsets() functions for model selection. As stated in the documentation:

Model selection by exhaustive search, forward or backward stepwise, or sequential replacement

I suggest:

fit = lm(y ~ ., data = d[,c(1,2,3)])

Where c(1,2,3) is a vector of the column numbers you want to train the model on. Don't forget to include the response variable's column.

참고URL : https://stackoverflow.com/questions/5251507/how-to-succinctly-write-a-formula-with-many-variables-from-a-data-frame

'Programing' 카테고리의 다른 글

테이블의 열 수를 구합니다 (0)	2020.07.25
유닉스 타임 스탬프를 날짜 문자열로 변환 (0)	2020.07.25
C #에서 IsNullOrEmpty와 IsNullOrWhiteSpace의 차이점 (0)	2020.07.25
'var'매개 변수는 더 이상 사용되지 않으며 Swift 3에서 제거됩니다. (0)	2020.07.25
Ioc / DI-응용 프로그램 진입 점에서 모든 레이어 / 어셈블리를 참조해야하는 이유는 무엇입니까? (0)	2020.07.25

현재글데이터 프레임의 변수가 많은 수식을 간결하게 작성하는 방법은 무엇입니까?

crosscheck

데이터 프레임의 변수가 많은 수식을 간결하게 작성하는 방법은 무엇입니까?

데이터 프레임의 변수가 많은 수식을 간결하게 작성하는 방법은 무엇입니까?

'Programing' 카테고리의 다른 글

'Programing'의 다른글

티스토리툴바

데이터 프레임의 변수가 많은 수식을 간결하게 작성하는 방법은 무엇입니까?

데이터 프레임의 변수가 많은 수식을 간결하게 작성하는 방법은 무엇입니까?

'Programing' 카테고리의 다른 글

'Programing'의 다른글

관련글

티스토리툴바