Programing

R, xlsx 또는 xls로 Excel 파일 가져 오기

crosscheck 2020. 9. 18. 07:38
반응형

R, xlsx 또는 xls로 Excel 파일 가져 오기


누군가가 Excel 2007 (.xlsx) 파일을 R로 가져 오는 가장 좋은 방법을 도와 줄 수 있습니다. 여러 가지 방법을 시도했지만 작동하지 않는 것 같습니다. 2.13.1, Windows XP, xlsx 0.3.0으로 업그레이드했는데 오류가 계속 발생하는 이유를 모르겠습니다. 나는 시도했다 :

AB<-read.xlsx("C:/AB_DNA_Tag_Numbers.xlsx","DNA_Tag_Numbers")

또는

AB<-read.xlsx("C:/AB_DNA_Tag_Numbers.xlsx",1)

하지만 오류가 발생합니다.

 Error in .jnew("java/io/FileInputStream", file) : 
  java.io.FileNotFoundException: C:\AB_DNA_Tag_Numbers.xlsx (The system cannot find the file specified)

감사합니다.


이상한 외부 종속성 *이없는 솔루션의 경우 다음이 있습니다 readxl.

readxl 패키지를 사용하면 Excel에서 R로 데이터를 쉽게 가져올 수 있습니다. 많은 기존 패키지 (예 : gdata, xlsx, xlsReadWrite)에 비해 readxl은 외부 종속성이 없으므로 모든 운영 체제에서 쉽게 설치하고 사용할 수 있습니다. 단일 시트에 저장된 테이블 형식 데이터와 함께 작동하도록 설계되었습니다.

Readxl은 레거시 .xls 형식과 최신 xml 기반 .xlsx 형식을 모두 지원합니다. .xls 지원은 with libxls C 라이브러리를 가능하게하여 기본 바이너리 형식의 많은 복잡성을 추상화합니다. .xlsx를 구문 분석하기 위해 RapidXML C ++ 라이브러리를 사용합니다.

다음과 같이 설치할 수 있습니다.

install.packages("readxl") # CRAN version

또는

devtools::install_github("hadley/readxl") # development version

용법

library(readxl)

# read_excel reads both xls and xlsx files
read_excel("my-old-spreadsheet.xls")
read_excel("my-new-spreadsheet.xlsx")

# Specify sheet with a number or name
read_excel("my-spreadsheet.xls", sheet = "data")
read_excel("my-spreadsheet.xls", sheet = 2)

# If NAs are represented by something other than blank cells,
# set the na argument
read_excel("my-spreadsheet.xls", na = "NA")

* 엄격히 사실이 아닙니다. Rtools (Windows 용) 또는 Xcode (OSX 용)를 필요로하는 Rcpp패키지 가 필요합니다 . 이는 R 외부의 종속성입니다.하지만 경로 등을 조작 할 필요가 없습니다. Java 및 Perl 종속성보다 유리합니다.

업데이트 이제 rexcel 패키지가 있습니다. 이를 통해 Excel 파일에서 R로 Excel 서식, 기능 및 기타 여러 종류의 정보를 얻을 수 있습니다.


XLConnect 패키지를 사용해 볼 수도 있습니다. xlsx보다 운이 좋았습니다 (또한 .xls 파일도 읽을 수 있음).

library(XLConnect)
theData <- readWorksheet(loadWorkbook("C:/AB_DNA_Tag_Numbers.xlsx"),sheet=1)

또한 파일을 찾을 수없는 경우 문제가있는 경우 file.choose ()로 선택해보십시오.


나는 확실히 패키지 read.xls기능을 시도 할 것인데 gdata, 이는 xlsx패키지 보다 훨씬 더 성숙 합니다. Perl이 필요할 수 있습니다 ...


최신 정보

아래 답변은 이제 다소 구식이므로 readxl 패키지에 주의를 기울일 것 입니다. Excel 시트의 형식이 잘 지정되어 있거나 정리 되어 있으면 이제 readxl사용 하여 통합 문서에서 읽습니다. 시트의 형식이 잘못 지정 read.csv()되거나 배치 된 경우 여전히 CSV로 내 보낸 다음 R에서 문제를 처리합니다 readLines().

실물

My preferred way is to save individual Excel sheets in comma separated value (CSV) files. On Windows, these files are associated with Excel so you don't loose the double-click-open-in-Excel "feature".

CSV files can be read into R using read.csv(), or, if you are in a location or using a computer set up with some European settings (where , is used as the decimal place), using read.csv2().

These functions have sensible defaults that makes reading appropriately formatted files simple. Just keep any labels for samples or variables in the first row or column.

Added benefits of storing files in CSV are that as the files are plain text they can be passed around very easily and you can be confident they will open anywhere; one doesn't need Excel to look at or edit the data.


Example 2012:

library("xlsx")
FirstTable <- read.xlsx("MyExcelFile.xlsx", 1 , stringsAsFactors=F)
SecondTable <- read.xlsx("MyExcelFile.xlsx", 2 , stringsAsFactors=F)
  • I would try 'xlsx' package for it is easy to handle and seems mature enough
  • worked fine for me and did not need any additionals like Perl or whatever

Example 2015:

library("readxl")
FirstTable  <- read_excel("MyExcelFile.xlsx", 1)
SecondTable <- read_excel("MyExcelFile.xlsx", 2)
  • nowadays I use readxl and have made good experience with it.
  • no extra stuff needed
  • good performance

This new package looks nice http://cran.r-project.org/web/packages/openxlsx/openxlsx.pdf It doesn't require rJava and is using 'Rcpp' for speed.


If you are running into the same problem and R is giving you an error -- could not find function ".jnew" -- Just install the library rJava. Or if you have it already just run the line library(rJava). That should be the problem.

Also, it should be clear to everybody that csv and txt files are easier to work with, but life is not easy and sometimes you just have to open an xlsx.


I recently discovered Schaun Wheeler's function for importing excel files into R after realising that the xlxs package hadn't been updated for R 3.1.0.

https://gist.github.com/schaunwheeler/5825002

The file name needs to have the ".xlsx" extension and the file can't be open when you run the function.

This function is really useful for accessing other peoples work. The main advantages over using the read.csv function are when

  • Importing multiple excel files
  • Importing large files
  • Files that are updated regularly

Using the read.csv function requires manual opening and saving of each Excel document which is time consuming and very boring. Using Schaun's function to automate the workflow is therefore a massive help.

Big props to Schaun for this solution.


For me the openxlx package worked in the easiest way.

install.packages("openxlsx")
library(openxlsx)
rawData<-read.xlsx("your.xlsx");

What's your operating system? What version of R are you running: 32-bit or 64-bit? What version of Java do you have installed?

I had a similar error when I first started using the read.xlsx() function and discovered that my issue (which may or may not be related to yours; at a minimum, this response should be viewed as "try this, too") was related to the incompatability of .xlsx pacakge with 64-bit Java. I'm fairly certain that the .xlsx package requires 32-bit Java.

Use 32-bit R and make sure that 32-bit Java is installed. This may address your issue.


You have checked that R is actually able to find the file, e.g. file.exists("C:/AB_DNA_Tag_Numbers.xlsx") ? – Ben Bolker Aug 14 '11 at 23:05

Above comment should've solved your problem:

require("xlsx")
read.xlsx("filepath/filename.xlsx",1) 

should work fine after that.


You may be able to keep multiple tabs and more formatting information if you export to an OpenDocument Spreadsheet file (ods) or an older Excel format and import it with the ODS reader or the Excel reader you mentioned above.


As stated by many here, I am writing the same thing but with an additional point!

At first we need to make sure that our R Studio has these two packages installed:

  1. "readxl"
  2. "XLConnect"

In order to load a package in R you can use the below function:

install.packages("readxl/XLConnect")
library(XLConnect)
search()

search will display the list of current packages being available in your R Studio.

Now another catch, even though you might have these two packages but still you may encounter problem while reading "xlsx" file and the error could be like "error: more columns than column name"

To solve this issue you can simply resave your excel sheet "xlsx" in to

"CSV (Comma delimited)"

and your life will be super easy....

Have fun!!


I have tried very hard on all the answers above. However, they did not actually help because I used a mac. The rio library has this import function which can basically import any type of data file into Rstudio, even those file using languages other than English!

Try codes below:

    library(rio)
    AB <- import("C:/AB_DNA_Tag_Numbers.xlsx")
    AB <- AB[,1]

Hope this help. For more detailed reference: https://cran.r-project.org/web/packages/rio/vignettes/rio.html

참고URL : https://stackoverflow.com/questions/7049272/importing-excel-files-into-r-xlsx-or-xls

반응형