【R】tidyxl

1. はじめに

tidyxlは、Exxcelから非テーブルのデータを読み込み、扱いやすいようにtidyな構造にしてくれるパッケージです。xlsxに対応し、xls形式には対応していません。

2. インストール

Githubからインストールできます。

devtools::install_github("nacnudus/tidyxl")

3. つかってみる

パッケージに例として含まれている‘titanic.xlsx’を使ってみます。

library("tidyxl")
library("tidyverse")

titanic <- system.file("extdata/titanic.xlsx", package = "tidyxl")
readxl::read_excel(titanic)
> readxl::read_excel(titanic)
New names:
* `` -> ...1
* `` -> ...2
* `` -> ...5
* `` -> ...7
# A tibble: 10 x 7
   ...1  ...2   Age      Child ...5  Adult ...7 
   <chr> <chr>  <chr>    <chr> <chr> <chr> <chr>
 1 NA    NA     Survived No    Yes   No    Yes  
 2 Class Sex    NA       NA    NA    NA    NA   
 3 1st   Male   NA       0     5     118   57   
 4 NA    Female NA       0     1     4     140  
 5 2nd   Male   NA       0     11    154   14   
 6 NA    Female NA       0     13    13    80   
 7 3rd   Male   NA       35    13    387   75   
 8 NA    Female NA       17    14    89    76   
 9 Crew  Male   NA       0     0     670   192  
10 NA    Female NA       0     0     3     20   
x <- xlsx_cells(titanic)
glimpse(x)
> glimpse(x)
Rows: 60
Columns: 21
$ sheet               <chr> "Sheet1", "Sheet1", "Sheet1", "Sheet1", "Sheet1", "Sheet1", "Sheet1", "~
$ address             <chr> "C1", "D1", "E1", "F1", "G1", "C2", "D2", "E2", "F2", "G2", "A3", "B3",~
$ row                 <int> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5,~
$ col                 <int> 3, 4, 5, 6, 7, 3, 4, 5, 6, 7, 1, 2, 1, 2, 4, 5, 6, 7, 1, 2, 4, 5, 6, 7,~
$ is_blank            <lgl> FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL~
$ data_type           <chr> "character", "character", "blank", "character", "blank", "character", "~
$ error               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ logical             <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ numeric             <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, 5, 118, 57, ~
$ date                <dttm> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
$ character           <chr> "Age", "Child", NA, "Adult", NA, "Survived", "No", "Yes", "No", "Yes", ~
$ character_formatted <list> [<tbl_df[1 x 14]>], [<tbl_df[1 x 14]>], <NULL>, [<tbl_df[1 x 14]>], <N~
$ formula             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ is_array            <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F~
$ formula_ref         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ formula_group       <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ comment             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ height              <dbl> 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,~
$ width               <dbl> 8.38, 8.38, 8.38, 8.38, 8.38, 8.38, 8.38, 8.38, 8.38, 8.38, 8.38, 8.38,~
$ style_format        <chr> "Normal", "Normal", "Normal", "Normal", "Normal", "Normal", "Normal", "~
$ local_format_id     <int> 2, 3, 3, 3, 3, 2, 3, 3, 3, 3, 2, 2, 3, 3, 1, 1, 1, 1, 3, 3, 1, 1, 1, 1,~

このように、特定のセルを取り出すこともできます。

x[x$row == 4, c("date", "address", "width", "numeric")]
> x[x$row == 4, c("date", "address", "width", "numeric")]
# A tibble: 6 x 4
  date   address width numeric
  <dttm> <chr>   <dbl>   <dbl>
1 NA     A4       8.38      NA
2 NA     B4       8.38      NA
3 NA     D4       8.38       0
4 NA     E4       8.38       5
5 NA     F4       8.38     118
6 NA     G4       8.38      57

4. さいごに

Excelのデータって結構使いますが、これをRでもっと楽に使えるようになるといいですね。

Add a Comment

メールアドレスが公開されることはありません。