【R】tidyxl
2021年8月22日
1. はじめに
tidyxl
は、Exxcelから非テーブルのデータを読み込み、扱いやすいようにtidyな構造にしてくれるパッケージです。xlsxに対応し、xls形式には対応していません。
2. インストール
Githubからインストールできます。
devtools::install_github("nacnudus/tidyxl")
3. つかってみる
パッケージに例として含まれている‘titanic.xlsx’を使ってみます。
library("tidyxl") library("tidyverse") titanic <- system.file("extdata/titanic.xlsx", package = "tidyxl") readxl::read_excel(titanic)
> readxl::read_excel(titanic)
New names:
* `` -> ...1
* `` -> ...2
* `` -> ...5
* `` -> ...7
# A tibble: 10 x 7
...1 ...2 Age Child ...5 Adult ...7
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 NA NA Survived No Yes No Yes
2 Class Sex NA NA NA NA NA
3 1st Male NA 0 5 118 57
4 NA Female NA 0 1 4 140
5 2nd Male NA 0 11 154 14
6 NA Female NA 0 13 13 80
7 3rd Male NA 35 13 387 75
8 NA Female NA 17 14 89 76
9 Crew Male NA 0 0 670 192
10 NA Female NA 0 0 3 20
x <- xlsx_cells(titanic) glimpse(x)
> glimpse(x)
Rows: 60
Columns: 21
$ sheet <chr> "Sheet1", "Sheet1", "Sheet1", "Sheet1", "Sheet1", "Sheet1", "Sheet1", "~
$ address <chr> "C1", "D1", "E1", "F1", "G1", "C2", "D2", "E2", "F2", "G2", "A3", "B3",~
$ row <int> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5,~
$ col <int> 3, 4, 5, 6, 7, 3, 4, 5, 6, 7, 1, 2, 1, 2, 4, 5, 6, 7, 1, 2, 4, 5, 6, 7,~
$ is_blank <lgl> FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL~
$ data_type <chr> "character", "character", "blank", "character", "blank", "character", "~
$ error <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ logical <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ numeric <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, 5, 118, 57, ~
$ date <dttm> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
$ character <chr> "Age", "Child", NA, "Adult", NA, "Survived", "No", "Yes", "No", "Yes", ~
$ character_formatted <list> [<tbl_df[1 x 14]>], [<tbl_df[1 x 14]>], <NULL>, [<tbl_df[1 x 14]>], <N~
$ formula <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ is_array <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F~
$ formula_ref <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ formula_group <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ comment <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ height <dbl> 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,~
$ width <dbl> 8.38, 8.38, 8.38, 8.38, 8.38, 8.38, 8.38, 8.38, 8.38, 8.38, 8.38, 8.38,~
$ style_format <chr> "Normal", "Normal", "Normal", "Normal", "Normal", "Normal", "Normal", "~
$ local_format_id <int> 2, 3, 3, 3, 3, 2, 3, 3, 3, 3, 2, 2, 3, 3, 1, 1, 1, 1, 3, 3, 1, 1, 1, 1,~
このように、特定のセルを取り出すこともできます。
x[x$row == 4, c("date", "address", "width", "numeric")]
> x[x$row == 4, c("date", "address", "width", "numeric")]
# A tibble: 6 x 4
date address width numeric
<dttm> <chr> <dbl> <dbl>
1 NA A4 8.38 NA
2 NA B4 8.38 NA
3 NA D4 8.38 0
4 NA E4 8.38 5
5 NA F4 8.38 118
6 NA G4 8.38 57
4. さいごに
Excelのデータって結構使いますが、これをRでもっと楽に使えるようになるといいですね。