【R】inspectdf
2021年2月22日
1. はじめに
inspectdf
は、いろいろデータフレームを調べられるパッケージです。データ処理の前に全体を俯瞰してみるときに役立ちます。
2. インストール
CRANからインストールできます。
install.packages("inspectdf")
3. 使ってみる。
パッケージに含まれているstarwarsデータで見ていきます。
データの概要を見てみます。
?starwars
starwars {dplyr} R Documentation
Starwars characters
Description
This data comes from SWAPI, the Star Wars API, https://swapi.dev/
Usage
starwars
Format
A tibble with 87 rows and 13 variables:
name
Name of the character
height
Height (cm)
mass
Weight (kg)
hair_color,skin_color,eye_color
Hair, skin, and eye colors
birth_year
Year born (BBY = Before Battle of Yavin)
sex
The biological sex of the character, namely male, female, hermaphroditic, or none (as in the case for Droids).
gender
The gender role or gender identity of the character as determined by their personality or the way they were programmed (as in the case for Droids).
homeworld
Name of homeworld
species
Name of species
films
List of films the character appeared in
vehicles
List of vehicles the character has piloted
starships
List of starships the character has piloted
Examples
starwars
データの要約はinspect_typese()関数で見れます。
starwars %>% inspect_types()
> starwars %>% inspect_types()
# A tibble: 4 x 4
type cnt pcnt col_name
<chr> <int> <dbl> <named list>
1 character 8 57.1 <chr [8]>
2 list 3 21.4 <chr [3]>
3 numeric 2 14.3 <chr [2]>
4 integer 1 7.14 <chr [1]>
star_cat <- starwars %>% inspect_cat() star_cat
> star_cat
# A tibble: 8 x 5
col_name cnt common common_pcnt levels
<chr> <int> <chr> <dbl> <named list>
1 eye_color 15 brown 24.1 <tibble [15 x 3]>
2 gender 3 masculine 75.9 <tibble [3 x 3]>
3 hair_color 13 none 42.5 <tibble [13 x 3]>
4 homeworld 49 Naboo 12.6 <tibble [49 x 3]>
5 name 87 Ackbar 1.15 <tibble [87 x 3]>
6 sex 5 male 69.0 <tibble [5 x 3]>
7 skin_color 31 fair 19.5 <tibble [31 x 3]>
8 species 38 Human 40.2 <tibble [38 x 3]>
各データで多い値の割合も見れます。
inspect_imb(starwars)
> inspect_imb(starwars)
# A tibble: 8 x 4
col_name value pcnt cnt
<chr> <chr> <dbl> <int>
1 gender masculine 75.9 66
2 sex male 69.0 60
3 hair_color none 42.5 37
4 species Human 40.2 35
5 eye_color brown 24.1 21
6 skin_color fair 19.5 17
7 homeworld Naboo 12.6 11
8 name Ackbar 1.15 1
グラフへ表示させるなら。
inspect_imb(starwars) %>% show_plot()

データのサイズも確認できます。
inspect_mem(starwars)
> inspect_mem(starwars)
# A tibble: 14 x 4
col_name bytes size pcnt
<chr> <int> <chr> <dbl>
1 films 20008 19.54 Kb 35.9
2 starships 7448 7.27 Kb 13.4
3 name 6280 6.13 Kb 11.3
4 vehicles 5944 5.8 Kb 10.7
5 homeworld 3608 3.52 Kb 6.48
6 species 2952 2.88 Kb 5.30
7 skin_color 2656 2.59 Kb 4.77
8 eye_color 1608 1.57 Kb 2.89
9 hair_color 1440 1.41 Kb 2.59
10 sex 976 976 bytes 1.75
11 gender 872 872 bytes 1.57
12 mass 744 744 bytes 1.34
13 birth_year 744 744 bytes 1.34
14 height 400 400 bytes 0.718
データの構造を見ることができました。それぞれのデータの内容を精査します。
star_cat$levels$hair_color
> star_cat$levels$hair_color
# A tibble: 13 x 3
value prop cnt
<chr> <dbl> <int>
1 none 0.425 37
2 brown 0.207 18
3 black 0.149 13
4 NA 0.0575 5
5 white 0.0460 4
6 blond 0.0345 3
7 auburn 0.0115 1
8 auburn, grey 0.0115 1
9 auburn, white 0.0115 1
10 blonde 0.0115 1
11 brown, grey 0.0115 1
12 grey 0.0115 1
13 unknown 0.0115 1
各データの内容を図示できます。
star_cat %>% show_plot(col_palette = 1)

相関も見ることができます。
inspect_cor(starwars)
> inspect_cor(starwars)
# A tibble: 3 x 7
col_1 col_2 corr p_value lower upper pcnt_nna
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 birth_year mass 0.478 0.00602 0.177 0.697 41.4
2 birth_year height -0.400 0.0114 -0.625 -0.113 49.4
3 mass height 0.134 0.316 -0.127 0.377 67.8
4. さいごに
データを俯瞰してみるには良いパッケージです。