【R】modeldata
2021年7月16日
1. はじめに
modeldata
は、様々なデータセットを含むパッケージです。
2. インストール
CRANからインストールできます。
install.packages("modeldata")
3. つかってみる
これらのデータセットが含まれているようです。
Chicago | Chicago ridership data |
Sacramento | Sacramento CA home prices |
Smithsonian | Smithsonian museums |
ad_data | Alzheimer’s disease data |
ames | Ames Housing Data |
attrition | Job attrition |
biomass | Biomass data |
bivariate | Example bivariate classification data |
car_prices | Kelly Blue Book resale data for 2005 model year GM cars |
cells | Cell body segmentation |
check_times | Execution time data |
concrete | Compressive strength of concrete mixtures |
covers | Raw cover type data |
credit_data | Credit data |
crickets | Rates of Cricket Chirps |
drinks | Sample time series data |
grants | Grant acceptance data |
hpc_cv | Class probability predictions |
hpc_data | High-performance computing system data |
lending_club | Loan data |
meats | Fat, water and protein content of meat samples |
mlc_churn | Customer churn data |
oils | Fatty acid composition of commercial oils |
okc | OkCupid data |
parabolic | Parabolic class boundary data |
pathology | Liver pathology data |
pd_speech | Parkinson’s disease speech classification data set |
penguins | Palmer Station penguin data |
scat | Morphometric data on scat |
small_fine_foods | Fine foods example data |
solubility_test | Solubility predictions from MARS model |
stackoverflow | Annual Stack Overflow Developer Survey Data |
tate_text | Tate Gallery modern artwork metadata |
two_class_dat | Two class data |
two_class_example | Two class predictions |
wa_churn | Watson churn data |
例えば、Chicagoのデータセットを見てみます。
library(modeldata) data(Chicago) head(Chicago)
> head(Chicago)
ridership Austin Quincy_Wells Belmont Archer_35th Oak_Park Western Clark_Lake Clinton Merchandise_Mart Irving_Park
1 15.732 1.463 8.371 4.599 2.009 1.421 3.319 15.561 2.403 6.481 3.744
2 15.762 1.505 8.351 4.725 2.088 1.429 3.344 15.720 2.402 6.477 3.853
3 15.872 1.519 8.359 4.684 2.108 1.488 3.363 15.558 2.367 6.405 3.861
4 15.874 1.490 7.852 4.769 2.166 1.445 3.359 15.745 2.415 6.489 3.843
5 15.423 1.496 7.621 4.720 2.058 1.415 3.271 15.602 2.416 5.798 3.878
6 2.425 0.693 0.911 2.274 0.624 0.426 1.111 2.413 0.814 0.858 1.735
Washington_Wells Harlem Monroe Polk Ashland Kedzie Addison Jefferson_Park Montrose California temp_min temp temp_max
1 7.560 2.655 5.672 2.481 1.319 3.013 2.500 6.595 1.836 0.756 15.1 19.45 30.0
2 7.576 2.760 6.013 2.436 1.314 3.020 2.570 6.750 1.915 0.781 25.0 30.45 36.0
3 7.620 2.789 5.786 2.526 1.324 2.982 2.587 6.967 1.977 0.812 19.0 25.00 28.9
4 7.364 2.812 5.959 2.450 1.350 3.013 2.528 7.013 1.979 0.776 15.1 22.45 27.0
5 7.089 2.732 5.769 2.573 1.355 3.085 2.557 6.922 1.953 0.789 21.0 27.00 32.0
6 0.786 1.034 1.044 0.006 0.566 1.130 0.800 2.765 0.772 0.370 19.0 24.80 30.0
temp_change dew humidity pressure pressure_change wind wind_max gust gust_max percip percip_max weather_rain
1 14.9 13.45 78.0 30.43 0.12 5.2 10.4 0 0.0 0 0 0
2 11.0 25.00 79.0 30.19 0.18 8.1 11.5 0 0.0 0 0 0
3 9.9 18.00 81.0 30.16 0.23 10.4 19.6 0 0.0 0 0 0
4 11.9 10.90 66.5 30.44 0.16 9.8 16.1 0 0.0 0 0 0
5 11.0 21.90 84.0 29.91 0.65 12.7 19.6 0 25.3 0 0 0
6 11.0 15.10 71.0 30.28 0.49 12.7 17.3 0 26.5 0 0 0
weather_snow weather_cloud weather_storm Blackhawks_Away Blackhawks_Home Bulls_Away Bulls_Home Bears_Away Bears_Home
1 0.0000000 0.7083333 0.00000000 0 0 0 0 0 0
2 0.0000000 1.0000000 0.20833333 0 0 0 1 0 0
3 0.2142857 0.3571429 0.07142857 0 0 1 0 0 0
4 0.0000000 0.2916667 0.04166667 0 0 0 0 0 0
5 0.5161290 0.4516129 0.45161290 0 0 0 0 0 0
6 0.0400000 0.6400000 0.24000000 0 0 0 1 0 0
WhiteSox_Away WhiteSox_Home Cubs_Away Cubs_Home date
1 0 0 0 0 2001-01-22
2 0 0 0 0 2001-01-23
3 0 0 0 0 2001-01-24
4 0 0 0 0 2001-01-25
5 0 0 0 0 2001-01-26
6 0 0 0 0 2001-01-27
最低気温(temp_min)を時系列でプロットしてみます。
library(tidyverse) Chicago %>% ggplot(aes(date, temp_min)) + geom_point()
と、まあ、いろいろデータがあります。
4. さいごに
ちょっとしたデータ処理の練習には最適なデータセットです。