In this post I’ll look at using the caret
package in R for determining the optimal parameters for a given model. The caret
package was developed by Max Kuhn, who also developed the C50
package for decision trees which I talked about in a previous post.
Parameter Estimation
We can use caret::train()
to determine the optimal parameters for a model. This function will perform repeatedly resample the data set in order to estimate the effect of different parameters. When it is done, it will report the optimal parameters, an estimated accuracy, and an estimated standard deviation for the accuracy.
We may input the feature matrix X
, and a vector of class labels y
, or we may pass an R formula using the variable names. Then, we specify the data set, and finally the machine learning algorithm. A complete list of the algorithms supported by the caret::train()
function may be found here.
Here is an example of using the caret::train()
function on Edgar Anderson’s iris data set using the Random Forests algorithm.
library( caret ) library( randomForest ) data( iris ) set.seed(318) m <- caret::train( Species ~ ., data=iris, method="rf" ) m
The cryptic Species ~ .
tells R that we want to model the Species
variable of the data set, using all of the other variables. (The tilde means using, and the dot matches all of the other variables.) The next argument tells the function to use the iris data set. The third argument specifies the randomForest
algorithm.
This produces the following output,
Random Forest 150 samples 4 predictor 3 classes: 'setosa', 'versicolor', 'virginica' No pre-processing Resampling: Bootstrapped (25 reps) Summary of sample sizes: 150, 150, 150, 150, 150, 150, ... Resampling results across tuning parameters: mtry Accuracy Kappa Accuracy SD Kappa SD 2 0.938 0.907 0.0348 0.0523 3 0.941 0.910 0.0358 0.0537 4 0.935 0.902 0.0370 0.0554 Accuracy was used to select the optimal model using the largest value. The final value used for the model was mtry = 3.
Caveat
If you get this error when using caret
,
Loading required namespace: e1071 Error in loadNamespace(name) : there is no package called ‘e1071’
Then try installing e1071
. I’m not sure what e1071
is, but this fixed the problem for me.
install.packages("e1071")
Thanks I had to :
install.packages(“e1071”)
.. this helped 🙂
Awesome! Apparently it’s a grab bag of functions, hahaha
http://cran.r-project.org/web/packages/e1071/index.html