In this post I’ll look at using the
caret package in R for determining the optimal parameters for a given model. The
caret package was developed by Max Kuhn, who also developed the
C50 package for decision trees which I talked about in a previous post.
We can use
caret::train() to determine the optimal parameters for a model. This function will perform repeatedly resample the data set in order to estimate the effect of different parameters. When it is done, it will report the optimal parameters, an estimated accuracy, and an estimated standard deviation for the accuracy.
We may input the feature matrix
X, and a vector of class labels
y, or we may pass an R formula using the variable names. Then, we specify the data set, and finally the machine learning algorithm. A complete list of the algorithms supported by the
caret::train() function may be found here.
Here is an example of using the
caret::train() function on Edgar Anderson’s iris data set using the Random Forests algorithm.
library( caret ) library( randomForest ) data( iris ) set.seed(318) m <- caret::train( Species ~ ., data=iris, method="rf" ) m
Species ~ . tells R that we want to model the
Species variable of the data set, using all of the other variables. (The tilde means using, and the dot matches all of the other variables.) The next argument tells the function to use the iris data set. The third argument specifies the
This produces the following output,
Random Forest 150 samples 4 predictor 3 classes: 'setosa', 'versicolor', 'virginica' No pre-processing Resampling: Bootstrapped (25 reps) Summary of sample sizes: 150, 150, 150, 150, 150, 150, ... Resampling results across tuning parameters: mtry Accuracy Kappa Accuracy SD Kappa SD 2 0.938 0.907 0.0348 0.0523 3 0.941 0.910 0.0358 0.0537 4 0.935 0.902 0.0370 0.0554 Accuracy was used to select the optimal model using the largest value. The final value used for the model was mtry = 3.
If you get this error when using
Loading required namespace: e1071 Error in loadNamespace(name) : there is no package called ‘e1071’
Then try installing
e1071. I’m not sure what
e1071 is, but this fixed the problem for me.