In this post I’ll look at using the `caret`

package in R for determining the optimal parameters for a given model. The `caret`

package was developed by Max Kuhn, who also developed the `C50`

package for decision trees which I talked about in a previous post.

# Parameter Estimation

We can use `caret::train()`

to determine the optimal parameters for a model. This function will perform repeatedly resample the data set in order to estimate the effect of different parameters. When it is done, it will report the optimal parameters, an estimated accuracy, and an estimated standard deviation for the accuracy.

We may input the feature matrix `X`

, and a vector of class labels `y`

, or we may pass an R formula using the variable names. Then, we specify the data set, and finally the machine learning algorithm. A complete list of the algorithms supported by the `caret::train()`

function may be found here.

Here is an example of using the `caret::train()`

function on Edgar Anderson’s iris data set using the Random Forests algorithm.

library( caret ) library( randomForest ) data( iris ) set.seed(318) m <- caret::train( Species ~ ., data=iris, method="rf" ) m

The cryptic `Species ~ .`

tells R that we want to model the `Species`

variable of the data set, using all of the other variables. (The tilde means *using*, and the dot matches all of the other variables.) The next argument tells the function to use the iris data set. The third argument specifies the `randomForest`

algorithm.

This produces the following output,

Random Forest 150 samples 4 predictor 3 classes: 'setosa', 'versicolor', 'virginica' No pre-processing Resampling: Bootstrapped (25 reps) Summary of sample sizes: 150, 150, 150, 150, 150, 150, ... Resampling results across tuning parameters: mtry Accuracy Kappa Accuracy SD Kappa SD 2 0.938 0.907 0.0348 0.0523 3 0.941 0.910 0.0358 0.0537 4 0.935 0.902 0.0370 0.0554 Accuracy was used to select the optimal model using the largest value. The final value used for the model was mtry = 3.

# Caveat

If you get this error when using `caret`

,

Loading required namespace: e1071 Error in loadNamespace(name) : there is no package called ‘e1071’

Then try installing `e1071`

. I’m not sure what `e1071`

is, but this fixed the problem for me.

install.packages("e1071")

Thanks I had to :

install.packages(“e1071”)

.. this helped 🙂

Awesome! Apparently it’s a grab bag of functions, hahaha

http://cran.r-project.org/web/packages/e1071/index.html