Tag Archives: R

The Pearson Chi-Squared Test with Python and R

December 31, 2014 cjohnson318 1 Comment

In this post I’ll discuss how to use Python and R to calculate the Pearson Chi-Squared Test for goodness of fit. The chi-squared test for goodness of fit determines how well categorical variables fit some distribution. We assume that the categories are mutually exclusive, and completely cover the sample space. That means that the everything we can think of fits into exactly one category, with no exceptions. For example, suppose we flip a coin to determine if it is fair. The outcomes of this experiment fit into exactly two categories, head and tails. The same goes for rolling a die to determine its fairness; rolls of the die will result in (exactly) one of (exactly) six outcomes or categories. This test is only meaningful with mutually exclusive categories.

Continue reading The Pearson Chi-Squared Test with Python and R →

Uncategorized

Installing gstat for R on a Mac

November 25, 2014 cjohnson318 5 Comments

I got the following error when trying to use install.packages("gstat") today,

> install.packages("gstat")

   package ‘gstat’ is available as a source package but not as a binary

Warning in install.packages :
  package ‘gstat’ is not available (for R version 3.1.1)

Continue reading Installing gstat for R on a Mac →

Uncategorized

ARIMA Forecasting in R

November 24, 2014 cjohnson318

This is a follow up on my previous post, in this post I will take a closer look at using ARIMA models in R using the same data set.

Continue reading ARIMA Forecasting in R →

Uncategorized

Time Series Forecasting in Python and R

November 23, 2014 cjohnson318 1 Comment

A friend recently made a prediction about the price of oil for the next three months. I thought I would perform some time series forecasting on the West Texas Intermediate prices and see if his numbers were reasonable from a dumb-numbers canned-forecasting perspective. I’m not making the claim that one can reasonably and accurately forecast oil prices with traditional time series techniques. (That’s bogus.) I’m simply doing this to learn more about forecasting.

Monthly petroleum prices can be found at the Energy Information Administration. Ever relevant, Wikipedia has a great write-up on recent trends in oil prices. Also, there is this Times article on the spike and drop in 2008 which had this apt summary,

[Oil prices are] the product of an extremely volatile mixture of speculation, oil production, weather, government policies, the global economy, the number of miles the average American is driving in any given week and so on. But the daily price is actually set — or discovered, in economic parlance — on the futures exchange.

Continue reading Time Series Forecasting in Python and R →

Uncategorized

Creating a Presentation with R

August 31, 2014 cjohnson318 1 Comment

In this post I’ll look at creating a presentation using the R ecosystem. I’ve used beamer before, and I love it, but I haven’t used the knitr R package yet. Incidentally, the creator of knitr, Yihui Xie, does not like beamer. This is fine, I have been wrong about technology before–I recall thinking in college that facebook was for losers and that it would never catch on. Anyway, Yihui’s work is really impressive and I strongly suggest checking it out.

Continue reading Creating a Presentation with R →

Uncategorized

Modeling in R with the caret Package

August 30, 2014 cjohnson318 2 Comments

In this post I’ll look at using the caret package in R for determining the optimal parameters for a given model. The caret package was developed by Max Kuhn, who also developed the C50 package for decision trees which I talked about in a previous post.

Continue reading Modeling in R with the caret Package →

Uncategorized

Decision Trees in R using the C50 Package

August 29, 2014 cjohnson318 5 Comments

In this post I’ll walk through an example of using the C50 package for decision trees in R. This is an extension of the C4.5 algorithm. We’ll use some totally unhelpful credit data from the UCI Machine Learning Repository that has been sanitized and anonymified beyond all recognition.

Continue reading Decision Trees in R using the C50 Package →

Uncategorized

tidyr and pandas: Gather and Melt

August 28, 2014 cjohnson318 4 Comments

In this post I’ll look at replicating Hadley Wickham‘s gather() tool from his tidyr package using the pandas melt() function. Why would anyone want to do this? Well, Dr. Wickham’s work is beautiful, and the pandas.melt() function is not as elegant as the tidyr::gather() function. You may read Dr. Wickham’s pre-print paper here.

Continue reading tidyr and pandas: Gather and Melt →

Uncategorized

Updating R from the Command Line

August 28, 2014 cjohnson318

This is a tiny post, but if I lumped it as an aside into a longer post I might never find it again. If you’re trying to keep up with Hadley Wickham you might need to update R from time to time. The installr package is there to help you keep up with the Wickhams. To update R, just follow the following steps:

install.packages("installr");
library(installr);
updateR();

For further infromation, check out this r-statistics post on the topic.

Uncategorized

Hypothesis Testing in R

August 13, 2014 cjohnson318

In this post I’ll look at different statistical hypothesis tests in R. Statistical tests can be tricky because they all have different assumptions that must be met before you can use them. Some tests require samples to be normally distributed, others require two samples to have the same variance, while others are not as restrictive.

We’ll begin with testing for normality. Then we’ll look at testing for equality of variance, with and without an assumption of normality. Finally we’ll look at testing for equality of mean, under different assumptions regarding normality and equal variance.

Continue reading Hypothesis Testing in R →

Connor Johnson

Tag Archives: R

The Pearson Chi-Squared Test with Python and R

Installing gstat for R on a Mac

ARIMA Forecasting in R

Time Series Forecasting in Python and R

Creating a Presentation with R

Modeling in R with the caret Package

Decision Trees in R using the C50 Package

tidyr and pandas: Gather and Melt

Updating R from the Command Line

Hypothesis Testing in R

Blog about math, programming, and data.