In this post I’ll look at different statistical hypothesis tests in R. Statistical tests can be tricky because they all have different assumptions that must be met before you can use them. Some tests require samples to be normally distributed, others require two samples to have the same variance, while others are not as restrictive.
We’ll begin with testing for normality. Then we’ll look at testing for equality of variance, with and without an assumption of normality. Finally we’ll look at testing for equality of mean, under different assumptions regarding normality and equal variance.
Continue reading Hypothesis Testing in R
In college I read about the advantages of conjoint analysis over the more intuitive method of using a Likert scale–the familiar rate-this-thing-from-one-to-five or whatever scale. It turns out that people get bored with Likert scales, and end up either reporting everything as extremes, or the median. It has been shown that you can get a better reading on people by asking them about their preference regarding two items. In this post I’d like to share the beginning of a framework for modelling these sorts of situations. Specifically, I’d like to model agents with specific behaviors, and see if those behaviors are apparent through conjoint analysis, i.e, I’d like to test conjoint methods under different controlled circumstances.
Continue reading Building an Agent Based Simulation for Conjoint Analysis
I was reading Nate Silver’s The Signal and the Noise today and I ran across his idea of Bayesland, a place where people walk around with sandwich boards listing things on which they would place a bet. If they meet a person whose odds on an event differs substantially from their own odds on that event, then the two will make a bet.
I thought that it would be neat to simulate this idea and see what would happen. In my example, we have three agents that bet on the probability of a coin coming up heads. The coin is simulated through a Bernoulli process, and the true probability or heads is 60%.
Continue reading Agent Based Modeling: A Simple Market
In this post I’ll discuss how to perform incremental updates to a simple statistical model using PyMC. The short answer is that you have to create a new model each time. In this example, I’ll use a Bernoulli random variable from
scipy.stats to generate coin flips, and I will use PyMC to model a prior and likelihood distribution, and produce a posterior distribution as output.
Continue reading Performing Incremental Updates using PyMC
The beta distribution requires two parameters, usually referred to as a and b, or alpha and beta. If you are considering a Bernoulli process, a sequence of binary outcomes (success or failure) with a constant probability of success, then you could use a beta distribution, setting the parameter
a equal to the number of successes, and setting the parameter
b equal to the number of failures. The neat thing about the Beta distribution is that the greater the total number of trials (the sum of the successes and failures) the more peaked, or narrow, the distribution becomes.
Continue reading Modeling with Beta Distributions
One of my favorite things about pandas is that you can easily combine temporal data sets using different time scales. Behind the scenes, pandas will fill in the empty gaps with null values, and then quietly ignore those null values when you want to make a scatter plot or do some other computation, like a rolling mean. It takes so much tedious book-keeping out of the data analysis process.
Continue reading Concatenating and Visualizing Data in Pandas
I explained in a previous post how to quickly and easily grab data from Excel files using pandas. I use this approach when I know that there’s a ton of data in the Excel file and I want it in a pandas DataFrame. If I am only extracting a handful of values, I like to use a lower level module,
Continue reading Working with Excel Files using Python
In this post I will discuss an implementation of sequential Gaussian simulation (SGS) from the field of geostatistics. Geostatistics is simply a statistical consideration of spatially distributed data. Sequential Gaussian simulation is a technique used to “fill in” a grid representing the area of interest using a smattering of observations, and a model of the observed trend. The basic workflow incorporates three steps:
- Modeling the measured variation using a semivariogram
- Using the semivariogram to perform interpolation by kriging
- Running simulations to estimate the spatial distribution of the variable(s) of interest
Continue reading Two Dimensional Sequential Gaussian Simulation in Python
In this post, I’ll describe how to change the color of an anode RGB LED with a potentiometer. I’ll be using an Arduino UNO, and components from this RadioShack components kit. The motivation for this post was to have an LED change color in response to the reading from a thermistor next to my stove, but when I read about how I’d first need to calibrate the thermistor with some kind of thermometer, my motivation scurried under the sofa like a terrier in a thunderstorm. As a compromise I substituted the thermistor with a trim-pot, reasoning that a variable resistance was a variable resistance.
Continue reading Controlling an RGB LED with a Potentiometer