TLDR: the negative binomial counts the number of trials needed before the Nth success.

I had this problem where we were considering running some very expensive tests that had a known success rate, and we wanted to know, given the success rate and the cost, whether we should run them at all. To make things more interesting, we were only interested in a set number of successes, and we could stop all testing after the first successes. My initial thought was to use the binomial distribution, but the binomial doesn’t “cut off” after a set number of successes. It turns out that we needed to use a version of the negative binomial distribution.

Continue reading →

It took me a while to get Sphinx documentation set up correctly. Since it is highly configurable, it is highly easy to not configure correctly. In this guide I’ll assume that you’re using a Python virtual environment, and that you’ve placed the source code that you want to document in a directory called `src/`

. I’ll walk through installing and configuring what you need to create documentation from inline comments using the Google or NumPy style, and create API documentation for a Flask server. I’ll be extra-explicit about what directory I’m in when I make calls that make assumptions about the working directory.

Continue reading →

I was working with a friend to grab comments from YouTube. We’d initially thought of using lynx or w3m, but the comments section always showed up as “Loading…”. Next, we tried using BeautifulSoup, but that didn’t work either, for similar reasons. Finally, we tried using Selenium, because it allows one to interact with the JavaScript on the page.

Continue reading →

## Blog about math, programming, and data.