Using InfluxDB with Django and Docker

Today I worked out an example of using InfluxDB from Django in Docker. Using Docker containers to run databases greatly reduces the amount of database configuration you need to worry about when you’re trying to work out a proof of concept.

InfluxDB is a great tool for storing timestamped data. Storing a timestamp and a set of measurements, one timestamp per row, in a Postgres database is possible, but inefficient. InfluxDB offers you a way to store a set of values, and a set of indexed meta-data tags per row.

For example, if you’re collecting hourly production data from multiple wells, you can store the rates as data values, and wells as indexed tags. Then looking up the production from a set of wells over some time period becomes very efficient due to indexing. Looking up wells by production rates, however, would be very inefficient, unless you stored rate data as a tag, and well names as values. Learn more here from the InfluxDB documentation.

Continue reading

Jenk’s Natural Breaks Optimization

A co-worker was interested in segmenting a list of data points, and I went down a rabbit hole on one dimensional segmentation. I found an article on the Jenk’s natural breaks optimization on Wikipedia. I found another article that had some examples. This is used to bin data points so that clusters are always binned together. There is an iterative method that takes unordered data, but this implementation just sorts the data before binning.

Continue reading