Debugging MSSQL Connections from Ububtu

You might need to install a fair bit of stuff,

sudo apt-get install unixodbc unixodbc-dev tdsodbc freetds-dev freetds-bin -y

There’s a neat trick you can do to automatically configure FreeTDS on you Ubuntu machine (discussion here: https://gist.github.com/rduplain/1293636)

sudo dpkg-reconfigure tdsodbc

This will write data to /etc/odbcinst.ini for you. This file configures which drivers FreeTDS will use. You could write this by hand, but this operation reduces the risk for error.

I was trying to access a remote server using pyodbc and I was providing a connection string like,

import pyodbc
conn = pyodbc.connection('DSN=<dsn>;UID=john.doe;PWD=<password>')

For some reason, I wasn’t able to connect at all. It turns out that I needed to put a space after UID… it’s weird, but it worked. Maybe this is because there was a dot in my UID? I’m not sure. You can also set the UID and PWD in /etc/obcd.ini, but that’s probably not ideal.

If you’re stuck, you can list the connections on a given host with this,

tsql -H <host> -L

You’ll probably need to install tsql, which I think is located in the freetds-bin package listed above.

Using InfluxDB with Django and Docker

Today I worked out an example of using InfluxDB from Django in Docker. Using Docker containers to run databases greatly reduces the amount of database configuration you need to worry about when you’re trying to work out a proof of concept.

InfluxDB is a great tool for storing timestamped data. Storing a timestamp and a set of measurements, one timestamp per row, in a Postgres database is possible, but inefficient. InfluxDB offers you a way to store a set of values, and a set of indexed meta-data tags per row.

For example, if you’re collecting hourly production data from multiple wells, you can store the rates as data values, and wells as indexed tags. Then looking up the production from a set of wells over some time period becomes very efficient due to indexing. Looking up wells by production rates, however, would be very inefficient, unless you stored rate data as a tag, and well names as values. Learn more here from the InfluxDB documentation.

Continue reading

Jenk’s Natural Breaks Optimization

A co-worker was interested in segmenting a list of data points, and I went down a rabbit hole on one dimensional segmentation. I found an article on the Jenk’s natural breaks optimization on Wikipedia. I found another article that had some examples. This is used to bin data points so that clusters are always binned together. There is an iterative method that takes unordered data, but this implementation just sorts the data before binning.

Continue reading