Taming the Docker Blob

Or understanding how to best use Docker.

Docker is a great way to build services with modular and changeable components without borking your server / computer. I like to think of Docker containers as a system version of Python’s virtual environment – you can build a stack of services and applications through a Docker file, and then easily use this on different computers.

However, Docker can become an amorphous blob if you are not careful. Containers and volumes can multiply, until your computer starts freezing because you have used up 100GB of space.

There are two tricks I have learnt to manage my Docker-based systems:

  • Working out the difference between images and containers, and understanding the lifecycle of the latter; and
  • Clever use of volumes.

organic-1002892_640

Images

Images are like ISO disk images, the difference being that they are built layer-by-layer such that layers may be shared between images. An image may be thought of as a class definition.

Images are created when you issued a docker build command. To organise them make sure you build an image using the -ttag option (e.g. -t image_name). Images are normally identified by an ID in the form of a hash, so giving your image a name is useful. To view the images on your computer use: docker images .

Containers

Containers are the computers that are created from images. They can be thought of as virtual machines, or of instances of a class definition.

One image may be used to create multiple container instances. A container is created when you use the docker run command and pass an image name. I recommend also using the --name option to create a name for your container, e.g. --name my_container.

Running containers may be viewed using the docker ps command. What took me a while to work out is that stopped and exited containers are also around. These can be viewed by using the -a option, i.e. docker ps -a.

Another tip is to use the --rmflag to automatically remove temporary containers after use. Beware though: removing a container will also delete all data generated during the running of the container, unless that data is stored in a separate volume.

Containers are really designed to be run as continual background processes. If you are working on a desktop or laptop you may want to turn your machine off and on again. If you exit a container, you can restart it using docker restart my_container.

Volumes

Volumes are chunks of file system that are handled by Docker. They can be connected into multiple containers.

It’s good practice to explicitly create a volume, so that it is easier to keep track of. To do this use the docker volume create vol_name command.

I had a problem where I had very large databases that were filling up a local solid state drive (SSD) that I wanted to move to a 1TB hard drive. The easiest way I found to store volumes on a different drive is to create a symlink (by right-clicking in the Nautilus file manager or using ln -s) to the location where you want to store the data (e.g. /HDD/docker_volumes/vol_1) and then rename the link to match the proposed volume name (e.g. vol_1). Copy and paste this in the Docker volumes directory (typically, /var/lib/docker/volumes) and then create the volume, e.g. docker volume create vol_. The volume will now be managed by Docker but the data will be stored in the linked folder.

To use a volume with a container use the -v flag with the docker run command, e.g. docker run -v vol_1:path_in_container --name my_container my_image.

Checking Disk Usage

Once you get the hang of all this a good check on disk use may be performed using  docker system df -v. This provides a full output showing your Docker disk usage.

A Mongo Example

Here is an example that pulls this altogether. The situation is that we want to store some data in a Mongo database. Instead of installing Mongo locally we can use the mongoDocker image.

Now through detective work (docker inspect mongo_image) I worked out that the mongo Docker image is designed to work with two volumes: one that is mapped to the database directory /data/db in a container; and one that is mapped to the configuration database directory /data/configdb. If you don’t explicitly specify volumes to map to these locations, Docker creates them locally. Now these directories can grow quite large. I thus created two Docker volumes mongo_data and mongo_configwith symlinks to a larger hard disk drive as described above.

To download the image and start a container with your data you can then use: docker run -v mongo_data:/data/db -v mongo_config/data/configdb -p 27017:27017 --name mongo_container mongo. The -p flag maps the local port 27017 to the exposed Mongo port of the container, so we can connect to the Mongo database using localhost and the default port.

If you stop the container (e.g. to restart or switch-off your server/computer), you can restart it using docker restart mongo_container. You can now accidentally delete the container while still keeping the data, which is stored in the `mongo_data` volume. (Although I recommend backing up that data just in case you aggressively prune the wrong volumes!)

Advertisements

Using Alembic to Migrate SQLAlchemy Databases

There are several advantages of using SQLAlchemy as a wrapper for an SQL database. These include stability with large numbers of data records, class/object-oriented approach, plug-and-play underlying databases. However, one under-documented disadvantage is poor change management. If you add a field or table you generally need to regenerate the entire database. This is a pain if you are constantly tinkering.

There are a number of tools to help with change management.

If you are using SQLAlchemy as part of a Flask application, your best bet is Flask-Migrate. This allows you to easily initialise, upgrade and migrate database definitions. Also the tutorial within the docs is great – generally this works without further modification.

If you are using SQLAlchemy outside of a Flask application, one option is to use Alembic. (Flask-Migrate is a wrapper for various Alembic functions.)

Alembic requires a little more set up. The documentation is good but a little intense. Wading through to work out an easy implementation is a bit of a struggle. However, once you do realise how things work it can be rather easy*. It’s a bit like git, but for databases.

First install Alembic in your current Python environment:

pip install alembic

Then navigate to your project directory and initialise:

alembic init [dir_name, e.g. alembic]

This creates a directory structure within your project directory. You may want to add the [dir_name] to your .gitignore file.

You then need to edit two configuration settings.

First, go into .ini file in the newly-created directory. Now add the “sqlalchemy url”. For me this was:

sqlalchemy.url = sqlite:///[DB_name].db

Second, you need to add your database model’s metadata object to the “env.py” file in the [dir_name] directory. As my Python package isn’t installed I also needed a hack to add the parent directory to the Python “sys.path” list. My added lines in this file are:

parent_dir = os.path.abspath(os.path.join(os.getcwd()))
sys.path.append(parent_dir)
from datamodels import Base
target_metadata = Base.metadata

To add a new revision you use a “revision” command much like “git commit”. The key is the “–autogenerate” flag. This automatically determines the changes to your database based on changes to your data models as defined in (for me) a “datamodels.py” file. So to start run:

alembic revision --autogenerate -m "message"

Then you can update your database by running:

alembic upgrade head

*Thanks go to Mathieu Rodic and his post here for helping me work this out.

Doing Useful Things with WeMo Motion

Using both the WeMo Motion rules and IFTTT allows you to do certain things with this motion detector. However, to expand our possibilities it would help if we could store our motion data and make it accessible to the programs that we write.

20131004-063848.jpg
To store our motion data in a database we need a bit of a convoluted process. It goes something like this:

20131004-063802.jpg
First we set up an IFTTT recipe to send an email to a Gmail account when motion is detected.

20131004-064517.jpg
Having done this you will get a series of emails:

20131004-064837.jpg
I recommend setting up a separate Gmail account for automation to avoid spamming yourself with IFTTT emails. It would also make things more secure for the next steps. To make things easier when using multiple IFTTT recipe emails, I set up a filtering rule in Gmail to automatically label all emails like this as “Motion”.

The next step is to write some Python code to access our emails, process messages and store data in an SQLite database.

  • The email processing makes use of the imaplib and HeaderParser libraries; and
  • The database processing makes use of the sqlite3 libraries.

A first function accesses all unread emails with a particular label and returns an array of the subject lines of those emails.

def read_subjects(label):
	obj = imaplib.IMAP4_SSL('imap.gmail.com', '993')
	obj.login('username@gmail.com', 'password')
	obj.select(label)  # <--- it will select inbox
	typ ,data = obj.search(None,'UnSeen')

	subjects =[]

	for num in data[0].split():
		data = obj.fetch(num, '(BODY[HEADER])')

		header_data = data[1][0][1]

		parser = HeaderParser()
		msg = parser.parsestr(header_data)
		#print msg['Subject']
		subjects.append(msg['Subject'])

	return subjects

A second set of functions then processes each subject line to extract a ‘datetime’ that the motion occurred and a motion sensor name.

def store_motion(subjects):
#Initilise temporary array for data
	rows = []
	unread_count = len(subjects)-1

	#Process and store unread mail items
	for j in range(0,unread_count):
		#print subjects[j]
		#Extract date/time of last motion
		extracted_date = extract_date(subjects[j])
		#Extract motion time
		motion_time = datetime.datetime.strptime(extracted_date, "%B %d, %Y at %I:%M%p")
		#Extract sensor name
		s_name = extract_sensor(subjects[j])
		#Add (motion time, sensor name) tuple to rows
		rows.append((motion_time, s_name))
	#print rows
	storemotioninsql(rows)

def extract_date(word):
	date_index_start = word.find("ion: ")+5
	date_index_end = word.find(" at")+11
	date_out = word[date_index_start:date_index_end]
	return date_out

def extract_sensor(word):
	name_end = word.find("' ")
	word_out = word[1:name_end]
	return word_out

A third function stores the prepared ‘datetime’ and motion sensor name in an SQLite database.

def storemotioninsql(rows):
	#Save in database
	con = lite.connect('motion.db')

	with con:

	    cur = con.cursor()

	    #Create a READINGS table if it doesn't already exist
	    cur.execute('CREATE TABLE IF NOT EXISTS motion (r_datetime TIMESTAMP, r_s_name TEXT)')

	    for row_values in rows:
	    	#print row_values
	    	cur.execute('INSERT INTO motion VALUES(?,?)', (row_values[0], row_values[1]))

All that remains is to set these functions up in a python script and then use cron to schedule it to run every 15 minutes (crontab -e etc…).

Accessing Our Sensor Data: PHP + SQLite + SVG

So we have our sensor(s) dutifully logging their data to an SQLite database. Now we need to process and view this data.

LASP

First we need to set our Raspberry Pi up as a webserver. This typically requires: Linux, Apache, MySQL and PHP – a LAMP framework. There is a handy guide: here (thanks Dave!).

In my case I wanted a light-weight alternative to MySQL so I decided to use SQLite. You can also swap Apache for something like Lighttpd if you feel like it (I may in the future but didn’t this time).

On Raspian (and in summary of the guide in Dave’s link) I installed the following packages:

sudo apt-get update
sudo apt-get install apache2 php5 libapache2-mod-php5
sudo apt-get install php5-sqlite

To avoid permission complications I set up my web directory in my home directory. This meant I could easily access and upload files from the iPad. To do this edit: /etc/apache2/sites-available/default, e.g.:

sudo nano edit: /etc/apache2/sites-available/default

In the section:

DocumentRoot /var/www
    <Directory />
        Options FollowSymLinks
        AllowOverride None
    </Directory>
    <Directory /var/www/>
        Options Indexes FollowSymLinks MultiViews
        AllowOverride None
        Order allow,deny
        allow from all
    </Directory>
 

Change /var/www for /home/[yourusername]/[newfolder], e.g.:

DocumentRoot /home/jimbojones/www
    <Directory />
        Options FollowSymLinks
        AllowOverride None
    </Directory>
    <Directory /home/jimbojones/www/>
        Options Indexes FollowSymLinks MultiViews
        AllowOverride None
        Order allow,deny
        allow from all
    </Directory>
 

Hurray! Now you should be ready to write some PHP.

Accessing the Database

In versions 5.3+ of PHP SQLite is enabled by default. To check run:

<?php phpinfo(); ?>

and search the page for ‘sqlite’.

I found that getting this working took a bit of fiddling. There were several different versions of the access code floating about and not all of them worked. I would only look for recent entries if you google.

However, eventually this bit of PHP worked for me:

$db = new SQLite3('EnergyDB/energymonitor.db');
	
$results = $db->query('SELECT * FROM readings');
while ($row = $results->fetchArray()) 
{
var_dump($row);
}

Fun with SQL

Now we have database access working we can experiment with some SQL queries. While testing it is easier to do this via the SQLite command line. This way we can obtain a set of working queries without the added burden of PHP debugging. These working queries can then be added to our PHP code for testing.

My first attempt was:

SELECT * FROM readings WHERE r_datetime > (SELECT DATETIME('now', '-1 day'))

This worked!

You can change ‘day’ to ‘hour’ to get the last hour’s report:

SELECT * FROM readings WHERE r_datetime > (SELECT DATETIME('now', '-1 hour'))

Of course, when copying to the PHP remember to change the inner single quotes to double quotes (d’oh!).

Graph It Up

No-one likes dry numbers. Everyone loves a pretty graph. Now we will create our own using PHP and SVG (scalable vector graphics).

HTML5 has support for SVGs out the box. This makes life easy for us. I found this summary from IBM useful. W3Schools is also useful. To knock up a quick graph I followed the following steps:

  • Dump the data from the SQL query into data arrays;
  • Set the dimensions of our graph;
  • Set scale factors for our y-axes (basically: chart_height / max(array_value));
  • Draw our x, y axes;
  • For each data value:
    • Draw an x ticker, and
    • Determine a y value; and
  • Plot polylines for each set of data values.

And hey presto:

[12 hours]

[24 hours]

The PHP code is as follows:

<?php
	
	$db = new SQLite3('EnergyDB/energymonitor.db');
	
	if (!$db) die ($error);
	
	/*$results = $db->query('SELECT * FROM readings');*/
	
	$statement = $db->prepare('SELECT * FROM readings WHERE r_datetime > (SELECT DATETIME("now", "-24 hour"))');
	$results = $statement->execute();
	if (!$results) die("Cannot execute statement.");
	$i = 0;
	while ($row = $results->fetchArray()) 
	{
		/*var_dump($row);*/
		/*echo $row['r_temp']  . " : " . $row['r_watts'];
		echo "<br>";*/
		$watts[$i] = $row['r_watts'];
		$temp[$i] = $row['r_temp'];
		$i=$i+1;
	}
	$maxwatts = max($watts);
	$maxtemp = max($temp);
	
	/*Set parameters for width & height of chart*/
	$width = 800;
	$height = 600;
	$tickerlength = 5;
	$scalefactorwatts = $height / $maxwatts;
	$scalefactortemp = $height / $maxtemp;
	$totalheight= $height + (2*$tickerlength);
	$xticker = floor($width/count($watts));
	
	echo '<svg xmlns="http://www.w3.org/2000/svg" version="1.1" width="'.$width.'" height="'.$totalheight.'" >
		<line x1="0" y1="'.$height.'" x2="'.$width.'" y2="'.$height.'" 
					           style="stroke:black;stroke-width:2"/>
		<line x1="0" y1="0" x2="0" y2="'.$height.'" 
					           style="stroke:black;stroke-width:2"/>';
	for ($j = 0; $j < count($watts); $j++) {
		$x = $j*$xticker;
		$tickerheight = $height+$tickerlength;
		echo '<line x1="'.$x.'" y1="'.$height.'" x2="'.$x.'" y2="'.$tickerheight.'" style="stroke:black;stroke-width:1"/>';
		
		$ywatts = $height - $watts[$j]*$scalefactorwatts;
		$polystringwatts = $polystringwatts." ".$x.",".$ywatts;
		
		$ytemp = $height - $temp[$j]*$scalefactortemp;
		$polystringtemp = $polystringtemp." ".$x.",".$ytemp;
		
	}
	echo '<polyline points="'.$polystringwatts.'" style="fill:none;stroke:red;stroke-width:4"/>';
    echo '<polyline points="'.$polystringtemp.'"
         style="fill:none;stroke:green;stroke-width:4"/>';    
	echo '</svg>';
	
?>

Databases for Sensor Data in Python

Here are some options to persistently store sensor data in a database using python.

Anydbm

For simple key-value pairs the anydbm module looks the best.

It stores values as strings. Because of this you may need to use another module called pickle to convert non-string values into strings.

For example:

import anydbm, pickle
from datetime import datetime

#Open a database - if it doesn't exist: create it
db = anydbm.open('test_data.db', 'c')

time_now_in = datetime.now()
string_version_in = pickle.dumps(time_now_in)

#Store value
db['1'] = string_version_in
#Close database
db.close()

#Open database
db2 = anydbm.open('test_data.db')

string_version_out = db2['1']
time_now_out = pickle.loads(string_version_out)
print time_now_out

db2.close()

SQLite

For more advanced database features the next step is sqlite3.

A tutorial on sqlite3 can be found here.

You may need to install sqlite3:

sudo apt-get install sqlite3

Then in python you can use SQL statements using code such as:

import sqlite3 as lite

#... - data in variables

# Set up a new database connection - db will be created if it doesn't exist
con = lite.connect('energymonitor.db')

#with the db connection (simplifies commit etc)
with con:

cur = con.cursor()

#Create a READINGS table if it doesn't already exist
cur.execute('CREATE TABLE IF NOT EXISTS readings (r_datetime TIMESTAMP, r_watts INT, r_temp INT)')

#Store sensor readings
cur.execute('INSERT INTO readings VALUES(?,?,?)', (time, watts, temp))

This then stores my energy monitor readings in a database called (unoriginally) "readings":