Using Alembic to Migrate SQLAlchemy Databases

There are several advantages of using SQLAlchemy as a wrapper for an SQL database. These include stability with large numbers of data records, class/object-oriented approach, plug-and-play underlying databases. However, one under-documented disadvantage is poor change management. If you add a field or table you generally need to regenerate the entire database. This is a pain if you are constantly tinkering.

There are a number of tools to help with change management.

If you are using SQLAlchemy as part of a Flask application, your best bet is Flask-Migrate. This allows you to easily initialise, upgrade and migrate database definitions. Also the tutorial within the docs is great – generally this works without further modification.

If you are using SQLAlchemy outside of a Flask application, one option is to use Alembic. (Flask-Migrate is a wrapper for various Alembic functions.)

Alembic requires a little more set up. The documentation is good but a little intense. Wading through to work out an easy implementation is a bit of a struggle. However, once you do realise how things work it can be rather easy*. It’s a bit like git, but for databases.

First install Alembic in your current Python environment:

pip install alembic

Then navigate to your project directory and initialise:

alembic init [dir_name, e.g. alembic]

This creates a directory structure within your project directory. You may want to add the [dir_name] to your .gitignore file.

You then need to edit two configuration settings.

First, go into .ini file in the newly-created directory. Now add the “sqlalchemy url”. For me this was:

sqlalchemy.url = sqlite:///[DB_name].db

Second, you need to add your database model’s metadata object to the “env.py” file in the [dir_name] directory. As my Python package isn’t installed I also needed a hack to add the parent directory to the Python “sys.path” list. My added lines in this file are:

parent_dir = os.path.abspath(os.path.join(os.getcwd()))
sys.path.append(parent_dir)
from datamodels import Base
target_metadata = Base.metadata

To add a new revision you use a “revision” command much like “git commit”. The key is the “–autogenerate” flag. This automatically determines the changes to your database based on changes to your data models as defined in (for me) a “datamodels.py” file. So to start run:

alembic revision --autogenerate -m "message"

Then you can update your database by running:

alembic upgrade head

*Thanks go to Mathieu Rodic and his post here for helping me work this out.

Quickpost: Adding a Custom Path to Conda Environment

I have a few Python applications in development in a custom ‘projects’ directory. I want to be able to run these using ‘python -m [appname]’.

The easiest way to do this is by adding a .pth file to the site-packages folder of my Python environment (for me ‘/[userdirpath]/anaconda3/envs/[projectname]/lib/python3.5/site-packages/’).

For example, I added a file called ‘custom.pth’ that had one line containing the full path to my ‘projects’ directory. I can then import the apps.

Starting a Python Project with Anaconda

It just so happens that on a few systems I have been using Anaconda to allow painless Python coding. For example, on Windows or non-Debian Linux I have struggled to compile packages from source. Anaconda provides a useful wrapper for the main functionality that just works on these operating systems (on my Ubuntu machine or the Raspberry Pi I just use virtualenv and pip in the usual way).

Anaconda also has the advantage of being a quick shortcut to install Python and a bucketful of useful libraries for big data and artificial intelligence experimentation. To start head over to the download page for Anaconda here. The installer is wrapper in a bash script – just download, verify and run. On my ten-year-old laptop running Puppy Linux (which was in the loft for a year or so covered in woodlouse excrement) this simply worked painlessly. No compiling from source. No version errors. No messing with pip. Previously, libraries like numpy or nltk had been a headache to install.

I find that Jupyter (formerly iPython) notebooks are a great way to iteratively code. You can test out ideas block by block, shift stuff around, output and document all in the same tool. You can also easily export to HTML with one click (hence this post). To start a notebook having installed Anaconda run the following:

jupyter notebook

This will start the notebook server on your local machine and open your browser. By default the notebooks are served at localhost:8888. To access across a local network use the -ip flag with your IP address (e.g. -ip 192.168.1.2) and then point your browser at [your-ip]:8888 (use -p to change the port).

My usual way of working is to play around with my code in a Jupyter notebook before officially starting a project. I find notebooks a better way to initially iteratively test and develop algorithms than coding and testing on the command line.

Once I have some outline functionality in a notebook it is time to create a new project. My workflow for this is as follows:

  1. Create a new empty repository on GitHub, with a Python .gitignore file, a basic ReadMe file and an MIT License.
  2. Clone the new empty repository into my local projects directory. I have set up SSH keys so this just involves:
     git clone git@github.com:[username]/[repositoryname].git 
  3.  Change directory into the newly cloned project directory:
     cd [repositoryname] 
  4. Create a new Conda environment. Conda is the command line package manager element of Anaconda. This page is great for working out the Conda commands equivalent to virtualenv and pip commands.
     conda create --name [repositoryname] python
  5. Activate new environment:
     source activate [repositoryname] 
  6. Create requirements.txt file:
     conda list --export > requirements.txt 
  7. Install required libraries (you can take these from your Jupyter notebook imports, e.g.:
     conda install nltk 
  8. Create a new .py file for your main program, move across your functions from your notebook and perform a first git commit and sync with GitHub.
    git add . 
    git commit -m "First Commit" 
    git push origin master 

Hey presto. You are ready to start developing your project.

Quick Post: Structuring a Python Program

One thing I’ve found hard about programming in Python is the jump from small scripts or iPython (now Jupyter) notebooks to fully functional programs.

Many examples and online tutorials only require a single “.py” file or a series of command line or notebook entries. However, as you get more advanced and start looking at complete Flash applications or libraries to upload to PyPI (for PIP install), there is a big jump in complexity. Now you are looking at a dozen or so files with various naming standards. You also need to setup virtual environments and upload code to GitHub using git. This can quickly become overwhelming.

Help is at hand though.

For help when you move beyond “rank amateur” with Python, I’m a big fan of Jeff Knupp. He has written many great tutorials. My favourite are:

I am also a fan of Kenneth Reitz‘s guide on Structuring Your (Python) Project. This fits in nicely with the latter two tutorials above – it explains a basic directory structure and gives an example on GitHub. I found that by comparing Kenneth’s and Jeff’s examples I could get a feel for what is required.

Of course the challenge now is to practice, practice, practice and start getting some libraries in a production ready standard and uploaded to PyPI.