Fixing Errors on Apache-Served Flask Apps

This is just a quick post to remind me of the steps to resolve errors on an Apache-served Flask app. I’m using Anaconda as I’m on Puppy Linux (old PC) and some compilations give me errors. Stuff in square brackets is for you to fill in.

Log into remote server (I use ssh keys):

ssh -p [MyPort] [user]@[server]

Check the error logs (the name of the log is set in the app configuration):

nano /var/log/apache2/[my_app_error].log

On a local machine clone the production Flask App (again I have ssh keys setup):

git clone git@github.com:[user]/[project].git
cd [project]

Setup a local virtual environment (with the right version of python):

conda create -n [project] python=2.7

Activate the environment:

source activate [project]

Install requirements:

pip install -r requirements.txt

[Use ‘conda install X’ for stuff that has trouble compiling (‘lxml’ is a nightmare).]

Setup environment variables:

Add ‘etc/conda/activate.d’ and’etc/conda/deactivate.d’ folders in the Anaconda environments directory and set env_vars.sh files in each folder:

mkdir -p ~/anaconda3/envs/[project]/etc/conda/activate.d
touch ~/anaconda3/envs/[project]/etc/conda/activate.d/env_vars.sh
mkdir -p ~/anaconda3/envs/[project]/etc/conda/deactivate.d
touch ~/anaconda3/envs/[project]/etc/conda/deactivate.d/env_vars.sh

(The ‘-p’ flag in ‘mkdir’ also creates the required parent directories.)

In the ‘activate.d/env_vars.sh’ set the environment variables:

#!/bin/sh
cd [project_path]
export HOST="127.0.0.1"
export PORT="80"
export MY_VAR = 'customvalue'

In the ‘deactivate.d/env_vars.sh’ clear the environment variables:

#!/bin/sh
unset MY_VAR

Now you should be able to run the app and have it hosted locally.

You can then test and fix the bug. Then add, commit and push the updates.

Then re-log into the remote server. Go to the project directory. Pull the updates from github. Restart the server.

cd [project]
git pull origin master
sudo service apache2 restart

Advertisements

Using Alembic to Migrate SQLAlchemy Databases

There are several advantages of using SQLAlchemy as a wrapper for an SQL database. These include stability with large numbers of data records, class/object-oriented approach, plug-and-play underlying databases. However, one under-documented disadvantage is poor change management. If you add a field or table you generally need to regenerate the entire database. This is a pain if you are constantly tinkering.

There are a number of tools to help with change management.

If you are using SQLAlchemy as part of a Flask application, your best bet is Flask-Migrate. This allows you to easily initialise, upgrade and migrate database definitions. Also the tutorial within the docs is great – generally this works without further modification.

If you are using SQLAlchemy outside of a Flask application, one option is to use Alembic. (Flask-Migrate is a wrapper for various Alembic functions.)

Alembic requires a little more set up. The documentation is good but a little intense. Wading through to work out an easy implementation is a bit of a struggle. However, once you do realise how things work it can be rather easy*. It’s a bit like git, but for databases.

First install Alembic in your current Python environment:

pip install alembic

Then navigate to your project directory and initialise:

alembic init [dir_name, e.g. alembic]

This creates a directory structure within your project directory. You may want to add the [dir_name] to your .gitignore file.

You then need to edit two configuration settings.

First, go into .ini file in the newly-created directory. Now add the “sqlalchemy url”. For me this was:

sqlalchemy.url = sqlite:///[DB_name].db

Second, you need to add your database model’s metadata object to the “env.py” file in the [dir_name] directory. As my Python package isn’t installed I also needed a hack to add the parent directory to the Python “sys.path” list. My added lines in this file are:

parent_dir = os.path.abspath(os.path.join(os.getcwd()))
sys.path.append(parent_dir)
from datamodels import Base
target_metadata = Base.metadata

To add a new revision you use a “revision” command much like “git commit”. The key is the “–autogenerate” flag. This automatically determines the changes to your database based on changes to your data models as defined in (for me) a “datamodels.py” file. So to start run:

alembic revision --autogenerate -m "message"

Then you can update your database by running:

alembic upgrade head

*Thanks go to Mathieu Rodic and his post here for helping me work this out.

Quickpost: Adding a Custom Path to Conda Environment

I have a few Python applications in development in a custom ‘projects’ directory. I want to be able to run these using ‘python -m [appname]’.

The easiest way to do this is by adding a .pth file to the site-packages folder of my Python environment (for me ‘/[userdirpath]/anaconda3/envs/[projectname]/lib/python3.5/site-packages/’).

For example, I added a file called ‘custom.pth’ that had one line containing the full path to my ‘projects’ directory. I can then import the apps.

Starting a Python Project with Anaconda

It just so happens that on a few systems I have been using Anaconda to allow painless Python coding. For example, on Windows or non-Debian Linux I have struggled to compile packages from source. Anaconda provides a useful wrapper for the main functionality that just works on these operating systems (on my Ubuntu machine or the Raspberry Pi I just use virtualenv and pip in the usual way).

Anaconda also has the advantage of being a quick shortcut to install Python and a bucketful of useful libraries for big data and artificial intelligence experimentation. To start head over to the download page for Anaconda here. The installer is wrapper in a bash script – just download, verify and run. On my ten-year-old laptop running Puppy Linux (which was in the loft for a year or so covered in woodlouse excrement) this simply worked painlessly. No compiling from source. No version errors. No messing with pip. Previously, libraries like numpy or nltk had been a headache to install.

I find that Jupyter (formerly iPython) notebooks are a great way to iteratively code. You can test out ideas block by block, shift stuff around, output and document all in the same tool. You can also easily export to HTML with one click (hence this post). To start a notebook having installed Anaconda run the following:

jupyter notebook

This will start the notebook server on your local machine and open your browser. By default the notebooks are served at localhost:8888. To access across a local network use the -ip flag with your IP address (e.g. -ip 192.168.1.2) and then point your browser at [your-ip]:8888 (use -p to change the port).

My usual way of working is to play around with my code in a Jupyter notebook before officially starting a project. I find notebooks a better way to initially iteratively test and develop algorithms than coding and testing on the command line.

Once I have some outline functionality in a notebook it is time to create a new project. My workflow for this is as follows:

  1. Create a new empty repository on GitHub, with a Python .gitignore file, a basic ReadMe file and an MIT License.
  2. Clone the new empty repository into my local projects directory. I have set up SSH keys so this just involves:
     git clone git@github.com:[username]/[repositoryname].git 
  3.  Change directory into the newly cloned project directory:
     cd [repositoryname] 
  4. Create a new Conda environment. Conda is the command line package manager element of Anaconda. This page is great for working out the Conda commands equivalent to virtualenv and pip commands.
     conda create --name [repositoryname] python
  5. Activate new environment:
     source activate [repositoryname] 
  6. Create requirements.txt file:
     conda list --export > requirements.txt 
  7. Install required libraries (you can take these from your Jupyter notebook imports, e.g.:
     conda install nltk 
  8. Create a new .py file for your main program, move across your functions from your notebook and perform a first git commit and sync with GitHub.
    git add . 
    git commit -m "First Commit" 
    git push origin master 

Hey presto. You are ready to start developing your project.

Quick Post: Structuring a Python Program

One thing I’ve found hard about programming in Python is the jump from small scripts or iPython (now Jupyter) notebooks to fully functional programs.

Many examples and online tutorials only require a single “.py” file or a series of command line or notebook entries. However, as you get more advanced and start looking at complete Flash applications or libraries to upload to PyPI (for PIP install), there is a big jump in complexity. Now you are looking at a dozen or so files with various naming standards. You also need to setup virtual environments and upload code to GitHub using git. This can quickly become overwhelming.

Help is at hand though.

For help when you move beyond “rank amateur” with Python, I’m a big fan of Jeff Knupp. He has written many great tutorials. My favourite are:

I am also a fan of Kenneth Reitz‘s guide on Structuring Your (Python) Project. This fits in nicely with the latter two tutorials above – it explains a basic directory structure and gives an example on GitHub. I found that by comparing Kenneth’s and Jeff’s examples I could get a feel for what is required.

Of course the challenge now is to practice, practice, practice and start getting some libraries in a production ready standard and uploaded to PyPI.

 

 

Quick Post: Recursive Function to Search Multi-Level Dictionary

Many APIs return JSON that is converted into a multilevel dictionary (e.g. EPO OPS). The following code snippet helps find a key (e.g. “id”) that is nested within the dictionary.

The function is based on this answer. To find more than the first occurrence “return” may be converted to “yield”.

def keysearch(d, key):
    """Recursive function to look for first occurrence of key in multi-level dict. 
    param dict d: dictionary to process
    param string key: key to locate"""
 
    if isinstance(d, dict):
        if key in d:
            return d[key]
        else:
            if isinstance(d, dict):
                for k in d:
                    found = keysearch(d[k], key)
                    if found:
                        return found
            else:
                if isinstance(d, list):
                    for i in d:
                        found = keysearch(d[k], key)
                        if found:
                            return found

Twitter Robots on a Raspberry Pi

Or how to get very quickly write-restricted by Twitter.

This is a short guide to playing around with the Twitter API using Python on a Raspberry Pi (or any other Linux machine).

Overview

The process has four general steps: –

  1. Setup a new Twitter account and create a new Twitter app;
  2. Setup the Raspberry Pi to access Twitter;
  3. Write code to return tweets associated with given search terms; and
  4. Write code to post tweets based on the given search terms.

Step 1 – Create New Twitter Account and App

First it helps to have an email alias to avoid spam in your main email account. I found the site www.33mail.com, which offers you multiple email address for free in the form [X]@[your-username].33mail.com.

Next register a new Twitter account using the email alias. To do this simply log out of any active Twitter accounts then go to www.twitter.com and sign up for a new account. It’s pretty quick and straight forward. Skip all the ‘suggested follows’ rubbish.

I used public domain images from WikiMedia Commons (this British Library collection on Flickr is also great) for the profile and added a relevant bio, location and birthday for my Robot.

Once the new Twitter account is set up log in. Then go to http://dev.twitter.com and create a new application. Once the new application is created you can view the associated consumer key and secret (under the ‘Access Keys’ tab). You can also request an access token and secret on the same page.

Some points to note:

  • You need to register a phone number with your new Twitter account before it will allow you to create a new application. One phone number can be linked with up to 10 Twitter accounts. Beware that SMS notifications etc will be sent to the most recently added account – this was fine by me I typically avoid being pestered by SMS.
  • You are asked to enter a website for your application. However, this can just be a placeholder. I used “http://www.placeholder.com”.

Step 2 – Setup Raspberry Pi

I have a headless Raspberry Pi I access via SSH with my iPad. Any old Linux machine will do though.

To configure the computer do the following:

  • Create a GitHub repository (mine is here).
  • SSH into the computer.
  • Clone the remote repository  (e.g. ‘git clone [respository link]’). I find it easier to use SSH for communication with the GitHub servers easier (see this page for how to Setup SSH keys).
  • CD into the newly generated folder (e.g. ‘cd social-media-bot’).
  • Initialise a new virtual environment using virtualenv and virtualenvwrapper. I found this blogpost very helpful to do this. (Once you have installed those two tools via ‘pip’ use ‘mkvirtualenv social-media-bot’ to setup then ‘workon social-media-bot’ to work within the initialise virtual env. For other commands (I haven’t used any yet) see here.)
  • Install Twitter tools and other required libraries. This was as simple as typing ‘pip install twitter’ (within the ‘social-media-bot’ virtualenv).

Step 3 – Write Search Code

As with previous posts I decided to use ConfigParser (or configparser with Python 3+) to hide specific secrets from GitHub uploads.

My Python script thus uses a settings.cfg file structured as follows:
—-
[twitter_settings]
ACCESS_TOKEN = [Your token here]
ACCESS_SECRET = [Your secret here]
CONSUMER_KEY = [Your key here]
CONSUMER_SECRET = [Your secret here]

[query_settings]
query_term = [Your query term here]
last_tweet_id = 0

[response_settings]
responses =
 ‘String phrase 1.’;
 ‘String phrase 2.’;
 ‘String phrase 3.’
—-

Create this file in the directory with the Python code. 

  • The first section (‘twitter_settings’) stores the Twitter app access keys that you copy and paste from the ‘Access Keys’ tab of the Twitter developer webpage. 
  • The second section (‘query_settings’) stores the query term (e.g. ‘patent’) and a variable that keeps track of the highest tweet ID returned by the last search.
  • The third section (‘response_settings’) contains string phrases that I can randomly select for automated posts.

The Python code for accessing Twitter is called twitter_bot.py. Have a look on GitHub – https://github.com/benhoyle/social-media-bot.

The comments should helpfully make the code self-explanatory. Authentication, which is often fairly tricky, is a doddle with the Python ‘Twitter’ library – simply create a new ‘oauth’ object using the access keys loaded from the settings.cfg and use this to initiate the connection to the Twitter API:

settings_dict = dict(parser.items('twitter_settings'))
oauth = OAuth(settings_dict['access_token'], settings_dict['access_secret'], settings_dict['consumer_key'], settings_dict['consumer_secret'])

# Initiate the connection to Twitter REST API
twitter = Twitter(auth=oauth)

The script then searches for tweets containing the query term. 


tweets = twitter.search.tweets(q=expanded_query_term, lang='en', result_type='recent', count='10', since_id=last_tweet_id)['statuses']

Points to note:

  • There is a lot of ‘noise’ in the form of retweets and replies on Twitter. I wanted to look for original, stand-alone tweets. To filter out retweets add ” -RT” to the query string. To filter out replies use ” -filter:replies” (this isn’t part of the documented API but appeared to work).
  • I found that search terms often meant something else in languages other than English. Using ‘lang=’en” limited the search to English language posts.
  • The parameters for the API function map directly onto the API parameters as found here: https://dev.twitter.com/rest/reference/get/search/tweets.
  • The ‘since_id’ parameter searches from a given tweet ID. The code saves the highest tweet ID from each search so that only new results are found.

Step 4 – Posting Replies

A reply is then posted from the account associated with the token, key and secrets. The reply randomly selects one of the string phrases in the responses section. The reply is posted as a reply to tweets that contain the query term.


# Extract tweetID, username and text of tweets returned from search

tweets = [{

   'id_str': tweet['id_str'],

   'screen_name': tweet['user']['screen_name'],

   'original_text': tweet['text'], 

   'response_text': '@' + tweet['user']['screen_name'] + ' ' + random.choice(responses)

   } for tweet in tweets if query_term in tweet['text'].lower()]

#Posting on twitter

for tweet in tweets:

#Leave a random pause (between 55 and 75s long) between posts to avoid rate limits

 twitter.statuses.update(status=tweet['response_text'], in_reply_to_status_id=tweet['id_str'])

 #print tweet['original_text'], '\n', tweet['response_text'],  tweet['id_str']

 time.sleep(random.randint(75,120))

There is finally a little bit of code to extract the maximum tweetID from the search results and save it in the settings.cfg file.


#Don't forget running list of IDs so you don't post twice - maybe use since_id to do this simply - record latest id return by search and start next search from this

id_ints = [int(t['id_str']) for t in tweets]

# Add highest tweetID to settings.cfg file

parser.set('query_settings', 'last_tweet_id', str(max(id_ints)))

# Write updated settings.cfg file

with open('settings.cfg', 'wb') as configfile:

    parser.write(configfile)

I have the script scheduled as a cron job that runs every 20 minutes. I had read that the rate limits were around 1 tweet per minute so you will see above that I leave a random gap of around a minute between each reply. I had to hard-code the path to the settings.cfg file to get this cron job working – you may need to modify for your own path.

I also found that it wasn’t necessarily clear cut as to how to run a cron command within a virtual environment. After a bit of googling I found a neat little trick to get this to work: use the Python executable in the ‘.virtualenvs’ ‘bin’ folder for the project. Hence the command to add to the crontab was:


~/.virtualenvs/social-media-bot/bin/python ~/social-media-bot/twitter_bot.py

Result

This all worked rather nicely. For about an hour. Then Twitter automatically write-restricted my application. 

A bit of googling took me to this article: – https://support.twitter.com/articles/76915. It appears you are only allowed to post replies to users if you are a large multi-national airline. Nevermind. 

Maybe automated tweet ‘quoting’, favouriting or just posting would work better for next time. Still it was an enjoyable play-around with the dynamics of the Twitter API. It should be easy to incorporate tweets into future projects.