Use OpenShift to Map River Levels With Flask, MongoDB, Leaflet, and OpenStreetMaps

The application I'm going to write about here actually had its start a few years back. I have a number of friends in the whitewater paddling community around Boston and at the time, I had started playing around with some of the "Web 2.0" (now that's a dated term!) shiny-ness such as Google Maps and the Google Maps API. What if, one of those friends asked, I wrote an app that would let us see at a glance the water levels for the local rivers we paddle?

I crated that application and you can see it running on OpenShift at http://wwos-bitmason.rhcloud.com/

I spent a fair bit of time poking at the idea but I never got to a working application. There were just too many moving parts. The USGS, which maintains the gauges, didn't have an easy way to get at the data; I would have had to scrape it off Web pages. I would have had to run a back-end PHP application and setup an associated database to periodically do the scraping. And the Google Maps API, which I was planning to use to display the data, was in relatively early days. I periodically thought about revisiting the idea, but all those moving parts made it all seem like too much bother just to get to the point where I could, well, code the app.

Fast forward to OpenShift community day in Boston this past June. If you're reading this, you probably know what OpenShift is. But in case you don't, it's Red Hat's Platform-as-a-Service, a cloud computing service model that aims to make life easier for developers by abstracting away underlying infrastructure details that aren't typically relevant to writing Web applications (such as the provisioning and patching of operating systems). For our purposes here, I'm going to focus on the hosted Online version of OpenShift, although there are also enterprise and community (OpenShift Origin) versions.

Anyway, my colleague Steven Citron-Pousty showed me a demo he had put together that displayed the location of national parks as pins on a map that could be scaled and zoomed. Jaw drops. This was exactly what I had been trying to do! I still had some details to work out--such as getting the data--but SteveCP had shown me a basic approach that also, not incidentally, used OpenShift to greatly simplify getting a lot of the infrastructure set up and operating. It doesn't eliminate the supporting components--and, indeed, one of the nice things about OpenShift is that it gives you a rich choice of languages, frameworks, and other tooling. But it eliminates much of the unproductive busywork associated with installing and configuring software on bare metal or on a bare VM. 

But enough preliminaries. Onto the app! (I also encourage you to check out Part 1 and Part 2 of this series of posts by SteveCP which cover overlapping topics. I'll move relatively quickly over material detailed in those posts.)

1. Architecture

I'll start by briefly discussing how everything fits together so that you can better understand why we're doing what we're doing when we get into the details.

The application runs on a small gear on OpenShift Online. A gear is an isolated application container and you get three small gears with a free OpenShift account. The application is written in Python 2.6 and Flask, a "micro framework" for Python based on Werkzeug and Jinja 2. (A couple of utility programs I wrote to manipulate the data are also written in Python.) Flask templating is used to display an HTML/CSS/JS file that uses Leaflet, a "a modern open-source JavaScript library for mobile-friendly interactive maps," to display the pins and map tiles (which are from OpenStreetMaps). The data is obtained from a USGS Web service (returning JSON) and stored in a MongoDB database. 

As you can see, that is a lot of moving parts but it's surprisingly manageable with OpenShift. Indeed, I probably spent more time wrestling with complexities of data formats and retrieval than I did with the rest of the code. For the purposes of this post, I'm going to focus on the application and only deal briefly with the particulars of the USGS data which I'll cover in more detail in a future post.

2. Building the Foundation

First we setup the infrastructure. I assume that you have the rhc client tools and git installed and have a basic understanding of working with and pushing code to the OpenShift service. The name of the application is wwos. 

SteveCP's posts listed above provide some more detail, but here are the basic steps:

Install Python:

rhc app create -t python-2.6 -a wwos

Use the Flask quickstart (or go to the repository and clone the files manually):

cd wwosgit remote add upstream -m master git://github.com/openshift/openshift-mongo-flask-example.gitgit pull -s recursive -X theirs upstream master

(Ideally Flask would be packaged in a cartridge. We just haven't done so yet. However, with the new v2 cartridge architecture, it should now be quicker and easier to create new cartridges.)

Add MongoDB:

rhc cartridge add mongodb-2.2 -a wwos

Add Cron:

rhc cartridge add cron-1.4 -a wwos

Now… Actually that's all there is to get things setup. Not bad. Onto coding.

3. Create a File with the Initial Data

The USGS won't let you pull data associated with all their gauges at one time; you have to specify at least one "major filter." It turns out, for reasons I won't delve into in this post, that the best approach seems to be to create a list of two letter lowercase state abbreviations (plus DC and Puerto Rico) and iterate over the list thusly:

statelist = ["al","ak","az","ar","ca","co","ct","de","dc","fl","ga","hi","id","il","in","ia","ks","ky","la","me","md","ma","mi","mn","ms","mo","mt","ne","nv","nh","nj","nm","ny","nc","nd","oh","ok","or","pa","ri","sc","sd","tn","tx","ut","vt","va","wa","wv","wi","wy","pr"]

for i in statelist:    requesturl = "http://waterservices.usgs.gov/nwis/iv/?format=json,1.1&stateCd=" + i +"&parameterCd=00060,00065&siteType=ST"    req = urllib2.Request(requesturl)    opener = urllib2.build_opener()    f = opener.open(req)     entry = json.loads(f.read())

Each request returns all the gauges in the state of type "ST" (stream).

For each entry, we then iterate through the individual gauges and save the values.

count = int (len(entry['value']['timeSeries']) - 1)

while count >= 0:

    agaugenum = entry['value']['timeSeries'][count]['sourceInfo']['siteCode'][0]['value']

    asitename = entry['value']['timeSeries'][count]['sourceInfo']['siteName']     alat = entry['value']['timeSeries'][count]['sourceInfo']['geoLocation']['geogLocation']['latitude']

    along = entry['value']['timeSeries'][count]['sourceInfo']['geoLocation']['geogLocation']['longitude']  

    agauge = {     "sitename": asitename,    "pos": [along, alat],    "flow": 0,    "height": 0,    "timestamp": 0,    "statecode": i     }

#output is an empty dictionary created outside all loops

output[agaugenum] = agauge      count = count - 1

A few comments:

  • Yes, the JSON returned by the USGS is quite baroque. That's why I'm saving it for another post.
  • Count is the number of records to iterate over for each state.
  • Longitude and latitude are saved the way they are for reasons that will become clear in the next section.
  • Flow, height, and timestamp are placeholders for time-variant data.

Finally, once we've iterated through all the states (and all the gauges within each state), we write out the data to a JSON-ish file. (Mongoimport is a bit fussy about formats and I couldn't get the built-in Python dump functions to work properly.)

fileout = open('gaugesall.json', 'w')

for k in output:     agaugestr = '{ "_id" : "' + k + '",'    asitenamestr = ' "sitename" : "' + output[k]["sitename"] + '" ,'    astatecodestr = ' "statecode" : "' + output[k]["statecode"] + '" ,'    aposstr = ' "pos" : [' + str(output[k]["pos"][0]) + ', ' + str(output[k]["pos"][1]) + '] ,'    aflowstr = ' "flow" : "' + str(output[k]["flow"]) + '" ,'    aheightstr = ' "height" : "' + str(output[k]["height"]) + '" ,'    atimestampstr = ' "timestamp" : "' + str(output[k]["timestamp"]) + '" }'

    outstr= agaugestr + asitenamestr + astatecodestr + aposstr + aflowstr + aheightstr + atimestampstr

    fileout.write(outstr)    fileout.write("\n") fileout.close()

Your file should look like this but with a whole lot more lines.

{ "_id" : "08072760", "sitename" : "Langham Ck at W Little York Rd nr Addicks, TX" , "statecode" : "tx" , "pos" : [-95.646612, 29.86717035] , "flow" : "0" , "height" : "0" , "timestamp" : "0" }{ "_id" : "11055500", "sitename" : "PLUNGE C NR EAST HIGHLANDS CA" , "statecode" : "ca" , "pos" : [-117.141704, 34.11834458] , "flow" : "0" , "height" : "0" , "timestamp" : "0" }

4. Load the Data into MongoDB

Because OpenShift is platform as a service and not infrastructure as a service there are limited places where you can write data to the server. One of those locations is app-root/data (under the applications home directory). That is where we will scp our data file to. 

scp gaugesall.json YOURSSH@wwos-YOURDOMAIN.rhcloud.com:app-root/data/

Now that we have the file on the server we can SSH in and then run the mongoimport command:

mongoimport -d gauges -c gaugepoints --type json --file app-root/data/gaugesall.json -h $OPENSHIFT_MONGODB_DB_HOST -u admin -p $OPENSHIFT_MONGODB_DB_PASSWORD

This should import about 9,500 points into a collection called gauge points in the gauges database.

Now create a 2d index to spatially enable your database.

mongo

>use gauges

>db.gaugepoints.ensureIndex( { pos : "2d" } )

(This is why we stored the longitude and latitude as we did previously.)

5. Create a Utility to Update the Database as a Cron Job

We now want write some code to add the flow, height, and creation time to the gauges. For our purposes here, I'm going to describe doing so with a python program that gets executed as a cron job by OpenShift. One could also use virtually identical code that can be executed through Flask in response to an HTTP request which allows for manual or scheduled updates from outside OpenShift. I'll discuss this briefly in the next section.

Most of the code in this utility is the same as that used to create the gaugesall.json file so I'm only going to describe the differences.

First, you need to establish a connection to the database.

conn = pymongo.Connection(os.environ['OPENSHIFT_MONGODB_DB_URL']) db = conn.gauges

Then we iterate through the states and get the count for a given state as before, but now the code within that inner loop is as follows. (The try/except code was to deal with some weird glitches with some gauges.)

while count >= 0:

    agaugenum = entry['value']['timeSeries'][count]['sourceInfo']['siteCode'][0]['value']

     variablecode = str(entry['value']['timeSeries'][count]['variable']['variableCode'][0]['variableID'])

    try:        variablevalue = str(entry['value']['timeSeries'][count]['values'][0]['value'][0]['value'])    except:        variablevalue = ""

    try:        creationtime = str(entry['value']['timeSeries'][count]['values'][0]['value'][0]['dateTime'])    except:        creationtime = ""

#Gage ht. ft. variableID 45807202

    if variablecode == '45807202':        db.gaugepoints.update({"_id":agaugenum},{"$set":{"height":variablevalue}})

#Discharge cfs variableID 45807197

    if variablecode == '45807197':        db.gaugepoints.update({"_id":agaugenum},{"$set":{"flow":variablevalue}}) 

    db.gaugepoints.update({"_id":agaugenum},{"$set":{"timestamp":creationtime}})

    count = count - 1     conn.close()

A couple of things to note. 

  • When we created the collection, we saved the gauge number (agaugenumber) as _id. This automatically makes it the primary index in MongoDB, which makes sense because we are making all the changes against that index.
  • The business with the variablecode comes about because each JSON record returned  by the USGS Web service is actually just for one variable. Thus, for most stream gauges, two records are returned: one for the height and one for the flow. (Other variables are ignored.)

You can now add this utility to your repo/.openshift/cron/daily directory and push it to OpenShift. (See this post for more information on cron.)

6. Filling out the Flask Framework

SteveCP discusses Flask in one of the earlier linked posts and I won't repeat what he writes here. As a way of testing things out, you probably want to write a couple of simple Flask functions. For example, this function would return JSON showing all the records in the collection if you go to http://wwos-YOURDOMAIN/rhcloud.com/ws/gauges.

@app.route("/ws/gauges")def gauges():    #setup the connection to the gauges database    conn = pymongo.Connection(os.environ['OPENSHIFT_MONGODB_DB_URL'])    db = conn.gauges

#query the DB for all the gaugepoints    result = db.gaugepoints.find()

#Now turn the results into valid JSON    return str(json.dumps({'results':list(result)},default=json_util.default))

You can also write a function that will update a single state thusly in response to, say, http://wwos-YOURDOMAIN/rhcloud.com/ws/gauges/state?st=ma

@app.route("/ws/gauges/update/state")def updatestate():

   statelist = ["al","ak","az","ar","ca","co","ct","de","dc","fl","ga","hi","id","il","in","ia","ks","ky","la","me","md","ma","mi","mn","ms","mo","mt","ne","nv","nh","nj","nm","ny","nc","nd","oh","ok","or","pa","ri","sc","sd","tn","tx","ut","vt","va","wa","wv","wi","wy","pr"]  #setup the connection to the gauges database    conn = pymongo.Connection(os.environ['OPENSHIFT_MONGODB_DB_URL'])    db = conn.gauges

    i = request.args.get('st')

    requesturl = "http://waterservices.usgs.gov/nwis/iv/?format=json,1.1&stateCd=" + i +"&parameterCd=00060,00065&siteType=ST"

Followed by the code we wrote for the earlier update function. (Updates through the Web service interface seem to be more robust if you can do them state by state to allow for failures of some requests--easy enough to do in a script file.)

However, what we really need Flask to do for us is to return the points within a bounding box defined by longitude/latitude pairs.

#find gauges within a lot/long bounding box passed in as query parameters (within?lat1=45.5&&lon1=-82&lat2=42&lon2=-84)@app.route("/ws/gauges/within")def within():    #setup the connection    conn = pymongo.Connection(os.environ['OPENSHIFT_MONGODB_DB_URL'])    db = conn.gauges

    #get the request parameters    lat1 = float(request.args.get('lat1'))    lon1 = float(request.args.get('lon1'))    lat2 = float(request.args.get('lat2'))    lon2 = float(request.args.get('lon2'))

    #use the request parameters in the query    result = db.gaugepoints.find({"pos": {"$within": {"$box" : [[lon1,lat1],[lon2,lat2]]}}})

    #turn the results into valid JSON    return str(json.dumps(list(result),default=json_util.default))

What's going on here is that the framework is being passed a request with query parameters and a result is returned using one of MongoDB's geo functions (which is why we had to spatially enable the pos field in our collection earlier). But where does this query come from? To answer that, I'm going to show you one more very short chunk of code from our app and follow where that takes us.

If you were to type http://wwos-YOURDOMAIN/rhcloud.com/, you'd end up here:

@app.route("/")def mainapp():    return render_template("index.html")

7. The Map

Flask uses Jinja2 templating and that's about all I'm going to say on that subject as our use of templates here is extremely simple. Suffice it to say that, under your application's wsgi directory (where a file called application calls your customized code), you create a directory called templates and you put your index.html file there. This means that if you type http://wwos-YOURDOMAIN/rhcloud.com into a browser, you get to index.html (by way of Flask). As mentioned earlier, this application uses Leaflet to display the map and pins. I'm not going to go through all the details--Leaflet's API information is pretty good--but I'll walk you through the overall flow in index.html leaving out a lot of the CSS/styling/etc.

First we run the Leaflet JavaScript

 <script src="http://cdn.leafletjs.com/leaflet-0.5.1/leaflet.js"></script> 

Then we create a map and add a layer group to that map. (I center it near Boston. You could write code to try to use your location.)

var map = L.map('map').setView([42.35, -71.06], 10);var markerLayerGroup = L.layerGroup().addTo(map);

We use OpenStreetMaps tiles. (Leaflet lets you use a variety of tile sources.)

L.tileLayer('http://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png', { maxZoom: 18, attribution: 'Map data &copy; <a href="http://openstreetmap.org">OpenStreetMap</a> contributors, <a href="http://creativecommons.org/licenses/by-sa/2.0/">CC-BY-SA</a> Written by <a href="http://www.bitmasons.com">Gordon Haff</a>. Running on OpenShift by Red Hat. <a href="http://bitmason.blogspot.com/p/blog-page_18.html"> About.</a>' }).addTo(map);

Now--drumroll please--we tie the whole application together. In response to appropriate events (the map.on lines), we call getPins , which is sending an http request back to the application which got us to index.html in the first place.

function getPins(e){   bounds = map.getBounds(); url = "/ws/gauges/within?lat1=" + bounds.getNorthEast().lat + "&lon1=" + bounds.getNorthEast().lng + "&lat2=" + bounds.getSouthWest().lat + "&lon2=" + bounds.getSouthWest().lng; $.get(url, pinTheMap, "json")

Finally, we draw the pins using the data retrieved from that request. (We check how far out the map is zoomed because, with over 9,000 points, the performance becomes unacceptable if you try to draw too many on the screen. 8--which corresponds to an area covering a couple of average-sized states on most devices--seemed about right.) 

River Gauges Image

8. An Application is Never Really Done

As I mentioned earlier, I'll be writing another post that gets into some details about the USGS data and some of the challenges I encountered and (mostly) overcame dealing with it. Beyond this particular dataset, you could also use this same basic code to write applications that showed many manner of places, whether points of interest or something else.

One thing that I should add to the application is the ability to optionally geolocate to where the user is located. For reasons related to how some browsers don't always throw locationerror events, this turned out to be harder to do well than anticipated so I left it out for now.

Coming back to OpenShift, one thing I hope you take away is that--while this app certainly has some complexities--setting up the basic infrastructure was pretty straightforward and let you pretty much jump right into retrieving data and coding. At the same time, OpenShift gave me lots of options about how to go about developing this applications, from language to database to framework.

Here's the code on GitHub.

What's Next?