Spatial MongoDB in OpenShift, be the next FourSquare - Part 1

Building a spatial mobile application on OpenShift with MongoDB and JBoss

 

One of the hottest areas in technology right now is SoLoMo (social, local, and mobile). In my next series of blog posts I will be showing you how to bring up the infrastructure and server-side application to handle a mobile "checkin" service. In reality, this application could be used for social checkins but could also be used in an enterprise setting for asset mangement or field forms submittals. The basic idea is an application where a mobile user can look for things near them and then submit their own notes or even locations of interest.

Today's post will cover setting up JBoss with MongoDB and then explore working with spatial data in MongoDB. The next post will show how to build a very simple REST-like service on top of our MongoDB store, and the final post will build a simple Android application to interact with the service. Let's get started with building our spatial data store.

[UPDATE] Here is a 30 minute presentation I gave on the topic at MongoSF 2012 [/UPDATE]

 

[UPDATE] Or if you are just too darn lazy to read the blog post check out the video below [/UPDATE]

Spinning up the OpenShift Infrastructure

Bring up an instance of Java + MongoDB on OpenShift using the command line

rhc domain create -d -n spmongo -l <yourusername>
rhc app create -t jbossas-7 -a parks -l <yourusername>
rhc cartridge add mongodb-2.0 -a parks -l <yourusername>

All done with setting up our "server"

Data Entry into Mongo

Now we need to import the data into our MongoDB instance. There are some particular ways that MongoDB expects to receive spatial data. While this structure is much less rigid than what is found in relational databases, there is still a specified structure.

Per the instructions on the MongoDB site about spatial data:

In order to use the index, you need to have a field in your object that an array where the first 2 elements are x,y coordinates (or y,x - just be consistent)

Therefore, when we format our data for either import or insert we need to make sure to our coordinates are in an array.

Our data for this application is Federal Parks in the U.S. and Canada. It was originally a CSV file but I converted it to JSON. The coordinates are put into an array within an attribute named "pos".

One other thing to remember is that Mongo spatial indices are not aware of units so all your results will be in the units of your data. You could just as easily use other coordinates, such as the inches from a corner on a electronics transistor board or the dimensions of your backyard.  Since our data is degrees latitude and longitude, this means that all our measurements will be returned in units of Degrees of the Earth. Using Degrees of the Earth can cause problems with distance measurements (and area) since the distance between degrees of longitude change as you move towards the poles. For the exercise today it doesn't matter but if you want to do more with the coords you should use the Spherical Model in MongoDB.

Download the attached JSON, which contains all the park data. To get the data into MongoDB we are going to use the command line mongoimport tool. There are two ways to use this tool with OpenShift.

  1. If you already have mongoImport installed on your local machine you can port forward and use the local copy.
  2. You can use scp or sftp to copy the file from your local machine to the OpenShift "server". From there you SSH into your machine and run mongoimport locally on your "server".

Since I don't want to assume that you have Mongo installed locally we will go with option 2.

Because OpenShift is platform as a service and not infrastructure as a service there are limited places where you can write data to the server. One of those locations is ~/<AppName>/data so that is where we will scp our data.

scp parkcoord.json 0b49d9219f2847c6a236820959f9a7a6@parks-spmongo.rhcloud.com:parks/data/

Now that we have the file on the server we can SSH in and then run the mongoimport command

mongoimport -d parks -c parkpoints --type json --file app-root/data/parkcoord.json  -h $OPENSHIFT_NOSQL_DB_HOST  -u admin -p $OPENSHIFT_NOSQL_DB_PASSWORD

The final line should show up as:

     imported 547 objects

Now the final step is to create a spatial index on our data. Creating this index will allow us to ask simple spatial questions of our data.

 mongo -u admin -p <xxxxx> 127.7.183.1/parks
db.parkpoints.ensureIndex( { pos : "2d" } );
  1. To create the index we need to open a mongo terminal
  2. and then create a 2d index

And now we are done getting our data into MongoDB with it being spatially enabled.

 

Query, Insert, and Update

Queries

And that's it to getting spatially enabled functionality in MongoDB. Look how easy it is to query for parks near Kansas, USA:

db.parkpoints.find( { pos : { $near : [-37,50] } } )

The results will be returned in descending distance from the point in the query . You can also do a query on a position and a name  such as a regular expression search in Name for lincoln, case insensitive, and near the location we tried before

db.parkpoints.find( { Name : /lincoln/i, pos : { $near : [-37,50] }} )

From now on you can find the nearest Federal parks near coordinates passed in for a location. The location can come from where a user clicks on a map, from where they checkin, from the GPS unit on their phone, or any other technology that gives you a location on earth..

There is documentation on how to use a geospatial query (both distance and containment) on the MongoDB site.

Inserts and Updates

For inserts and updates we will build a new collection of the user locations (userloc).

db.createCollection("userloc")
db.userloc.ensureIndex( { pos : "2d" } );

Let's go ahead and insert our first document:

db.userloc.insert({ "created" : new Date(), "Notes" : 'just landed', "pos" : [-76.7302 , 25.5332 ] })

This document has an attribute of created  - set to the date and time of the server when the record was inserted; an attribute Notes, and a pos attribute which is an array to store the coordinates of the users location.

Let's do a quick query to make sure the spatial indexing works:

db.userloc.find( { pos : { $near : [-37,50] } } )

Because of the schema-less nature of MongoDB we could have just inserted the document and then created the 2D index.  Since MongoDB just holds documents, instead of a table, there is no schema to declare for the documents which are being stored. Instead, MongoDB just accepts the document and if it has a pos attribute it will add it to the index, if not then there is no addition to the index.

Updates

MongoDB has two ways to do an update. There is the normal update, where you can change the values on an existing document. For example let's change the note for our first document to "our first note".

db.userloc.findOne();
"_id" : ObjectId("4f95e12ad12c5fe7ef0b3dac")
 db.userloc.update({"_id" : ObjectId("4f95e12ad12c5fe7ef0b3dac")},{"$set" : {"Notes": 'our first note'}})
db.userloc.findOne()
  1. Since we only have one document, we can:
  2. We take the _id from this document (this id will be different in your database if you are doing the steps at home):
  3. Now we can use the key in an update statement
  4. To check do a:

and see the new note.

The other way to do an update is an Upsert. An upsert provides functionality that first checks if a document exists and if it does then update it, otherwise insert it as a new document. There are two ways to do an upset, use a .save or use a .update with upsert=true. For today we are just going to use the save to first insert a new document and then update the note on the original document.

db.userloc.save({ "created" : new Date(), "Notes": 'that was a big step', "pos" : [-37.7302 , 40.5332 ]})
db.userloc.find( { pos : { $near : [-37,50] } } )
db.userloc.save({"_id" : ObjectId("4f95e12ad12c5fe7ef0b3dac"),"Notes": 'really the landing', "pos" :  [-76.7302 , 25.5332  ] }) 
db.userloc.save({"_id" : ObjectId("4f95e12ad12c5fe7ef0b3dac"),"created" : new Date(), "Notes": 'really the landing', "pos" :  [-76.7302 , 25.5332  ] })
myDoc = db.userloc.findOne({"_id" : ObjectId("4f95e12ad12c5fe7ef0b3dac")});
myDoc.Notes = "really the landing";
db.userloc.save(myDoc);
  1. Save a new document - this one is closer to the point we use in our find query
  2. If you do the find from above you will see that the new point is closer
  3. Now let's update the original documents note. Be careful with this, as any attribute not listed in the document will not be retained in the updated document:
  4. If you do a db.userlocs.find() you will notice that this record is now missing the "created" attribute. By doing the upsert this way, we are completely redefining the document. In effect, what we are saying is that the original document is replaced with this new document but they "_id" stays the same. To do just a single attribute we would use a .update with upsert=true and set just the attribute we are interested in changing. Instead what we wanted to do was:
  5. One unintended side effect of doing the update this way is that the date is now changed. To just change the notes field on the record would be better achieved through the following code:

As you can see MongoDB has many ways to handle document inserts and updates. I strongly recommend reading the documentation on inserts, updates, and upserts to fully understand which technique is better for your application.

Conclusion

I hope this post has shown you how to:

  • Setup and use MongoDB from OpenShift
  • Import data into MongoDB on OpenShift
  • Work with spatial data in MongoDB
  • Do basic inserts, queries, and updates of documents in MongoDB

Now that we are familiar with how MongoDB works we can move on to writing some server-side code that leverages the power of MongoDB. In the next blog post in the series I will show you how to make some REST style services to do basic CRUD operations with spatial data and MongoDB. These services will be used to create a National Park Finder and Checkin application. As always, if you have questions feel free to write them below, post them in the forums, or ask them on IRC in #openshift on freenode.

We've got a lot of MongoDB and cloud-interested readers who'd like to read some quality, developer-centered content like this post. Mind if I republish on DZone's cloud portal with a link back to the original?

Hey egenesky:
That would be great - go ahead and post it on DZone.
Thanks
Steve