Create a Spatial REST Service Using Lucene Spatial and JEE

Blue man looking at some pins on a map

Greetings Shifters, the time has come to write the final piece in my series on using Lucene for spatial applications. Today we take the index we made in the second post in this series and we put a nice REST based search interface in front of it. My goal with this REST service was to make a service similar to the one I built for my JEE + MongoDB blog post. By doing this: 1) It would be easier to compare the code and 2) since the REST signatures would be the same for code I cared about, I could chose to use my server-side implementation based on what made more sense for my application.

Let's go ahead and dive into the code.

Creating the application and getting the index on OpenShift

Since we are going to use JEE, we need to create a JBoss EAP application.

rhc app create lucenespatial jbosseap-6

This will create our application with JBoss EAP 6 all ready to run.

The source code for the application and the index built in the previous blog post can be found on GitHub.

Move into the directory titled lucenespatial, which is right below the directory where you executed the command above. Then make the github repo an upstream project to the lucenespatial application, merge the code, and push it back up to your OpenShift gear.

cd lucenespatial
git remote add github -m master https://github.com/thesteve0/LuceneSpatial.git
git pull -s recursive -X theirs github master   
git push

Now we need to do one more step to get the index into a location where we want the application to see it. The index for the application is stored the git repository from the code to build the index. You don't need the lukeall-4.4.0.jar, unless you want to use Luke to look at your lucene index. The rest of the files in that directory make up your index. Our application is expecting the files to be in $OPENSHIFT_DATA_DIR/indexDir so we need to scp them up to the server.

I am going to give the instructions assuming you do NOT have the index already on your machine. Go to a directory where you want the Indexing git repo to appear on your machine. From there:

git clone https://github.com/thesteve0/SpatialLuceneIndexer.git
 
cd SpatialLuceneIndexer
 
scp -r indexDir {uuid}@lucenespatial-{your domain}.rhcloud.com:app-root/data

Now we just restart the application so it can pick up the lucene index

rhc app restart lucenespatial

You should now be able to hit:

http://lucenespatial-{your domain}.rhcloud.com/ws/parks and see a bunch of lovely JSON on your screen.

Let's look at the code

If you are not familiar with JAX-RS or CDI, I highly reccomend reading about them in my JEE post since I will not cover that overview here.

There are only three classes that make up our application.

FileHandler.Java

Since opening a file handle is "costly", I created an injectable class that opens a file handle to the index, gets an IndexSearcher, and holds on to it. I load this class into application scope so we can re-use the IndexSearcher throughout the Application.

All searches in Lucene go through the IndexSearcher as well as any queries about metadata of the index. You can say that, for a Lucene search application, the IndexSearcher is the main class required for the searching functionality.

lucenews.java

This is the JAX-RS class where I map URLs to method calls. In this application I wanted a very clear separation of concerns so this Class is very lightweight and is basically a router to different queries. The added benefit of doing it this way is that lucenews has no Lucene pieces in it and we are free to swap the search without having to rewrite this class.

The queries return ArrayLists, which JAX-RS (thought Jackson) is quite happy to serialize to JSON for output. With Lucene Spatial it was quite easy to do all the spatial queries I did with MongoDB.

I added a new signature for querying a rectangle since I will use that in the JavaScript front end I build for this application. I also added a circle search to allow queries to limit the radius of a search. We need this since the near query in Lucene Spatial returns all the documents in distance sorted order. Finally, I show a mixed free text and spatial search so you can see how to combine the two types of queries.

QueryHandler.java

This class handles all the fun Lucene Spatial action for the application. The first little bit of code handles the spatial pieces that are common to all the queries.

    private void setupSpatial(){
        this.spatialContext = new SpatialContext(true);
 
        //We also need a lucene strategy that matches the strategy used to build the index
        SpatialPrefixTree grid = new GeohashPrefixTree(spatialContext, 11);
        this.strategy = new RecursivePrefixTreeStrategy(grid, "position");
    }

The SpatialContext comes from Spatial4j and is used to create the spatial objects we are going to use in our queries. For example, when we do a rectangle query, we need to create a spatially aware rectangle before we can pass it to Lucene.

Let's start with the basic query that matches all the documents.

The method is called getAllParks and there is only one line that matters:

TopDocs returnedDocs  = searcher.search(new MatchAllDocsQuery(), searcher.getIndexReader().numDocs());

By using a MatchAllDocsQuery we are telling Lucene we are not applying any selection criteria. We also tell the query that we want all the documents in the collection rather than a limited number. We get back TopDocs, which is a "collection" of all the documents in the index in relevance order. Any fields you stored when creating the index will be available in each document.

We then loop through all the returned documents and create a HashMap with just the information we want, in a format we want, add it to an ArrayList, and then return the ArrayList. I should have actually extracted the loop into a separate method as well since it is repeated in each different query.

The rectangle query

This function is called getABoxOfPoints All the remaining query use the SpatialContext to create spatial object to be used in the search. For the rectangle, we create a rectangle from the corners passed in, use that in spatialargs with an intersection test, and then use that spatialargs as a filter in the Lucene search. Intersection means anything contained or touching our rectangle will match the query.

 Shape ourRectangle = new RectangleImpl(minX, maxX, minY, maxY, spatialContext);
    SpatialArgs spatialArgs = new SpatialArgs(SpatialOperation.Intersects, ourRectangle);
    Filter filter = strategy.makeFilter(spatialArgs);
        TopDocs returnedDocs = searcher.search(new MatchAllDocsQuery(), filter, searcher.getIndexReader().numDocs());

Finding parks near a location with a certain name

This function is called getNameNear and uses a different spatial technique and also uses a standard Lucene Query.

To handle the spatial piece what we actually do is actually sort the results by distance rather than relevance. In this way all the results come back in increasing distance order. `

 Point ourCenterPoint = new PointImpl(lon, lat, spatialContext);
    ValueSource valueSource = strategy.makeDistanceValueSource(ourCenterPoint);
    Sort distSort = new Sort(valueSource.getSortField(false)).rewrite(searcher);

Then instead of a MatchAllQuery, we actually construct a standard Lucene Query. The important step is that you have to take the string passed in and parse it using the Analyzer used to create the index and on a field that was index (or indexed and stored).

 String nameField = "name";
    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_44);
    QueryParser queryParser = new QueryParser(Version.LUCENE_44, nameField, analyzer);
 
    Query query = queryParser.parse(name);
 
    // we only want to search against documents that have the "name" field filled in
    int numDocs = (int) searcher.collectionStatistics("name").docCount();
 
    TopDocs returnedDocs = searcher.search(query, numDocs, distSort);

And with that we covered the most complicated query in our application.

Conclusion

I hope this series of blog posts have show you how easy it is to index, create, and host an application using Lucene and Lucene Spatial. We have built a powerful free-text search index against spatial documents and then built a nice little REST style service to give an API to the documents. This example app should give you all the building blocks you need to create more exciting and interesting spatial applications with Lucene. I would love to see someone fork this and add the ability to add new documents and update the index.

In terms of this REST service, I will be writing this using PostGIS + Hibernate Spatial to have some nice comparison apps. From there I would like to see how they perform under load - so I will make them scalable applications and load test them. Finally I want to sum up a comparison between all 3 and give reccomendations where you might want to use each datastore.

What's Next