Enable Full Text Search in your OpenShift Application with Searchify

Full Text Search in your OpenShift Application

Today, full text search has become one of the most important feature of every web application. Several open source projects like Apache Lucene, Apache Solr , ElasticSearch,etc. have made it very easy and cheap to add full text search to your application. However, the developer must manage the Solr or ElasticSearch cluster themself. To relieve developers and IT teams from this workload, we can use cloud service like Searchify to enable full text search in our application. Searchify is a full-text search as a service which makes it very easy to add custom full text search, without the cost or complexity of managing search servers.

Prerequisites

Before we can start building the application, we have a few setup tasks to do:

  1. Sign up for an OpenShift Account. It is completely free and Red Hat gives every user three free Gears on which to run their applications. At the time of this writing, the combined resources allocated for each user is 1.5 GB of memory and 3 GB of disk space.

  2. Install the rhc client tool on your machine. The rhc tool is a ruby gem so you need to have ruby 1.8.7 or above on your machine. To install rhc, just type

sudo gem install rhc

If you already have one, make sure it is the latest one. To update your rhc, execute the command shown below.

sudo gem update rhc

For additional assistance setting up the rhc command-line tool, see the following page: https://openshift.redhat.com/community/developers/rhc-client-tools-install

  1. Setup your OpenShift account using rhc setup command. This command will help you create a namespace and upload your ssh keys to OpenShift server.

  2. Sign up for searchify trial account. The trial account works for 30 days and you can store upto 100,000 documents.

  3. After sign up, login to your account and go to dashboard and click on "Create another index" as shown below. Create Searchify Index

  4. Create index with name test_index and you will see it on dashboard as shown below. Searchify Test Index

Let's Get Started

After we have completed all the prerequisites, its time to start building the application.

Create OpenShift Application

We will start by creating a Tomcat 7 application and using code from my searchify quickstart github repository.

rhc app create searchifydemo tomcat-7 --from-code=git://github.com/shekhargulati/searchify-openshift-quickstart.git

This will create an application container for us, called a gear, and setup all of the required SELinux policies and cgroup configuration. OpenShift will also setup a private git repository with code from the quickstart github repository and clone the repository to your local system. Finally, OpenShift will propagate the DNS world wide. The application will be accessible at http://searchifydemo-{$domainName}.rhcloud.com/. Replace {$domainName} with your own unique domain name.

Look at the application source code

The application is a simple Spring MVC application which expose two RESTful endpoints -- one to add data the Searchify index and another to get data from the Searchify index for the given lucene query.

Bootstrapping application context

All the Spring context configuration is defined in the ApplicationConfiguration.java class. This application uses Java based Spring application context configuration and is annotated with the Configuration annotation. The ApplicationConfiguration class defines two beans -- one to create the IndexTankClient which wraps the Searchify REST API and MappingJacksonJsonView to render JSON output. The Spring MVC is enabled by the use of EnableWebMvc annotation.

@Configuration
@ComponentScan(basePackages = "com.openshift.searchifydemo")
@EnableWebMvc
public class ApplicationConfig {
 
    private static final String SEARCHIFY_API_URL = "http://xxxxx.api.searchify.com";
 
    @Bean
    public IndexTankClient indexTankClient(){
        IndexTankClient indexTankClient = new IndexTankClient(SEARCHIFY_API_URL);
        return indexTankClient;
    }
 
    @Bean
    public MappingJacksonJsonView jsonView() {
        MappingJacksonJsonView jsonView = new MappingJacksonJsonView();
        jsonView.setPrefixJson(true);
        return jsonView;
    }
 
 
}

The IndexTankClient instance requires the Searchify API url. This is unique per user. You can get this url from the dashboard of your Searchify account.

Searchify client

The second class in this application is SearchifyClient which wraps the Searchify IndexTankClient Java API. It exposes three methods - createIndex, addToIndex, and search.

@Service
public class SearchifyClient {
 
    private IndexTankClient client;
    private static final String INDEX_NAME = "test_index";
 
    @Autowired
    public SearchifyClient(IndexTankClient indexTankClient) {
        this.client = indexTankClient;
    }
 
    @PostConstruct
    public Index createIndex() {
        IndexConfiguration configuration = new IndexConfiguration();
        configuration.enablePublicSearch(false);
        try {
            Index index = client.createIndex(INDEX_NAME, configuration);
            while (!index.hasStarted()) {
                Thread.sleep(300);
            }
            return index;
        } catch (IndexAlreadyExistsException e) {
            System.out
                    .println("Index already exists so skipping this exception");
            return client.getIndex("test_index");
        } catch (MaximumIndexesExceededException e) {
            throw new RuntimeException("You have exceeded the limit ", e);
        } catch (Exception e) {
            throw new RuntimeException("Unable to create index because of  ", e);
        }
    }
 
    public void addToIndex(String documentId, Map<String, String> fields){
        try {
            Index index = client.getIndex(INDEX_NAME);
            index.addDocument(documentId, fields);
        } catch (Exception e) {
            throw new RuntimeException(
                    "Exception occured while adding document to index .. ", e);
        }
    }
 
    public Set<String> search(String query) {
        Set<String> documentIds = new LinkedHashSet<String>();
        try {
            Index index = client.getIndex(INDEX_NAME);
            SearchResults results = index.search(Query.forString(query));
            System.out.println("Matches: " + results.matches);
            for (Map<String, Object> document : results.results) {
                System.out.println("doc id: " + document.get("docid"));
                documentIds.add((String)document.get("docid"));
            }
        } catch (Exception e) {
            throw new RuntimeException(
                    "Exception occured while searching for results with query "
                            + query, e);
        }
        return documentIds;
    }
 
}

REST API

The SearchifyController exposes two RESTful operations:

  • The first allows the addition of a document to the index
  • The second allows searching documents with a lucene query as shown below.
@Controller
@RequestMapping("/searchify")
public class SearchifyController {
 
    @Autowired
    private SearchifyClient searchifyClient;
 
    @RequestMapping(value = "/add", method = RequestMethod.POST, consumes = MediaType.APPLICATION_JSON_VALUE)
    public ResponseEntity<String> addToIndex(@RequestBody Document document) {
        Map<String, String> fields = new HashMap<String, String>();
        fields.put("text", document.getText());
        searchifyClient.addToIndex(document.getId(), fields);
 
        ResponseEntity<String> responseEntity = new ResponseEntity<String>(
                HttpStatus.CREATED);
        return responseEntity;
 
    }
 
    @RequestMapping(value = "/search", method = RequestMethod.GET, produces = MediaType.APPLICATION_JSON_VALUE)
    public @ResponseBody Set<String> search(@RequestParam("query") String query) {
        Set<String> documentIds = searchifyClient.search(query);
        return documentIds;
    }
 
}

Push the code to cloud

Now we can push the code to OpenShift and see our application running in cloud.

git push

The application will be running at https://searchifydemo-domain-name.rhcloud.com/. Please replace domain-name with your own namespace.

Test the API

To test the REST web service we will use curl to add data to our index and fetch documents.

To add the documents, use the follwing curl commands:

curl -i -X POST -H "Content-Type: application/json" -H "Accept: application/json" -d '{"id":1,"text":"this is a first test document"}' http://searchifydemo-ideas.rhcloud.com/api/searchify/add
 
curl -i -X POST -H "Content-Type: application/json" -H "Accept: application/json" -d '{"id":2,"text":"this is a second test document"}' http://searchifydemo-ideas.rhcloud.com/api/searchify/add
 
 
curl -i -X POST -H "Content-Type: application/json" -H "Accept: application/json" -d '{"id":3,"text":"this is a third test document"}' http://searchifydemo-ideas.rhcloud.com/api/searchify/add

To fetch all the documents which have test in them, use the following curl command:

curl -i -H "Accept: application/json" http://searchifydemo-ideas.rhcloud.com/api/searchify/search?query=text:test
HTTP/1.1 200 OK
Date: Tue, 30 Apr 2013 07:36:14 GMT
Server: Apache-Coyote/1.1
Content-Type: application/json;charset=UTF-8
Vary: Accept-Encoding
Transfer-Encoding: chunked
 
["3","2","1"]

Conclusion

As you can see, it is very easy and requires minimal effort to use and integrate third party services with OpenShift. So, what are you waiting for, sign up for OpenShift and start building cool applications.

What's Next?

once again, very good article, thanks!