PostgreSQL 9.2 Comes to OpenShift

The postgresql elephant on the move picture There have been a lot of requests for a more recent version of PostgreSQL than version 8.4. Well as of this last release you now have access to PostgreSQL 9.2.4 (as well as still having 8.4). In this blog post I will give a short background to the history of our upgrade, how to move your data, and then some of the cool new database features you can now use.

History

Very early on we knew as a team that we needed a more recent version of PostgreSQL but we also knew that providing support and bug fixes for anything shipped with our platform was just as important. For that reason, all of our default packages for languages and databases were the ones provided with Red Hat Enterprise Linux 6.

We saw the votes come in, the forum threads, and the tweets. Then Red Hat announced it's SCL product and there was a way forward that got us new packages AND support. It was decided that PostgreSQL would be the first software package to get upgraded (Python 2.7 is in progress and you can track the status here on our Trello page ). So with the OpenShift online release last week you now have access to Postgresql 9.2.

Using the new Postgresql

Creating an application in PostgreSQL 9.2 is just as easy as before. Here is an example of creating an application called "pg92" that has Python 2.6 for the web language and PostgreSQL 9.2:

rhc app create pg92 python-2.6 postgresql-9.2

If you already have an application, again named pg92, and it doesn't have a database you can also add it as a cartridge in the following manner:

rhc cartridge add postgresql-9.2 pg92

That's it.

Please note, if you are in the git repository for your application on your local machine, you can omit the application name when adding a cartridge. The new rhc command line tools are aware of the application based upon the git repository.

Migrating your existing 8.4 data to 9.2

As always, when migrating between major releases in PostgreSQL (version 8.x to 9.x) you need to dump and reimport your data.

Here are the steps to migrate your data to a newer version of Postgresql within the same application:

Do a Database dump to your $OPENSHIFT_DATA_DIR

On the application with the PostgreSQL 8.4 cartridge execute the following command

pg_dumpall > $OPENSHIFT_DATA_DIR/eightfourbackup

This will export the data from your database into a file called eightfourbackup that can be imported into your 9.2 version of the database.

[Optional] Download the data to your local machine

The only reason to do this is in case you want a backup of your data or if you are going to use your data in a completely new application.

Remove the Postgresql-8.4 cartridge

This is as simple as:

rhc cartridge remove postgresql-84 <appname>

Now add in the Postgresq-9.2 cartridge

rhc cartridge add postgresql-92 <appname>

Now reimport the data

If this is a new application you will need to upload the file you dumped from before to your $OPENSHIFT_DATA_DIR in your new application. If you just upgraded the cartridge then your data file should still be in place.

To restore the data just execute the following command:

psql -f $OPENSHIFT_DATA_DIR/eightfourbackup postgres

Finally, if needed, update your application to use the new username and password from the new cartridge.

At this point you should have a functional 9.2 database .

Cool new features I like

Here are the pages that show you the new features for PostgreSQL 9.0, PostgreSQL 9.1, and PostgreSQL 9.2 with a lot of information to cover. There is also a nice feature matrix showing what features are available in which versions. Today, I am just going to pick out some of my favorite features to show off here.

With 9.2 you actually have two NoSQL data types - Hstore (key-value) and Json (storing json). There are a few discussion items about the use cases for each. Let's go ahead and start with JSON.

Native JSON document stores

With the JSON native store you can store and query against entire JSON documents stored in a column in your table.

Why use it

One of the main use cases is if you are talking to a client that already sends and receives JSON. The json type actually parses the submitted json for validity. The drawback is that you can't query into the JSON you store. It is basically a JSON store with validty checking. It is nice when you want flexibility for the document you store with each row. For example you could have table of products with just an ID and a JSON document. Inside the document you can store different "types" as neccessary by item. Going to steal from the following nice article on JSON.

How to create

CREATE TABLE products (product_id serial, facts json);

How to insert

INSERT INTO products (facts) VALUES ('{"name":"sword","buy":"500","sell":"200","description":"basic sword","attack":"10"}');

How to query

SELECT * FROM products;
 
product_id | facts
----+--------------------------------------------------------------------------------------
1 | {"name":"sword","buy":"500","sell":"200","description":"basic sword","attack":"10"}
2 | {"name":"shield","buy":"200","sell":"80","description":"basic shield","defense":"7"}
(2 rows)

There is also a row_to_json() function that can take a row in a database and cast it to json. The article above gives several examples but the easiest use is to just return the whole row. For example if we had a table people

CREATE TABLE people
(
  id serial NOT NULL,
  name character varying,
  age integer,
  CONSTRAINT firstpk PRIMARY KEY (id)
)

we would just call

SELECT row_to_json(people) FROM people;

and we get nice JSON returned - perfect for your mobile application.

Links

Data type documentation

Hstore extension

The hstore extension can also turn a column into a key value store. The key and values will all be treated as text, but you then store arbitrary amounts of key and values per row. As usual with a key-value storate, each key must be unique per row.

The ability to use key-value stores is an extension, so to enable it, as an owner of a DB, you need to execute the following command in your database:

CREATE EXTENSION hstore;

Once that is done you are all set to go for using it.

Why use it

Both JSON and HSTORE provide you a flexible method to store arbitrary values in your PostgreSQL tables. Unlike JSON, HSTORE has the ability to index the keys and values. This can actually give you very fast performance on your queries.

How to create

Remember to first create the extension in your database.

After that you can add an hstore column just like any other column:

CREATE TABLE products (product_id serial, facts hstore);

How to use insert

The most common insertion looks like this:

INSERT INTO products(facts) VALUES ('author=>Dave, date=>"Dec 2012", price=>"500", currency=>"dollar"');

Notice we have inserted multiple key=>values into one record in the database.

How to query

You can query by keys (where there is key with the string 'pages'):

SELECT id from FROM products WHERE facts ? 'pages';

or by values (where there is a value of pages > 100):

SELECT id from FROM products WHERE (facts-> 'pages' )::integer > 100;

You can also cast normal columns into an hstore as well, just like with JSON:

SELECT hstore(people) FROM people;

Links

The main documentation on hstore in postgresql

A nice introduction from the fine folks at Heroku

Range data types

A range data column allows you to store a range of values (dates, integers...) in the actual column. There is a great introduction that talks about the uses of ranges.

Here is a list of the range types:

-  4-byte integer range, int4range
-  8-byte integer range, int8range
-  numeric range, numrange
-  range of timestamp without timezone, tsrange
-  range of timestamp with timezone, tstzrange
-  range of date, daterange

Why use it

It is perfect for storing things like dates for the beginning and end of a record or the maximum and minimum values that a object obtains. With the range in place you can then query for all the records where a range includes a test value.

How to create

You just declare your data type to be one of the types above and you are good to go. Let's create a table for a car reservation system (mulit-day rentals) with a reservations table:

CREATE TABLE reservations
(
  id serial NOT NULL,
  name character varying,
  reservedtime daterange,
  CONSTRAINT res_pk PRIMARY KEY (id)
)

How to insert

INSERT INTO reservations  (name, reservedtime) VALUES ('Ferrari', '[2013-09-15, 2013-09-16)');

How to query

To find any cars being used during a time range

SELECT * FROM reservations WHERE reservedtime && '[2013-08-16, 2013-09-16 )';

Or if you wanted to see if a car is not being used on a certain day

SELECT * FROM reservations WHERE NOT reservedtime @> '[2013-08-16, 2013-08-17)';

Links

Here is the documentation for range types

Documentation for range operators

Unlogged tables

Unlogged tables provide tables that have no write-ahead logging so records are written straight into the tables. The benefit is an large increase in write speed but the drawback is that if something goes wrong before the records are written to disk the unwritten records will be lost.

Why use it

They can give you performance improvements over normal tables at the expense of data security. If there is a crash your table will be truncated. The degree of speed up depends on your use case. In simple cases I was seeing about 10% speed improvement on OpenShift but I would love to hear other tales form the field.

How to create

All you do is add the UNLOGGED to the table create statement:

CREATE UNLOGGED table testu (a int);

Links

The announcement of the feature

Read about it in the create table syntax

More discussion on unlogged tables versus temporary tables

Features still being worked on:

Extensions

With the arrival of 9.1, PostgreSQL enabled extensions, a way to add functionality to your database without having to compile the extension. An extension is placed into the shared library space for the PostgreSQL install and then database creators can add the extensions at database creation time. Given the way extensions are currently implemented by PostgreSQL, database users can not install extensions . Therefore on OpenShift, the systems administrators need to compile and add them to PostgreSQL.

Here are the extensions that are already installed:

SELECT name, default_version,installed_version FROM pg_available_extensions order by name;
 
        name        | default_version | installed_version 
--------------------+-----------------+-------------------
 adminpack          | 1.0             | 
 autoinc            | 1.0             | 
 btree_gin          | 1.0             | 
 btree_gist         | 1.0             | 
 chkpass            | 1.0             | 
 citext             | 1.0             | 
 cube               | 1.0             | 
 dblink             | 1.0             | 
 dict_int           | 1.0             | 
 dict_xsyn          | 1.0             | 
 earthdistance      | 1.0             | 
 file_fdw           | 1.0             | 
 fuzzystrmatch      | 1.0             | 
 hstore             | 1.1             | 1.1
 insert_username    | 1.0             | 
 intagg             | 1.0             | 
 intarray           | 1.0             | 
 isn                | 1.0             | 
 lo                 | 1.0             | 
 ltree              | 1.0             | 
 moddatetime        | 1.0             | 
 pageinspect        | 1.0             | 
 pg_buffercache     | 1.0             | 
 pg_freespacemap    | 1.0             | 
 pg_stat_statements | 1.1             | 
 pg_trgm            | 1.0             | 
 pgcrypto           | 1.0             | 
 pgrowlocks         | 1.0             | 
 pgstattuple        | 1.0             | 
 plperl             | 1.0             | 
 plperlu            | 1.0             | 
 plpgsql            | 1.0             | 1.0
(47 rows)

We realize these are not all the extensions people are interested in using.

PostGIS 2.0.x

In talking to the people at PostgreSQL and from our own feedback, we know that PostGIS is the most widely used extension not on our current list. PostGIS brings advanced spatial functionality to PostgreSQL. Never fear though maxamillion aka Adam Miller has packaged up PostGIS with all of it's crazy build dependecies and it should be available in the next release. I will have a blog post to follow up on that release. If you want to follow the progress you can always look at the Trello card for this development story.

Are there any other extensions, particularly in the FDW extensions, that you would like to see installed.

phpPgAdmin

Another piece we are still missing is a web interface to manage the PostgreSQL database, which usually takes the form of phpPGAdmin. More of the OpenShift engineering team came ot the rescue. Hiro Ashari took his own personal time and went ahead and built phpPgAdmin. We are working it through the build and QA process and should have it available in the next few releases.

Next steps

Given this great new functionality I feel like the first step is to spin up an application with PostgreSQL and find your favorite feature. Then give us some feedback on our implementation or show us the great new app you built.

Thanks for the post -- I just upgraded from 8.4 to 9.2 on one of my gears, and it worked great, w/ one additional step. I had to stop my app (restarting wasn't enough) and start it up again before the ENV variables for the DB worked properly.