TMP Part 2 -- Using Clojure Tree with ActiveRecord

Tree structure

In the first article we introduced the Team Management Project (TMP) as the start of this series. Today we will take a look at tree structures, Closure Tree library and will lay out the basic ideas for a permission system.

Tree structures

One of the basic structures in IT, and generally in the world, is the tree structure. You can see it everywhere, from biology and scientific classification, to social hierarchies - even company structures are commonly organized hierarchically, i.e. in a tree.

Due to the fundamental nature of this structure, developers often have a need to store hierarchical data.

NoSQL

NoSQL databases are often considered great fit for these use-cases, because they simplify the storage of hierarchical data. Let's take MongoDB as an example of a NoSQL database. MongoDB is a Document Oriented database, that means it does not care about relations between (or among) entities, it just stores an entity as a document with a free internal structure. The relations are handled by the application framework or manually by the developer.

People tend to use the simplest solution without really thinking about the consequences. MongoDB has a free structure (JSON-like) with the ability to embed documents. Hmm, that sounds like a nice way to store hierarchies and it will just work. Let's see an example on article and comments.

{ "title": "TMP - part 2",
  "content": "...",
  "comments": [
    { "title": "Great",
      "content": "Great article",
      "comments": [
        { "title": "Are you joking?",
          "content": "This article is just ****. How can you like it?",
          "comments": []
        }
      ]
    }
  ]
}

In this example we have one article, with one direct comment and one response on the direct comment. Simple. In one shot I can get all the comments. However, than the problems arise? How to paginate the comments? How to efficiently insert nested comments? And many more problems.

So, let's take a look at the official documentation Model tree structures in MongoDB. There we see, that the MongoDB team encourages people to use the same techniques we know from relational databases. Funny, isn't it?

Tree structures

There is a lot of possible ways to store tree structures. The most well known and used (in regard to databases) are

Adjacency list

This is the most basic solution. Every entity has a reference to the parent node. When this reference is NULL, the entity is considered root.

+------------------+
| id       INT     |
| name     VARCHAR |
| parent   INT     |
+------------------+
 
+--------------------+
| Id | Name | Parent |
+--------------------+
| 1  | Bob  | NULL   |
| 2  | Tom  | 1      |
| 3  | Jim  | NULL   |
| 4  | Kim  | 2      |
+--------------------+

In this example, Kim works for Tom and Tom works for Bob, Jim works alone. It is a very simple, and very naive solution.

The problem comes, when I want to query more levels in the tree than one. And that is pretty often.

Materialized path

Is an extension to the previous one (may not be, but I see it that way). Instead of storing the link to the parent, it stores the path in the whole tree.

+------------------+
| id       INT     |
| name     VARCHAR |
| path     VARCHAR |
+------------------+
 
+--------------------+
| Id | Name | Parent |
+--------------------+
| 1  | Bob  | 1,     |
| 2  | Tom  | 1,2,   |
| 3  | Jim  | 3,     |
| 4  | Kim  | 1,2,4, |
+--------------------+

Using string matching it is possible to do multi-level tree queries. However, we store the relations as a string which will be inefficient for large datasets. On the other hand, it's human-readable.

Nested set / Pre-ordered traversal tree

Very popular structure used to store hierarchical data in a relational database. The idea can be nicely illustrated on an XML file

<Bob> (1)
  <Tom> (2)
    <Kim> (3)
    </Kim> (4)
  </Tom> (5)
</Bob> (6)
 
<Jim> (7)
</Jim> (8)

This looks like our hierarchy from previous examples. The numbers in brackets are helpers used by Nested set approach to determine the position in a tree. When we talk about an opening tag, it's called Left, and whenever we talk about closing tag it's called Right. Let's get back to SQL

+------------------+
| id       INT     |
| name     VARCHAR |
| left     INT     |
| right    INT     |
+------------------+
 
+--------------------------+
| Id | Name | Left | Right |
+--------------------------+
| 1  | Bob  | 1    | 6     |
| 2  | Tom  | 2    | 5     |
| 3  | Jim  | 7    | 8     |
| 4  | Kim  | 3    | 4     |
+--------------------------+

This a solid solution for not-too-often-changed data. Whenever we modify the data set, it's required to recalculate the structures for consistency. It's simpler in case the set has only one root, with multiple root, it get more challenging.

This structure allows complex queries. If I want all descendants from Bob, I just query add condition to my query "WHERE left > 1 AND right < 6".

Closure table

The most complex one a the one "Closure Tree" library is using. For every table with tree structure create a new one, that stores the link from each node to itself and to all descendants, every link also contains the length, i.e. over how many nodes you get to this descendant. Again let's use the same dataset from previous examples

+------------------+
| id       INT     |
| name     VARCHAR |
| parent   INT     |
+------------------+
 
+--------------------+
| Id | Name | Parent |
+--------------------+
| 1  | Bob  | NULL   |
| 2  | Tom  | 1      |
| 3  | Jim  | NULL   |
| 4  | Kim  | 2      |
+--------------------+
 
+-----------------+
| ancestor    INT |
| descendant  INT |
| length      INT |
+-----------------+
 
+--------------------------------+
| Ancestor | Descendant | Length |
+--------------------------------+
| 1        | 1          | 0      |
| 1        | 2          | 1      |
| 1        | 4          | 2      |
| 2        | 2          | 0      |
| 2        | 4          | 1      |
| 3        | 3          | 0      |
| 4        | 4          | 0      |
+--------------------------------+

This method allows all the complex queries, and does not add such a huge overhead in case of modifications in the dataset. Also it is possible to utilize referential integrity on the data.

For more information I would recommend to use Google, there is some reading at the bottom, however this introduction is intended just for the purpose of showing basic information on the methods, not deep-diving in each of those.

Closure Tree

Now you understand how the hierarchical data is being stored in the tables, let's take a look at the Closure Tree library that uses Closure Table approach and provides simple methods to handle hierarchies in a Ruby-way.

Add closure_tree to your Gemfile:

gem 'closure_tree'

And install the gem:

bundle install

Call the acts_as_tree method on the model that should be tree-enabled

class Role < ActiveRecord::Base
 
  acts_as_tree
 
end

Generate a new migration to add parent_id to the model, and to create new a table and index for the hierarchies:

class CreateRoleHierarchyDependencies < ActiveRecord::Migration
 
  def change
    add_column :roles, :parent_id, :integer
 
    create_table :role_hierarchies, :id => false do |t|
      t.integer  :ancestor_id, :null => false
      t.integer  :descendant_id, :null => false
      t.integer  :generations, :null => false
    end
 
    add_index :role_hierarchies, [:ancestor_id, :descendant_id], :unique => true
    add_index :role_hierarchies, [:descendant_id]
  end
 
end

And finally, migrate the database:

rake db:migrate

Basic usage

We will use this Role model to build a simple Role-based system.

First we need to create some Roles. Let's start by creating a new role for Guest - an unauthenticated user

guest = Role.create(:name => 'Guest')

In a second step, we will create a new Role for an authenticated user called User and add it as child to the Guest role

user = Role.create(:name => 'User')
guest.children << user

In the last step let's create a Role for Administrator:

admin = Role.new
admin.name = 'Administrator'
admin.parent = user
admin.save

This is a different approach, where we specify a parent rather than by appending the model to the children collection of a parent. Now our structure looks like this:

    Guest
      ^
    User
      ^
Administrator

Connecting with Users

Let's assume we have n-n relation between Role and User models with the help of UserRole for the mapping. Something like this:

class User < ActiveRecord::Base
 
  has_many :user_roles, :dependent => :destroy
  has_many :roles, :through => :user_roles
 
end
 
class Role < ActiveRecord::Base
 
  acts_as_tree
 
  has_many  :user_roles, :dependent => :destroy
  has_many  :users, :through => :user_roles
 
end

Next, let's make a user an administrator:

user = User.where(...).first
user.roles << Role.where(:name => 'Administrator').first

Now user.roles will include the administrator role. Simple! Now, I guess you are asking why the hell we spent the time with the Closure Tree library. Because it allows us to build inheritance into the Role based system.

Update the User model once again:

class User < ActiveRecord::Base
 
  has_many :user_roles, :dependent => :destroy
  has_many :roles, :through => :user_roles
  has_many :effective_roles, :through => :roles, :source => :self_and_ancestors
 
end

We added a new line. Seems complex? Well, a bit. self_and_ancestors is a relation on Role model provided by Closure Tree library that contains the the role itself and all it's ancestors. As we combined with the existing relation roles it extends the existing relation to include also the inherited roles.

In our case, user.effective_roles will return array of 3 roles "Administrator", "User" and "Guest" - because "User" and "Guest" are inherited from "Administrator". And, it will be done in one single query to the database server. Nice, isn't it?

Conclusion

This was a quick introduction into the tree hierarchies problem and into a elegant solution for ActiveRecord based applications. We also build a basic foundation for a Role-based permission system that allows inheritance of Roles. In the next article we will move further.

Resources

  1. Closure Tree
  2. Models for hierarchical data
  3. Model tree structures in MongoDB