First Steps with Titan using Rexster and Scala

titan-logo

Titan is a distributed graph database that runs on top of Cassandra or HBase to achieve both massive data scale and fast graph traversal queries. There are benefits to Titan on only a single server and it seamlessly scales up from there. It’s great to know that Titan scales but when first starting out you may just need it on a single server, either for local development or powering a small production application. However, there are so many Titan deployment options and associated tools & technologies that it can be difficult to know where to get started.

titan1

This post assumes the standard architecture for a web application: a database running on a server that is remote from where the application runs. Therefore the application needs to communicate with the database over the network. The application could be a web site, a RESTful web service, or any of a variety of different services. The main point is that the database and the application exist on different servers.

Since Scala is the language of choice at Pongr, we’ll also be writing code in Scala and managing the project with sbt.

In addition to the application communicating with the database, we would also like to have interactive command-line access to send ad-hoc commands and queries to the database. This is incredibly useful to test out queries and make sure the application is storing the correct data.

We will accomplish the goals above using the following approach:

The example application and more details are available at https://github.com/zcox/rexster-titan-scala.

Titan Server

titan2

Titan Server provides a very convenient all-in-one package to get Titan up-and-running quickly. Cassandra is used as the underlying data storage. Titan provides graph database functionality on top of Cassandra. Rexster exposes the Titan graph to remote applications via the network. All three of these systems run within the same JVM so calls between them are performant.

These simple commands download Titan Server 0.3.1 and fire it up:

wget http://s3.thinkaurelius.com/downloads/titan/titan-cassandra-0.3.1.zip
unzip titan-cassandra-0.3.1.zip
cd titan-cassandra-0.3.1
bin/titan.sh config/titan-server-rexster.xml config/titan-server-cassandra.properties

When everything has started you should notice a process bound to port 8184. This is the RexPro port and it is ready to receive incoming connections.

The provided titan-server-rexster.xml and titan-server-cassandra.properties files contain good basic defaults, but would require modifications for production deployments. You would also want to run titan.sh from something like Upstart as a daemon.

Rexster Console

Next we use the Rexster Console to quickly test out our new Titan server and create a simple graph in it. Here are commands to download and start the console:

wget http://tinkerpop.com/downloads/rexster/rexster-console-2.3.0.zip
unzip rexster-console-2.3.0.zip
cd rexster-console-2.3.0
bin/rexster-console.sh

Inside the shell, you write Gremlin queries to interact with the graph in Titan, much as you would write SQL to interact with a MySQL database via the MySQL command-line client. Let’s create 3 nodes connected by 2 edges.

        (l_(l
(_______( 0 0
(        (-Y-) <woof>
l l-----l l
l l,,   l l,,
opening session [127.0.0.1:8184]
?h for help

rexster[groovy]> g = rexster.getGraph("graph")
==>titangraph[embeddedcassandra:null]
rexster[groovy]> v1 = g.addVertex([name:"Zach"])
==>v[4]
rexster[groovy]> v2 = g.addVertex([name:"Scala"])
==>v[8]
rexster[groovy]> e1 = g.addEdge(v1, v2, "likes", [since: 2009])
==>e[n-4-2F0LaTPQAS][4-likes->8]
rexster[groovy]> v3 = g.addVertex([name:"NOS"])
==>v[12]
rexster[groovy]> e2 = g.addEdge(v1,v3,"likes",[since:2012])
==>e[z-4-2F0LaTPQAS][4-likes->12]
rexster[groovy]> g.commit()
==>null
rexster[groovy]> g.V.name
==>Zach
==>Scala
==>NOS
rexster[groovy]> g.V('name','Zach').out('likes').name
==>Scala
==>NOS
rexster[groovy]> ?q
closing session with Rexster [ip-10-152-185-66.ec2.internal:8184]--> done

Note that after modifying the graph we need to commit the transaction, so our changes are visible to other clients.

RexPro in Scala

Now that we have some simple data in our remote Titan database, let’s write some Scala code to query it. The first step is to add the rexster-protocol dependency to our build.sbt file:

"com.tinkerpop.rexster" % "rexster-protocol" % "2.3.0"

Now we can use the RexsterClientFactory to obtain a RexsterClient instance, and use that to send Gremlin queries to Titan:

import com.tinkerpop.rexster.client.RexsterClientFactory
val client = RexsterClientFactory.open("localhost", "graph")

val names: Seq[String] = client.execute("g.V.name").toSeq
debug("%d names: %s" format (names.size, names.mkString("[", ",", "]")))

val zachLikes: Seq[String] = client.execute("g.V('name',name).out('likes').name", Map("name" -> "Zach")).toSeq
debug("Zach likes %d things: %s" format (zachLikes.size, zachLikes.mkString("[", ",", "]")))

client.close()

Note that the raw Gremlin queries are defined in Strings. This may seem similar to old school JDBC but is currently the way to do things using RexPro on the JVM. Rexster does provide a way to write code using the Gremlin API directly, called Extensions. An Extension runs on the Titan/Rexster server, so it has no remote communication with Titan. The Extension is then available to client-side code via Rexster. There are also several “Object-Graph Mapper” libraries available such as Bulbs and Thunderdome that allow you to write client-side code at a higher-level than Gremlin queries in Strings. I’d really like to experiment with such an approach using Scala, and will definitely write a follow-up blog post with more details and options for client-side use of Rexster.

Also note that the Scala compiler needs a few hints as to how to handle the type returned from execute(). This is common when the Java and Scala type systems collide, and would best be encapsulated in a Scala RexsterClient adapter.

You can run the example app and see results of the queries:

$ sbt run
[info] Set current project to rexster-titan-scala (in build file:/home/zcox/dev/rexster-titan-scala/)
[info] Running com.pongr.Main 
2013-05-14 16:54:53,293 INFO  c.t.r.client.RexsterClientFactory - Create RexsterClient instance: [hostname=localhost
graph-name=graph
port=8184
timeout-connection-ms=8000
timeout-write-ms=4000
timeout-read-ms=16000
max-async-write-queue-size=512000
message-retry-count=16
message-retry-wait-ms=50
language=groovy
graph-obj-name=g
transaction=true
channel=2]
2013-05-14 16:54:53,925 DEBUG com.pongr.Main$ - 3 names: [Zach,Scala,NOS]
2013-05-14 16:54:54,004 DEBUG com.pongr.Main$ - Zach likes 2 things: [Scala,NOS]
[success] Total time: 4 s, completed May 14, 2013 4:54:54 PM

EC2

While the above walkthrough ran everything locally, this also works remotely on two separate EC2 instances using a few modifications to support the app and Titan on different servers:

  • Make sure the server instance has port 8184 open to the client instance
  • Titan Server: use private IP as <server-host> in titan-server-rexster.xml
  • Rexster Console: bin/rexster-console.sh -rh [internal-hostname of titan-server]
  • Scala code: Replace localhost in src/main/scala/main.scala with internal-hostname of titan-server

Next Steps

One of Titan’s major features is vertex-centric indexes, so we would definitely want to set those up either via the Rexster console or the RexPro client. Titan also supports external indexes which would also be very valuable to many applications.

Since Titan scales out for high-availability and data size, it would be useful to know how that affects both Rexster Console as well as the RexPro client in application code.

From an operations perspective, the stock Titan Server configs need some adjustment for production use. For example, by default the data is stored in /tmp, which you would definitely want to relocate, perhaps to a mounted EBS volume. Automated, periodic backups to S3 would also be advised, as would a proper Upstart script and perhaps a Titan Server Debian package.

Conclusion

Hopefully this blog post has shown how to easily get started using a remote Titan server both from an interactive shell as well as application code in Scala. We’ve only scratched the surface though, so be sure to read all of the linked documentation.

Advertisements

11 thoughts on “First Steps with Titan using Rexster and Scala

  1. Hi,

    Great article. Would you be able to shed some light on way you went with Titan instead of more popular Neo4j? I am trying to make a decision between neo4j and titan and would appreciate any input

    Thanks

    • Neo4j is a great graph database and is a good choice for many applications. Titan is interesting to me for a few reasons: its vertex-centric indexes can greatly improve query performance, it can use ElasticSearch for external indexes to quickly find vertexes to start queries at, it scales-out simply thanks to Cassandra, and integrates tightly with Faunus for global graph processing via map-reduce.

  2. Your article helped me a lot. Thanks~ I have a question about client modue for rexster server.
    RexPro client(RexsterClient class) module only provides synchronous style API. Calling thread blocks during network IO. Blocking database call doesn’t fit with scala style programming (especially Play Framework).
    I want to use aynchronous style RexPro client API that fits well with non-blocking style web framework(Play Framework, Node.js, etc.) Do you have any idea or information about asynchronous Rexster client database driver for Java or Scala? Thanks you for your generous sharing of knowledges for Titan 😀

    • Currently there are no async Rexster clients. I wish one existed. That would also fit well with the rest of our architecture (we use Akka and Spray). At this point, you’ll have to create such a client yourself. 🙂 We may very well create one too in the near future.

  3. Are you really sure about this: “Note that after modifying the graph we need to commit the transaction, so our changes are visible to other clients.” If I leave the commit out and terminate the console and the server and restart everything again my changes are still present.

    • I have a feeling that when you terminate the console, a shutdown handler is closing the graph which commits the open transaction (or something like that). Try this: open 2 separate consoles, perform the operations in the 1st console but don’t close the tx, then query the graph in the 2nd console. Does the 2nd console see changes from the 1st console while that tx is still open?

      If you see any problems, I would highly recommend posting on the Titan mailing list (https://groups.google.com/forum/#!forum/aureliusgraphs) they are very responsive.

  4. Hi Zack,

    Thank you for the excellent post. Would you mind sharing on how to use the Akka and Spray to connect and query the Titan server.

  5. I’m trying to set up an application that’s going to be doing very frequent updates to a very large graph, and looking at your example above, I’m struggling with how to set up transactions properly. For example, nodes will have unique keys, so I need to define an operation to 1) create two nodes if they don’t already exist, 2) create an edge between them if it doesn’t exist, and 3) increment a property of the edge. i’m kind of at a loss as to how to go about this. can i define an entire gremlin script including transaction boundaries and exception handling to run remotely on the rexster server and run it using client.execute? how on earth is this performant?

  6. Pingback: Rexster and Scala | Tales from a Trading Desk

  7. Pingback: Scala – Building a Rexster API With Spray | The Security Diaries

  8. Hi…I have two questions:
    1) Which is the difference between rexsterPro and rexsterGraph??
    2) can I do multiline queries?..for instance declare a vertex variable and then in other line add the edge?

    thanks so much for the article…is hard found documentation about tinkergraph components (compared to neo4j)…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s