Titan is a distributed graph database that runs on top of Cassandra or HBase to achieve both massive data scale and fast graph traversal queries. There are benefits to Titan on only a single server and it seamlessly scales up from there. It’s great to know that Titan scales but when first starting out you may just need it on a single server, either for local development or powering a small production application. However, there are so many Titan deployment options and associated tools & technologies that it can be difficult to know where to get started.
This post assumes the standard architecture for a web application: a database running on a server that is remote from where the application runs. Therefore the application needs to communicate with the database over the network. The application could be a web site, a RESTful web service, or any of a variety of different services. The main point is that the database and the application exist on different servers.
In addition to the application communicating with the database, we would also like to have interactive command-line access to send ad-hoc commands and queries to the database. This is incredibly useful to test out queries and make sure the application is storing the correct data.
We will accomplish the goals above using the following approach:
- Titan Server: Titan + Cassandra + Rexster as the database server
- Rexster Console – shell access to remote Titan server via Gremlin
- RexPro Java client library – send Gremlin queries to remote Titan server from Java (or Scala)
The example application and more details are available at https://github.com/zcox/rexster-titan-scala.
Titan Server provides a very convenient all-in-one package to get Titan up-and-running quickly. Cassandra is used as the underlying data storage. Titan provides graph database functionality on top of Cassandra. Rexster exposes the Titan graph to remote applications via the network. All three of these systems run within the same JVM so calls between them are performant.
These simple commands download Titan Server 0.3.1 and fire it up:
wget http://s3.thinkaurelius.com/downloads/titan/titan-cassandra-0.3.1.zip unzip titan-cassandra-0.3.1.zip cd titan-cassandra-0.3.1 bin/titan.sh config/titan-server-rexster.xml config/titan-server-cassandra.properties
When everything has started you should notice a process bound to port 8184. This is the RexPro port and it is ready to receive incoming connections.
The provided titan-server-rexster.xml and titan-server-cassandra.properties files contain good basic defaults, but would require modifications for production deployments. You would also want to run titan.sh from something like Upstart as a daemon.
Next we use the Rexster Console to quickly test out our new Titan server and create a simple graph in it. Here are commands to download and start the console:
wget http://tinkerpop.com/downloads/rexster/rexster-console-2.3.0.zip unzip rexster-console-2.3.0.zip cd rexster-console-2.3.0 bin/rexster-console.sh
Inside the shell, you write Gremlin queries to interact with the graph in Titan, much as you would write SQL to interact with a MySQL database via the MySQL command-line client. Let’s create 3 nodes connected by 2 edges.
(l_(l (_______( 0 0 ( (-Y-) <woof> l l-----l l l l,, l l,, opening session [127.0.0.1:8184] ?h for help rexster[groovy]> g = rexster.getGraph("graph") ==>titangraph[embeddedcassandra:null] rexster[groovy]> v1 = g.addVertex([name:"Zach"]) ==>v rexster[groovy]> v2 = g.addVertex([name:"Scala"]) ==>v rexster[groovy]> e1 = g.addEdge(v1, v2, "likes", [since: 2009]) ==>e[n-4-2F0LaTPQAS][4-likes->8] rexster[groovy]> v3 = g.addVertex([name:"NOS"]) ==>v rexster[groovy]> e2 = g.addEdge(v1,v3,"likes",[since:2012]) ==>e[z-4-2F0LaTPQAS][4-likes->12] rexster[groovy]> g.commit() ==>null rexster[groovy]> g.V.name ==>Zach ==>Scala ==>NOS rexster[groovy]> g.V('name','Zach').out('likes').name ==>Scala ==>NOS rexster[groovy]> ?q closing session with Rexster [ip-10-152-185-66.ec2.internal:8184]--> done
Note that after modifying the graph we need to commit the transaction, so our changes are visible to other clients.
RexPro in Scala
"com.tinkerpop.rexster" % "rexster-protocol" % "2.3.0"
Now we can use the RexsterClientFactory to obtain a RexsterClient instance, and use that to send Gremlin queries to Titan:
import com.tinkerpop.rexster.client.RexsterClientFactory val client = RexsterClientFactory.open("localhost", "graph") val names: Seq[String] = client.execute("g.V.name").toSeq debug("%d names: %s" format (names.size, names.mkString("[", ",", "]"))) val zachLikes: Seq[String] = client.execute("g.V('name',name).out('likes').name", Map("name" -> "Zach")).toSeq debug("Zach likes %d things: %s" format (zachLikes.size, zachLikes.mkString("[", ",", "]"))) client.close()
Note that the raw Gremlin queries are defined in Strings. This may seem similar to old school JDBC but is currently the way to do things using RexPro on the JVM. Rexster does provide a way to write code using the Gremlin API directly, called Extensions. An Extension runs on the Titan/Rexster server, so it has no remote communication with Titan. The Extension is then available to client-side code via Rexster. There are also several “Object-Graph Mapper” libraries available such as Bulbs and Thunderdome that allow you to write client-side code at a higher-level than Gremlin queries in Strings. I’d really like to experiment with such an approach using Scala, and will definitely write a follow-up blog post with more details and options for client-side use of Rexster.
Also note that the Scala compiler needs a few hints as to how to handle the type returned from execute(). This is common when the Java and Scala type systems collide, and would best be encapsulated in a Scala RexsterClient adapter.
You can run the example app and see results of the queries:
$ sbt run [info] Set current project to rexster-titan-scala (in build file:/home/zcox/dev/rexster-titan-scala/) [info] Running com.pongr.Main 2013-05-14 16:54:53,293 INFO c.t.r.client.RexsterClientFactory - Create RexsterClient instance: [hostname=localhost graph-name=graph port=8184 timeout-connection-ms=8000 timeout-write-ms=4000 timeout-read-ms=16000 max-async-write-queue-size=512000 message-retry-count=16 message-retry-wait-ms=50 language=groovy graph-obj-name=g transaction=true channel=2] 2013-05-14 16:54:53,925 DEBUG com.pongr.Main$ - 3 names: [Zach,Scala,NOS] 2013-05-14 16:54:54,004 DEBUG com.pongr.Main$ - Zach likes 2 things: [Scala,NOS] [success] Total time: 4 s, completed May 14, 2013 4:54:54 PM
While the above walkthrough ran everything locally, this also works remotely on two separate EC2 instances using a few modifications to support the app and Titan on different servers:
- Make sure the server instance has port 8184 open to the client instance
- Titan Server: use private IP as <server-host> in titan-server-rexster.xml
- Rexster Console: bin/rexster-console.sh -rh [internal-hostname of titan-server]
- Scala code: Replace localhost in src/main/scala/main.scala with internal-hostname of titan-server
One of Titan’s major features is vertex-centric indexes, so we would definitely want to set those up either via the Rexster console or the RexPro client. Titan also supports external indexes which would also be very valuable to many applications.
Since Titan scales out for high-availability and data size, it would be useful to know how that affects both Rexster Console as well as the RexPro client in application code.
From an operations perspective, the stock Titan Server configs need some adjustment for production use. For example, by default the data is stored in /tmp, which you would definitely want to relocate, perhaps to a mounted EBS volume. Automated, periodic backups to S3 would also be advised, as would a proper Upstart script and perhaps a Titan Server Debian package.
Hopefully this blog post has shown how to easily get started using a remote Titan server both from an interactive shell as well as application code in Scala. We’ve only scratched the surface though, so be sure to read all of the linked documentation.