Zach's Blog

Just another WordPress.com weblog

First Steps with Titan using Rexster and Scala

with 10 comments

titan-logo

Titan is a distributed graph database that runs on top of Cassandra or HBase to achieve both massive data scale and fast graph traversal queries. There are benefits to Titan on only a single server and it seamlessly scales up from there. It’s great to know that Titan scales but when first starting out you may just need it on a single server, either for local development or powering a small production application. However, there are so many Titan deployment options and associated tools & technologies that it can be difficult to know where to get started.

titan1

This post assumes the standard architecture for a web application: a database running on a server that is remote from where the application runs. Therefore the application needs to communicate with the database over the network. The application could be a web site, a RESTful web service, or any of a variety of different services. The main point is that the database and the application exist on different servers.

Since Scala is the language of choice at Pongr, we’ll also be writing code in Scala and managing the project with sbt.

In addition to the application communicating with the database, we would also like to have interactive command-line access to send ad-hoc commands and queries to the database. This is incredibly useful to test out queries and make sure the application is storing the correct data.

We will accomplish the goals above using the following approach:

The example application and more details are available at https://github.com/zcox/rexster-titan-scala.

Titan Server

titan2

Titan Server provides a very convenient all-in-one package to get Titan up-and-running quickly. Cassandra is used as the underlying data storage. Titan provides graph database functionality on top of Cassandra. Rexster exposes the Titan graph to remote applications via the network. All three of these systems run within the same JVM so calls between them are performant.

These simple commands download Titan Server 0.3.1 and fire it up:

wget http://s3.thinkaurelius.com/downloads/titan/titan-cassandra-0.3.1.zip
unzip titan-cassandra-0.3.1.zip
cd titan-cassandra-0.3.1
bin/titan.sh config/titan-server-rexster.xml config/titan-server-cassandra.properties

When everything has started you should notice a process bound to port 8184. This is the RexPro port and it is ready to receive incoming connections.

The provided titan-server-rexster.xml and titan-server-cassandra.properties files contain good basic defaults, but would require modifications for production deployments. You would also want to run titan.sh from something like Upstart as a daemon.

Rexster Console

Next we use the Rexster Console to quickly test out our new Titan server and create a simple graph in it. Here are commands to download and start the console:

wget http://tinkerpop.com/downloads/rexster/rexster-console-2.3.0.zip
unzip rexster-console-2.3.0.zip
cd rexster-console-2.3.0
bin/rexster-console.sh

Inside the shell, you write Gremlin queries to interact with the graph in Titan, much as you would write SQL to interact with a MySQL database via the MySQL command-line client. Let’s create 3 nodes connected by 2 edges.

        (l_(l
(_______( 0 0
(        (-Y-) <woof>
l l-----l l
l l,,   l l,,
opening session [127.0.0.1:8184]
?h for help

rexster[groovy]> g = rexster.getGraph("graph")
==>titangraph[embeddedcassandra:null]
rexster[groovy]> v1 = g.addVertex([name:"Zach"])
==>v[4]
rexster[groovy]> v2 = g.addVertex([name:"Scala"])
==>v[8]
rexster[groovy]> e1 = g.addEdge(v1, v2, "likes", [since: 2009])
==>e[n-4-2F0LaTPQAS][4-likes->8]
rexster[groovy]> v3 = g.addVertex([name:"NOS"])
==>v[12]
rexster[groovy]> e2 = g.addEdge(v1,v3,"likes",[since:2012])
==>e[z-4-2F0LaTPQAS][4-likes->12]
rexster[groovy]> g.commit()
==>null
rexster[groovy]> g.V.name
==>Zach
==>Scala
==>NOS
rexster[groovy]> g.V('name','Zach').out('likes').name
==>Scala
==>NOS
rexster[groovy]> ?q
closing session with Rexster [ip-10-152-185-66.ec2.internal:8184]--> done

Note that after modifying the graph we need to commit the transaction, so our changes are visible to other clients.

RexPro in Scala

Now that we have some simple data in our remote Titan database, let’s write some Scala code to query it. The first step is to add the rexster-protocol dependency to our build.sbt file:

"com.tinkerpop.rexster" % "rexster-protocol" % "2.3.0"

Now we can use the RexsterClientFactory to obtain a RexsterClient instance, and use that to send Gremlin queries to Titan:

import com.tinkerpop.rexster.client.RexsterClientFactory
val client = RexsterClientFactory.open("localhost", "graph")

val names: Seq[String] = client.execute("g.V.name").toSeq
debug("%d names: %s" format (names.size, names.mkString("[", ",", "]")))

val zachLikes: Seq[String] = client.execute("g.V('name',name).out('likes').name", Map("name" -> "Zach")).toSeq
debug("Zach likes %d things: %s" format (zachLikes.size, zachLikes.mkString("[", ",", "]")))

client.close()

Note that the raw Gremlin queries are defined in Strings. This may seem similar to old school JDBC but is currently the way to do things using RexPro on the JVM. Rexster does provide a way to write code using the Gremlin API directly, called Extensions. An Extension runs on the Titan/Rexster server, so it has no remote communication with Titan. The Extension is then available to client-side code via Rexster. There are also several “Object-Graph Mapper” libraries available such as Bulbs and Thunderdome that allow you to write client-side code at a higher-level than Gremlin queries in Strings. I’d really like to experiment with such an approach using Scala, and will definitely write a follow-up blog post with more details and options for client-side use of Rexster.

Also note that the Scala compiler needs a few hints as to how to handle the type returned from execute(). This is common when the Java and Scala type systems collide, and would best be encapsulated in a Scala RexsterClient adapter.

You can run the example app and see results of the queries:

$ sbt run
[info] Set current project to rexster-titan-scala (in build file:/home/zcox/dev/rexster-titan-scala/)
[info] Running com.pongr.Main 
2013-05-14 16:54:53,293 INFO  c.t.r.client.RexsterClientFactory - Create RexsterClient instance: [hostname=localhost
graph-name=graph
port=8184
timeout-connection-ms=8000
timeout-write-ms=4000
timeout-read-ms=16000
max-async-write-queue-size=512000
message-retry-count=16
message-retry-wait-ms=50
language=groovy
graph-obj-name=g
transaction=true
channel=2]
2013-05-14 16:54:53,925 DEBUG com.pongr.Main$ - 3 names: [Zach,Scala,NOS]
2013-05-14 16:54:54,004 DEBUG com.pongr.Main$ - Zach likes 2 things: [Scala,NOS]
[success] Total time: 4 s, completed May 14, 2013 4:54:54 PM

EC2

While the above walkthrough ran everything locally, this also works remotely on two separate EC2 instances using a few modifications to support the app and Titan on different servers:

  • Make sure the server instance has port 8184 open to the client instance
  • Titan Server: use private IP as <server-host> in titan-server-rexster.xml
  • Rexster Console: bin/rexster-console.sh -rh [internal-hostname of titan-server]
  • Scala code: Replace localhost in src/main/scala/main.scala with internal-hostname of titan-server

Next Steps

One of Titan’s major features is vertex-centric indexes, so we would definitely want to set those up either via the Rexster console or the RexPro client. Titan also supports external indexes which would also be very valuable to many applications.

Since Titan scales out for high-availability and data size, it would be useful to know how that affects both Rexster Console as well as the RexPro client in application code.

From an operations perspective, the stock Titan Server configs need some adjustment for production use. For example, by default the data is stored in /tmp, which you would definitely want to relocate, perhaps to a mounted EBS volume. Automated, periodic backups to S3 would also be advised, as would a proper Upstart script and perhaps a Titan Server Debian package.

Conclusion

Hopefully this blog post has shown how to easily get started using a remote Titan server both from an interactive shell as well as application code in Scala. We’ve only scratched the surface though, so be sure to read all of the linked documentation.

Written by Zach Cox

May 15, 2013 at 5:42 pm

Posted in Scala

Tagged with , , , , , ,

Hosting a Private Apt Repository on S3

with 2 comments

If you are building Debian packages for your own libraries and applications (that are not open source), you are probably familiar with the pain of managing and hosting your own private Apt repository. This post shows how to  easily manage your own Apt repo and host it on S3 so that only your servers can access it. You will not have to maintain any Apt index files, nor will you have to expose your Apt repo using a web server. While this approach works especially well when you are running everything on EC2, that is not required.

To accomplish this, we will use:

  • reprepro to manage the Apt repo
  • s3cmd to store the Apt repo in a private S3 bucket
  • apt-s3 to use that S3 bucket as a private Apt repo

Note that if you want to provide public access to your Apt repo, you can just set all of the files in S3 to have an anyone-readable ACL. Then anyone can just access the files in the S3 bucket using HTTP. If this is all you need to do, then here is a great blog post showing how to set that up.

However, when you need a private Apt repo to host your company’s important libs & apps, there is no way to have Apt provide the proper request signing for S3 security over HTTP. In this case, we need a custom Apt protocol, which is provided by apt-s3. This protocol lets you specify your AWS access & security keys to access your private Apt repo files.

Warning: apt-s3 is pretty new and you’ll have to choose a fork from github to compile yourself, but it has worked well so far in my tests. The castlabs fork seems to be the most mature at the time of this writing.

Setup

Here is a great blog post describing how to set up reprepro. We’ll just abbreviate some of the commands.

Install reprepro, it should already be in standard Ubuntu repos. Create a new empty directory, create a conf subdir in it, then create a conf/distributions file something like this:

Codename: production
Components: main
Architectures: i386 amd64

The reprepro docs provide full detail on what should go in this file.

Install s3cmd, it should already be in standard Ubuntu repos. Set it up by running:

s3cmd --configure
s3cmd mb s3://my-repo/

The first command will ask you for your AWS keys. The second command creates the S3 bucket. If you already have the S3 bucket then just omit this step.

Publish .deb

Next you’ll need a .deb to put in your new repo. Creating these is outside the scope of this post. Presumably your .deb will either be built by hand, or some Continuous Integration server such as Jenkins will build it for you.

Let’s assume you end up with mysuperapp_1.0_i386.deb. This is all you need to do to add it to the repo:

reprepro -b /path/to/repo includedeb production mysuperapp_1.0_i386.deb

That’s it! Look in your Apt repo dir, you’ll find that reprepro has created tons of stuff for you.

Now let’s get that repo on S3:

s3cmd --verbose --delete-removed --follow-symlinks  sync /path/to/repo/ s3://my-repo/

This command ensures that your S3 bucket contains all of the same files as your local Apt repo, in the proper directory structure. The –delete-removed should be obvious. –follow-symlinks is useful if you are using suites with reprepro. Note that all of the files on S3 are private and only readable/writable by you.

Install .deb

Next you’ll need to compile apt-s3 from source to create your own binary. Make sure you compile it for the architecture on which you’ll eventually use it (i386 for 32-bit, amd64 for 64-bit, etc). I used these commands on a fresh Ubuntu EC2 instance:

sudo apt-get install -y git libapt-pkg-dev libcurl4-openssl-dev make build-essential
git clone https://github.com/castlabs/apt-s3
cd apt-s3
make

You’ll end up with an s3 executable in the src dir. Move this to the /usr/lib/apt/methods dir on the server on which you want to use your S3-based Apt repo.

Next, tell Apt where to find your repo. Add a line to /etc/apt/sources.list (or a file in sources.list.d) like this one, substituting your AWS keys and your S3 bucket name:

deb s3://ACCESS_KEY:[SECRET_KEY]@s3.amazonaws.com/BUCKET production main

Finally, you should be able to tell Apt to find your new repo and install packages from it:

sudo apt-get update
sudo apt-get install mysuperapp

Note that we’ve skipped over signing your .debs and Apt repo, so that last line may give warnings. In a full production setting you’ll probably want to sign these things, but that is left as an exercise to the reader. You’ll also probably want to just build apt-s3 once and make it easily accessible to put on a new server.

And that’s it! The next time you have a new .deb package, just run the reprepro command above to add it to the local repo, then run s3cmd sync as above to update the files in the S3 bucket. If Jenkins is building your .deb files, just have him also perform these commands. Then on your servers, you can just apt-get update and apt-get install mysuperapp to upgrade to the new version.

Written by Zach Cox

August 13, 2012 at 5:40 pm

Posted in Devops

Tagged with , , ,

The Power of Programmatic Builds

leave a comment »

Using a build tool that provides the full power of a real programming language makes many tasks much easier than build tools that use a declarative markup language. To cut to the chase, I’m referring to sbt vs Maven (or Ant). Just wanted to write a quick post that shows how we have customized our sbt builds in two important ways: using different versions of dependencies based on which version of Scala we’re using, and publishing to different snapshot and release repositories depending on the project’s current version.

In the land of Scala, it’s common to publish library jars compiled with multiple versions of Scala (long story). So sbt supports cross-building. In this project’s build.properties file, we specify to compile it with both Scala 2.9.1 and 2.8.1:

build.scala.versions=2.9.1 2.8.1

That’s great and all, but one of the dependencies we want to use, Specs, only goes up to version 1.6.8 with Scala 2.8.1, then switches to version 1.6.9 for Scala 2.9.1. Might seem like a problem, but because we define our build in Scala, we just throw in an if-statement:

class Project(info: ProjectInfo) extends DefaultProject(info) {
  //specs version depends on Scala version
  val specsVersion = if ("2.9.1" == buildScalaVersion) "1.6.9" else "1.6.8"
  val specs = "org.scala-tools.testing" %% "specs" % specsVersion % "test"
}

Run +update on that and you’ll get Specs 1.6.8 for Scala 2.8.1 and Specs 1.6.9 for Scala 2.9.1.

It’s standard practice to use separate Maven repositories for snapshot versions and release versions. Case in point: scala-tools snapshot & release repos. When your project is at version 0.9-SNAPSHOT you should publish its jars to the snapshot repo. Then when version 0.9 is released, publish to the release repo.

Again, because we’re defining the build in Scala, we just do some programming to set this up exactly how we want it:

class Project(info: ProjectInfo) extends DefaultProject(info) {
  //publish destination depends on build version
  override def managedStyle = ManagedStyle.Maven
  def suffix = if (version.toString endsWith "-SNAPSHOT") "snapshots/" else "releases/"
  val publishTo = "Scala Tools Nexus" at "http://nexus.scala-tools.org/content/repositories/" + suffix
  Credentials(Path.userHome / ".ivy2" / ".scala_tools_credentials", log)
}

Pretty simple examples, but the point is that we’re not locked in to the XML elements & attributes that some Maven plugin author exposed to us. With sbt we can do basically whatever we need to in the build file to customize our build to meet project requirements.

Written by Zach Cox

October 24, 2011 at 11:12 am

Posted in Scala

Tagged with ,

Database Actors in Lift

with 2 comments

We’ve been slowly migrating our Lift web app to more of an event-driven architecture. This approach offloads non-essential processing out of the HTTP request/response cycle into actors, and makes things a lot more flexible on the back-end. However, as we’ve discovered during this process, there are several things to be aware of by doing database processing in actors. In this post we’ll examine those problems and present our current solution, which is also demonstrated in an example Lift app and was recently discussed on the Lift mailing list.

Background

The app is very simple: users say things and these quotes show up on the home page. Other users can then “like” the quotes. The most-liked quotes are also shown on the home page.

Let’s take a look at the Like link in more detail. When you click it, a function in a snippet runs on the server-side.

A quote

This function toggles the like status between a Quote and a User. So after you like a quote, the link changes to Unlike so you can take back your like.

    def likeLink(q: Quote): NodeSeq = a(likeLinkText(q), "id" -> likeLinkId(q)) { 
      for (u <- User.currentUser)
        u toggle q
      Replace(likeLinkId(q), likeLink(q))
    }

This toggling makes its way to the QuoteLike meta mapper, to either the like or unlike method. In both cases, after the action is processed we send a message to the QuotePopularity actor.

object QuoteLike extends QuoteLike with LongKeyedMetaMapper[QuoteLike] with Logger {
...
  def like(u: User, q: Quote) = 
    if (!exists_?(u, q)) {
      val ql = QuoteLike.create.user(u).quote(q).saveMe
      debug("User " + u.id.is + " liked Quote " + q.id.is)
      QuotePopularity !<> q
      Full(ql)
    } else 
      Empty

  def unlike(u: User, q: Quote) = 
    for (ql <- find(u, q)) {
      ql.delete_!
      debug("User " + u.id.is + " unliked Quote " + q.id.is)
      QuotePopularity !<> q
    }

  def toggle(u: User, q: Quote) = if (exists_?(u, q)) unlike(u, q) else like(u, q)
}

This actor updates that quote’s popularity. The Popular Quotes section on the home page renders the top 10 quotes by descending popularity.

object QuotePopularity extends DbActor with Logger {
  protected def messageHandler = {
    case q: Quote => 
      val p = q.likeCount
      q.popularity(p).save
      debug("Quote " + q.id.is + " popularity = " + p)
  }
}

While we could easily have done this update in the QuoteLike like and unlike methods, we chose to offload this processing to an actor, since it is not essential to the AJAX response after the user clicks the Like/Unlike link. It’s a simple calculation in this example app, but imagine a more complex app where a lot of number-crunching must take place to determine trending items (*cough*Pongr*ahem*). We don’t want the AJAX response delayed, so we let an actor update the popularity sometime later. And the popularity of quotes is then updated & cached for the next home page load.

Problem

While this is a very common use case for actors (asynchronous “later” processing), what’s not immediately obvious is the dreaded database transaction. It’s standard in Lift to wrap every HTTP request inside a transaction. This is configured in Boot:

S.addAround(DB.buildLoanWrapper)

So at the end of our AJAX HTTP req/resp cycle due to the user clicking the Like/Unlike link, the new (or deleted) QuoteLike object is committed to the database, and can be read by other parts of our app. So far, so good.

However, by default, Lift actors are not wrapped in database transactions. So as soon as you send that message to the QuotePopularity actor, it may start updating the quote’s popularity. We have no guarantees as to when that actor will execute; it may be immediately in which case it won’t see the new/deleted QuoteLike, or it may happen to be after the QuoteLike is committed.

Another problem occurs if this actor makes changes to the database itself. Since it’s executing outside of a transaction, these changes are committed immediately, leaving some partially updated entities open to discovery by other parts of the app. Definitely not something we want to happen.

Solution

Our approach to solving this problem is the following simple DbActor trait:

trait DbActor extends LiftActor {
  override protected def aroundLoans = List(DB.buildLoanWrapper)

  def !<>(msg: Any): Unit = DB.performPostCommit { this ! msg }
}

We now follow these two best practices for actors that use the database:

  1. Extend DbActor instead of LiftActor
  2. Only send messages to the actor using the !<> method

So #1 ensures that this actor’s messageHandler method is executed inside a transaction. LiftActor has this awesome aroundLoans method, where we can simply wrap the actor in a DB.buildLoanWrapper (just like HTTP requests). Database changes made by the actor will now all be committed when the actor is finished. Our QuotePopularity actor above extends DbActor. Actors executing inside database transactions: check.

The !<> method in #2 ensures that this actor will only execute after the current database transaction commits. Again, Lift comes to the rescue with the DB.performPostCommit method, which does exactly what we want. So the QuotePopularity actor above will only execute after the AJAX HTTP request that creates/deletes a QuoteLike has been committed to the database. This way, we know for sure that the QuotePopularity actor will see the QuoteLike changes. Actors executing after the current transaction commits: check.

Conclusion

So just remember: if your Lift actor does anything with the database, follow the two best practices above. Wrap actor execution in a transaction, and send messages to the actor so that the actor executes after the current transaction commits.

It’s worth pointing out that we’re hard-coding the notification of the QuotePopularity actor above in the like and unlike methods. This is OK, but a better solution would be a generic pub-sub event system, where those methods would publish “quote liked/unliked” events, and QuotePopularity would just subscribe to those events. Similarly, QuotePopularity could publish a “quote popularity updated” event when it’s done, and something else like a comet actor could receive that event and update the Popular Quotes section of the home page. But that’s a topic for another blog post…

Written by Zach Cox

January 18, 2011 at 5:42 pm

Posted in Scala

Tagged with , , ,

Lift + Ostrich

with 6 comments

We’ve been doing some profiling on http://pongr.com recently and have started using Ostrich.  Our site uses Lift and I thought I’d put together a brief tutorial showing how to use Ostrich in a Lift-based web app.  Our approach was heavily inspired by the usage of Ostrich in ESME, is described below and is available on Github.

In this example we’ll use Lift 2.1 and Ostrich 2.2.10.  Update: As Steve commented, there is a newer release of Ostrich (2.3.3 as of this post) but it requires Scala 2.8.1, so to use it you would also need to use Lift 2.2.  You’d also need to add the Twitter maven repo to the sbt project.

Setup

First off, we need to add the JBoss maven repo and the Ostrich dependency to the sbt project:


val jbossRepo = "jboss" at "http://repository.jboss.org/nexus/content/groups/public/"

override def libraryDependencies = Set(
  "net.liftweb" %% "lift-webkit" % liftVersion % "compile->default",
  "net.liftweb" %% "lift-mapper" % liftVersion % "compile->default",
  "org.mortbay.jetty" % "jetty" % "6.1.22" % "test->default",
  "junit" % "junit" % "4.5" % "test->default",
  "org.scala-tools.testing" %% "specs" % "1.6.5" % "test->default",
  "com.h2database" % "h2" % "1.2.138",
  "com.twitter" % "ostrich_2.8.0" % "2.2.10"
  ) ++ super.libraryDependencies

Ostrich will provide stats via HTTP, so we need to define the port it will use in default.props:


#Enable and set port for Ostrich web access (JSON output)
admin_http_port=9990

Ostrich needs to be started when the web app boots up:


// Ostrich setup
val runtime = new RuntimeEnvironment(getClass)
val config = new Config
config("admin_http_port") = Props.getInt("admin_http_port") openOr 9990
ServiceTracker.startAdmin(config, runtime)

As far as configuration goes, that’s it!  Ostrich is now ready to collect data and provide stats.

Counters

The first type of data we’ll collect is the count of something.  Could be number of users registered, number of new blog posts, etc.  For this example we’ll count each time the HelloWorld.howdy snippet is rendered:


def howdy(in: NodeSeq): NodeSeq = {
  Stats.incr("howdy-renders")
  Helpers.bind("b", in, "time" -> date.map(d => Text(d.toString)))
}

Gauges

A gauge is just some value of something at a particular point in time, perhaps a measurement from a sensor or some calculated value.  In our example we’ll use WTFs/min:


import scala.math._
Stats.makeGauge("wtfs-per-min") { rint(random * 10) }

We set up this particular gauge in Boot.scala and just generate random data.  You just name a gauge and provide a function that always returns the current value.

Timings

Ostrich can also record how long it takes to execute certain blocks of code.  Simply wrap the code you want to time in a call to Stats.time.  For extra convenience, it will return whatever your code block returns.

In this example, we suspect our important number calculation is slowing down page loads, so we’ll time it:


def importantNumber(in: NodeSeq): NodeSeq = {
  val n = Stats.time("important-number-calculation") {
    import scala.math._
    Thread.sleep(round(2000*random + 1000))
    7
  }
  <span>{n}</span>
}

Stats

Now that we’re collecting all three types of data, let’s pull some stats from Ostrich.  After loading our home page at http://localhost:8080 several times, we can get the latest stats from http://localhost:9990/stats.

{"counters":{"howdy-renders":7},"timings":{"important-number-calculation":{"count":7,"standard_deviation":621,"p75":3405,"histogram":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"maximum":2878,"p9999":3405,"p90":3405,"p25":2015,"p99":3405,"average":2179,"minimum":1447,"p50":2619,"p999":3405}},"jvm":{"nonheap_committed":46530560,"heap_max":1908932608,"thread_peak_count":54,"heap_committed":120782848,"uptime":436119,"nonheap_max":224395264,"thread_daemon_count":12,"num_cpus":8,"thread_count":54,"nonheap_used":45256464,"start_time":1292277570344,"heap_used":9656904},"gauges":{"wtfs-per-min":5.0}}

By default, Ostrich returns stats in JSON format which is great for parsing for display in another app or viewing with JSONView.  However, for this blog post perhaps the plaintext version at http://localhost:9990/stats.txt is more readable:

counters:
  howdy-renders: 7
gauges:
  wtfs-per-min: 2.0
jvm:
  heap_committed: 120782848
  heap_max: 1908932608
  heap_used: 8911680
  nonheap_committed: 46530560
  nonheap_max: 224395264
  nonheap_used: 45252880
  num_cpus: 8
  start_time: 1292277570344
  thread_count: 54
  thread_daemon_count: 12
  thread_peak_count: 54
  uptime: 430468
timings:
  important-number-calculation: (average=2179, count=7, maximum=2878, minimum=1447, p25=2015, p50=2619, p75=3405, p90=3405, p99=3405, p999=3405, p9999=3405, standard_deviation=621)

We can see that the howdy snippet was rendered 7 times and we’re currently seeing 2 WTFs/min.  We also have collected 7 timings for the important number calculation and we see various stats like min/max/avg, etc.  Now we know precisely how much time is being spent calculating important numbers, and we can choose to optimize if needed.

While not as detailed or comprehensive as a profiling tool like VisualVM, Ostrich is a great, simple tool for collecting performance data in specific parts of your Scala app.  And integration with Lift really could not be easier.

Written by Zach Cox

December 13, 2010 at 6:27 pm

Posted in Scala

Tagged with ,

Jersey + Guice + Scala

with 6 comments

At Pongr, our RESTful web services are built using Jersey, Guice and Scala (among many other technologies). Here’s a quick post that shows how to set up an example project that uses all three. By the end we’ll be able to declare Guice bindings in our own custom module and have them injected into Jersey resource classes.

We’ll manage everything with Maven, so first create your pom.xml. I know it looks like a lot, but only has three Jersey dependencies (and one on the Servlet 2.5 API to make things compile). The rest of this junk is mainly configuring plugins for Scala – don’t worry about it. The upshot is that you can a) run this all using mvn jetty:run, and b) can generate an Eclipse project using mvn eclipse:eclipse.

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.pongr</groupId>
    <artifactId>jerseyguicescala</artifactId>
    <version>1.0-SNAPSHOT</version>
    <packaging>jar</packaging>
    <name>jerseyguicescala</name>
    <properties>
        <jersey-version>1.1.1-ea</jersey-version>
    </properties>
    <dependencies>
        <dependency>
            <groupId>com.sun.jersey</groupId>
            <artifactId>jersey-server</artifactId>
            <version>${jersey-version}</version>
        </dependency>
        <dependency>
            <groupId>com.sun.jersey.contribs</groupId>
            <artifactId>jersey-guice</artifactId>
            <version>${jersey-version}</version>
        </dependency>
        <dependency>
            <groupId>com.sun.jersey.contribs</groupId>
            <artifactId>jersey-scala</artifactId>
            <version>${jersey-version}</version>
        </dependency>
        <dependency>
            <groupId>javax.servlet</groupId>
            <artifactId>servlet-api</artifactId>
            <version>2.5</version>
            <scope>provided</scope>
        </dependency>
    </dependencies>
    <build>
        <plugins>
            <!-- Run with "mvn jetty:run" -->
            <plugin>
                <groupId>org.mortbay.jetty</groupId>
                <artifactId>maven-jetty-plugin</artifactId>
                <version>6.1.19</version>
                <configuration>
                    <contextPath>/</contextPath>
                </configuration>
            </plugin>
            <!-- Begin scala plugins, inspired by: http://scala-tools.org/mvnsites/maven-scala-plugin/usage_java.html -->
            <plugin>
                <groupId>org.scala-tools</groupId>
                <artifactId>maven-scala-plugin</artifactId>
                <executions>
                    <execution>
                        <id>scala-compile-first</id>
                        <phase>process-resources</phase>
                        <goals>
                            <goal>add-source</goal>
                            <goal>compile</goal>
                        </goals>
                    </execution>
                    <execution>
                        <id>scala-test-compile</id>
                        <phase>process-test-resources</phase>
                        <goals>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>1.6</source>
                    <target>1.6</target>
                </configuration>
                <executions>
                    <execution>
                        <phase>compile</phase>
                        <goals>
                            <goal>compile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <artifactId>maven-eclipse-plugin</artifactId>
                <configuration>
                    <sourceIncludes>
                        <sourceInclude>**/*.scala</sourceInclude>
                    </sourceIncludes>
                    <buildcommands>
                        <buildcommand>ch.epfl.lamp.sdt.core.scalabuilder</buildcommand>
                    </buildcommands>
                    <additionalProjectnatures>
                        <!-- This nature gets put after org.eclipse.jdt.core.javanature in .project so Eclipse has a J badge on the project instead of an S -->
                        <projectnature>ch.epfl.lamp.sdt.core.scalanature</projectnature>
                    </additionalProjectnatures>
                    <classpathContainers>
                        <classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer>
                        <classpathContainer>ch.epfl.lamp.sdt.launching.SCALA_CONTAINER</classpathContainer>
                    </classpathContainers>
                </configuration>
            </plugin>
            <!-- http://groups.google.com/group/liftweb/browse_thread/thread/3dac7002f9e59546/3918bba2f7a92cd3?pli=1 -->
            <plugin>
                <groupId>org.codehaus.mojo</groupId>
                <artifactId>build-helper-maven-plugin</artifactId>
                <executions>
                    <execution>
                        <id>add-source</id>
                        <phase>generate-sources</phase>
                        <goals>
                            <goal>add-source</goal>
                        </goals>
                        <configuration>
                            <sources>
                                <source>src/main/scala</source>
                            </sources>
                        </configuration>
                    </execution>
                    <execution>
                        <id>add-test-source</id>
                        <phase>generate-sources</phase>
                        <goals>
                            <goal>add-test-source</goal>
                        </goals>
                        <configuration>
                            <sources>
                                <source>src/test/scala</source>
                            </sources>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
            <!-- End scala plugins -->
        </plugins>
    </build>
</project>

Next up, create the src/main/webapp/WEB-INF/web.xml file. This just registers our special GuiceConfig class that, uh, configures Guice.

<?xml version="1.0" encoding="UTF-8"?>
<web-app xmlns="http://java.sun.com/xml/ns/j2ee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://java.sun.com/xml/ns/j2ee http://java.sun.com/xml/ns/j2ee/web-app_2_5.xsd" metadata-complete="false" version="2.5">
    <listener>
        <listener-class>com.pongr.GuiceConfig</listener-class>
    </listener>
    <filter>
        <filter-name>guiceFilter</filter-name>
        <filter-class>com.google.inject.servlet.GuiceFilter</filter-class>
    </filter>
    <filter-mapping>
        <filter-name>guiceFilter</filter-name>
        <url-pattern>/*</url-pattern>
    </filter-mapping>
</web-app>

We’ll create an example Guice module in the src/main/java/com/pongr/ExampleModule.java file. It just binds a message String that we’ll inject into our resource classes later.

package com.pongr;

import com.google.inject.*;
import com.google.inject.name.*;

public class ExampleModule extends AbstractModule
{
    @Override
    protected void configure()
    {
        bindConstant().annotatedWith(Names.named("message")).to("Hello, World!");
    }
}

Next up is src/main/java/com/pongr/GuiceConfig.java where we connect Jersey to Guice. This is where we create the Guice Injector using our ExampleModule and configure any Jersey properties, like the package(s) that contain our resource classes.

package com.pongr;

import java.util.*;

import com.google.inject.*;
import com.google.inject.servlet.*;
import com.sun.jersey.api.core.*;
import com.sun.jersey.guice.spi.container.servlet.*;

public class GuiceConfig extends GuiceServletContextListener
{
    @Override
    protected Injector getInjector()
    {
        final Map<String, String> params = new HashMap<String, String>();
        params.put(PackagesResourceConfig.PROPERTY_PACKAGES, "com.pongr");
        return Guice.createInjector(new ExampleModule(), new ServletModule()
        {
            @Override
            protected void configureServlets()
            {
                serve("/*").with(GuiceContainer.class, params);
            }
        });
    }
}

Now for the fun! This src/main/java/com/pongr/JavaResource.java file gets the message String injected and displays it.

package com.pongr;

import javax.ws.rs.*;
import javax.ws.rs.core.*;

import com.google.inject.*;
import com.google.inject.name.*;

@Path("java")
public class JavaResource
{
    private String message;

    @Inject
    public JavaResource(@Named("message") String message)
    {
        this.message = message;
    }

    @GET
    @Produces(MediaType.TEXT_PLAIN)
    public String get()
    {
        return "From Java: " + message;
    }
}

And for good measure here’s src/main/scala/com/pongr/ScalaResource.scala. Same as JavaResource, except in Scala. The Guice @Inject and @Named annotations can be a bit tricky for constructor injection, so here’s how it’s done.

package com.pongr

import javax.ws.rs._
import javax.ws.rs.core._

import com.google.inject._
import com.google.inject.name._

@Path("scala")
class ScalaResource @Inject() (@Named("message") message: String) {
  @GET
  @Produces(Array(MediaType.TEXT_PLAIN))
  def get(): String = "From Scala: " + message
}

That’s it! Now just run mvn jetty:run on your command line and hit http://localhost:8080/java and http://localhost:8080/scala in your browser to see these resources in action. As always, you can also visit http://localhost:8080/application.wadl to get an overview of the available resources. Enjoy!

Written by Zach Cox

September 22, 2009 at 10:03 pm

Posted in Java, Scala

Tagged with , , , ,

Dynamically Generating Zip Files in Jersey

with one comment

We often need to pull a large number of rows from a database table, split those rows up into n groups, and write each group out to a separate text file.  These text files are then processed by another application.  Each text file starts with the number of rows in the file on the first line, and then contains each row on its own line after that.  It became a pain to generate these files by hand, so I added a new resource to our Jersey-based web service that would generate all of the files and wrap them all up into a single .zip file.  The processing app also used to be run by hand, but it’s now totally automated, so it obtains the .zip file, unzips it, and processes each individual file.

Here is the resource we ended up with, except for this blog I’m just generating a list of 100 random strings instead of pulling rows from a real database:

import java.io.*;
import java.util.*;
import java.util.zip.*;

import javax.ws.rs.*;

@Path("/files")
public class FilesResource
{
    @GET
    @Produces("application/x-zip-compressed")
    public InputStream getZipFile(@QueryParam("per_file") @DefaultValue("25") final int perFile)
            throws IOException
    {
        //we write to the PipedOutputStream
        //that data is then available in the PipedInputStream which we return
        final PipedOutputStream sink = new PipedOutputStream();
        PipedInputStream source = new PipedInputStream(sink);

        //apparently we need to write to the PipedOutputStream in a separate thread
        Runnable runnable = new Runnable()
        {
            public void run()
            {
                //PrintStream => BufferedOutputStream => ZipOutputStream => PipedOutputStream
                ZipOutputStream zip = new ZipOutputStream(sink);
                PrintStream writer = new PrintStream(new BufferedOutputStream(zip));

                try
                {
                    //break the strings up into multiple files
                    List<String> strings = getStrings();
                    int stringCount = strings.size();
                    int fileCount = (int) Math.ceil((double) stringCount / (double) perFile);
                    for (int file = 0; file < fileCount; file++)
                    {
                        zip.putNextEntry(new ZipEntry("file" + (file + 1) + ".txt"));
                        int first = file * perFile;
                        int last = Math.min((file + 1) * perFile, stringCount);
                        int imagesInFile = last - first;
                        writer.println(imagesInFile);
                        for (int i = first; i < last; i++)
                            writer.println(strings.get(i));
                        writer.flush();
                        zip.closeEntry();
                    }

                    //also include a single file with all strings
                    writer.println(stringCount);
                    zip.putNextEntry(new ZipEntry("file.txt"));
                    for (int i = 0; i < stringCount; i++)
                        writer.println(strings.get(i));
                    writer.flush();
                    zip.closeEntry();
                }
                catch (IOException e)
                {
                }
                writer.flush();
                writer.close();
            }
        };
        Thread writerThread = new Thread(runnable, "FileGenerator");
        writerThread.start();

        return source;
    }

    private List<String> getStrings()
    {
        List<String> strings = new ArrayList<String>();
        for (int i = 0; i < 100; i++)
            strings.add(String.valueOf(Math.random()));
        return strings;
    }
}

The getZipFile() method will be called in response to a GET /files request.  I also registered the .zip URI extension so it will also respond to GET /files.zip.  By default it will split up the strings into groups of 25, but the per_file query parameter can be specified in the request to change that, like GET /files.zip?per_file=50 to split them into groups of 50.

The easiest way to create a .zip file in Java is using ZipOutputStream.  Once you have that created you call its putNextEntry() method to start a new file within the .zip file.  We also wrapped the ZipOutputStream in a PrintStream, and write the contents of the text files by calling println() on the PrintStream (there’s also a BufferedOutputStream in there for good measure).

Jersey is smart enough to read the data from an InputStream and use it as the HTTP response body, so ultimately we need to return an InputStream.  By hooking the ZipOutputStream up to a PipedOutputStream, and then connecting the PipedOutputStream to a PipedInputStream, Jersey can read this all of this zip file data from the PipedInputStream.  It’s best to write-to and read-from these piped streams in different threads so we start a new writing thread and return the PipedInputStream right away.

This ends up working perfectly: the resource at http://site.com/files.zip feels like a static zip file, but is really generated dynamically on every request.  It also can now be accessed overy HTTP from any machine and processed automatically.  If it was really expensive to generate we could cache it and re-generate only when one of the rows in the database changed, but for our purposes it’s a cheap .zip file to generate.

Written by Zach Cox

August 26, 2009 at 7:43 am

Posted in Java

Tagged with ,

Follow

Get every new post delivered to your Inbox.