CPU Resources in Docker, Mesos and Marathon

We’ve been learning a lot about Docker, Mesos and Marathon lately at Banno and have big plans to use these technologies in our infrastructure. Mesos will let us treat all of our servers as one uniform pool of resources, on which we’ll run our applications packaged into convenient Docker containers, with Marathon figuring out the details of running these applications. Many of our applications are “reactive“, built on tools like Akka and Spray, and are thus inherently multi-threaded by nature, able to spread computations across many CPUs concurrently to speed up certain operations.

When you tell Marathon to create an application for you on the Mesos cluster, you include a cpus parameter which is typically a number like 0.1 or 3. I was very curious what this parameter controlled exactly, but was unable to find it documented anywhere. It seems quite important, especially since our reactive apps will use as many CPUs as they are given. In this blog post, we’ll dig down into this cpus parameter and learn about CPU resources in Docker, Mesos and Marathon.

Note: these are very fast-moving projects and the information in the post may become outdated rather quickly. It should be accurate as of Docker 1.2.0, Mesos 0.20.0 and Marathon 0.7.0.

Our goal will be to answer the following questions:

  • What does Marathon’s cpu setting actually mean? What does 0.1 cpus, or 2 cpus mean?
  • How many CPUs does a process running in a Docker container on a Mesos slave think it has?
  • How does this relate to total CPUs on the Mesos slave?
  • How do processes running in separate Docker containers on the same Mesos slave interact/interfere/share the machine’s CPUs?

First off, we need a Mesos cluster running Marathon. My colleague Nic Grayson created a great project that will run a local Vagrant box with Zookeeper, Mesos master, Mesos slave, Marathon and Docker on it, so that’s what I will use in this post. Currently it’s a private project, but hopefully we can open source it in the future. Mesosphere also provides convenient tools to set up similar clusters on AWS and Google Cloud Platform.

I’m running all of this on a quad-core MacBook Pro, and I gave the Vagrant box all 8 CPUs and 4GB memory. As a baseline, let’s check out the resources on our Mesos slave machine. Here’s what the Mesos web UI at http://192.168.22.22:5050 shows for our cluster resources:

Screen Shot 2014-09-17 at 9.27.08 AM

If I ssh in to this VM and run htop, we see 8 CPUs:

Screen Shot 2014-09-17 at 9.31.27 AM

Also /proc/cpuinfo shows 8 CPUs:

vagrant@all-in-one-1404:~$ grep processor /proc/cpuinfo
processor : 0
processor : 1
processor : 2
processor : 3
processor : 4
processor : 5
processor : 6
processor : 7

So our Mesos slave machine has 8 CPUs.

Now let’s get a Docker container running in Mesos, then get a shell inside that container and poke around. The following curl will create an app in Marathon using an Ubuntu Docker image that just echoes “hello world” forever:

curl -X POST -H "Content-Type: application/json" http://192.168.22.22:8080/v2/apps -d@helloworld.json

Here are the contents of helloworld.json. Note that we request 0.1 cpus for this application.

{
    "id": "helloworld",
    "container": {
        "docker": {
            "image": "ubuntu:14.04"
        },
        "type": "DOCKER",
        "volumes": []
    },
    "cmd": "while true; do echo hello world; sleep 1; done",
    "cpus": 0.1,
    "mem": 32.0,
    "instances": 1
}

It may take a few minutes to pull that ubuntu:14.04 Docker image, but eventually Mesos will run the Docker container. You can see it by running sudo docker ps on the Mesos slave.

To get a shell inside this container, we’ll use the excellent nsenter tool. Then we’ll examine the CPU resources available to the container:

vagrant@all-in-one-1404:~$ sudo docker ps
CONTAINER ID        IMAGE               COMMAND                CREATED             STATUS              PORTS               NAMES
8c67cd81d13a        ubuntu:14.04        "/bin/sh -c 'while t   3 minutes ago       Up 3 minutes                            mesos-5e500b93-48b1-4e1f-b87c-2f84adc4b46e
vagrant@all-in-one-1404:~$ sudo docker-enter 8c67cd81d13a bash
root@all-in-one-1404:/# grep processor /proc/cpuinfo
processor : 0
processor : 1
processor : 2
processor : 3
processor : 4
processor : 5
processor : 6
processor : 7

So far, all evidence suggests that this container has access to all 8 of the Mesos slave’s CPUs, even though we only requested 0.1.

Next let’s go a step further and see what an application running in the JVM sees for CPUs. I wrote a simple Scala application that just repeatedly prints the number of available processors:

package com.banno.cpucount

object Main extends App {
  while (true) {
    println(s"${Runtime.getRuntime.availableProcessors} available processors")
    Thread.sleep(1000)
  }
}

Using the banno-sbt-plugin‘s Docker support it’s very simple to package this app into a Docker image and push it to our private Docker registry: just run docker and dockerPush in sbt. Then we tell Marathon to run this application on our Mesos cluster, using the following json:

{
    "id": "cpucount",
    "container": {
        "docker": {
            "image": "registry.banno-internal.com/cpu-count:1-SNAPSHOT"
        },
        "type": "DOCKER",
        "volumes": []
    },
    "cpus": 0.1,
    "mem": 32.0,
    "instances": 1
}

Again we’re only requesting 0.1 cpus, but once this app runs and we look at its stdout, we see it also has access to all 8 CPUs:

Registered executor on all-in-one-1404.vagrantup.com
Starting task cpucount.a4fe3968-3ebe-11e4-9944-56847afe9799
/bin/sh -c exit `docker wait mesos-9dfe2385-e7ab-44ae-9637-98680d8727a1` 
Forked command at 1976
8 available processors
8 available processors
8 available processors
8 available processors
8 available processors
8 available processors

To summarize so far, a process in a Docker container running on a Mesos slave appears to have access to all CPUs of that slave machine, regardless of the cpus parameter we submit to Marathon when creating the application. This seems great for reactive apps, as they can spread computations across multiple CPUs. So what does this Marathon/Mesos cpus parameter do exactly?

Let’s take a look at how Mesos actually runs a Docker container. Mesos builds up a docker run command, converting the cpus value into a value for Docker’s --cpu-shares setting, which according to the Docker documentation is just a priority weight for that process relative to all others on the machine. So this cpus parameter is a relative weight on the priority that the OS will use when scheduling processes time on the CPUs. An application run with cpus=2 should receive twice the priority as one using cpus=1.

Another thing to note is the Mesos cluster’s resource state while our application is running:

Screen Shot 2014-09-17 at 5.10.30 PM

There are a total of 8 CPUs but our app is using 0.1 of them, so 7.9 CPUs are left available. Mesos will only allow a task to run on a slave if that slave has enough CPU capacity left to accomodate that app’s requested CPU value. The remaining capacity starts off at the total number of CPUs on the machine and decreases by the amount requested by each task assigned to it. This is another effect that the cpus parameter has: it specifies the CPU capacity used up by the application.

Note that this cpus parameter is not a direct limitation on the number of CPUs available to the Docker container, nor some kind of limit on the speed of the CPUs. We already saw that the Docker container could access all 8 CPUs, so when we request cpus=0.1 Mesos is not just giving our Docker container 1 of those CPUs, or 0.1 of them; it has all 8 CPUs. Mesos just seems to keep track of total CPU resources, and subtract from that capacity however much CPUs your application specifies that it needs.

Let’s go back to our original list of questions and fill in the answers:

  • Marathon’s cpu setting is both a relative weight for scheduling all Docker containers across all of the Mesos slave’s CPUs and an amount of the Mesos slave’s available CPU capacity to use up
  • A process running in a Docker container on a Mesos slave thinks it has the same number of CPUs as the underlying machine
  • The OS should give relative weight to the Docker containers running on a Mesos slave according to their cpus values

Given what we’ve discovered, cpus seems like a bit of a vague, or even misleading, name for this parameter. Maybe cpu-capacity or cpu-weight would be more descriptive of what it actually does?

Now that we know a bit more about CPU resources in Docker, Mesos and Marathon, and the effects of this cpus parameter specifically, we can make more informed choices for its value when creating applications in Marathon. If we always choose a low value like 0.1 we risk over-allocating tasks on the Mesos slaves: we could end up running too many processes, and each process won’t get enough CPU time. If we always choose high values like 5 or 10 we risk under-allocating the Mesos slaves, leaving expensive CPUs sitting idle, or worse we may not even have a Mesos slave in the cluster with enough available CPU capacity to even run our tasks.

Hopefully this has been an informative blog post for you. If I’ve missed any details or made any mistakes, please let me know in the comments!

Update 2014-09-29: Christos Kozyrakis from Mesosphere was kind enough to provide some clarification on the information above, which I will paraphrase:

  • The CPU behavior described above all stems from Mesos’ current isolator, which uses cgroups
  • Once there are multiple Docker containers running on a Mesos slave, most Linux distros will use the CFS Scheduler to give those processes running time on the machine’s CPUs, using the relative weights from Marathon’s cpus parameter
  • Mesos provides an API for various isolator implementations to use, and one that should be available in the future would be based on cpusets and would allow you to truly restrict a certain process to certain CPUs on the Mesos slave. Docker also exposes this functionality via its --cpuset option. This will provide additional flexibility and may be better than cpushares for certain use cases, but just like choosing the value for your cpus parameter, you would need to choose the cpuset isolation carefully or you may see low utilization.
Advertisements

5 thoughts on “CPU Resources in Docker, Mesos and Marathon

  1. Nice research! I started to wonder when my multithreaded application gained no improvement when i increased the cpu-parameter in the marathon config.

  2. Thanks a lot! Nice research! However, can you throw some light on the default CPU values? What value will be assigned if I dont specify a value explicitly?

    • @Rishabh I’m pretty sure you have to specify a value for cpus when creating an application in Marathon. If you omit that field, I think it will just respond with an error.

      • The GUI probably wont allow you to leave the space empty. In case you are using a json file to post an application (say a docker based application) , in that case, CPU isn’t a mandatory field. A colleague of mine was saying it is 1 unit by default. I havent yet tried by myself.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s