We’ve been learning a lot about Docker, Mesos and Marathon lately at Banno and have big plans to use these technologies in our infrastructure. Mesos will let us treat all of our servers as one uniform pool of resources, on which we’ll run our applications packaged into convenient Docker containers, with Marathon figuring out the details of running these applications. Many of our applications are “reactive“, built on tools like Akka and Spray, and are thus inherently multi-threaded by nature, able to spread computations across many CPUs concurrently to speed up certain operations.
When you tell Marathon to create an application for you on the Mesos cluster, you include a cpus
parameter which is typically a number like 0.1 or 3. I was very curious what this parameter controlled exactly, but was unable to find it documented anywhere. It seems quite important, especially since our reactive apps will use as many CPUs as they are given. In this blog post, we’ll dig down into this cpus
parameter and learn about CPU resources in Docker, Mesos and Marathon.
Note: these are very fast-moving projects and the information in the post may become outdated rather quickly. It should be accurate as of Docker 1.2.0, Mesos 0.20.0 and Marathon 0.7.0.
Our goal will be to answer the following questions:
- What does Marathon’s
cpu
setting actually mean? What does 0.1 cpus, or 2 cpus mean? - How many CPUs does a process running in a Docker container on a Mesos slave think it has?
- How does this relate to total CPUs on the Mesos slave?
- How do processes running in separate Docker containers on the same Mesos slave interact/interfere/share the machine’s CPUs?
First off, we need a Mesos cluster running Marathon. My colleague Nic Grayson created a great project that will run a local Vagrant box with Zookeeper, Mesos master, Mesos slave, Marathon and Docker on it, so that’s what I will use in this post. Currently it’s a private project, but hopefully we can open source it in the future. Mesosphere also provides convenient tools to set up similar clusters on AWS and Google Cloud Platform.
I’m running all of this on a quad-core MacBook Pro, and I gave the Vagrant box all 8 CPUs and 4GB memory. As a baseline, let’s check out the resources on our Mesos slave machine. Here’s what the Mesos web UI at http://192.168.22.22:5050 shows for our cluster resources:
If I ssh in to this VM and run htop, we see 8 CPUs:
Also /proc/cpuinfo
shows 8 CPUs:
vagrant@all-in-one-1404:~$ grep processor /proc/cpuinfo processor : 0 processor : 1 processor : 2 processor : 3 processor : 4 processor : 5 processor : 6 processor : 7
So our Mesos slave machine has 8 CPUs.
Now let’s get a Docker container running in Mesos, then get a shell inside that container and poke around. The following curl
will create an app in Marathon using an Ubuntu Docker image that just echoes “hello world” forever:
curl -X POST -H "Content-Type: application/json" http://192.168.22.22:8080/v2/apps -d@helloworld.json
Here are the contents of helloworld.json. Note that we request 0.1 cpus for this application.
{ "id": "helloworld", "container": { "docker": { "image": "ubuntu:14.04" }, "type": "DOCKER", "volumes": [] }, "cmd": "while true; do echo hello world; sleep 1; done", "cpus": 0.1, "mem": 32.0, "instances": 1 }
It may take a few minutes to pull that ubuntu:14.04 Docker image, but eventually Mesos will run the Docker container. You can see it by running sudo docker ps
on the Mesos slave.
To get a shell inside this container, we’ll use the excellent nsenter tool. Then we’ll examine the CPU resources available to the container:
vagrant@all-in-one-1404:~$ sudo docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 8c67cd81d13a ubuntu:14.04 "/bin/sh -c 'while t 3 minutes ago Up 3 minutes mesos-5e500b93-48b1-4e1f-b87c-2f84adc4b46e vagrant@all-in-one-1404:~$ sudo docker-enter 8c67cd81d13a bash root@all-in-one-1404:/# grep processor /proc/cpuinfo processor : 0 processor : 1 processor : 2 processor : 3 processor : 4 processor : 5 processor : 6 processor : 7
So far, all evidence suggests that this container has access to all 8 of the Mesos slave’s CPUs, even though we only requested 0.1.
Next let’s go a step further and see what an application running in the JVM sees for CPUs. I wrote a simple Scala application that just repeatedly prints the number of available processors:
package com.banno.cpucount object Main extends App { while (true) { println(s"${Runtime.getRuntime.availableProcessors} available processors") Thread.sleep(1000) } }
Using the banno-sbt-plugin‘s Docker support it’s very simple to package this app into a Docker image and push it to our private Docker registry: just run docker
and dockerPush
in sbt. Then we tell Marathon to run this application on our Mesos cluster, using the following json:
{ "id": "cpucount", "container": { "docker": { "image": "registry.banno-internal.com/cpu-count:1-SNAPSHOT" }, "type": "DOCKER", "volumes": [] }, "cpus": 0.1, "mem": 32.0, "instances": 1 }
Again we’re only requesting 0.1 cpus, but once this app runs and we look at its stdout, we see it also has access to all 8 CPUs:
Registered executor on all-in-one-1404.vagrantup.com Starting task cpucount.a4fe3968-3ebe-11e4-9944-56847afe9799 /bin/sh -c exit `docker wait mesos-9dfe2385-e7ab-44ae-9637-98680d8727a1` Forked command at 1976 8 available processors 8 available processors 8 available processors 8 available processors 8 available processors 8 available processors
To summarize so far, a process in a Docker container running on a Mesos slave appears to have access to all CPUs of that slave machine, regardless of the cpus
parameter we submit to Marathon when creating the application. This seems great for reactive apps, as they can spread computations across multiple CPUs. So what does this Marathon/Mesos cpus
parameter do exactly?
Let’s take a look at how Mesos actually runs a Docker container. Mesos builds up a docker run
command, converting the cpus
value into a value for Docker’s --cpu-shares
setting, which according to the Docker documentation is just a priority weight for that process relative to all others on the machine. So this cpus
parameter is a relative weight on the priority that the OS will use when scheduling processes time on the CPUs. An application run with cpus=2
should receive twice the priority as one using cpus=1
.
Another thing to note is the Mesos cluster’s resource state while our application is running:
There are a total of 8 CPUs but our app is using 0.1 of them, so 7.9 CPUs are left available. Mesos will only allow a task to run on a slave if that slave has enough CPU capacity left to accomodate that app’s requested CPU value. The remaining capacity starts off at the total number of CPUs on the machine and decreases by the amount requested by each task assigned to it. This is another effect that the cpus
parameter has: it specifies the CPU capacity used up by the application.
Note that this cpus
parameter is not a direct limitation on the number of CPUs available to the Docker container, nor some kind of limit on the speed of the CPUs. We already saw that the Docker container could access all 8 CPUs, so when we request cpus=0.1
Mesos is not just giving our Docker container 1 of those CPUs, or 0.1 of them; it has all 8 CPUs. Mesos just seems to keep track of total CPU resources, and subtract from that capacity however much CPUs your application specifies that it needs.
Let’s go back to our original list of questions and fill in the answers:
- Marathon’s
cpu
setting is both a relative weight for scheduling all Docker containers across all of the Mesos slave’s CPUs and an amount of the Mesos slave’s available CPU capacity to use up - A process running in a Docker container on a Mesos slave thinks it has the same number of CPUs as the underlying machine
- The OS should give relative weight to the Docker containers running on a Mesos slave according to their
cpus
values
Given what we’ve discovered, cpus
seems like a bit of a vague, or even misleading, name for this parameter. Maybe cpu-capacity
or cpu-weight
would be more descriptive of what it actually does?
Now that we know a bit more about CPU resources in Docker, Mesos and Marathon, and the effects of this cpus
parameter specifically, we can make more informed choices for its value when creating applications in Marathon. If we always choose a low value like 0.1 we risk over-allocating tasks on the Mesos slaves: we could end up running too many processes, and each process won’t get enough CPU time. If we always choose high values like 5 or 10 we risk under-allocating the Mesos slaves, leaving expensive CPUs sitting idle, or worse we may not even have a Mesos slave in the cluster with enough available CPU capacity to even run our tasks.
Hopefully this has been an informative blog post for you. If I’ve missed any details or made any mistakes, please let me know in the comments!
Update 2014-09-29: Christos Kozyrakis from Mesosphere was kind enough to provide some clarification on the information above, which I will paraphrase:
- The CPU behavior described above all stems from Mesos’ current isolator, which uses cgroups
- Once there are multiple Docker containers running on a Mesos slave, most Linux distros will use the CFS Scheduler to give those processes running time on the machine’s CPUs, using the relative weights from Marathon’s
cpus
parameter - Mesos provides an API for various isolator implementations to use, and one that should be available in the future would be based on cpusets and would allow you to truly restrict a certain process to certain CPUs on the Mesos slave. Docker also exposes this functionality via its
--cpuset
option. This will provide additional flexibility and may be better than cpushares for certain use cases, but just like choosing the value for yourcpus
parameter, you would need to choose the cpuset isolation carefully or you may see low utilization.
There’s a little more information on cpu shares at https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html
Nice research! I started to wonder when my multithreaded application gained no improvement when i increased the cpu-parameter in the marathon config.
Thanks a lot! Nice research! However, can you throw some light on the default CPU values? What value will be assigned if I dont specify a value explicitly?
@Rishabh I’m pretty sure you have to specify a value for cpus when creating an application in Marathon. If you omit that field, I think it will just respond with an error.
The GUI probably wont allow you to leave the space empty. In case you are using a json file to post an application (say a docker based application) , in that case, CPU isn’t a mandatory field. A colleague of mine was saying it is 1 unit by default. I havent yet tried by myself.
Pingback: Java Containers on Mesos | Chaotic Good Programming
Pingback: Monitoring and Collecting Docker container statistics – JJPP: JP in JP