Deploying a multi-node application to AWS using chef-provisioning

This post is for people who are getting started with chef-provisioning and want to use it to deploy to AWS. It will take you through creating a couple of machines and deploying a simple application to them. In a future post, I’ll extend this to cover setting up some networking and infrastructure (VPC, subnets, security groups), but this post will assume you are using the default VPC created by AWS.

If you’re just looking to try chef-provisioning and you use Vagrant, you may want start with my other post: Deploying a multi-node application to Vagrant using chef-provisioning.

For an overview of chef-provisioning ( (formerly known as chef-metal), take a look at this Chef-Provisioning: Infrastructure as Code blog post. Also, see the Chef provisioning docs for more details.

Getting setup with chef-provisioning

Chef-provisioning is included in the latest ChefDK (0.3.6 at time of writing). Make sure you have this version or later installed by typing:

chef --version

If not, you can download or upgrade it here.

Create a new Chef repository to explore chef-provisioning:

cd ~
chef generate repo chefprov

We are going to use chef-client in local mode to run our provisioning recipes, so we want to set up a .chef directory that will be used specifically for this repo.

cd ~/chefprov
mkdir .chef
cd .chef

In the .chef directory, create a knife.rb file containing the following:

log_level                :info
current_dir = File.dirname(__FILE__)
node_name                "provisioner"
client_key               "#{current_dir}/dummy.pem"
validation_client_name   "validator"

Our workstation is going to behave like a chef-client talking to the local-mode server on our workstation,  so it needs a node name and a key. The key can be any well-formed key as the local-mode server will not validate it. For example:

ssh-keygen -f dummy.pem

Check the setup is working by performing an empty chef-client run:

chef-client -z

This will perform a local mode chef-client run with no recipes, using the built-in chef-zero server running on port 8889. You should see output similar to:

Starting Chef Client, version 11.18.0
[2015-01-31T16:16:43-06:00] INFO: *** Chef 11.18.0 ***
[2015-01-31T16:16:43-06:00] INFO: Chef-client pid: 14113
[2015-01-31T16:16:44-06:00] INFO: Run List is []
[2015-01-31T16:16:44-06:00] INFO: Run List expands to []
[2015-01-31T16:16:44-06:00] INFO: Starting Chef Run for provisioner
[2015-01-31T16:16:44-06:00] INFO: Running start handlers
[2015-01-31T16:16:44-06:00] INFO: Start handlers complete.
[2015-01-31T16:16:44-06:00] INFO: HTTP Request Returned 404 Not Found : Object not found: /reports/nodes/provisioner/runs
[2015-01-31T16:16:44-06:00] WARN: Node provisioner has an empty run list.
Converging 0 resources
[2015-01-31T16:16:44-06:00] INFO: Chef Run complete in 0.032696323 seconds
Running handlers:
[2015-01-31T16:16:44-06:00] INFO: Running report handlers
Running handlers complete
[2015-01-31T16:16:44-06:00] INFO: Report handlers complete
Chef Client finished, 0/0 resources updated in 1.117898047 seconds

If you’re curious, take a look at the ‘nodes/provisioner.json’ file. This is where the local-mode server stores its node data. You can also run commands like:

knife node show provisioner -z

This command will query the local-mode server and show summary details that it has about your provisioner node (i.e. your workstation).

Preparing the AWS client

To use chef-provisioning, you need to have the AWS CLI client installed. Follow the AWS CLI setup instructions to download and install the client, and to obtain your access keys.

If you are using an existing AWS account, please take appropriate precautions to make sure that you are working in a ‘sandbox’ that minimizes the chance of bad things when you get a script wrong. For example, you might configure your AWS client default to use a region where you do not have existing resources. To do this, edit the ~/.aws/config file and make sure  your selected region is in the default stanza. The following example sets us-west-2 (Oregon) as the default region:

region = us-west-2

Also check that you have the right access keys are configured as default. I prefer to separate these out into the ~/.aws/credentials file, rather than put them in the config file:

aws_access_key_id = AMADEUPACCESSKEY
aws_secret_access_key = AMadeUPSecreTACcesSKEYXXYyyyZzzZ1234

To make sure the AWS client is working, run the following command:

aws ec2 describe-availability-zones

It should give you a list of availability zones in the region you are using.

If, like me, you are a little more paranoid, you may want to create an IAM user with limited access to resources. I won’t cover this in detail, but below is an example policy that may be useful as a basis to restrict access. Feel free to skip over this to the next section!

  "Version": "2012-10-17",
  "Statement": [
            "Sid": "AllowDescribeAndBasicSetup",
            "Effect": "Allow",
            "Action": ["ec2:Describe*", 
                "ec2:ModifyInstanceAttribute" ],
            "Resource": "*"
            "Sid": "AllowInstanceResourceActions",
            "Effect": "Allow",
            "Action": ["ec2:RunInstances"],
            "Resource": [
            "Sid": "AllowOtherInstanceActions",
            "Effect": "Allow",
            "Action": [
            "Resource": "arn:aws:ec2:us-west-2:632055226646:instance/*"
            "Sid": "AllowToSeeWhatCantDo",
            "Effect": "Allow",
            "Action": [
            "Resource": "*"

Lines 3-8 allow the user to perform most query operations on any region, import key pairs (see later section on SSH access), and create tags (which is something the chef-provisioning resources like ‘machine’ do by default).

Lines 9-21 only allow the user to create instances and associate them with resources in the us-west-2 region. Lines 22-31 allow the user to manage the instances after creation.

Lines 32-38 are optional but can be useful. If the access policy is too restrictive, you will get a ‘You are not authorized to perform this operation’ message. Sometimes this will include an encoded message which gives you information about what you were not authorized to do. With the above authorization, you can run:

aws sts decode-authorization-message --encoded-message xxxxxxxxxxxxxxxx

Where “xxxxxxxxxxxxxxxx” is the encoded message.

Preparing SSH access into AWS

In order to run chef-client on the instances that you are going to create in AWS, you need to enable SSH access to those instances. There are two main things you need to do:

  • Setup a key-pair
  • Enable SSH access from your IP address

Setup a key-pair

Use the EC2 console to create a keypair in the region you are using. Download the private key (‘test2_aws.pem’) and save it in ~/.ssh. Make sure its permissions are read-only:

chmod 400 ~/.ssh/test2_aws.pem

You will also need the public key. You can retrieve this from the private key by running:

ssh-keygen -y -f ~/.ssh/test2_aws.pem >

and giving it the name of the file.

If you are using an IAM user without a console logon, generate a keypair using ssh-keygen then import it using the AWS CLI:

aws ec2 import-key-pair --key-name test2_aws --public-key-material file://

The ‘file://’ method of loading the file ensures that the key is base64 encoded, which is required to upload a key via the CLI.

Enable SSH access from your IP address

By default, AWS does not enable SSH from external sources into its VPCs. You need to use the EC2 console to allow inbound SSH access from your IP address, by adding a rule to a security group.

This post assumes you can add this rule to the default security group for the default VPC in the region you are using. This will allow immediate access to the machines we will create with chef-provisioning.  If you can’t do this, the examples won’t work without some manual intervention – i.e. you will need to add the security group to the created instances before you can run recipes on them.

We also need to let chef-provisioning know about the keys. Add the following to your ./chef/knife.rb file:

knife[:ssh_user] = "ubuntu"
knife[:aws_ssh_key_id] = 'test2_aws'
private_keys     'test2_aws' => '/home/christine/.ssh/test2_aws.pem'
public_keys      'test2_aws' => '/home/christine/.ssh/'

Line 1 is the user name to use when SSH’ing to the instance. For the standard Ubuntu image, this should be ‘ubuntu’. Line 2 specifies which key name to use for AWS, and Lines 3 & 4 setup the locations of the private and public keys.

Enable external access to the application

Our test application requires TCP access on port 3001. Open this port by adding a Custom TCP rule to the security group for the default VPC, allowing access from any IP address (CIDR block ‘’).

The inbound rules should now look something like this:

Creating the AWS instances

Create basic machine provisioning recipe

Our first pass at the chef-provisioning recipes will just create the instances, with nothing on them.

We will create two recipes. The first will set up the AWS-specific details. The second will create the machines.


require 'chef/provisioning/aws_driver'
with_driver 'aws'

  with_machine_options :bootstrap_options => {
  :key_name => 'test2_aws',
  :instance_type => 't1.micro',
  :associate_public_ip_address => true

Lines 4-8 specify what sort of instances we want to create.

Line 2 tells chef to use the ‘chef-provisioning-aws’ provider. This provider is one of two AWS providers distributed with ChefDK, and is an alternative to the more established chef-provisioning-fog driver. I am using it because of its support for a growing range of other AWS resources (VPCs, security groups, S3, and others). To use the fog driver, replace ‘aws’ with ‘fog:aws’. You may also need to make other changes, for example ‘:instance_type’ is ‘flavor_id’ in the fog driver.

In Line 7, we choose the smallest and cheapest type of instance to experiment with.

Line 8 associates a public IP address with the instance, so that chef can SSH to it.

We are using the default AMI, which is currently Ubuntu 14.04.

The full set of ‘:bootstrap_options’ corresponds to the options listed for the AWS create-instance method.

The second recipe specifies a simple topology with two machines in it:


require 'chef/provisioning'
machine 'db'
machine 'appserver'

This recipe will create and start the machines, and bootstrap the chef-client onto them.

UPDATE: chef-provisioning-aws 1.2.1 introduces new default AMIs. If the command above fails with:

AWS::EC2::Errors::InvalidParameterCombination: Non-Windows instances with a 
virtualization type of 'hvm' are currently not supported for this instance type.

then replace t1.micro with t2.micro in the above:

  with_machine_options :bootstrap_options => {
  :key_name => 'test2_aws',
  :instance_type => 't2.micro',
  :associate_public_ip_address => true

UPDATE: If the above command fails with:

         Unexpected Error:
         ChefZero::ServerNotFound: No socketless chef-zero server on given port 8889

then add the following to each machine resource:

machine 'db' do
  chef_server( :chef_server_url => 'http://localhost:8889') 
machine 'appserver' do
 chef_server( :chef_server_url => 'http://localhost:8889') 

or add the following in the setup recipe:

with_chef_server "http://localhost:8889"

This problem exists in chefDK 6.0 to 6.2.

Run the recipe

Before proceeding, be aware that you will be charged for the resources that these recipes create. Make sure you delete any instances after you are done. I will tell you how to do that using chef-provisioning, but I advise you to logon to the EC2 console and making sure you have no instances left running when you are done.

To run the recipes, enter:
chef-client -z aws_setup.rb topo.rb

For each of the two machines, you should see the chef-client run create a node, wait for the machine to become connectable (this may take a while), bootstrap the chef-client and perform an empty run.

If you go to the EC2 console, you should see both machines (named ‘db’ and ‘appserver’) are up and running.

Working around SSH issue

If you are trying this with ChefDK 0.3.6 on Ubuntu, you may encounter the following error:

         Chef encountered an error attempting to load the node data for "db"

         Unexpected Error:
         NoMethodError: undefined method `gsub' for nil:NilClass

This is a known issue with chef-provisioning providing a bad URL for the local-mode server. If you can upgrade to chefDK 0.4.0, this problem has been fixed (but be aware that chefDK 0.4 embeds Chef 12 and not Chef 11).

A workaround for chefDK 0.3.6 is to create the following Gemfile in your chefprov directory:

source ''

gem 'chef-dk'
gem 'chef-provisioning'
gem 'chef-provisioning-aws'
gem 'net-ssh', '=2.9.1'

and then run chef-client using:

bundle exec chef-client -z aws_setup.rb topo.rb

This will run the chef-client using a previous version of ‘net-ssh’, which avoids the problem.

You will likely need to use ‘bundle exec’ in front of all of the chef-client runs described in this post.

Setup and deploy the Application

Get the application cookbooks

The basic application we will install can be found in the ‘test-repo’ for the ‘knife-topo’ plugin on Github.

First, download the latest release of  the knife-topo repository and unzip it.

Then we will use ‘berks vendor’ to assemble the cookbooks we need to deploy this application:

cd knife-topo-0.0.11/test-repo
berks vendor
cp -R berks-cookbooks/* ~/chefprov/cookbooks

Line 2 uses the Berksfile to assemble all of the necessary cookbooks into the ‘berks-cookbooks’ directory.

Line 3 copies them into our ‘chefprov’ repo, where the local-mode server will look for them when it runs the chef-provisioning recipes.

Extend machine provisioning to include runlists

Now change the topo.rb provisioning recipe as follows:

require 'chef/provisioning'

machine 'db' do
  run_list ['apt','testapp::db']

machine 'appserver' do
  run_list ['apt','testapp::appserver']

and rerun the chef-client:
chef-client -z aws_setup.rb topo.rb

This time, the chef-client running on the two instances will execute the specified recipes, installing nodejs on ‘appserver’ and mongodb on ‘db’.

Deploy the application

We will now create a third recipe to deploy the application. We could have included this as part of the ‘topo.rb’ recipe, but I chose to make it a separate recipe, so it can be run independently.

Here’s what the recipe looks like:


require 'chef/provisioning'

machine 'appserver' do
 run_list ['testapp::deploy']
 attribute ['testapp', 'user'], 'ubuntu'
 attribute ['testapp', 'path'], '/var/opt'
 attribute ['testapp', 'db_location'], lazy { search(:node, "name:db").first['ipaddress'] }

ruby_block "print out public IP" do
 block do
 appservernode = search(:node, "name:appserver").first"Application can be accessed at http://#{appservernode['ec2']['public_ipv4']}:3001")

Line 4 runs the recipe to deploy the application.

Lines 5 to 7 set attributes on the node that customize the test application. For example, Line 7 sets the attribute node[‘testapp’][‘db_location’] to the IP address of the database server, which it looks up using a search for node information stored in the local-mode Chef server (i.e. in the ‘chefprov/nodes’ directory).

In Line 5, ‘lazy’ is used so that the search occurs during the converge phase of the chef-run, not during the compile phase. This is important if the ‘topo.rb’ and ‘deploy.rb’ recipes are run in a single runlist, because the IP address of the database server will only be known after the db machine resource has actually been executed in the converge phase.

Lines 8-13 print out the URL for the application, which uses the public IP address of the application server. This is executed in a ‘ruby_block’ resource so that it occurs in the converge phase once the application server has been created and configured.

Run the chef-client:
chef-client -z aws_setup.rb deploy.rb

At the end of the run, you should see something like:

  * ruby_block[print out public IP] action run[2015-01-31T21:28:38-06:00] INFO: Processing ruby_block[print out public IP] action run (@recipe_files::/home/christine/chefprov/deploy.rb line 9)
[2015-01-31T21:28:38-06:00] INFO: Application can be accessed at
[2015-01-31T21:28:38-06:00] INFO: ruby_block[print out public IP] called

    - execute the ruby block print out public IP
[2015-01-31T21:28:38-06:00] INFO: Chef Run complete in 21.74813493 seconds

Running handlers:
[2015-01-31T21:28:38-06:00] INFO: Running report handlers
Running handlers complete
[2015-01-31T21:28:38-06:00] INFO: Report handlers complete
Chef Client finished, 2/2 resources updated in 23.594399737 seconds

Browse to the application URL, and you should see something like:

 Congratulations! You have installed a test application using the knife topo plugin.

 Here are some commands you can run to look at what the plugin did:

    knife node list
    knife node show dbserver01
    knife node show appserver01
    knife node show appserver01 -a normal
    knife data bag show topologies test1
    cat cookbooks/testsys_test1/attributes/softwareversion.rb

Go to the knife-topo plugin on Github

Ignore the example commands as we did not use the knife-topo plugin.

Destroy the machines

To destroy the machines, create a recipe:


require 'chef/provisioning'
machine 'db' do

machine 'appserver' do

And run it:
chef-client -z destroy.rb

You should see messages like:

  * machine[appserver] action destroy[2015-02-01T09:20:43-06:00] INFO: Processing machine[appserver] action destroy (@recipe_files::/home/christine/chefprov/destroy.rb line 7)

    - Terminate appserver (i-93a8db50) in us-west-1 ...[2015-02-01T09:20:46-06:00] INFO: Processing chef_node[appserver] action delete (basic_chef_client::block line 26)

    - delete node appserver at http://localhost:8889[2015-02-01T09:20:46-06:00] INFO: Processing chef_client[appserver] action delete (basic_chef_client::block line 30)
[2015-02-01T09:20:46-06:00] INFO: chef_client[appserver] deleted client appserver at http://localhost:8889

    - delete client appserver at clients

For both ‘db’ and ‘appserver’. If the run succeeds but you do not see these messages, you may have specified the wrong machine name.

Until you are confident in your scripts, you may want to use the EC2 console to make sure you have terminated the instances (don’t forget to navigate to the right region). You may also want to remove the added rules from the VPC default security group.


Avoiding the possible pitfalls of derived attributes

What this post is about

This post is intended for folks who are comfortable with the basics of attributes in Chef, but want to understand some of the subtleties better. It focusses on one specific aspect – derived or computed attributes – and how to make sure they end up with the value you intend. I’m going to cover four topics:

This post is with huge thanks to the many people at the 2014 Chef Summit and online who helped me with this topic, including but not limited to Noah Kantrowski, Julian Dunn and Lamont Granquist. The good ideas are theirs, any mistakes are mine.

Attribute precedence in practice

First: it’s not all bad news. Much of the time the attribute precedence scheme in Chef 11, although complex, will do what you want it to. The complexity is there because Chef supports multiple different approaches to customizing attribute values, particularly (1) using wrapper cookbooks versus (2) using roles and environments. You can see some of these tradeoffs in this description of Chef 11 attribute changes.

Here’s a reminder of the attribute precedence scheme. The highest numbers indicate the highest precedence:

Image linked from Chef documentation.

One benefit of the above scheme is that you can override default attributes with new values in a wrapper cookbook at default level. You do not need to use a higher priority level. This is important because you can wrapper the wrapper if you have to, without suffering “attribute priority inflation”. Why wrapper a wrapper? It can be very useful when you need multiple levels of specialization, e.g. to set defaults for an organization; override some of those defaults for a business system, and then do further customizations for a specific deployment of that business system.

A benefit of the precedence scheme when working with roles and environments is that you can set default attributes in a role or environment, and they will override the default attributes in cookbooks. The mental model is that your cookbooks are designed to be generally reusable, and have the least context-awareness. An environment provides additional context such as implementing the policies specific to your organization. Similarly, roles can configure the recipes to meet a specific functional purpose.

The possible pitfalls of derived attributes

So where can it go wrong? Let’s use a simple example, consisting of a base cookbook called “app”, and a wrapper cookbook called “my_app” which has a dependency on “app” in its metadata.rb. The contents of those cookbooks are:


  default["app"]["name"] = "app"
  default["app"]["install_dir"] = "/var/#{node["app"]["name"]}"


ruby_block "Executing resource in recipe" do
  block do "Executing recipe, app name is: #{node['app']['name']};" +
      " install_dir is #{node['app']['install_dir']}"


  default["app"]["name"] = "my_app"


  include_recipe "app::default"

And they are uploaded to the server using:

knife cookbook upload app my_app

The base “app” cookbook has an application “name” attribute which defaults to “app”, and an “install_dir” attribute which is calcyulated from the application name. For simplicity, the recipe which would actually deploy “app” just prints out the value of the attributes using a ruby block so that we see the values that would be used when the resources are run. The wrapper “my_app” cookbook changes the application name attribute from “app” to “my_app”.

What happens if we run the wrapper cookbook?

sudo chef-client -r 'recipe[my_app]' -linfo


The “name” attribute is set to “my_app”, however the derived “install_dir” attribute still has its old value of “/var/app”, which is probably not what was intended.  This is not a question of priority: if the wrapper contained override["app"]["name"] = "my_app", we would get the same result. To understand why this happens, we need to look at the order of evaluation of the attributes.

What happens during this chef-client run in this example is as follows:

  1. As there are no roles or environments, the “compile” phase starts by evaluating attribute files based on the cookbooks in the runlist and their dependencies.
  2. The first cookbook in the list is “my_app”, which has a dependency on “app”. Dependencies are loaded first, so the default “name” attribute is set to “app” and the default “install_dir” attribute is set to “/var/app”.
  3. The “my_app” wrapper attribute file is loaded second and updates the default “name” attribute to “my_app”. The “install_dir” attribute is not updated and therefore keeps its value of “/var/app”.
  4. After that, the recipe files are loaded and the ruby_block resource is added to the resource collection, instantiated with the current values of the “name” and “install_dir” attributes.
  5. The “converge” phase executes the resources in the resource collection, printing out the values “my_app” and “/var/app”.

How attribute values are determined

Basic model

The following diagrams may help explain how attribute values are evaluated.

First, let’s work with a runlist like the following, consisting of three recipes in three cookbooks (cb1, wcb, cb3). The second recipe(wcb::rc2) is a wrapper of a recipe in a fourth cookbook (cb2::r2). Each cookbook has a single attribute file (a_cb1, etc).


The diagram below illustrates how the values of the attributes in this example change through the run. The attribute files are evaluated in runlist order but with dependencies (from metadata.rb) evaluated first. In this case, ‘a_cb2’ is evaluated after ‘a_cb1’ but before ‘a_wcb’. As the attribute files are evaluated, attribute values are put into “buckets” based on the attribute name and priority, e.g. node.default['x'] updates the value of ‘x’ in the default bucket for ‘x’. Each subsequent update to the same attribute and priority replaces the value in that bucket.


When the recipes are run and they access an attribute e.g. node['x'], the value that is passed back is that of the highest priority bucket that has a value in it.

Here’s an example showing the problem with a derived attribute. In the first step when “y” is calculated, the value of “x” is “app” and so “y” is set to “/var/app”. The value of “x” is set to “my_app” in the second step. When recipe r2 retrieves the attribute values in the third step, it therefore gets “my_app” and “/var/app”:


This diagram shows why using a higher priority does not solve the problem. Again, “y” is calculated in the first step and “x” is not set to “my_app” until the second step:


There is a wrinkle that you should be aware of. If you chose to set “normal” precedence rather than “override” in the above, the first run would still give the same result as above, but subsequent runs would “work”. Normal attributes are special because they persist across chef-client runs. If the wrapper cookbook contained normal['x']="my_app", “y” would still be computed as “/var/app” on the first run. On the second run, however, it would change to “/var/my_app”, because “my_app” would be in the “normal” bucket at the start of evaluation and would be used in the first step to calculate “y”.

Model including roles

Roles introduce two changes to our model:

  1. Role attributes have a higher precedence than those in cookbooks, effectively creating two new rows of buckets labelled as “role_default” and “role_override” in the diagram below
  2. Role attributes are always evaluated before cookbook attributes, regardless of their runlist position


These precedence rules mean that you can use roles to avoid the derived attribute problem, as shown below:


Setting the default value of “x” to “my_app” in the role guarantees that the value of “my_app” will be present when “y” is evaluated in cookbook cb2. “my_app” will be used rather than “app” because a role default value takes precedence over the cookbook default (it is in a higher priority bucket).

Model including environments

Environments add two new precedence levels, one between default and role_default; one after role_override and before “automatic”. Like roles, they are always evaluated before cookbook attributes.

Some ways to solve the problem

As a user of a cookbook with a derived attribute

As a user of a cookbook with a derived attribute and you do not have the option of modifying the base cookbook, you have two basic choices:

  • Always set any computed attributes if you change the attributes that they are derived from
  • Use a role or environment

Set computed attributes

The simplest approach is to make sure you set all of the attributes that are derived from attributes that you want to change. In our original example, we would specify both “name” and “install_dir”, e.g.:


  default["app"]["name"] = "my_app"
  default["app"]["install_dir"] = "/var/my_app"

This is probably the approach you will want to take if you use the wrapper cookbook approach.

Use a role or environment

As explained in the Model including roles section, attributes in roles have priority over attributes in cookbooks, and are also always evaluated before them. If you use roles, then setting an attribute in a role will also change any computed attributes. In our original example, we could define myapp role as:

  "name": "myapp",
  "default_attributes": {
    "app": {
      "name": "my_app"

knife role from file roles/myapp.json

Then run with a modified runlist:

sudo chef-client -r 'recipe[app]','role[myapp]' -linfo

The result would be to set both the “name” attribute to “my_app”, and the “install_dir” attribute to “/var/my_app”.

As an author of a cookbook with a derived attribute

As an author of a cookbook, you may prefer not to rely on users noticing derived attributes and handling them appropriately. Here are some possibilities to make life easier for your users:

  • Use a variable and not an attribute
  • Use delayed evaluation in the recipe
  • Use conditional assignment in the recipe

A gist for this example.

Use a variable and not an attribute

If the derived value should always be calculated, then don’t use an attribute, use a ruby variable in the recipe. In our original example, if “install_dir” should always be “/var” followed by the application name, remove the derived attribute and instead do the following in the recipe:


  default["app"]["name"] = "app"


install_dir = "/var/#{node["app"]["name"]}"
ruby_block "Executing resource in recipe" do
  block do "Executing recipe, application name is: #{node['app']['name']};" +
      " install_dir is #{install_dir}"

Similarly, if the user needs to be able to change the root path for the install directory but the application should always be installed in a directory with the application name, create two attributes for “root_path” and “name”, and combine them using a variable:


  default["app"]["name"] = "app"
  default["app"]["root_path"] = "/var"


install_dir = "#{node["app"]["root_path"]}/#{node["app"]["name"]}"
ruby_block "Executing resource in recipe" do
  block do "Executing recipe, application name is: #{node['app']['name']};" +
      " install_dir is #{install_dir}"

Use delayed evaluation in the recipe

Noah Kantrowitz proposed an approach for delaying evaluation of the derived attribute into the recipe, whilst still allowing it to be defined and overridden in the attribute file.

This approach sets up a template for the derived attribute in the attribute file, using the ruby %{} operator to define a placeholder. It then uses the ruby % operator in the recipe file to perform string interpolation, i.e. to substitute the actual value of the placeholder. In our original example, this would look like:


  default["app"]["name"] = "app"
  default["app"]["install_dir"] = "/var/%{name}"


install_dir = node["app"]["install_dir"] % { name: node["app"]["name"]}
ruby_block "Executing resource in recipe" do
  block do "Executing recipe, application name is: #{node['app']['name']};" +
      " install_dir is #{install_dir}"

node["app"]["install_dir"] % { name: node["app"]["name"]} causes Ruby to substitute the value of the “name” attribute wherever the placeholder “%{name}” appears in the “install_dir” attribute. Because this substitution is delayed until the recipe is evaluated, the “name” attribute has already been set by the wrapper cookbook, and “install_dir” will be set to “/var/my_app”.

One consequence of this approach is that the “install_dir” attribute will have a value of “/var/%{name}” in the node object at the end of the run. This may not be desirable if “install_dir” was something you used in node searches. It also means that any cookbooks that reference the “install_dir” attribute need to perform the placeholder substitution before using it.

Use conditional assignment in the recipe

This approach is based on something suggested by Lamont Granquist. It uses conditional logic in the recipe that will only set the default value if no other value has been provided in a wrapper cookbook. Our example would look like this:


  default["app"]["name"] = "app"
  # default["app"]["install_dir"] = "/var/#{node['app']['name']" 


install_dir = node["app"]["install_dir"] || "/var/#{node['app']['name']"
ruby_block "Executing resource in recipe" do
  block do "Executing recipe, application name is: #{node['app']['name']};" +
      " install_dir is #{install_dir}"

The line for “install_dir” in the attribute file is commented out, so that it does not take effect but a user can see that the attribute exists and can be overridden. The line install_dir = node["app"]["install_dir"] || "/var/#{node['app']['name']" will take any overridden value of the node attribute, but otherwise will set it based on the “name” attribute. The conditional assignment is important because otherwise it would overwrite an assignment done in the wrapper cookbook.

With this code, the “install_dir” attribute saved in the node object will be null unless it has been overridden. If you want the actual value used to be saved, you may want to conditionally set the node attribute rather than a variable, e.g. node.default["app"]["install_dir"] = "/var/#{node['app']['name']" unless node["app"]["install_dir"].

Spend less time waiting, more time Cheffing

Vagrant is wonderful, but I hate waiting for my virtual machines to come up. Here’s some things I have done to reduce that wait:

Use Vagrant’s locally managed boxes

When you are starting up a machine based on a completely new box, the most painful wait is usually for the box to download. Make sure you don’t have to wait more than once: use vagrant box add to add it into vagrant’s locally managed boxes. For example, to add Ubuntu 12.04:

vagrant box add precise64

This will give you a box that you can now refer to as “precise64” in your Vagrantfiles. You can use whatever name you want for the first parameter (‘precise64’) in the example. The second parameter is the URL to obtain the box, see this list of base boxes.

In this example, your Vagrantfile will contain something like line 1 in the following: ="precise64";
  config.vm.box_url = "";

Line 2 is entirely optional, and tells vagrant where to get the box if it is not found in the local cache. It’s useful for when you reuse your Vagrant file on another machine, share it with someone else, or when you forget where you got the box from!

Use vagrant-cachier with vagrant-omnibus

After storing the box locally, my next longest waiting time was for the omnibus installer to download the Chef image. I looked into how to do a knife bootstrap from a local image, but that involved replacing the entire chef bootstrap script. Matt Stratton instead pointed me at vagrant-cachier, a plugin that provides a shared package cache. You can also use it to cache other packages like apt and gems that you are installing on the virtual machine. What it does is configure the package manager to use a package cache that is a shared folder between host and guest. This cache is used by the vagrant-omnibus plugin to bootstrap chef onto the virtual machine. Make sure you have recent versions of both plugins. Here’s how to install them:

vagrant plugin install vagrant-omnibus
vagrant plugin install vagrant-cachier

It seems that vagrant-omnibus is quite specific about when it uses the cache. Here’s what worked for me.

if Vagrant.hasPlugin?("vagrant-cachier")
  config.cache.auto_detect = true
  config.cache.scope = :machine
  config.omnibus.cache_packages = true
  config.omnibus.chef_version = "11.16.0"

Line 1 makes sure that you dont get errors if the cachier plugin isn’t installed.

Line 2 enables caching for all types of packages. Beware of this – if you’re actually trying to test downloading from various package repositories, this setting may not work for you. I tried enabling only Chef with config.cache.auto_detect = false and config.cache.enable :chef but it seems like this doesn’t work for the omnibus installer, only for things like cookbooks that would be placed in ‘/var/chef/cache’ during a chef-client run.

Line 3 restricts package sharing to a specific machine. Although it’s tempting to use the :box setting and share across all machines using the same box, it’s dangerous (if you ever run more than one machine at once you may well get locking problems – the package managers will all treat the cache as if its a local filesystem). Further, the omnibus plugin appears to require a scope of machine before it will use the cache.

Line 4 is needed to tell omnibus that you really do want to use the cache.

Line 5 is optional and sets the specific Chef version you want to use.

With this configuration, the packages once downloaded are stored on the host machine in ‘.vagrant/machines/dbserver/cache’ (where ‘dbserver’ is the machine name in the Vagrantfile). You may need to go to this folder from time to time, to check on the cache size or to clear it out. They are shared with the guest in ‘/tmp/vagrant-cache’.

Run vbguest plugin only when needed

Plugins can make a difference to startup time. Case in point: early on I had some problems with Vagrant shared folders caused by different Guest Addition version on the guest versus Virtualbox, and so I started using the vagrant-vbguest plugin to resolve that problem. It’s great, but it takes a little while to apply the kernel updates when creating a new machine. If you are really shaving time off getting a new machine running, you may reconsider updating guest additions automatically.

Turn auto-update off for vbguest plugin

Much of the time, the mismatch between guest additions and Vagrant is not a showstopper, so you may want to start by just reporting the mismatch, i.e. turning auto_update off for the vbguest plugin (if you have it).

  config.vbguest.auto_update = false

If you decide to update guest additions, change the property in the Vagrantfile and vagrant up, or simply run:

  vagrant vbguest --do install

You are advised to reboot the virtual machine afterwards, e.g. using vagrant reload.

Remove vbguest plugin

Use vagrant plugin list to see which plugins you have, and sudo vagrant plugin remove vagrant-vbguest to remove the vbguest plugin if you have it.

Package a custom box

When I have a stable setup, I sometimes package my own box with the right version of Guest Additions and Chef. To do this, get a machine setup with the Chef and Virtual Additions that you want (plus anything else). Halt the machine, then package it as a box using something like:

vagrant package dbserver --output
vagrant add myprecise64 ./

Where ‘dbserver’ is the name of the machine in the Vagrantfile from which to create the box, and ‘’ is the filename to output the box to. Now replace “precise64” with “myprecise64” in your Vagrantfile, and your machines will (at least for now) have the right version of Chef and Guest Additions.

Confirm before navigating away from unsaved edits, using javascript and Backbone

Here’s an approach to putting up a confirmation dialog whenever the user navigates away from a view containing unsaved edits, so that the user doesn’t unintentionally lose those edits.

A working example is available on github.

Use Case

My use case is as follows. Our application is a single page application written in JavaScript that makes use of Backbone routing to navigate between major tasks. Each of those major tasks is implemented by one or more view managers (basically Backbone Views which implement a view container and controller for multiple views that work on a defined set of data). We allow the user to make a set of edits and explicitly decide when to save them to the server, rather than push every edit to the server as it happens. We don’t want the user to accidentally lose those edits before they are saved. Some of the ways that could occur are:

  • User selects an in-application navigation that triggers Backbone routing
  • User refreshes URL or types a new URL
  • User closes the browser window or tab


The basic approach I took was inspired by Kevin Yao’s post, Page Closing Confirmation with jQuery Custom Event. I liked his idea of listening for the window ‘beforeunload’ event and triggering a custom event that any interested party (in our case, the view managers) could respond to. However, I wanted to use Backbone for the events rather than plain JQuery, and I needed to catch in-application navigation as well.

Catching the ‘beforeunload’ event

The following code is called as part of our application overall setup.

$(window).bind('beforeunload', function(){
  var status = {
    unsavedEdits: false

  Backbone.trigger('myapp:navigate', status);
  if (status.unsavedEdits) {
    return status.msg || "There are unsaved edits. Continuing will discard these edits.";

Line 1 listens for the ‘beforeunload’ event which happens when the user closes the tab, reloads or enters a new URL.

Lines 2-5 set up a ‘status’ object in the event will be used by the view managers to communicate if they have unsaved edits, and to provide a message to be displayed.

Line 6 triggers a custom event ‘myapp:navigate’, using the Backbone object as a global event bus.

Lines 7-9 checks whether any of the ‘myapp:navigate’ handlers have set the status.unsavedEdits flag. If so, it returns a non-empty string, which the browser may or may not choose to display (Firefox does not – see WindowEventHandlers.onbeforeunload).

Communicating if there are unsaved edits

The following code is in the view managers:

this.listenTo(Backbone, "myapp:navigate", function(status){
  if (this.unsavedEdits()){
    status.unsavedEdits = true;
    status.msg = "There are unsaved edits to XXX. Continuing will discard these edits.";

Line 1 listens for the custom ‘myapp:navigate’ event on the global event bus. When it occurs, lines 2-5 check whether there are unsaved edits, and if so, set the status.unsavedEdits flag, and optionally a message to be displayed.

Catching in-application navigation

To catch in-application navigation, I added code to the router to check for unsaved edits before routing to a new URL. Arguably, it may be preferable to intercept any navigation events before they reach the router. However, I opted to handle the logic in a single place, rather than adding logic wherever there is navigation.

The following code is in a routine that gets triggered for any route change resulting in a change of view managers (i.e. where there is a potential for unsaved edits to be lost).

var status = { 
  unsavedEdits: false
Backbone.trigger('myapp:navigate', status);

if(status.unsavedEdits) {
  var confirmed = window.confirm(status.msg || "There are unsaved edits. Continuing will discard these edits.");
  if (!confirmed){
    var history = this.history;
    if (history.length > 1){
      this.navigate(history[history.length-2], {replace: true});							

Lines 1-6 are the same as before… set up a status object, trigger a custom ‘myapp:navigate” event on the global event bus, and check whether the unsavedEdits flag has been set.

If the flag is set, line 7 uses a windows confirmation dialog to warn the user and get their response.

If the user does not want to proceed, we now have the issue that the URL has been added to the history and needs to be removed (or else subsequent navigation to that URL may be disabled). Lines 9-12 achieve this by replacing the newly added URL with its predecessor, using navigate with the replace option.

Getting started with Chef report and exception handlers

This post is for people who want to use or write their first Chef report handler, but aren’t sure where to begin. I first attempted to write a handler just after I learned how to write basic Chef recipes. It was hard, because I only had a rudimentary understanding of Chef mechanics, and I was learning Ruby as I went. This article would have got me started faster. However, in the end you are going to be writing Ruby, and probably you’ll need to get deeper into Chef too.

The three types of handler

There are three types of handler:

  • Start handler – Runs at the beginning of a chef-client run
  • Report handler – Runs at the end of a chef-client run, after all of the recipes have successfully completed
  • Exception handler – Runs at the end of a chef-client run if it exits with an error code

I am going to gloss over start handlers: they are less common and somewhat more complex because you have to get the handler in place before the chef-client run happens (so you can’t distribute them in a recipe).

Report handlers are useful for gathering information about a chef-client run (e.g. what cookbook versions were run, what resources were updated). Exception handlers are useful to capture information about failed runs, or to perform cleanup on exception (e.g. cleaning up frozen filesystem resources).

Using the built-in handlers

The most basic place to start is being able to run one of the built-in handlers. At current time of writing, there are two handlers built in to the chef-client: the json_file handler, and the error_report handler. The benefit of starting here is you don’t have to worry about how to get the handler code onto the nodes that you want to run them on – they’re distributed with the chef-client.

Running the error_report handler

Let’s start with the error_report handler. To run this, all you need to do is use the ‘chef_handler’ cookbook and add the following into a recipe in your runlist:

include_recipe "chef_handler"

chef_handler "Chef::Handler::ErrorReport" do
  source "chef/handler/error_report"
  action :enable

After a chef-client run, this will create a file called ‘failed_run_data.json’ in the chef-client cache (typically ‘/var/chef/cache’) on the node it is running on.

Despite its name, this handler can be useful whether or not the run fails. Assuming your run succeeded, here’s what you’ll find in the ‘failed_run_data.json’ file.

"node": {
  "name": "m2",
  "chef_environment": "_default",
  "json_class": "Chef::Node",
  "automatic": {
  "kernel": {
  "name": "Linux",

The JSON data starts off with details about the node attributes, similar to what you get with ‘knife node show m2 -l -Fjson'.

  "success": true,
  "start_time": "2014-08-31 20:43:53 +0000",
  "end_time": "2014-08-31 20:44:28 +0000",
  "elapsed_time": 34.995100522,

It then lists some basic details about the chef-client run.

  "all_resources": [
      "json_class": "Chef::Resource::ChefHandler",
  "updated_resources": [
      "json_class": "Chef::Resource::ChefHandler",

The next JSON elements describe all of the resources that were part of the chef-client run, and which were updated.

  "exception": null,
  "backtrace": null,
  "run_id": "53e02623-1bc9-4b33-a08d-eeb89936feca"

And then finally there is information about the exception that occurred (which is null in this case, a successful run).

Running error_report handler when an exception occurs

Let’s create an exception by adding an invalid resource to the recipe:

include_recipe "chef_handler"

execute "brokenthing"

chef_handler "Chef::Handler::ErrorReport" do
  source "chef/handler/error_report"
  action :enable

The chef-client run will fail with an error:

Errno::ENOENT: No such file or directory - brokenthing

But if you copied my recipe, the error report won’t have been updated! Why?

The problem is that the failing ‘execute’ resource happened before the error report handler was enabled. If you move the ‘execute’ resource after the chef_handler resource, the error report will be created. So one takeaway is that it is good practice to define handlers in a recipe that you put at the start of the runlist. But that’s not all that you want to do.

If the handler is the first resource encountered in the run, then it will report any errors happening when subsequent resources are executed. But sometimes exceptions happen before this, in what’s called the ‘compile’ phase (see About the chef-client Run). This is the phase where the chef-client constructs a list of all of the resources and actions that it is meant to perform, which it then executes in the ‘converge’ phase. For a much deeper explanation of this, see The chef resource run queue.

What we want to do is to enable the error report handler as early as possible in the compile phase, so it can catch errors occurring during this phase too. Here’s how to do that:

include_recipe "chef_handler"

execute "brokenthing"

chef_handler "Chef::Handler::ErrorReport" do
  source "chef/handler/error_report"
  action :nothing

The end.run_action(:enable) tells Chef to do the “enable” action immediately on encountering the resource (i.e. during ‘compile’). The action :nothing tells Chef that it does not need to do anything during the ‘converge’ phase (as its already been enabled).

With this change, now you will find that the end of the error report has exception and backtrace information:

  "exception": "Errno::ENOENT: execute[brokenthing] (testapp::handlers line 11) had an error: Errno::ENOENT: No such file or directory - brokenthing",
  "backtrace": [
    "/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/mixlib-shellout-1.4.0/lib/mixlib/shellout/unix.rb:320:in `exec'",
    "/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/mixlib-shellout-1.4.0/lib/mixlib/shellout/unix.rb:320:in `block in fork_subprocess'",

Only running error_report on exception

Perhaps you do not want to run the error_report handler on every run, just when an exception occurs. To do this, we override the default supports attribute on the resource to specify it should be used with exceptions only:

chef_handler "Chef::Handler::ErrorReport" do
  source "chef/handler/error_report"
  action :nothing
  supports :exception=>true

Replace :exception with :report if you only want the handler to run when the run is successful.

Running the json_file handler

The json_file handler is like the error_report handler, but puts the results into a timestamped file. The following chef_handler resource will result in data being written to /var/chef/reports/chef-run-report-20140831204047.json, for a run that started at 20:40:47 on 31st August 2014.

chef_handler 'Chef::Handler::JsonFile' do
  source 'chef/handler/json_file';
  arguments :path => '/var/chef/reports'
  action :nothing

This example illustrates how to pass parameters into a handler.

You can also choose to use the ‘json_file’ recipe in the chef_handler cookbook to achieve the same result.

An alternative – using the client.rb file

An alternative (as described in the Chef docs) to enabling the handler using the chef_handler resource is to add the following to the client.rb file (or solo.rb file):

require 'chef/handler/json_file'
report_handlers << => "/var/chef/reports")
exception_handlers << => "/var/chef/reports")

Using a custom handler

The next step is to use a custom handler, i.e. one that is not built into the chef-client. The extra dimension here is that you need to get the handler code onto the node before it is run. You can do this by putting your handler source in the ‘files’ directory of your cookbook and using a cookbook_file resource to transfer it to the node, e.g.:

cookbook_file "#{node["chef_handler"]["handler_path"]}/your_handler.rb" do
  source "your_handler.rb"
  owner "root"
  group "root"
  mode 00755
  action :nothing

The expression #{node["chef_handler"]["handler_path"]} gives you the directory in which the chef_handler resource expects to find your handler. As previously, we run the cookbook_file resource immediately to ensure the handler file is created during the compile phase, before the handler itself is run. If the handler doesn’t run until the converge phase, you can replace the last two lines with:

  action :create

The chef_handler::default recipe can also be used to transfer handlers to the target node. You will need to make a copy of the chef_handlers cookbook and place your handlers in the ‘files/default/handlers’ directory of that cookbook (or copy the code from the default recipe into your handler cookbook).

To enable this handler, you would then define a chef_handler resource that refers to the transferred handler file:

chef_handler 'YourHandlerModule::YourHandler' do
  source "#{node["chef_handler"]["handler_path"]}/your_handler.rb";
  action :nothing

For a real example you can try using, see Julian Dunn’s cookbook version handler.

Writing your own handler

To write a handler, you need to create a Ruby class that inherits from Chef::Handler and has a report method:

module YourHandlerModule
  class YourHandler < Chef::Handler
    def report
       # Handler code goes here

Put this code in the ‘your_handler.rb’ file in the ‘files’ directory of your handler cookbook.

Within the handler you can write arbitrary Ruby code, and you can use ‘run_status’ information available from the chef-client run. ‘run_status’ is basically the same information that is output by the ‘error_report’ handler. The information can be accessed through the following methods in the Ruby code:

  • data – a hash containing the run status
  • start_time, end_time, elapsed_time – times for the chef-client run
  • updated_resources, all_resources – the resources in the chef-client run
  • success?, failed? – methods that indicate if the chef-client run succeeded or failed

In the handler, you can access these directly or via the run_status object, e.g. ‘run_status.success?’ is equivalent to ‘success?’.

So for example, I can write the following handler:

require 'chef/log'

module TestApp
  module Handlers
     class UpdatedResources < Chef::Handler
      def report
        if success?
'Running Updated Resources handler after successful chef-client run')
'Running Updated Resources handler after failed chef-client run')
        end'Updated resources are:')
        updated_resources.each do |resource|


In the above, I use the success? method to test whether the run succeeded or not, and the updated_resources to loop through each resource updated during the run, and print it out as a string.

If you run ‘chef-client -linfo’ on the target node, you will see output similar to:

[2014-09-01T14:48:02+00:00] INFO: Running Updated Resources handler after successful chef-client run
[2014-09-01T14:48:02+00:00] INFO: Updated resources are:
[2014-09-01T14:48:02+00:00] INFO: chef_handler[Chef::Handler::JsonFile]
[2014-09-01T14:48:02+00:00] INFO: chef_handler[Chef::Handler::ErrorReport]
[2014-09-01T14:48:02+00:00] INFO: chef_handler[TestApp::Handlers::UpdatedResources]
  - TestApp::Handlers::UpdatedResources
Running handlers complete

In this case, the only updated resources in the chef-client run are the three chef_handlers in my handlers recipe. ‘chef_handler[TestApp::Handlers::UpdatedResources]’ is how the UpdatedResources handler is represented as a string.

You can also use the ‘to_hash’ method in place to ‘to_s’ in the above, to see what information is available about the resource. If you did this, you would see something like the following:

2014-09-01T14:54:25+00:00] INFO: {:name=>"TestApp::Handlers::UpdatedResources", :noop=>nil, 
:before=>nil, :params=>{}, :provider=>nil, :allowed_actions=>[:nothing, :enable, :disable], 
:action=>[:nothing], :updated=>true, :updated_by_last_action=>false, 
:supports=>{:report=>true, :exception=>true}, :ignore_failure=>false, :retries=>0, :retry_delay=>2, 
:source_line=>"/var/chef/cache/cookbooks/testapp/recipes/handlers.rb:41:in `from_file'", 
:guard_interpreter=>:default, :elapsed_time=>0.000480376, :resource_name=>:chef_handler, 
:cookbook_name=>"testapp", :recipe_name=>"handlers", 
:source=>"/var/chef/handlers/testapp_handlers.rb", :arguments=>[], 

From this, you might decide just to print out the name of the resource, e.g.:

        updated_resources.each do |resource|
"Updated resource with name: " + resource.to_hash[:name])

Don’t expect ‘to_hash’ to work in all cases – it depends on whether it has been implemented in the relevant Ruby class. And be aware it may not be the best way to access the data. For example, the above can better be achieved using the name method on the resource, i.e. ‘’ in place of ‘resource.to_hash[:name]’. Over time, you’ll want to get comfortable with using debuggers to look at the data and reading the chef source code to understand how best to access it.

As well as ‘run_status’, you also have access to the run_context. I’ve included links to a couple of examples using the run context below.

Chef Handler with a parameter

If you want to pass a parameter into your handler, you need to add an initialize method to the handler, as below. This allows you to set the arguments attribute in chef_handler, which is passed to the initialize method as a parameter called ‘config’.

    class UpdatedResourcesToFile < Chef::Handler

      def initialize(config={})
        @config = config
        @config[:path] ||= "/var/chef/reports"
      def report[:path], "lastrun-updated-resources.json"), "w") do |file|
          updated_resources.each do |resource|

What the above initialize method does is store the parameters in an attribute of the handler class called ‘@config’, and also sets a default value for the ‘path’ parameter. The ‘@’ indicates an attribute and means that the value is available to other methods in the class. The ‘report’ method can then access the path parameter using ‘@config[:path]’.

To enable the above handler, put something like the following in your handler recipe. You will also need to add a resource to create the path, if it doesn’t already exist.

chef_handler "TestApp::Handlers::UpdatedResourcesToFile" do
  source "#{node["chef_handler"]["handler_path"]}/testapp_handlers.rb"
  arguments :path => '/tmp'
  action :nothing
  supports :report=>true, :exception=>true

In the above, I am overriding the default path value so that the file is written out to ‘/tmp’ instead.

Examples of handlers

Here are some examples of handlers that you may find useful:

Managing software versions in a multi-node topology with knife-topo

Here’s a simple example of using knife-topo with Chef to manage the versions of software deployed in a multi-node topology. This post assumes you have Vagrant and ChefDK installed.

The example topology that we’ll work with consists of three nodes as shown below:


However, in this post, we’ll only define and deploy two of the nodes using knife-topo, Chef and Vagrant. I’ll introduce the third node in a later blog to illustrate some more complex uses of the topology JSON (such as conditional attributes). You can also look at the full example in the test-repo in knife-topo’s github repository.

Describing the topology

Define the nodes

The first step is to describe the topology that we want. Below is a minimal topology JSON file describing the two nodes in the topology (‘test1’). The ‘name’ property for the nodes defines the node name that will be used in Chef. The ‘ssh_host’ property specifies the address to use in bootstrapping the nodes.

  "name": "test1",
  "nodes": [
      "name": "appserver01",
      "ssh_host": ""
      "name": "dbserver01",
      "ssh_host": ""

This is sufficient to allow knife-topo to bootstrap the appserver and dbserver nodes, so that they will be managed by Chef. You can try this out by following the instructions later to download a test repo, run Vagrant and chef-zero, however, the results may be rather underwhelming. A data bag describing the topology will be created, Chef will be bootstrapped onto the two nodes, and they will register themselves with the server, but that’s all. You will also get warnings during bootstrap that the nodes have no runlists.

Define the runlists

Let’s fix the missing runlists now:

  "name": "test1",
  "nodes": [
      "name": "appserver01",
      "ssh_host": "",
      "run_list": [
      "name": "dbserver01",
      "ssh_host": "",
      "run_list": [

The recipes in these runlistsinstall the software (mongodb, nodejs, and a test application) on the nodes, and are provided in the test repo, along with a Berksfile to manage the dependencies. However, they do not specify what versions of the software should be installed. If you try running the example now, the latest versions will be chosen. But we want to get control of what’s on our topology. We can do this by defining the software versions as attributes in an environment cookbook specific to our topology.

Defining specific software versions as attributes in an environment cookbook

Here’s the first addition that’s needed to deploy specific software versions. We add a “cookbook_attributes” section, specify the name of the environment cookbook (‘testsys_test1’) and the attribute file we want (‘softwareversion’), and the necessary version attributes to install NodeJS version 0.10.28, and MongoDB version 2.6.1.

  "name": "test1",
  "nodes": [
  "cookbook_attributes": [
      "cookbook": "testsys_test1",
      "filename": "softwareversion",
      "normal": {
        "nodejs": {
          "version": "0.10.28",
          "checksum_linux_x64": "5f41f4a90861bddaea92addc5dfba5357de40962031c2281b1683277a0f75932"
        "mongodb": {
          "package_version": "2.6.1"

The second change is to add the environment cookbook to the runlists for the two nodes:

  "name": "test1",
  "nodes": [
      "name": "appserver01",
      "ssh_host": "",
      "run_list": [
      "name": "dbserver01",
      "ssh_host": "",
      "run_list": [
  "cookbook_attributes": [

The topology JSON is now ready to use – go ahead and try it.

A word of warning if you ran knife-topo with runlists but no specific software versions – there’s currently a bug in the mongodb cookbook that means it can’t handle downgrades. So you may want to destroy the virtual machines (‘vagrant destroy’) and start again (or see troubleshooting for an alternative).

Running the example

These instructions may be enough to get you started. If you want more details or encounter problems, see these instructions and the knife-topo readme.

Setting up the environment

Install Vagrant and ChefDK if you do not already have them. Neither are required for knife-topo, but this post uses them.

Install knife-topo using gem. You may need to use ‘sudo’.

gem install knife-topo

To get the test repo, it’s easiest to download and unzip the latest knife-topo release from github (replace ‘0.0.7’ with the latest release number):

unzip -d

The test-repo gives you a multi-node Vagrantfile similar to what that I described in my previous post. Use that to create the virtual machines:

cd ~/knife-topo-0.0.7/test-repo
vagrant up

When the virtual machines have started (this can take a while the first time), you can start chef-zero. If you have chefDK, you can use the embedded chef-zero, or you can install chef-zero as a gem. Here’s how to start the embedded chef-zero on Ubuntu:

/opt/chefdk/embedded/bin/chef-zero -H

Importing the pre-requisites

Leave chef-zero running, open a new terminal and go to the test-repo, then upload the required cookbooks using berkshelf:

cd ~/knife-topo-0.0.7/test-repo
berks install
berks upload

Save your topology file as “mytopo.json” in the test-repo, then import your topology into your Chef workspace:

knife topo import mytopo.json

Do not name the file “test1.json” or it will cause an error later on when we load the test1 topology from the data bag file (knife looks for data bag files in the current directory BEFORE it looks in the data bag directory).

If you are using the final topology JSON, ‘knife-topo import’ will have generated some useful artifacts for you: in particular, an attribute file in the topology cookbook containing the specified software versions. This file is in “knife-topo-0.0.7/test-repo/cookbooks/testsys_test1/attributes/softwareversion.rb” and should look like:

# Cookbook Name:: testsys_test1
# Attribute File:: softwareversion.rb
# Copyright 2014, YOUR_COMPANY_NAME
normal['nodejs']['version'] = "0.10.28"
normal['nodejs']['checksum_linux_x64'] = "5f41f4a90861bddaea92addc5dfba5357de40962031c2281b1683277a0f75932"
normal['mongodb']['package_version'] = "2.6.1"

Bootstrapping the topology

knife topo create test1 --bootstrap -xvagrant -Pvagrant --sudo

This command uploads the topology cookbook, creates the test topology in the Chef server and bootstraps all nodes that provide an ‘ssh_host’. After it finishes, you should have a working two node topology with the specified software versions. The test application welcome screen is at: http://localhost:3031

Multi-node topologies using Vagrant and chef-zero

I’ve been bootstrapping a lot of multi-node topologies recently. Here’s some things I learned about making it easier with Vagrant and chef-zero. I also wrote a Chef knife plugin (knife-topo) – but more of that in a later post.

Multiple VM Vagrant files

The basics of a multi-VM vagrant file as described in the Vagrant documentation is that you have a “config.vm.define” statement for each machine, and setup the machine-specific configuration within its block. This looks something like:

Vagrant.configure("2") do |config| = "ubuntu64"
  config.vm.box_url = ""
  config.vm.synced_folder "/ypo", "/ypo"
  config.vm.synced_folder ".", "/vagrant", disabled: true

  # setup appserver
  config.vm.define "appserver" do |appserver_config|
    appserver_config.vm.hostname = "appserver"
    # other VM configuration for appserver goes here

  # setup dbserver
  config.vm.define "dbserver" do |dbserver_config|
    dbserver_config.vm.hostname = "dbserver"
    # other VM configuration for dbserver goes here

Lines 2-3 tell Vagrant what to put on the machines, and where to get it. Lines 4-5 are optional: line 4 sets up a directory on the host machine which will be accessible to each guest virtual machine; line 5 disables the default share.

Lines 7-10 and 13-15 each define a virtual machine (appserver and dbserver).  The two config variables (appserver_config and dbserver_config) are like the overall ‘config’ variable, but scoped to a specific machine.

This is fine when you don’t have much configuration information for each node, but it gets hard to read, laborious and error-prone to maintain when you start having lots of configuration statements or lots of nodes.

Separating out the configuration of the virtual machines

I got this approach from a post by Joshua Timberman in the days before I learned Ruby, and it was a great help in cleaning up my Vagrantfile.  Separating out the machine (node) configuration makes it so much easier to understand and update the Vagrant file.

First, we define a structure (Ruby hash) that describes how we want the nodes to be configured:

nodes = {
  :dbserver => {
    :hostname => "dbserver",
    :ipaddress => "",
    :run_list => [ "role[base-ubuntu]", "recipe[ypo::db]" ]
  :appserver => {
    :hostname => "appserver",
    :ipaddress => "",
    :run_list => [ "role[base-ubuntu]", "recipe[ypo::appserver]"],
      :forwardport => {
        :guest => 3001,
        :host => 3031

The above defines two nodes, dbserver and appserver.  Lines 3-5 and 8-10 set up their hostnames, IP addresses and the run lists to use with Chef provisioning. Lines 11-13 set up port forwarding, so that what the application server listens for on port 3001 will be accessible through port 3031 on the host machine.

Second, we use the nodes hash to configure the VMs, with code something like this:

Vagrant.configure("2") do |config|
  nodes.each do |node, options|
    config.vm.define node do |node_config| :private_network, ip: options[:ipaddress]
      if options.has_key?(:forwardport) :forwarded_port, guest: options[:forwardport][:guest], host: options[:forwardport][:host]
    node_config.vm.hostname = options[:hostname]
    # Chef provisioning options go here (see below)

Line 2 loops through the nodes hash, defining the configuration of each machine. Line 4 sets up the machine on a private network with a fixed IP address as specified in the nodes hash. Lines 5-7 setup port forwarding, if configured in the nodes hash. Line 9 sets up the hostname from the nodes hash.

Multiple nodes with a single chef-zero

chef-zero is an in-memory Chef server – a replacement for chef-solo. It’s a great way to test recipes, because it is just like a ‘real’ chef server. By default, it listens only locally, on However, you can run it so that other nodes can access it – for example, all of the VMs in the Vagrantfile.  Doing this lets you use recipes requiring search of other nodes’ attributes, e.g. for dynamic configuration of connections between nodes. When you’re done testing, you just stop chef-zero – no cleanup of the server required.

You can install chef-zero as a gem:

gem install chef-zero

The machines defined above are all on the 10.0.1.x private network. So all we need to do is start chef-zero on that network, using:

chef-zero -H

and configure the VMs with a chef server URL of “”. Because the Vagrantfile is reading configuration from the knife.rb file, all we need to do is set variables in knife.rb – something like:

node_name                "workstation"
client_key               "#{current_dir}/dummy.pem"
validation_client_name   "validator"
validation_key           "#{current_dir}/dummy.pem"
chef_server_url          ""

The dummy.pem can be any validly formatted .pem file.

Using Chef provisioning with Vagrant

Running chef-client using Vagrant provisioning

Sometimes it is very useful to have Vagrant provision your machines using chef-client. The Vagrant documentation provides you with the basics for doing this, but not a lot else. The first useful tip (again from Joshua Timberman) is to use your knife.rb configuration file rather than hard-coding the server URL and authentication information. To do this, add the following at the top of the Vagrantfile:

require 'chef'
require 'chef/config'
require 'chef/knife'
current_dir = File.dirname(__FILE__)

These five lines read in the configuration from your knife.rb file (change ‘../chef/knife.rb” to be the path to your knife.rb) and make it available to be used as below, which shows code to that should be inside the “nodes.each” block described previously:

node_config.vm.provision :chef_client do |chef|
  chef.chef_server_url = Chef::Config[:chef_server_url]
  chef.validation_key_path = Chef::Config[:validation_key]
  chef.validation_client_name = Chef::Config[:validation_client_name] 
  chef.node_name = options[:hostname]
  chef.run_list = options[:run_list]
  chef.provisioning_path = "/etc/chef"
  chef.log_level = :info

Lines 2-4 setup the connection to the Chef server. Lines 5-6 setup the node name and run list from the options hash we defined earlier. Setting the provisioning_path to “/etc/chef” puts the client.rb in the place that chef-client expects it, and setting log_level to “:info” provides a decent level of output (change to “:debug” if you have problems with the provisioning).

Cleaning up the Chef Server on vagrant destroy

When you destroy a Vagrant VM that you have provisioned using a Chef Server, you need to delete the node and client on the Chef server before you reprovision it in Vagrant. The following lines added into the node_config.vm.provision block above should do this for you:

chef.delete_node = true
chef.delete_client = true

However, there is an issue with this – it does not work for me, so I always have to use knife to manually clean up:

knife node delete appserver
knife client delete appserver

Controlling the version of chef-client

Initially I was running into issues because the chef client used by the Vagrant chef provisioner was back-level. This post on StackOverflow solved the problem for me. Install the chef-omnibus Vagrant plugin:

vagrant plugin install vagrant-omnibus

and specify the version of chef-client you want to use in the node configuration part of the Vagrant file:

node_config.omnibus.chef_version = "11.12.8"


node_config.omnibus.chef_version = :latest

to install the latest version.