Your not alone

Want to know the funny part, I have always felt like we where the only company with such a ugly Chef repo, and I was wrong! Even better it does not matter if your using Chef, Puppet, Salt, or some home grown system there is someone who is in the same boat as you.

One thing that I find often happens in at tech conferences is most of us walk away excited about the potential but totally feel defeated by seeing how well other are doing, but here is a little secret if you pull those presenters aside you will find a stack of things that they also wish where better.
I am constantly ashamed when I talk to others about the state of our chef repo, and since I started using Chef over four years ago there is a lot of history in there. For the first two years I was the only guy managing any of our Chef, along with begin the better looking half of a two man team managing our datacenter, servers, network and internal IT. The second guy on the team hated Chef and would alway push things off to me, and to be fair I did the same to him with the network. The down side of this is four and a half years later we are just now starting to get more eyes in the chef repo and over all workings of Chef.
Personally I love it when someone finds a better way to do something and when they start cussing at me about how much they hate (the company formerly know as Opscode) Chef. I have been pushing for other to be more involved, and I don’t fear the skeletons in my Chef closet, but rather ask for help in cleaning them out. I love it when the engineering team comes to me and says hey I wrote this cookbook can we talk about it, or when they say I see you did x in cookbook y why didn’t you do z! This discussion has helped move our repo forward and the pace of change is only getting faster, and in the process we are working out the bugs that a one man team does not even think about.
How did this change happen? We honestly it was a pain, not only because people did not want to learn but Chef is easy to start with but joining a existing system is a little harder. I have a personal chef account I play with all the time as I love doing this stuff but developers just want to get their code out and not worry about the infrastructure to manage it. Like everyone today a good engineers time is over committed and learning a new skill that is not directly related to the problem at hand is hard to find time for, also those Ops guys can do it for them! We finally hit a point where I could not keep up and we had a few engineers who had the desire, and while the other ops team members where doing minor changes they where still very limited in their skills. To help I wrote pages of documentation but no one would read it, they just wanted to complain about what they didn’t know! Finally I went to the VP of Eng and said lets bring in a trainer and I need your guys for three days, luckily he saw the advantage of being able to move faster and having developers help manage the infrastructure and agreed. Three days of training later we had a ton of questions and idea but we are at least on the same page now. I am not saying everything is perfect and I still get a lot of why did you do that, but we have a team working together to improve and that is what Chef is about to me.
Want to know the funny part, I have always felt like we where the only company with such a ugly Chef repo, and I was wrong! Even better it does not matter if your using Chef, Puppet, Salt, or some home grown system there is someone who is in the same boat as you. They may not talk about it on stage but pull those guys aside and while they have solved many of the problems you have, they often have problems you have already solved. The tech ops field is a focus on problem not successes, I mean how often do you hear about them unless the site goes down. The part I am trying to point out is yes, there is much focus on your failures but your not alone and maybe you should step back a second and think about your successes.

Your doing it all wrong AGAIN

Just a few mumblings as I explore the move away from roles, which I agree makes sense but I have 4 years of using roles history I have to refactor.

The joy of rapidly changing software is that if your doing it right today there is a high likely hood your doing it wrong tomorrow. Well that is one way to look at it, or you could just say “I am doing what works for me”. When I started using Opscode chef one of the things that really stood out to me was a stage conversation between Luke and Adam. Summary of that conversation as Luke stated if you don’t do it the defined way your doing it wrong, to which Adam replied unless you need to do it another way. That has pretty much summed up my whole approach and might have some part of why I enjoy chef so much and why Adam is a great front man. Nothing technical here but that is one thing to keep in mind when your working on a new chef deploy, there are best practices but they are not always right for your instance. Just a few mumblings as I explore the move away from roles, which I agree makes sense but I have 4 years of using roles history I have to refactor.

Sometimes automation is a frigging pain

I have been killing myself with Postgresql 9.1, pgpool, and heartbeat Chef configuration management and have pretty much hit a wall.

I have been killing myself with Postgresql 9.1, pgpool, and heartbeat Chef configuration management and have pretty much hit a wall. The current situation has to do with idempotent and when I manually remove the postgresql-9.1 package with a apt-get purge postgresql-9.1 is happily does its job. The problem is when I run chef again I expect it to install the package again but it does not. The more Chef you use the more you will learn to love and hate it. FYI the Coroutine team has done some of the work for you on postgresql replication in chef if you need that kind of stuff.

https://github.com/coroutine/chef-postgresql

Not very useful stuff today, but I have been starting to look at private chef and might have some post coming about that soon.

Good luck out there, and if you looking at this you might also be interested in working at Rdio Senior-systems/Operations-engineer

Using ohio dmi decode variables for conditional runs

Problem: We have a type of dell server “R710” that we want to run a certain cron on.

Simple conditional that some people may not be aware exist.

Problem: We have a type of dell server “PowerEdge R710” that we want to run a certain cron on.

Solution: We use node[:dmi][:system][:product_name] as a conditional for our cron job

Example: mycookbook/recipes/crons.rb

cron “check_it” do
user “root”
minute “0”
hour “*/4”
day “*”
month “*”
weekday “*”
command “/path/check.sh >> /dev/null 2>&1”
only_if do node[:dmi][:system][:product_name] == “PowerEdge R710” end
end

Chef restore from backup

So I was testing my restore from backup for chef and ran into a few problems. The first problem I encountered was that my nginx load balancers config files are dynamically created based role assigned to boxes. After my restore one of the first boxes I tested was one of the LB boxes and to my horror even thought the systems where listed when I did a chef node list it seems that until they have check into the restored chef server they are not counted. This means that my nginx config server pools where empty … bummer. The easy fix here was to have my servers move over to the restored chef server instance from the bottom up … i.e. sql boxes, web boxes, then edge lb stuff. Not a huge problem but it does mean if you ever have to retore a chef box, stop all client before you bring it up.

The other odd problem I had was one node that had a local variable assigned to it did not pull the var over. Now the variable in question had not changed in months and my daily backups should have contained this info. I got lucky that even though it was a db password access for the system, I had removed the notify restart of a lot of services before the restore to minimize impact of changes but over it went pretty well.

My backups … tar zcvf `date +%Y%m%d`.`hostname`.chef.tar.gz /var/lib/couchdb/ /etc/chef

Restore, build server, install chef-server, stop chef-server, drop tar into place and start chef-server.

One final thought, I had to restore a 0.8.16 system after 0.9.8 was out which turns out to be a problem as the bootstrap latest files do not work with 0.8.16. Luckily I had a local copy of the boot strap that I used for 0.8.x installs and was able to run from there. I suggest you backup any files you use for installs locally just incase.

Chef error: marshal data too short

WARN: HTTP Request Returned 500 Internal Server Error: marshal data too short … what to do?

jmiller@srv-101-29:~$ sudo chef-client
[Tue, 10 Aug 2010 12:36:13 -0700] INFO: Starting Chef Run
[Tue, 10 Aug 2010 12:36:28 -0700] WARN: HTTP Request Returned 500 Internal Server Error: marshal data too short
/usr/lib/ruby/1.8/net/http.rb:2097:in `error!’: 500 “Internal Server Error” (Net::HTTPFatalError)
from /usr/lib/ruby/1.8/chef/rest.rb:216:in `api_request’
from /usr/lib/ruby/1.8/chef/rest.rb:267:in `retriable_rest_request’
from /usr/lib/ruby/1.8/chef/rest.rb:197:in `api_request’
from /usr/lib/ruby/1.8/chef/rest.rb:100:in `get_rest’
from /usr/lib/ruby/1.8/chef/client.rb:270:in `sync_cookbooks’
from /usr/lib/ruby/1.8/chef/client.rb:86:in `run’
from /usr/lib/ruby/1.8/chef/application/client.rb:215:in `run_application’
from /usr/lib/ruby/1.8/chef/application/client.rb:207:in `loop’
from /usr/lib/ruby/1.8/chef/application/client.rb:207:in `run_application’
from /usr/lib/ruby/1.8/chef/application.rb:62:in `run’
from /usr/bin/chef-client:25
jmiller@srv-101-29:~$

So looking at this I thought it was a checksum error on the client and deleted the /var/chef/cache directory without luck. After digging around I found that stopping the chef server and deleting /var/chef/cache/checksums, then restarting chef server fixed the problem. Easy fix but odd problem. Chef 0.8.16

MegaCLI Raid6 Array creation

I am using Ubuntu Karmic on Dell R610 to access MD1200 storage devices and since (until recently) Openmanage was not a option for the H800 SAS Raid adaptors so I had to explore the wonderful megacli utility!

I am using Ubuntu Karmic on Dell R610 to access MD1200 storage devices and since (until recently) Openmanage was not a option for the H800 SAS Raid adaptors so I had to explore the wonderful megacli utility!

# Find unused disks

root@srv-103-27:/opt/MegaRAID/MegaCli# ./MegaCli64 -PDList -a0 | grep -B14 Unconfigured | grep -e ‘^Enclosure Device ID:’ -e ‘^Slot Number:’
Enclosure Device ID: 41
Slot Number: 11
Enclosure Device ID: 80
Slot Number: 0
Enclosure Device ID: 80
Slot Number: 1
Enclosure Device ID: 80
Slot Number: 2ID: 80
Enclosure Device ID: 80
Slot Number: 3
Enclosure Device ID: 80
Slot Number: 4
Enclosure Device ID: 80
Slot Number: 5
Enclosure Device ID: 80
Slot Number: 6
Enclosure Device ID: 80
Slot Number: 7
Enclosure Device ID: 80
Slot Number: 8
Enclosure Device ID: 80
Slot Number: 9
Enclosure Device ID: 80
Slot Number: 10
Enclosure Device ID: 80
Slot Number: 11
Enclosure Device ID: 106
Slot Number: 0
Enclosure Device ID: 106
Slot Number: 1
Enclosure Device ID: 106
Slot Number: 2
Enclosure Device ID: 106
Slot Number: 3
Enclosure Device ID: 106
Slot Number: 4
Enclosure Device ID: 106
Slot Number: 5
Enclosure Device ID: 106
Slot Number: 6
Enclosure Device ID: 106
Slot Number: 7
Enclosure Device ID: 106
Slot Number: 8
Enclosure Device ID: 106
Slot Number: 9
Enclosure Device ID: 106
Slot Number: 10
Enclosure Device ID: 106
Slot Number: 11
root@srv-103-27:/opt/MegaRAID/MegaCli#

# Create Raid 6 Volume

root@srv-103-27:/opt/MegaRAID/MegaCli# ./MegaCli64 -CfgLdAdd -r6 [80:0,80:1,80:2,80:3,80:4,80:5,80:6,80:7,80:8,80:9,80:10] -a0

Adapter 0: Created VD 5

Adapter 0: Configured the Adapter!!

Exit Code: 0x00
root@srv-103-27:/opt/MegaRAID/MegaCli#

# add dedicated hot spares, we use dedicated as they stay with the array/shelf
root@srv-103-27:/opt/MegaRAID/MegaCli# ./MegaCli64 -PDHSP -Set -Dedicated -Array5 -PhysDrv [80:11] -a0

Adapter: 0: Set Physical Drive at EnclId-80 SlotId-11 as Hot Spare Success.

Exit Code: 0x00

Chef Performance Tuning — Part 1

It turns out chef-server is a cpu hog, not sure why all its really doing is attribute storage and file pushing. I started noticing that my 66 node chef server farm was seeing longer and longer chef-client runs. At first I looked at disk as I did not think chef sever would have problems with this small of a farm. After much consideration that did not seem to be the problem, then I noticed while watching top that chef-server was using 99% of one core 85% of the time. While I do not claim to be a experts here is the solution that worked for me.

I am reading more of the chef-server code and thinking I overcomplicated this a bunch but am checking with others to confirm … this works but may not be the best solution.

It turns out chef-server is a cpu hog, not sure why all its really doing is attribute storage and file pushing. I started noticing that my 66 node chef server farm was seeing longer and longer chef-client runs. At first I looked at disk as I did not think chef sever would have problems with this small of a farm. After much consideration that did not seem to be the problem, then I noticed while watching top that chef-server was using 99% of one core 85% of the time. While I do not claim to be a experts here is the solution that worked for me.

One work around is to create additional merb threads, to do this on a gems install edit: /etc/service/chef-server/run

I have added the -c 8: -c, –cluster-nodes NUM_MERBS Number of merb daemons to run.

jmiller@srv-101-03:~$ cat /etc/service/chef-server/run
#!/bin/sh
PATH=/usr/local/bin:/usr/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/lib/ruby/gems/1.8/bin
exec 2>&1
exec /usr/bin/env chef-server -c 8 -N -p 4000 -e production -P /var/run/chef/server.%s.pid
jmiller@srv-101-03:~

Then restart the chef server:

sudo /etc/init.d/chef-server restart

This will spawn 8 worker threads starting at 4000 (port 4040 is the chef-server-webui)

jmiller@srv-101-03:~$ ps -eaf |grep merb |grep -v grep
root 3559 12380 0 Jun14 ? 00:00:37 merb : worker (port 4040)
root 3623 12342 0 14:08 ? 00:00:07 merb : spawner (ports 4000)
root 3638 3623 12 14:08 ? 00:18:40 merb : worker (port 4004)
root 3639 3623 12 14:08 ? 00:18:47 merb : worker (port 4005)
root 3640 3623 11 14:08 ? 00:17:17 merb : worker (port 4006)
root 3641 3623 12 14:08 ? 00:18:30 merb : worker (port 4007)
root 10890 1 64 Jun14 ? 11:18:07 merb : worker (port 4000)
root 10891 1 5 Jun14 ? 00:54:46 merb : worker (port 4001)
root 10892 1 4 Jun14 ? 00:51:57 merb : worker (port 4002)
root 10893 1 4 Jun14 ? 00:51:03 merb : worker (port 4003)
jmiller@srv-101-03:~$

Apply the correct recipes to the chef server

“recipe[apache2]”,
“recipe[apache2::mod_status]”,
“recipe[apache2::mod_proxy]”,
“recipe[apache2::mod_proxy_http]”,
“recipe[apache2::mod_proxy_balancer]”,
“recipe[apache2::mod_rewrite]”,
“recipe[apache2::mod_headers]”,

Now that we have the threads we need to lb request to them, opscode provides examples for apache lb so that is what I chose to use the port 4080 was a random choice that works in my env:

jmiller@srv-101-03:~$ cat /etc/apache2/sites-available/chef.example.com
Listen 4080

ServerName chef.example.com
DocumentRoot /usr/share/chef-server/public


BalancerMember http://127.0.0.1:4000
BalancerMember http://127.0.0.1:4001
BalancerMember http://127.0.0.1:4002
BalancerMember http://127.0.0.1:4003
BalancerMember http://127.0.0.1:4004
BalancerMember http://127.0.0.1:4005
BalancerMember http://127.0.0.1:4006
BalancerMember http://127.0.0.1:4007
Order deny,allow
Allow from all

LogLevel info
ErrorLog /var/log/apache2/chef_server-error.log
CustomLog /var/log/apache2/chef_server-access.log combined

RewriteEngine On
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} !-f
RewriteRule ^/(.*)$ balancer://chef_server%{REQUEST_URI} [P,QSA,L]

Enable the new vsite:

jmiller@srv-101-03:~$ sudo a2ensite chef.example.com
Site chef.example.com already enabled
jmiller@srv-101-03:~$

Restart apache:

jmiller@srv-101-03:~$ sudo /etc/init.d/apache2 reload
* Reloading web server config apache2
Warning: DocumentRoot [/usr/share/chef-server/public] does not exist
[Tue Jun 15 16:48:21 2010] [warn] NameVirtualHost *:443 has no VirtualHosts
…done.
jmiller@srv-101-03:~$

Make sure it works, first update your knife chef_server_url port

jmiller@srv-101-03:~$ cat .chef/knife.rb
log_level :warn
#log_location “/home/jmiller/.chef/knife.log”
node_name ‘jmiller’
client_key ‘/home/jmiller/.chef/jmiller.pem’
validation_client_name ‘chef-validator’
validation_key ‘/home/jmiller/.chef/chef-validator.pem’
chef_server_url ‘http://srv-101-03.example.com:4080’
cache_type ‘BasicFile’
cache_options( :path => ‘/home/jmiller/.chef/checksums’ )
cookbook_path [ ‘/home/jmiller/site-cookbooks’ ]

jmiller@srv-101-03:~$

Before the worker threads this was taking 2 – 9 seconds this is the highest I have seen since the change 🙂

jmiller@srv-101-03:~$ time knife role list
[
“APACHE_ROLE”,

“WEBSERVER_ROLE”
]

real 0m0.673s
user 0m0.310s
sys 0m0.100s
jmiller@srv-101-03:~$

Now you will need to update all the /etc/chef/client.rb files on the systems and restart chef-client daemon, I suggest you use chef to do it.

thank you to Josh Timberman, Adam Jacob, and holoway for many pointers

Automated role updates with knife

In this example we want to update a role, this is the basics you will need to automate the actually edit of the json file in whatever language you like

In this example we want to update a role, this is the basics you will need to automate the actually edit of the json file in whatever language you like

List the roles, no sample role

joshua-millers-macbook-pro:chef jmiller$ knife role list
[
“APACHE_ROLE”,
“APPBASE_ROLE”,
“APTREPO_ROLE”,
“WEBSERVER_ROLE”
]
joshua-millers-macbook-pro:chef jmiller$

Dump the BASE_ROLE so we can use it to create a new role

joshua-millers-macbook-pro:chef jmiller$ knife role show BASE_ROLE > SAMPLE_ROLE.json
joshua-millers-macbook-pro:chef jmiller$

Edit the role; going to do it manually here but could be done with perl …

joshua-millers-macbook-pro:chef jmiller$ cat SAMPLE_ROLE.json
{
“name”: “SAMPLE_ROLE”,
“default_attributes”: {
},
“json_class”: “Chef::Role”,
“run_list”: [
],
“description”: “All nodes wiil get this base”,
“chef_type”: “role”,
“override_attributes”: {
“authorization”: {
“sudo”: {
“groups”: [
“dev”
],
“users”: [

]
}
},
“chef”: {
“client_splay”: “20”,
“client_interval”: “900”,
“server_fqdn”: “chef.example.com”
},
“postfix”: {
“myorigin”: “mail.example.com”,
“relayhost”: “mailrelay.example.com”,
“mydomain”: “example.com”
},
“ntp”: {
“is_server”: false,
“service”: “ntpd”,
“servers”: [
“time01.example.com”,
“time02.example.com”
]
}
}
}
joshua-millers-macbook-pro:chef jmiller$

I am creating the role so it going to generate a “Not Found” error

joshua-millers-macbook-pro:chef jmiller$ knife role from file SAMPLE_ROLE.json
WARN: HTTP Request Returned 404 Not Found: Cannot load role SAMPLE_ROLE
WARN: Updated Role SAMPLE_ROLE!
joshua-millers-macbook-pro:chef jmiller$

Sample role created:

joshua-millers-macbook-pro:chef jmiller$ knife role list | grep SAMPLE
“SAMPLE_ROLE”,
joshua-millers-macbook-pro:chef jmiller$

Here is what we have:

joshua-millers-macbook-pro:chef jmiller$ knife role show SAMPLE_ROLE
{
“name”: “SAMPLE_ROLE”,
“default_attributes”: {
},
“json_class”: “Chef::Role”,
“run_list”: [

],
“description”: “All nodes wiil get this base”,
“chef_type”: “role”,
“override_attributes”: {
“authorization”: {
“sudo”: {
“groups”: [
“dev”
],
“users”: [

]
}
},
“chef”: {
“client_splay”: “20”,
“client_interval”: “900”,
“server_fqdn”: “chef.example.com”
},
“postfix”: {
“myorigin”: “mail.example.com”,
“relayhost”: “mailrelay.example.com”,
“mydomain”: “example.com”
},
“ntp”: {
“is_server”: false,
“service”: “ntpd”,
“servers”: [
“time01.example.com”,
“time02.example.com”
]
}
}
}
joshua-millers-macbook-pro:chef jmiller$

I update the role ( could be automated with a script ) and update chef

joshua-millers-macbook-pro:chef jmiller$ vi SAMPLE_ROLE.json

joshua-millers-macbook-pro:chef jmiller$ cat SAMPLE_ROLE.json
{
“name”: “SAMPLE_ROLE”,
“default_attributes”: {
},
“json_class”: “Chef::Role”,
“run_list”: [
],
“description”: “All nodes wiil get this base”,
“chef_type”: “role”,
“override_attributes”: {
“ntp”: {
“is_server”: false,
“service”: “ntpd”,
“servers”: [
“time01.example.com”,
“time02.example.com”
]
}
}
}
joshua-millers-macbook-pro:chef jmiller$ knife role from file SAMPLE_ROLE.json
WARN: Updated Role SAMPLE_ROLE!
joshua-millers-macbook-pro:chef jmiller$ knife role show SAMPLE_ROLE
{
“name”: “SAMPLE_ROLE”,
“default_attributes”: {
},
“json_class”: “Chef::Role”,
“run_list”: [

],
“description”: “All nodes wiil get this base”,
“chef_type”: “role”,
“override_attributes”: {
“ntp”: {
“is_server”: false,
“service”: “ntpd”,
“servers”: [
“time01.example.com”,
“time02.example.com”
]
}
}
}
joshua-millers-macbook-pro:chef jmiller$

It looks like we should be able to use the following to do the role edit on the chef server … or create another client pem for just this task …

root@chef:~# knife role show SAMPLE_ROLE -s http://chef.example.com:4000 -u chef-webui -k /etc/chef/webui.pem
{
“name”: “SAMPLE_ROLE”,
“default_attributes”: {

},
“json_class”: “Chef::Role”,
“run_list”: [

],
“description”: “All nodes wiil get this base”,
“chef_type”: “role”,
“override_attributes”: {
“ntp”: {
“is_server”: false,
“service”: “ntpd”,
“servers”: [
“time01.example.com”,
“time02.example.com”
]
}
}
}
root@chef:~# knife role show SAMPLE_ROLE -s http://chef.example.com:4000 -u chef-webui -k /etc/chef/webui.pem > SAMPLE_ROLE.json
root@chef:~# vi SAMPLE_ROLE.json
root@chef:~# knife role from file SAMPLE_ROLE.json -s http://chef.example.com:4000 -u chef-webui -k /etc/chef/webui.pem
WARN: Updated Role SAMPLE_ROLE!
root@chef:~# knife role show SAMPLE_ROLE -s http://chef.example.com:4000 -u chef-webui -k /etc/chef/webui.pem
{
“name”: “SAMPLE_ROLE”,
“default_attributes”: {

},
“json_class”: “Chef::Role”,
“run_list”: [

],
“description”: “A sample role”,
“chef_type”: “role”,
“override_attributes”: {
“ntp”: {
“is_server”: false,
“service”: “ntpd”,
“servers”: [
“time01.example.com”,
“time02.example.com”
]
}
}
}
root@chef:~#

Chef 0.8.x Deb and Upstart

So my chef clients have been crashing and its always a bummer to ssh in and restart it. I could just have my monitoring system start it but why bother when Ubuntu has a wonderful and built in way to make sure the service stays up!

So my chef clients have been crashing and its always a bummer to ssh in and restart it. I could just have my monitoring system start it but why bother when Ubuntu has a wonderful and built in way to make sure the service stays up!

First I downloaded the chef recipe from opscode, then I added the following.

joshua-millers-macbook-pro:site-cookbooks jmiller$ cat chef/recipes/client-deb.rb
#
# Author:: Joshua Miller
# Cookbook Name:: chef
# Recipe:: client-deb
#
# Copyright 2008-2010, Fitsnips.net
#
# Licensed under the Apache License, Version 2.0 (the “License”);
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an “AS IS” BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# since I have the deb installed already by this point I dont install it.
case node[:platform]
when “ubuntu”
# Upstart is on karmic and above by default … not sure about lower versions
if node[:platform_version].to_f >= 9.10

# my chef server is installed with gems, but for easy of auto install I am using debs
# in my kickstart build with a local apt-mirror. Due to that I have added a check for
# chef-server and its there we dont make any changes.
template “/etc/init.d/chef-client” do
source “chef-client-upstartjob.erb”
owner “root”
group “root”
mode 0774
backup 0
not_if do File.symlink?(“/etc/init.d/chef-server”) end
end

service “chef-client” do
provider Chef::Provider::Service::Upstart
supports :restart => true, :reload => true
end

template “/etc/default/chef-client” do
source “default-chef-client.erb”
owner “root”
group “root”
mode 644
backup 0
not_if do File.symlink?(“/etc/init.d/chef-server”) end
end

template “/etc/init/chef-client.conf” do
source “upstart-chef-client.conf.erb”
owner “root”
group “root”
mode 0644
backup 0
notifies :start, resources(:service => “chef-client”)
not_if do File.symlink?(“/etc/init.d/chef-server”) end
end

end

end

Then we create a few templates:

joshua-millers-macbook-pro:site-cookbooks jmiller$ cat chef/templates/default/upstart-chef-client.conf.erb
start on runlevel [2345]

script
exec /usr/bin/env chef-client -c /etc/chef/client.rb -i <%= @node[:chef][:client_interval] %> -s <%= @node[:chef][:client_splay] %>
end script

# Restart the process if it dies with a signal
# or exit code not given by the ‘normal exit’ stanza.
respawn

# Give up if restart occurs 10 times in 90 seconds.
respawn limit 10 90

Lets make is easy on the other admins who are not used to Upstart:

joshua-millers-macbook-pro:site-cookbooks jmiller$ cat chef/templates/default/chef-client-upstartjob.erb
#!/bin/sh -e
# upstart-job
#
# Symlink target for initscripts that have been converted to Upstart.

set -e

INITSCRIPT=”$(basename “$0″)”
JOB=”${INITSCRIPT%.sh}”

if [ “$JOB” = “upstart-job” ]; then
if [ -z “$1” ]; then
echo “Usage: upstart-job JOB COMMAND” 1>&2
exit 1
fi

JOB=”$1″
INITSCRIPT=”$1″
shift
else
if [ -z “$1” ]; then
echo “Usage: $0 COMMAND” 1>&2
exit 1
fi
fi

COMMAND=”$1″
shift

if [ -z “$DPKG_MAINTSCRIPT_PACKAGE” ]; then
ECHO=echo
else
ECHO=:
fi

$ECHO “Rather than invoking init scripts through /etc/init.d, use the service(8)”
$ECHO “utility, e.g. service $INITSCRIPT $COMMAND”

case $COMMAND in
status)
$ECHO
$ECHO “Since the script you are attempting to invoke has been converted to an”
$ECHO “Upstart job, you may also use the $COMMAND(8) utility, e.g. $COMMAND $JOB”
$COMMAND “$JOB”
;;
start|stop|restart)
$ECHO
$ECHO “Since the script you are attempting to invoke has been converted to an”
$ECHO “Upstart job, you may also use the $COMMAND(8) utility, e.g. $COMMAND $JOB”
PID=$(status “$JOB” 2>/dev/null | awk ‘/[0-9]$/ { print $NF }’)
if [ -z “$PID” ] && [ “$COMMAND” = “stop” ]; then
exit 0
elif [ -n “$PID” ] && [ “$COMMAND” = “start” ]; then
exit 0
elif [ -z “$PID” ] && [ “$COMMAND” = “restart” ]; then
start “$JOB”
exit 0
fi
$COMMAND “$JOB”
;;
reload|force-reload)
$ECHO
$ECHO “Since the script you are attempting to invoke has been converted to an”
$ECHO “Upstart job, you may also use the $COMMAND(8) utility, e.g. $COMMAND $JOB”
reload “$JOB”
;;
*)
$ECHO
$ECHO “The script you are attempting to invoke has been converted to an Upstart” 1>&2
$ECHO “job, but $COMMAND is not supported for Upstart jobs.” 1>&2
exit 1
esac