Chef AWS

OK so you have your AWS account setup, your ready to launch a EC2 instance with Hosted Chef what do you need to know? I know this may seem simple to someone who has been doing it for a long time but it took me a few hours to figure what exactly I needed. Here is a few quick notes on what I found worked work me on my Macbook Air although it should work just fine on Linux also. I assume you have a working chef install and can upload cookbooks to your Hosted Chef server. Secondly I am using the chef-dk and had to install the knife-ec2 plugin. Assuming you have setup the chef-dk as they recommend all you should need to do is run the chef gem install command. This will install the ruby gems into your home directory at ~/.chefdk so you will not need sudo access.

jmiller11:fcs2-chef-repo jmiller$ ls -l ~/.chefdk/
total 0
drwxr-xr-x 3 jmiller staff 102 May 3 13:35 gem
jmiller11:fcs2-chef-repo jmiller$

Here is the command to install the plugin:

chef gem isntall knife-ec2

Append the following to your .chef/knife.rb

# AWS support
knife[:aws_access_key_id] = ENV[‘AWS_ACCESS_KEY_ID’]
knife[:aws_secret_access_key] = ENV[‘AWS_SECRET_ACCESS_KEY’]
# Optional if you’re using Amazon’s STS
#knife[:aws_session_token] = ENV[‘AWS_SESSION_TOKEN’]
knife[:aws_ssh_key_id] = ENV[‘AWS_MYPEM’]
knife[:region] = ENV[‘AWS_REGION’]
knife[:bootstrap_version]= ‘11.12.4-1’

Append the following to your ~/.bash_profile

AWS_ACCESS_KEY_ID=XXXXXXXXXXX
AWS_SECRET_ACCESS_KEY=XXXXXXXXXXX
# note the AWS_MYPEM does not have .pem extension listed
# it found my key that was in ~/.ssh/ and is chmod 600
AWS_MYPEM=XXXXXXXX
AWS_REGION=us-east-1
# Optional if you’re using Amazon’s STS
#AWS_SESSION_TOKEN=””
export AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN AWS_MYPEM AWS_REGION

Source your bash profile to make sure the new variables are active:

jmiller11:fcs2-chef-repo jmiller$ . ~/.bash_profile

Ok lets test that your setup correctly by running “knife ec2 server list”, likely this will be empty for your but as long as it returns the header your fine:

jmiller11:fcs2-chef-repo jmiller$ knife ec2 server list
Instance ID Name Public IP Private IP Flavor Image SSH Key Security Groups IAM Profile State
i-b9ea45e9 base1204-1 m1.small ami-0145d268 aws-jmiller default terminated
i-88e34cd8 m1.small ami-0145d268 aws-jmiller default terminated
i-25e84775 base1204-1 m1.small ami-0145d268 aws-jmiller default terminated
i-c7e44b97 base1204-1 m1.small ami-0145d268 aws-jmiller default terminated
i-41e14e11 base1204-1 54.227.113.203 10.236.185.159 m1.small ami-0145d268 aws-jmiller www, default running
i-7dfc532d base1204-2 54.237.5.212 10.151.112.113 m1.small ami-0145d268 aws-jmiller www, default running
i-53b57800 webserver1 t1.micro ami-3202f25b me default terminated
jmiller11:fcs2-chef-repo jmiller$

Launch Command:

Ok here I am using a simple role called “base” that was uploaded to my chef hosted account all it does at this point is setup chef to run as a cron job to save memory. The ami is a ubuntu 12.04 that will be running on a m1.small instance, with the “default” and “www” security groups, with a easy to read name of “base1204-1” using the ssh key file for my aws key. I need to figure out if there is a better way then defining he ssh key on the line but this works for now.

jmiller11:fcs2-chef-repo jmiller$ knife ec2 server create -r ‘role[BASE]’ -I ami-0145d268 -f m1.small -x ubuntu -G default -N base1204-1 -i ~/.ssh/aws-jmiller

The output of the command will run and you should see something like this:

jmiller11:fcs2-chef-repo jmiller$ knife ec2 server create -r ‘role[BASE]’ -I ami-0145d268 -f m1.small -x ubuntu -G default,www -N base1204-2 -i ~/.ssh/aws-jmiller
Instance ID: i-7dfc532d
Flavor: m1.small
Image: ami-0145d268
Region: us-east-1
Availability Zone: us-east-1a
Security Groups: default, www
Tags: Name: base1204-2
SSH Key: aws-jmiller

Waiting for instance…………………
Public DNS Name: ec2-54-237-5-212.compute-1.amazonaws.com
Public IP Address: 54.237.5.212
Private DNS Name: ip-10-151-112-113.ec2.internal
Private IP Address: 10.151.112.113

Waiting for sshd….done
Connecting to ec2-54-237-5-212.compute-1.amazonaws.com
ec2-54-237-5-212.compute-1.amazonaws.com Installing Chef Client…

ec2-54-237-5-212.compute-1.amazonaws.com Chef Client finished, 7/12 resources updated in 14.133802702 seconds

Instance ID: i-7dfc532d
Flavor: m1.small
Image: ami-0145d268
Region: us-east-1
Availability Zone: us-east-1a
Security Groups: default, www
Security Group Ids: default
Tags: Name: base1204-2
SSH Key: aws-jmiller
Root Device Type: ebs
Root Volume ID: vol-cc24df85
Root Device Name: /dev/sda1
Root Device Delete on Terminate: true
Public DNS Name: ec2-54-237-5-212.compute-1.amazonaws.com
Public IP Address: 54.237.5.212
Private DNS Name: ip-10-151-112-113.ec2.internal
Private IP Address: 10.151.112.113
Environment: _default
Run List: role[BASE]

Ok thats the basics, if you get this far you might want to checkout chef-metal

Chef restore from backup

So I was testing my restore from backup for chef and ran into a few problems. The first problem I encountered was that my nginx load balancers config files are dynamically created based role assigned to boxes. After my restore one of the first boxes I tested was one of the LB boxes and to my horror even thought the systems where listed when I did a chef node list it seems that until they have check into the restored chef server they are not counted. This means that my nginx config server pools where empty … bummer. The easy fix here was to have my servers move over to the restored chef server instance from the bottom up … i.e. sql boxes, web boxes, then edge lb stuff. Not a huge problem but it does mean if you ever have to retore a chef box, stop all client before you bring it up.

The other odd problem I had was one node that had a local variable assigned to it did not pull the var over. Now the variable in question had not changed in months and my daily backups should have contained this info. I got lucky that even though it was a db password access for the system, I had removed the notify restart of a lot of services before the restore to minimize impact of changes but over it went pretty well.

My backups … tar zcvf `date +%Y%m%d`.`hostname`.chef.tar.gz /var/lib/couchdb/ /etc/chef

Restore, build server, install chef-server, stop chef-server, drop tar into place and start chef-server.

One final thought, I had to restore a 0.8.16 system after 0.9.8 was out which turns out to be a problem as the bootstrap latest files do not work with 0.8.16. Luckily I had a local copy of the boot strap that I used for 0.8.x installs and was able to run from there. I suggest you backup any files you use for installs locally just incase.

Chef error: marshal data too short

WARN: HTTP Request Returned 500 Internal Server Error: marshal data too short … what to do?

jmiller@srv-101-29:~$ sudo chef-client
[Tue, 10 Aug 2010 12:36:13 -0700] INFO: Starting Chef Run
[Tue, 10 Aug 2010 12:36:28 -0700] WARN: HTTP Request Returned 500 Internal Server Error: marshal data too short
/usr/lib/ruby/1.8/net/http.rb:2097:in `error!’: 500 “Internal Server Error” (Net::HTTPFatalError)
from /usr/lib/ruby/1.8/chef/rest.rb:216:in `api_request’
from /usr/lib/ruby/1.8/chef/rest.rb:267:in `retriable_rest_request’
from /usr/lib/ruby/1.8/chef/rest.rb:197:in `api_request’
from /usr/lib/ruby/1.8/chef/rest.rb:100:in `get_rest’
from /usr/lib/ruby/1.8/chef/client.rb:270:in `sync_cookbooks’
from /usr/lib/ruby/1.8/chef/client.rb:86:in `run’
from /usr/lib/ruby/1.8/chef/application/client.rb:215:in `run_application’
from /usr/lib/ruby/1.8/chef/application/client.rb:207:in `loop’
from /usr/lib/ruby/1.8/chef/application/client.rb:207:in `run_application’
from /usr/lib/ruby/1.8/chef/application.rb:62:in `run’
from /usr/bin/chef-client:25
jmiller@srv-101-29:~$

So looking at this I thought it was a checksum error on the client and deleted the /var/chef/cache directory without luck. After digging around I found that stopping the chef server and deleting /var/chef/cache/checksums, then restarting chef server fixed the problem. Easy fix but odd problem. Chef 0.8.16

chef, knife, and ssh – loving it!

Opscode added a ssh call to the knife utility which when used with the search syntax can be very nice. A few minor examples below.

Opscode added a ssh call to the knife utility which when used with the search syntax can be very nice. A few minor examples below.

jmiller@srv-101-03: $ knife ssh role:APACHE_ROLE uptime
srv-101-18.example.com  02:07:24 up 140 days, 23:23,  1 user,  load average: 0.00, 0.00, 0.00
srv-101-17.example.com  02:07:24 up 125 days, 10:53,  1 user,  load average: 0.03, 0.06, 0.02

j

miller@srv-101-03:~/operations/chef/roles$ knife ssh “role:BASE_ROLE” ‘ grep paranoia /etc/nscd.conf ‘
srv-101-01.example.com # paranoia
srv-101-01.example.com paranoia no
srv-101-14.example.com # paranoia
srv-101-14.example.com paranoia yes
srv-201-22.example.com # paranoia
srv-201-22.example.com paranoia yes
srv-201-01.example.com # paranoia
srv-201-01.example.com paranoia yes
srv-201-26.example.com # paranoia
srv-201-26.example.com paranoia yes
srv-101-04.example.com # paranoia
….

Backup chef roles

I like to keep my chef roles in git so I do a dump of them and check them when I make changes. Very nice if you remove something and can not recall what it is.

I like to keep my chef roles in git so I do a dump of them and check them when I make changes. Very nice if you remove something and can not recall what it was as you jump around.

#!/bin/bash

####
#
# Must be run from a server that has knife and your key i.e. chef.int.rdio
#
###

# List of all roles:

knife role list | sed s/\”//g | sed s/,// | egrep -v ‘\]|\[‘ > ./rolelist.txt

# Generate a file for each role containing the servers in that role

for i in `cat rolelist.txt`; do echo $i; knife role show $i > $i.json; done

Quick and dirty server list from chef

So I have always used a simple bash look to do quick task on lots of servers:

Example:

for i in `cat server.list`; do ssh $i ‘hostname;uptime’;done

We can use chef to build list of servers by role, and a list all servers in a our farm if managed by chef 🙂

#!/bin/bash

####
#
# Must be run from a server that has knife and your key i.e. chef.server.com
#
###

# I think I am going to make this a recipe
# but for now…

#Generate a list of all chef controlled servers

knife node list | sed s/\”//g | sed s/,// | grep -v \] > /home/operations/servers/all.txt

# List of all roles:

knife role list | sed s/\”//g | sed s/,// | egrep -v ‘\]|\[‘ > /home/operations/servers/roles.txt

# Generate a file for each role containing the servers in that role
# Tetsu likes the files lower case … works for me 🙂

for i in `cat roles.txt`; do echo $i; z=`echo $i | tr ‘[:upper:]’ ‘[:lower:]’`; knife search node role:$i -i > $z.txt; done

Joy with Chef 0.8 – and user error!!!

Maybe not really read for prime time, chef 0.8 is a major step forward … but the lack of good docs make it feel like a half a step back.

So install of Chef 0.8.6 on Ubuntu 9.10 karmic was not bad on a clean machine, then I go and do the dumb thing of updating to 0.8.8 now its busted!

root@srv-101-03:~# chef-server
Loading init file from /usr/lib/ruby/gems/1.8/gems/chef-server-0.8.8/config/init.rb
Loading /usr/lib/ruby/gems/1.8/gems/chef-server-0.8.8/config/environments/development.rb
/usr/local/lib/site_ruby/1.8/rubygems.rb:230:in `activate’: can’t activate chef (= 0.8.8, runtime) for [“chef-solr-0.8.8”], already activated chef-0.8.6 for [] (Gem::LoadError)
from /usr/local/lib/site_ruby/1.8/rubygems.rb:246:in `activate’

jmiller@srv-101-03:~$ knife node list
/usr/lib/ruby/1.8/net/http.rb:2097:in `error!’: 500 “Internal Server Error” (Net::HTTPFatalError)
from /usr/lib/ruby/gems/1.8/gems/chef-0.8.8/lib/chef/rest.rb:296:in `run_request’
from /usr/lib/ruby/gems/1.8/gems/chef-0.8.8/lib/chef/rest.rb:106:in `get_rest’
from /usr/lib/ruby/gems/1.8/gems/chef-0.8.8/lib/chef/node.rb:363:in `list’
from /usr/lib/ruby/gems/1.8/gems/chef-0.8.8/lib/chef/knife/node_list.rb:35:in `run’
from /usr/lib/ruby/gems/1.8/gems/chef-0.8.8/lib/chef/application/knife.rb:110:in `run’
from /usr/lib/ruby/gems/1.8/gems/chef-0.8.8/bin/knife:26
from /usr/bin/knife:19:in `load’
from /usr/bin/knife:19
jmiller@srv-101-03:~$

OK here was the dumb and quick fix, the failure was that I run gem upgrade chef … not the command below. Ruby is stupid!

gem install chef -v ‘=0.8.8’

Or maybe not .. that only fixed the error “can’t activate chef”

More progress thank you to the mailing list, webui is back and running but node lists are still messed up:

After looking at your stack trace, you are using Merb 1.1 which is not compatable with Chef .8.8, you should downgrade Merb back to 1.0.15 if you want the webui to work at all.

Damm

root@srv-101-03:~# gem list

*** LOCAL GEMS ***

abstract (1.0.0)
amqp (0.6.7)
bundler (0.9.13)
bunny (0.6.0)
chef (0.8.8)
chef-server (0.8.8)
chef-server-api (0.8.8)
chef-server-webui (0.8.8)
chef-solr (0.8.8)

merb-assets (1.1.0)
merb-core (1.1.0)
merb-haml (1.1.0)
merb-helpers (1.1.0)
merb-param-protection (1.1.0)
merb-slices (1.1.0)

gem uninstall -aIx merb-assets merb-core merb-haml merb-helpers merb-param-protection merb-slices

gem install merb-assets merb-core merb-haml merb-helpers merb-param-protection merb-slices -v ‘~> 1.0.0’

I love the chef mailing list, they pointed out. http://tickets.opscode.com/browse/CHEF-1069

On Tue, Mar 30, 2010 at 5:15 PM, Joshua Miller wrote:
I did a dump of the chef couchdb and am sure this is the problem but do not know enough about couchdb to fix it .. doing research but if anyone just knows the answer.

{“chef_type”: “node”, “name”: null, “_rev”: “1-d40f879d3cbf5d93099b75619d03c8cf”, “defaults”: {}, “run_list”: [], “attributes”: {}, “json_class”: “Chef::Node”, “_id”: “61250eb6-62da-450e-a90a-97856291a2ee”, “overrides”: {}}^M
–==954fdeac87864055bc0716669a22d711==^M
Content-ID: 72b00c98-75c8-4ac8-8aed-723c60686d1c^M
Content-Length: 351^M
Content-MD5: mcsIj4Vf9ssbVNz8jjub2w==^M
Content-Type: application/json;charset=utf-8^M

Joshua,
you probably want to access CouchDB’s webui which is available from a
URL like http://localhost:5984/_utils/

On most installations, CouchDB configured to listen *only* on the
localhost/loopback interface, so you’ll most likely want to set up an
SSH tunnel from port 5984 on your box to localhost:5984. From there,
you can navigate to the “chef” database and then select the nodes >
all_id view. This URL will probably work for that:
http://localhost:5984/_utils/database.html?chef/_design/nodes/_view/all_id

Then find the one with a null/blank id and delete it.

HTH,
Dan DeLeo

Once I deleted the offending node all works again! So happy to have my chef 0.8.8 running again and a big thank you to Dan DeLeo

chef-client 0.7.16 [BUG] Segmentation fault

Here is what it looked like .. it would start then die within 10 seconds:
jmiller@somerandomname:~$ sudo /etc/init.d/chef-client start
* Starting chef-client chef-client
…done.
jmiller@somerandomname:~$ /usr/lib/ruby/1.8/ohai/plugins/linux/virtualization.rb:58: [BUG] Segmentation fault
ruby 1.8.7 (2009-06-12 patchlevel 174) [x86_64-linux]

So I have been having problems on one of my chef boxes for the last week, it only showed up on this one system and it was driving me nuts. After a bit of time messing around it with it seems to be a known ruby issue and a updated ubuntu package finally came out!

http://tickets.opscode.com/browse/CHEF-530

Here is what it looked like .. it would start then die within 10 seconds:
jmiller@somerandomname:~$ sudo /etc/init.d/chef-client start
* Starting chef-client chef-client
…done.
jmiller@somerandomname:~$ /usr/lib/ruby/1.8/ohai/plugins/linux/virtualization.rb:58: [BUG] Segmentation fault
ruby 1.8.7 (2009-06-12 patchlevel 174) [x86_64-linux]

Running chef client from the commandline would complete just fine.
jmiller@somerandomname:~$sudo chef-client

But the same segfault would occur when adding daemonize flag and here is a excerpt from the log files

jmiller@somerandomname:~$sudo chef-client -d -l debug

[Thu, 11 Mar 2010 17:05:54 -0800] DEBUG: —- End uname -m STDERR —-
[Thu, 11 Mar 2010 17:05:54 -0800] DEBUG: Ran (uname -m) returned 0
[Thu, 11 Mar 2010 17:05:54 -0800] DEBUG: Loading plugin virtualization
[Thu, 11 Mar 2010 17:05:54 -0800] DEBUG: Loading plugin linux::virtualization

I had to update the following ruby packages:

libmixlib-cli-ruby libmixlib-cli-ruby1.8 libmixlib-config-ruby libmixlib-config-ruby1.8 libmixlib-log-ruby libmixlib-log-ruby1.8 libohai-ruby libohai-ruby1.8

Now life is all good again … man that sucked.

Opscode adds training but will anyone care?

While this is a next logical step I feel that they need to focus on getting 8.0 out before they even start to worry about training. I have held off on suggesting chef to a lot of people due to tall the change coming in 8.0.

So as you can see I have enjoyed Opscodes chef a lot but now I see they added training. While this is a next logical step I feel that they need to focus on getting 8.0 out before they even start to worry about training. I have held off on suggesting chef to a lot of people due to tall the change coming in 8.0. As a user or almost 8 months now I feel the changes are so extreme that its not worth starting with chef at this point. While the recipes an basic stuff your pushing out will move forward a lot of logic changes happen in 8.0. First there is the new databags that will allow you to rethink how you use shared data. Then there is the joy of roles in roles, which I love by the way. Why these are really minor changes I hate to think about going over all my roles and recipes and reworking them for 8.0. Not because you have to but more because I like to have a common pattern in execution and I assure you that I will be using these new features in new additions to my chef tool kit. O then there is knife … umm yea world changer there. So in summary hold off on training opscode and get 8.0 out the door.

Chef 8.0 almost here?

Its starting to feel like 8.0 will never ship and I just dont feel its ready to run in production just yet based on the lack of documentation but I have tasted enough to know I want it.

Been busy as heck around here at Rdio, Inc still loving chef but can not wait for 8.0

Some features I am looking forward to:

Knife: a command-line utility used to interact with a Chef server directly through the RESTful API.
one of the best parts of this that I have seen is that it will make multiple admins much easier to deal with. My favorite command so far: cookbook upload

Openid no longer only option for logins: Infact the whole login stuff has changed and with knife there will be even less reason then ever to login to the UI, this is a major change as the whole auth stuff is in flux right now.

Better Serach: now this one I have not played with much but they say it will be much better based partially on the databag addition

Databags: Data bags are arbitrary stores of JSON data on the server that get indexed for search.
This will help you store data that is used across recipes with less effort.

I am sure there are more, but those are the ones I have played with so far. Its starting to feel like 8.0 will never ship and I just dont feel its ready to run in production just yet based on the lack of documentation but I have tasted enough to know I want it.