Chef Performance Tuning — Part 1

It turns out chef-server is a cpu hog, not sure why all its really doing is attribute storage and file pushing. I started noticing that my 66 node chef server farm was seeing longer and longer chef-client runs. At first I looked at disk as I did not think chef sever would have problems with this small of a farm. After much consideration that did not seem to be the problem, then I noticed while watching top that chef-server was using 99% of one core 85% of the time. While I do not claim to be a experts here is the solution that worked for me.

I am reading more of the chef-server code and thinking I overcomplicated this a bunch but am checking with others to confirm … this works but may not be the best solution.

It turns out chef-server is a cpu hog, not sure why all its really doing is attribute storage and file pushing. I started noticing that my 66 node chef server farm was seeing longer and longer chef-client runs. At first I looked at disk as I did not think chef sever would have problems with this small of a farm. After much consideration that did not seem to be the problem, then I noticed while watching top that chef-server was using 99% of one core 85% of the time. While I do not claim to be a experts here is the solution that worked for me.

One work around is to create additional merb threads, to do this on a gems install edit: /etc/service/chef-server/run

I have added the -c 8: -c, –cluster-nodes NUM_MERBS Number of merb daemons to run.

jmiller@srv-101-03:~$ cat /etc/service/chef-server/run
#!/bin/sh
PATH=/usr/local/bin:/usr/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/lib/ruby/gems/1.8/bin
exec 2>&1
exec /usr/bin/env chef-server -c 8 -N -p 4000 -e production -P /var/run/chef/server.%s.pid
jmiller@srv-101-03:~

Then restart the chef server:

sudo /etc/init.d/chef-server restart

This will spawn 8 worker threads starting at 4000 (port 4040 is the chef-server-webui)

jmiller@srv-101-03:~$ ps -eaf |grep merb |grep -v grep
root 3559 12380 0 Jun14 ? 00:00:37 merb : worker (port 4040)
root 3623 12342 0 14:08 ? 00:00:07 merb : spawner (ports 4000)
root 3638 3623 12 14:08 ? 00:18:40 merb : worker (port 4004)
root 3639 3623 12 14:08 ? 00:18:47 merb : worker (port 4005)
root 3640 3623 11 14:08 ? 00:17:17 merb : worker (port 4006)
root 3641 3623 12 14:08 ? 00:18:30 merb : worker (port 4007)
root 10890 1 64 Jun14 ? 11:18:07 merb : worker (port 4000)
root 10891 1 5 Jun14 ? 00:54:46 merb : worker (port 4001)
root 10892 1 4 Jun14 ? 00:51:57 merb : worker (port 4002)
root 10893 1 4 Jun14 ? 00:51:03 merb : worker (port 4003)
jmiller@srv-101-03:~$

Apply the correct recipes to the chef server

“recipe[apache2]”,
“recipe[apache2::mod_status]”,
“recipe[apache2::mod_proxy]”,
“recipe[apache2::mod_proxy_http]”,
“recipe[apache2::mod_proxy_balancer]”,
“recipe[apache2::mod_rewrite]”,
“recipe[apache2::mod_headers]”,

Now that we have the threads we need to lb request to them, opscode provides examples for apache lb so that is what I chose to use the port 4080 was a random choice that works in my env:

jmiller@srv-101-03:~$ cat /etc/apache2/sites-available/chef.example.com
Listen 4080

ServerName chef.example.com
DocumentRoot /usr/share/chef-server/public


BalancerMember http://127.0.0.1:4000
BalancerMember http://127.0.0.1:4001
BalancerMember http://127.0.0.1:4002
BalancerMember http://127.0.0.1:4003
BalancerMember http://127.0.0.1:4004
BalancerMember http://127.0.0.1:4005
BalancerMember http://127.0.0.1:4006
BalancerMember http://127.0.0.1:4007
Order deny,allow
Allow from all

LogLevel info
ErrorLog /var/log/apache2/chef_server-error.log
CustomLog /var/log/apache2/chef_server-access.log combined

RewriteEngine On
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} !-f
RewriteRule ^/(.*)$ balancer://chef_server%{REQUEST_URI} [P,QSA,L]

Enable the new vsite:

jmiller@srv-101-03:~$ sudo a2ensite chef.example.com
Site chef.example.com already enabled
jmiller@srv-101-03:~$

Restart apache:

jmiller@srv-101-03:~$ sudo /etc/init.d/apache2 reload
* Reloading web server config apache2
Warning: DocumentRoot [/usr/share/chef-server/public] does not exist
[Tue Jun 15 16:48:21 2010] [warn] NameVirtualHost *:443 has no VirtualHosts
…done.
jmiller@srv-101-03:~$

Make sure it works, first update your knife chef_server_url port

jmiller@srv-101-03:~$ cat .chef/knife.rb
log_level :warn
#log_location “/home/jmiller/.chef/knife.log”
node_name ‘jmiller’
client_key ‘/home/jmiller/.chef/jmiller.pem’
validation_client_name ‘chef-validator’
validation_key ‘/home/jmiller/.chef/chef-validator.pem’
chef_server_url ‘http://srv-101-03.example.com:4080’
cache_type ‘BasicFile’
cache_options( :path => ‘/home/jmiller/.chef/checksums’ )
cookbook_path [ ‘/home/jmiller/site-cookbooks’ ]

jmiller@srv-101-03:~$

Before the worker threads this was taking 2 – 9 seconds this is the highest I have seen since the change 🙂

jmiller@srv-101-03:~$ time knife role list
[
“APACHE_ROLE”,

“WEBSERVER_ROLE”
]

real 0m0.673s
user 0m0.310s
sys 0m0.100s
jmiller@srv-101-03:~$

Now you will need to update all the /etc/chef/client.rb files on the systems and restart chef-client daemon, I suggest you use chef to do it.

thank you to Josh Timberman, Adam Jacob, and holoway for many pointers

Automated role updates with knife

In this example we want to update a role, this is the basics you will need to automate the actually edit of the json file in whatever language you like

In this example we want to update a role, this is the basics you will need to automate the actually edit of the json file in whatever language you like

List the roles, no sample role

joshua-millers-macbook-pro:chef jmiller$ knife role list
[
“APACHE_ROLE”,
“APPBASE_ROLE”,
“APTREPO_ROLE”,
“WEBSERVER_ROLE”
]
joshua-millers-macbook-pro:chef jmiller$

Dump the BASE_ROLE so we can use it to create a new role

joshua-millers-macbook-pro:chef jmiller$ knife role show BASE_ROLE > SAMPLE_ROLE.json
joshua-millers-macbook-pro:chef jmiller$

Edit the role; going to do it manually here but could be done with perl …

joshua-millers-macbook-pro:chef jmiller$ cat SAMPLE_ROLE.json
{
“name”: “SAMPLE_ROLE”,
“default_attributes”: {
},
“json_class”: “Chef::Role”,
“run_list”: [
],
“description”: “All nodes wiil get this base”,
“chef_type”: “role”,
“override_attributes”: {
“authorization”: {
“sudo”: {
“groups”: [
“dev”
],
“users”: [

]
}
},
“chef”: {
“client_splay”: “20”,
“client_interval”: “900”,
“server_fqdn”: “chef.example.com”
},
“postfix”: {
“myorigin”: “mail.example.com”,
“relayhost”: “mailrelay.example.com”,
“mydomain”: “example.com”
},
“ntp”: {
“is_server”: false,
“service”: “ntpd”,
“servers”: [
“time01.example.com”,
“time02.example.com”
]
}
}
}
joshua-millers-macbook-pro:chef jmiller$

I am creating the role so it going to generate a “Not Found” error

joshua-millers-macbook-pro:chef jmiller$ knife role from file SAMPLE_ROLE.json
WARN: HTTP Request Returned 404 Not Found: Cannot load role SAMPLE_ROLE
WARN: Updated Role SAMPLE_ROLE!
joshua-millers-macbook-pro:chef jmiller$

Sample role created:

joshua-millers-macbook-pro:chef jmiller$ knife role list | grep SAMPLE
“SAMPLE_ROLE”,
joshua-millers-macbook-pro:chef jmiller$

Here is what we have:

joshua-millers-macbook-pro:chef jmiller$ knife role show SAMPLE_ROLE
{
“name”: “SAMPLE_ROLE”,
“default_attributes”: {
},
“json_class”: “Chef::Role”,
“run_list”: [

],
“description”: “All nodes wiil get this base”,
“chef_type”: “role”,
“override_attributes”: {
“authorization”: {
“sudo”: {
“groups”: [
“dev”
],
“users”: [

]
}
},
“chef”: {
“client_splay”: “20”,
“client_interval”: “900”,
“server_fqdn”: “chef.example.com”
},
“postfix”: {
“myorigin”: “mail.example.com”,
“relayhost”: “mailrelay.example.com”,
“mydomain”: “example.com”
},
“ntp”: {
“is_server”: false,
“service”: “ntpd”,
“servers”: [
“time01.example.com”,
“time02.example.com”
]
}
}
}
joshua-millers-macbook-pro:chef jmiller$

I update the role ( could be automated with a script ) and update chef

joshua-millers-macbook-pro:chef jmiller$ vi SAMPLE_ROLE.json

joshua-millers-macbook-pro:chef jmiller$ cat SAMPLE_ROLE.json
{
“name”: “SAMPLE_ROLE”,
“default_attributes”: {
},
“json_class”: “Chef::Role”,
“run_list”: [
],
“description”: “All nodes wiil get this base”,
“chef_type”: “role”,
“override_attributes”: {
“ntp”: {
“is_server”: false,
“service”: “ntpd”,
“servers”: [
“time01.example.com”,
“time02.example.com”
]
}
}
}
joshua-millers-macbook-pro:chef jmiller$ knife role from file SAMPLE_ROLE.json
WARN: Updated Role SAMPLE_ROLE!
joshua-millers-macbook-pro:chef jmiller$ knife role show SAMPLE_ROLE
{
“name”: “SAMPLE_ROLE”,
“default_attributes”: {
},
“json_class”: “Chef::Role”,
“run_list”: [

],
“description”: “All nodes wiil get this base”,
“chef_type”: “role”,
“override_attributes”: {
“ntp”: {
“is_server”: false,
“service”: “ntpd”,
“servers”: [
“time01.example.com”,
“time02.example.com”
]
}
}
}
joshua-millers-macbook-pro:chef jmiller$

It looks like we should be able to use the following to do the role edit on the chef server … or create another client pem for just this task …

root@chef:~# knife role show SAMPLE_ROLE -s http://chef.example.com:4000 -u chef-webui -k /etc/chef/webui.pem
{
“name”: “SAMPLE_ROLE”,
“default_attributes”: {

},
“json_class”: “Chef::Role”,
“run_list”: [

],
“description”: “All nodes wiil get this base”,
“chef_type”: “role”,
“override_attributes”: {
“ntp”: {
“is_server”: false,
“service”: “ntpd”,
“servers”: [
“time01.example.com”,
“time02.example.com”
]
}
}
}
root@chef:~# knife role show SAMPLE_ROLE -s http://chef.example.com:4000 -u chef-webui -k /etc/chef/webui.pem > SAMPLE_ROLE.json
root@chef:~# vi SAMPLE_ROLE.json
root@chef:~# knife role from file SAMPLE_ROLE.json -s http://chef.example.com:4000 -u chef-webui -k /etc/chef/webui.pem
WARN: Updated Role SAMPLE_ROLE!
root@chef:~# knife role show SAMPLE_ROLE -s http://chef.example.com:4000 -u chef-webui -k /etc/chef/webui.pem
{
“name”: “SAMPLE_ROLE”,
“default_attributes”: {

},
“json_class”: “Chef::Role”,
“run_list”: [

],
“description”: “A sample role”,
“chef_type”: “role”,
“override_attributes”: {
“ntp”: {
“is_server”: false,
“service”: “ntpd”,
“servers”: [
“time01.example.com”,
“time02.example.com”
]
}
}
}
root@chef:~#