Chef Performance Tuning — Part 1

It turns out chef-server is a cpu hog, not sure why all its really doing is attribute storage and file pushing. I started noticing that my 66 node chef server farm was seeing longer and longer chef-client runs. At first I looked at disk as I did not think chef sever would have problems with this small of a farm. After much consideration that did not seem to be the problem, then I noticed while watching top that chef-server was using 99% of one core 85% of the time. While I do not claim to be a experts here is the solution that worked for me.

I am reading more of the chef-server code and thinking I overcomplicated this a bunch but am checking with others to confirm … this works but may not be the best solution.

It turns out chef-server is a cpu hog, not sure why all its really doing is attribute storage and file pushing. I started noticing that my 66 node chef server farm was seeing longer and longer chef-client runs. At first I looked at disk as I did not think chef sever would have problems with this small of a farm. After much consideration that did not seem to be the problem, then I noticed while watching top that chef-server was using 99% of one core 85% of the time. While I do not claim to be a experts here is the solution that worked for me.

One work around is to create additional merb threads, to do this on a gems install edit: /etc/service/chef-server/run

I have added the -c 8: -c, –cluster-nodes NUM_MERBS Number of merb daemons to run.

jmiller@srv-101-03:~$ cat /etc/service/chef-server/run
#!/bin/sh
PATH=/usr/local/bin:/usr/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/lib/ruby/gems/1.8/bin
exec 2>&1
exec /usr/bin/env chef-server -c 8 -N -p 4000 -e production -P /var/run/chef/server.%s.pid
jmiller@srv-101-03:~

Then restart the chef server:

sudo /etc/init.d/chef-server restart

This will spawn 8 worker threads starting at 4000 (port 4040 is the chef-server-webui)

jmiller@srv-101-03:~$ ps -eaf |grep merb |grep -v grep
root 3559 12380 0 Jun14 ? 00:00:37 merb : worker (port 4040)
root 3623 12342 0 14:08 ? 00:00:07 merb : spawner (ports 4000)
root 3638 3623 12 14:08 ? 00:18:40 merb : worker (port 4004)
root 3639 3623 12 14:08 ? 00:18:47 merb : worker (port 4005)
root 3640 3623 11 14:08 ? 00:17:17 merb : worker (port 4006)
root 3641 3623 12 14:08 ? 00:18:30 merb : worker (port 4007)
root 10890 1 64 Jun14 ? 11:18:07 merb : worker (port 4000)
root 10891 1 5 Jun14 ? 00:54:46 merb : worker (port 4001)
root 10892 1 4 Jun14 ? 00:51:57 merb : worker (port 4002)
root 10893 1 4 Jun14 ? 00:51:03 merb : worker (port 4003)
jmiller@srv-101-03:~$

Apply the correct recipes to the chef server

“recipe[apache2]”,
“recipe[apache2::mod_status]”,
“recipe[apache2::mod_proxy]”,
“recipe[apache2::mod_proxy_http]”,
“recipe[apache2::mod_proxy_balancer]”,
“recipe[apache2::mod_rewrite]”,
“recipe[apache2::mod_headers]”,

Now that we have the threads we need to lb request to them, opscode provides examples for apache lb so that is what I chose to use the port 4080 was a random choice that works in my env:

jmiller@srv-101-03:~$ cat /etc/apache2/sites-available/chef.example.com
Listen 4080

ServerName chef.example.com
DocumentRoot /usr/share/chef-server/public


BalancerMember http://127.0.0.1:4000
BalancerMember http://127.0.0.1:4001
BalancerMember http://127.0.0.1:4002
BalancerMember http://127.0.0.1:4003
BalancerMember http://127.0.0.1:4004
BalancerMember http://127.0.0.1:4005
BalancerMember http://127.0.0.1:4006
BalancerMember http://127.0.0.1:4007
Order deny,allow
Allow from all

LogLevel info
ErrorLog /var/log/apache2/chef_server-error.log
CustomLog /var/log/apache2/chef_server-access.log combined

RewriteEngine On
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} !-f
RewriteRule ^/(.*)$ balancer://chef_server%{REQUEST_URI} [P,QSA,L]

Enable the new vsite:

jmiller@srv-101-03:~$ sudo a2ensite chef.example.com
Site chef.example.com already enabled
jmiller@srv-101-03:~$

Restart apache:

jmiller@srv-101-03:~$ sudo /etc/init.d/apache2 reload
* Reloading web server config apache2
Warning: DocumentRoot [/usr/share/chef-server/public] does not exist
[Tue Jun 15 16:48:21 2010] [warn] NameVirtualHost *:443 has no VirtualHosts
…done.
jmiller@srv-101-03:~$

Make sure it works, first update your knife chef_server_url port

jmiller@srv-101-03:~$ cat .chef/knife.rb
log_level :warn
#log_location “/home/jmiller/.chef/knife.log”
node_name ‘jmiller’
client_key ‘/home/jmiller/.chef/jmiller.pem’
validation_client_name ‘chef-validator’
validation_key ‘/home/jmiller/.chef/chef-validator.pem’
chef_server_url ‘http://srv-101-03.example.com:4080’
cache_type ‘BasicFile’
cache_options( :path => ‘/home/jmiller/.chef/checksums’ )
cookbook_path [ ‘/home/jmiller/site-cookbooks’ ]

jmiller@srv-101-03:~$

Before the worker threads this was taking 2 – 9 seconds this is the highest I have seen since the change 🙂

jmiller@srv-101-03:~$ time knife role list
[
“APACHE_ROLE”,

“WEBSERVER_ROLE”
]

real 0m0.673s
user 0m0.310s
sys 0m0.100s
jmiller@srv-101-03:~$

Now you will need to update all the /etc/chef/client.rb files on the systems and restart chef-client daemon, I suggest you use chef to do it.

thank you to Josh Timberman, Adam Jacob, and holoway for many pointers

Leave a Reply

Your email address will not be published. Required fields are marked *