Kitchen Soap

Thoughts on capacity planning and web operations.

Kitchen Soap header image 1

Web Ops Visualizations Group on Flickr

December 16th, 2008 · No Comments

Like lots of operations people, we’re quite addicted to data pr0n here at Flickr. We’ve got graphs for pretty much everything, and add graphs all of the time. We’ve blogged about some of how and why we do it.

One thing we’re in the habit of is screenshotting these graphs when things go wrong, right, or indifferent, and adding them to a group on Flickr. I’ve decided to make a public group for these sort of screenshots, for anyone to contribute to:

http://flickr.com/groups/webopsviz/

You should realize before posting anything here, that you might want to think about if you want everyone in the world to see what you’ve got. I’ve made a quick FAQ on the groups page, but I’ll repeat it here:

Q: What is this?
A: This group is for sharing visualizations of web operations metrics. For the most part, this means graphs of systems and application metrics, from software like ganglia, cacti, hyperic, etc.

Q:Who gets to see this?
A: This is a semi-public group, so don’t post anything you don’t want others to see.
For now, it’ll be for members-only to post and view. Ideally, I think it’d be great to share some of these things publicly.

Q: What’s interesting to post here?
A: Spikes, dips, patterns. Things with colors. Shiny things. Donuts. Ponies.

Q: My company will fire me if I show our metrics!
A: Don’t be dense, and post your pageview, revenue, or other super-secret stuff that you think would be sensitive. Your mileage may vary.

So: you’ve got something to brag about? How many requests per second can your awesome new solid-state-disk database do? You got spikes? Post them!

→ No CommentsTags: flickr · tools

2009 Velocity Conference submissions are open!

November 20th, 2008 · 1 Comment

The CFP for next year’s Velocity Conference is up now, so all you ops and performance ninjas submit your ideas for talks.

I’m lucky enough to be on the program committee this year, and I think the conference is a huge opportunity to spread the ops love on all kinds of topics. There’s a list on the O’Reilly page to get you thinking about what might make for a good submission:

- How to tie web performance and operations to the bottom line
- Real-world incident management – getting “tight like a pit crew”
- Making websites as fast and reliable as desktop apps
- Networking, DNS, and load balancing
- Profiling’s not just on the backend: JavaScript, CSS, and the network
- Managing web services – flaming disasters you survived and lessons learned
- The intersection between performance and design
- Wicked cool (and actionable) metrics
- Ads, ads, ads – the performance killer?
- Troubleshooting in production
- How to scale and be fast on the social web
- Capacity planning and load testing
- Establishing performance and operations best practices within your organization
- Configuration management best (and worst) tools and practices
- Monitoring and instrumentation: Open Source, as a service, commercially supported solutions
- Using multiple CDNs to improve customer experience and reduce cost

Think for a minute: Do you have a bunch of sweet ops hacks that you’re really proud of? Do you and your dev teams collaborate on making things easy to manage? Do you face unique challenges that others don’t which ops folks can learn from?

If so, don’t be lame: submit a proposal!

→ 1 CommentTags: talks · webops

Code Swarm for Config Management

October 21st, 2008 · 3 Comments

Gil Raphaelli, one of the guys on our Flickr Ops team, put together a Code Swarm animation for the configuration/deployment management tool we use at Flickr to manage our infrastructure. Myles Grant did this for our bug reporting system as well. Check it out:

Our automated config management system is called Gemstone, but conceptually you can think of it as a pretty extensible SystemImager/Puppet/cfengine-style system. In the animation, the dots are changes made by the ops person shown.  The legend is:

transforms
: this is what cluster should have what packages, files, actionable scripts, etc.
raw: these are actual files, like apache/memcached/squid configs, which get munged depending on what cluster they might be in
conf: this is what boxes/clusters are subsets or supersets of which clusters
code: ops-written tools/utilities
Misc: stuff that doesn’t fit into the above. :)

→ 3 CommentsTags: tools · webops

I wrote a book about common sense.

October 6th, 2008 · 1 Comment

The Art of Capacity Planning: Scaling Web Resources

Whew. That took longer than I thought.

Todd Hoff over at the High Scalability blog has an email interview with me about a book that I wrote, called “The Art of Capacity Planning: Scaling Web Resources“.  I’m still just happy that I got it done at all, seeing how it was due the same week that my son was born.

This book happened because of a LOT of people. You know who you are, because you’re in the Acknowledgements. :)

→ 1 CommentTags: "capacity planning" · book

More back-of-envelope-math…

September 18th, 2008 · No Comments

Via kottke: some good examples of doing rough math in your head, causing you to guess about assumptions all along the way.

IMHO, being able to do this is one of the things that makes a good web ops person. The examples might be “useless”, but the process is invaluable.

→ No CommentsTags: "capacity planning" · random · webops

Internet-Scale Efficiency

September 16th, 2008 · No Comments

James Hamilton’s excellent LADIS 2008 presentation has lots of great stuff in it about internet scale bits. Cool stats.

→ No CommentsTags: "capacity planning" · webops

Why we use GraphicsMagick

September 2nd, 2008 · 6 Comments

Speed: http://www.graphicsmagick.org/www/BENCHMARKS.html

Also, it looks like the GM devs are working on getting OpenMP (parallelism) put into GM processing, which will be a huge boom for multicore boxes. Yay!

→ 6 CommentsTags: tools

Everything isn’t about the Knuth quote

August 24th, 2008 · 2 Comments

It’s hard to describe how tiring it is to hear someone quote Donald Knuth (or Tony Hoare) in the wrong context. I’m not the only one annoyed by this. In “Structured Programming with go to Statements”, Knuth says:

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

After having read Knuth’s paper containing this quote, I can agree that it’s certainly a brilliant piece of advice in the context of programming. What is irritating to me is the blanket application of this pearl of wisdom to anything that has to do with computers, especially systems performance, web operations and architecture decisions.

For the record: I firmly believe in these principles:

  1. Done >= perfect.
  2. Don’t waste time building elaborate simulations for what the future might bring to your capacity.
  3. Performance tuning is better left outside the capacity planning process.

But I think sometimes folks lean on the Knuth/Hoare way too much, in the wrong situation. This was meant to be a blog post of my own, but I think this article pretty much sums up my current feelings about it.

→ 2 CommentsTags: Uncategorized

Untitled Metric #1202345227

July 17th, 2008 · 2 Comments



Untitled Metric #1202345227, originally uploaded by straup.

Our philosophy in Flickr Operations Engineering.

→ 2 CommentsTags: Uncategorized

Slides from Velocity

June 25th, 2008 · 8 Comments

Here are the slides from my talk at the Velocity Conference.

→ 8 CommentsTags: flickr · talks · webops