Etsy’s Chef Repo, 2010 from jspaw on Vimeo. Delicious InfoViz courtesy of Gource....
Continue reading...
UPDATE, 10/17/2017: This post hasn’t aged well, and needs some patching. The title should be “TTR is more important than TBF (for most types of F)” Why? Because taking the statistical mean of TTR or TBF makes absolutely no sense, whatsoever. Incidents and events simply are not comparable in that way, and even if they were, the time...
Continue reading...
Last month I had the honor of speaking at the Surge Conference in Baltimore, put together by OmniTI. It was a most excellent conference, and the expertise levels were ridiculously high. I count myself lucky to be considered the same league as the rest of the presenters. I did give a Keynote talk, and I...
Continue reading...
Protip: if you’re getting Nagios alerts on an iPhone, and you have your contact set as: xxx-xxx-xxxx@txt.att.net, you’ll get messages from a ‘sender’ that looks like: “1 (410) 000-173”. This is not someone in Maryland, it’s a special address so that AT&T can route a reply back to the sender if need be. The side...
Continue reading...
At the Velocity Conference last year, I was talking to Mike Loukides from O’Reilly about the topics being presented and how it was so great to see such successful veterans of the field come out from behind the curtain and share their experiences. Mike said that there was interest in doing a book on the...
Continue reading...
I guess I’m late on getting to this, but How Complex Systems Fail by Richard Cook is excellent. Let me start with this: I don’t think I can overstate how right-on this paper is, with respect to the challenges, solutions, observations, and concerns involved with operating a medium to large web infrastructure. I found this...
Continue reading...
Like all sane web organizations, we gather metrics about our infrastructure and applications. As many metrics as we can, as often as we can. These metrics, given the right context, helps us figure out all sorts of things about our application, infrastructure, processes, and business. Things such as… What: …did we do before (historical trending,...
Continue reading...
UPDATE: blip.tv has the video of the talk as well, below. Jeez I have some major bed-head. That was a blast! I had never done a ‘duet’ talk before. Here are the slides: 10+ Deploys Per Day: Dev and Ops Cooperation at Flickr …and the video of it is here:...
Continue reading...
That was a pretty good time. Saw lots of good and wicked smaht people, and I got a lot of great questions after my talk. The slides are up on slideshare, and here are the PDF slides. Operational Efficiency Hacks Web20 Expo2009 View more presentations from John Allspaw. UPDATE: Gil Raphaelli has posted his python...
Continue reading...
Moving one of our eight photoserving farms from hardware Layer7 URL hash balancing (expensive, has limits) to L4 DSR balancing with CARP (cheap and simple) and figuring out how to juggle 18,000 requests/second while we do it. Built yet some more automated query analysis reporting (with some yummy MySQLProxy) Added yet another aggregated graph of...
Continue reading...