Protip: if you’re getting Nagios alerts on an iPhone, and you have your contact set as: xxx-xxx-xxxx@txt.att.net, you’ll get messages from a ‘sender’ that looks like: “1 (410) 000-173”. This is not someone in Maryland, it’s a special address so that AT&T can route a reply back to the sender if need be.
The side affect of this is when/if you get a boatload of alerts (which can happen in cascading failure scenarios where you don’t have any Nagios dependencies or event handlers set up) you’re gonna have to spend a proportional boatload of time swiping and deleting those alerts one by one.
This, of course, is a major bummer. 🙁
A solution is to set your contact info in nagios instead to xxx-xxx-xxxx@mms.att.net, which will properly set a “from” address on your iPhone, so when it comes time to delete the boatload of messages, you can do it in a single ‘delete conversation’ swipe.
Caveat: If you do this (set to mms.att.net, instead of txt.att.net) you’ll lose the ability to reply to a Nagios alert. This presumably will affect those smart folks who have set up the ability to acknowledge an alert from their phone via a reply/procmail mechanism.
Bonus protip: make it so that you don’t ever get boatloads of Nagios alerts at once. That will help, too.
Implied bonus protip: event handlers and dependencies are the sign of an evolved ops organization. It’s not too difficult to set up, and you’ll feel joy after you do!
Good tip!
In order to try and avoid an SMS avalanche I hacked together this system to route outbound emails away from the primary contact address (SMS) to a secondary non-SMS address (email) if there is a lage batch of them at the same time: http://warwickp.com/2010/03/ruby-self-throttling-nagios-alerts/
Alternatively, use something like http://www.ServerDensity.com and their iPhone or Android app and configure push alerts. Our devops guys use ServerDensity + PagerDuty to make sure we know someone’s looking at an issue and that it’s been fixed – we’re quite a small team and we’ve found it really effective so far.
pagerduty.com FTW. Phone calls that say “you have 5 triggered nagios alerts” are infinitely better than 5 texts in rapid succession.
I don’t get a boatloads of alerts anymore, but it was a huge pain to delete each sms when I did.
There was also an issue with AT&T and delayed SMS messages. Based on your throughput it would trigger modifications to your account. These changes controlled how often sms were delivered. Probably a bunch of BS to cover the issues with their saturated backend network. I don’t have that issue anymore either, I’m on Sprint/Android.
@edwin: We experienced this throttling as well – emails reported as being successfully received by AT&T’s mail servers, but never delivered to the oncall engineers phone. Not cool.
another great idea is to aggregate alerts by service. for instance, if you’ve got 500 hosts running the same service and they all blow up simultaneously, you don’t really want to get 500 pages. there are a few ways to solve this; i’ll leave the solution(s) up to your readers 🙂
Real protip – receive them by email (PUSH IMAP, obviously), not SMS which is what I assume you’re doing. Then select all, and delete. Of course, as you say, Nagios dependencies are a preferable solution, solving the problem at the source. Using email, you can also acknowledge by replying.
I can’t imagine why you would use SMS for Nagios notifications. Wasn’t it the thing you used to *have to* use when your phone didn’t do email?
Jesse: we also are indeed PagerDuty customers, and so far we like it. We’re at the moment only using it to escalate human-initiated alerts (support and community staff, looking for the on-call developer and ops engineers) and the escalation is excellent.
We haven’t done Nagios integration yet with PagerDuty yet, but it’s likely that we will. We’re talking with PagerDuty on some feature requests, and we’ll move slowly towards it.
Acknowledging via reply is an interesting scenario that we had at Flickr as well, but it does require that you can receive inbound email in a way that it ends up on the Nagios server, which unfortunately isn’t yet straightforward with our current mail architecture. It’s something we want and will get there, via PagerDuty or not.
That is a good tip, I had the same problem and my solution was to set up txt messaging with a google voice account using a python script I found, i am not familiar with nagios but I imagine you can trigger a script just as easy as you can send an email.
http://code.google.com/p/pygooglevoice/
In november iOS 4.2 should be released which will allow you to set custom txt tones so you can have a unique alert when a txt comes from your google voice number.
Ciaran: True SMS (read: not anything carried by AT&T or Verizon, etc) is way more reliable than SMTP. Time for the blang blang, get out your Skytels, bitchez!