Get Enterprise AppFog on Private Clouds and On Premises Today

October 27th Downtime Postmortem

Hi, my name is Matthew Lyon and I’m responsible for rolling out changes to the PHP Fog website and orchestrating changes across our service infrastructure, and this is not what I wanted to be writing.

I wanted to write about how we’ve rolled out our new logging service. I wanted to tell you about how we can provide you with real-time viewing of your app’s access and error logs from our web console (or other http tools such as curl), about how it will aggregate logs from across multiple servers, and about how we’ll be able to tie this into third-party log archiving and search services in the coming weeks.

Instead I’m writing about our downtime on the evening of Thursday, October 27th, and how we’re working to prevent it from happening again.

What Went Wrong

Platform as a service is a complex interconnected system. Managing and deploying complex features like a unified logging system that collects and centralizes logs for tens of thousands of different applications had some unforeseen scaling issues. Although we had worked hard to predict and plan around them as much as we could, although we ran and developed this feature in a cloned qa environment, although we did load testing, we missed a few small details that brought our systems down temporarily.

It is our responsibility to provide the most reliable and robust platform for web development out there. Since going general availability, our reliability has been pretty terrific, which makes this outage really painful for us. We are using this opportunity to put the proper pieces in place to prevent this from ever happening again.

How We’re Going to Prevent This From Happening Again

We’re going to announce deployments before we do them. We are going to do a much better job letting you know when there will be maintenance on our site that may affect your sites.

We’re implementing staged deployments for new features. This will allow us to catch problems that get past our other safeguards before they affect all of our customers.

We’re improving our tools for deploying complex features like the logging service daemon. We are built on the cloud and we provide a cloud service, so we can take advantage of that. We are going to bake in new service features to our server templates, fire up new servers for everyone’s apps, ensure consistency and reliability of data, flip a switch so that those new servers go live, and then turn the old servers off.

We’re improving our load testing for services that run on your servers to insure they can handle organic real-world load. While many people are content to use ab or httperf and call it a day, we know that’s not enough, and apparently our load testing process wasn’t enough, either. We’re improving our load testing processes to better handle the needs of real-world applications, and will be testing them on our own applications before they are put to general use.

Reliability has always been my first goal for PHP Fog, and is our primary goal as a service. We’re very sorry about the outage and are working to ensure that our service is more reliable than ever.

I look forward to being able to write about our new logging service soon.

 

Share this post
Facebook Twitter Google
Try AppFog: The new PaaS Hackers love
  • http://www.spicermatthews.com/ Spicer Matthews

    Thanks for being honest about what happened. Better, thanks for offering up a credit. Any rational person knows companies make mistakes and are willing to accept the mistakes. However, most companies are not rational with their aftermath. Being honest and paying up for the mistake is key. Kick a**! Thanks.

  • Autobot

    Still not sure what exactly went wrong, but that’s irrelevant. Thanks for being honest n dealing with it comprehensively. You’ve got a member for the life of my project. Bigi cheers!

  • http://picdish.com Haitham Al-Beik

    Could have not said it better than Spicer Mathews.  Thanks for being open on what happened and at the same time giving us a preview of great things to come from PHPFog in terms of reliability and features.

    Keep up the great work. :)

  • Sharan

    Hat off, your honest about downtime explanation. Its really great 

    Cheers !

  • http://thorerik.net Thor Erik

    *bows* thanks for being honest, I wish more companies where open about whats going on, not only about security issues but in general like this :)

  • http://tomaszsobczak.com Tomasz Sobczak

    If someone is REALLY BIG and it’s also honest and quickly responsive about failures it must be BEST ONE in near future :)

  • Bodom78

    With tech there will always be issues, anyways this is a great blog post. It could have just been some PR styled paragraph but a more human approach and explanation makes all the difference.

  • Aspcorpo

    Please, dont let this thing happen again! i will die! :’( 

  • http://www.callum-macdonald.com/ Callum Macdonald

    Awesomely handled guys, really impressive. This incident has enhanced my view of PHPFog.

  • http://nspeaks.com Navjot Singh

    Giving a $10 credit as a compensation was a nice gesture. But one thing I didn’t understand that why were free accounts offered credit? I have an account with you which is lying unsed. Still I got $10 credit. One more suggestion – you should send separate newsletters to free and paid customers in my opinion. This line at the bottom of newsletter: “You are receiving this email because you are either a paid or free customer of PHP Fog.” looks a bit unprofessional. Just my 2 cents.

  • http://tomaszsobczak.com Tomasz Sobczak

    Give someone additional credits to recompensate service downtime and they will call you unproffessional. Strange

  • http://nspeaks.com Navjot Singh

    I am not calling them unprofessional. I am just saying that offering free accounts credit who have never paid or may not pay in future, offering them $10 though very generous but I found it surprising. That’s all. And I was just suggesting that newsletters should differentiate between a paid and a free customer. At times you would like to send upgrade offers to a free customer but from what I see the newsletters being sent are the same.

  • http://tomaszsobczak.com Tomasz Sobczak

    From my experience in such a big companies its just cheaper to credit all users than sending 2 different newsletters (especially after such failure when informing users ASAP is priority). I also am sure that some users are having both types of plans (paid + free)  linked to the same account so it’s another condition for sending newsletter.

    Anyway iam really happy that i was credited even for one of my free account so i can test new paid features and probably iam a bit closer to take decision about use of paid version.

  • Anonymous

    What’s not to understand about giving free accounts credits? Just because the account is free doesn’t mean the account owners don’t utilize the account for a legitimate reason. You could easily run a small, low traffic application on the free account. Just because I don’t pay for a service that they offer for free means that I shouldn’t get something for the time that my application was down? What if I make money off of my application? Be it from ad impressions, or through a payment gateway. What if I lost money due to the outage??

    That being said, I don’t understand the logic in questioning great customer service. Most of the companies out there WOULDN’T credit their free customers. To me PHP Fog is demonstrating amazing customer service, it’s just rude to question that.

    And… I’m sorry but saying that the email looks unprofessional is a bit ridiculous in my opinion. These guys provide a great service whether you pay for the account or you have a free account. Everyone that signs up for this service knows that there is a free account, it’s not secret. They are including that line because it’s newsletter service 101. Always let the customer know why they’re receiving the email. I’m willing to bet the reason they explicitly say “… a paid or free customer …” is because when offering free accounts you going to have people that sign up for a free account and do nothing with it. I, for example, do not have a site on PHP Fog. I signed up for the free account to see what it was like and to see what the CakePHP support was like. I don’t even remember how long ago I signed up for the account, but if the emailed had just said “… because you are a customer of PHP Fog …” I seriously would have questioned whether I’m paying these guys and forgot about it (or maybe even think they’re one of sleazy companies that profits on people that forget to cancel a service and eventually get billed.)

    Suggesting they spend the time and the resources to create two separate mailing lists, no matter how simple/small of a task that is, is just nitpicking.

  • James

    Nice to see a company own up to their mistakes–everyone makes mistakes, instead of trying to claim “oh it wasn’t us, must be something on your side.” 

  • Anonymous

    Hi Navjot,

    This is Lucas, founder and CEO of PHP Fog.

    Giving our free customers credit was intentional, just because you do not pay us does not mean you do not rely on our service. Although we do not yet have a site-wide SLA, I felt it was the right thing to do as an apology since the outage did not just affect paying users. I am sorry this did not come across better in our communications. 

    Thanks for your time and understanding!

  • http://geekanddad.wordpress.com/ Dad

    bravo! well handled. perfect response – Apology, what we learned, how we are going to make our system better so this won’t happen again, or anything like it.  I especially appreciate the details about what you’re going to do because that educates us all and is more valuable than the $10 credit in the long run.

Powered by Olark