Sorry about the slowness and downtime last night and just now for 10 minutes.
We’ve seen network problems inside our data center too many times.
We chose them 2 years ago because they’re a PCI specialist data center. They know a lot about PCI compliance and working with auditing firms, and they provide 24/7 staff working on systems and that we can call on. They are fully-managed hosting, focused on PCI and several other forms of compliance.
But as I mentioned above, data center network problems are hurting us and our merchants. They come in clumps… no problems for a month or two, then problems last night and again just now.
It’s incredibly frustrating to have our systems running fine, but the world can’t reach our app, or they can reach it but only slowly.
A few of you graciously Tweeted with us last night, saying the problem is probably related to IPv6… longer IP addresses that data centers have to upgrade their equipment to support. That’s definitely possible, even probable, but we haven’t been told if that’s the case or not.
We’ve spent a few months researching different data centers, finding out what works for other notable ecommerce companies… what notable data centers / cloud providers they tried that still failed to deliver high uptime. We need to move, but we want to be deliberate about it and learn from the experience of others.
We’ve also been searching for a few months for a full-time systems engineer to focus on this area and make it his or her life.
We were systems engineering -light, but we started changing that earlier this year.
We have a contractor who was recommended by one of you, plus we use bandwidth from our development team. The market for top-notch systems engineers is tight and it takes months to find good candidates and then narrow the list, and of course they have to want us, too.
Thank you to our partners and merchants who’ve sent us leads on potential candidates.
We’re in the later stages of our search and we plan to fill the position soon. We’ll transfer all of the knowledge and contacts we’ve got so far and let this new person run with the project of standing up a new Chargify installation at a new data center.
We’ll keep both data centers and load balance across them.
Another of our large merchants was very giving with their knowledge when we met them earlier this year in Austin. They’ve “been there, done that” with multiple data centers. Another of you also tweeted about similar knowledge last night. Data synchronization becomes the hard part, but it’s a problem that others have tackled before us.
Also note that each installation has to be audited by an outside auditing firm for PCI Level 1 compliance. That adds cost and time, and it’s one of the reasons that we don’t think lightly of moving between data centers / cloud providers.
A related idea we have, but this all costs money, of course, is to stand up *2* new data centers and then have 3 total. That would allow us to monitor them all for, say, 6 months, and then choose the 2 best ones to keep. I really like that idea.
It’s a matter of cost versus risk. Having 2 data centers, even if each of them has occasional short downtime, should result in nearly perfect long-term uptime for our app and for you.
Once you get to 3 or more data centers, then each additional data center delivers a diminishing return in uptime, but there’s no break on the cost… each data center adds linearly to our cost.
Thus, 2 really good data centers located in different geographic areas, using different major internet backbones, should deliver great network uptime and be able to withstand a natural disaster in one place.
Some of you will ask if we’ll just go with multiple AWS regions or something similar from Rackspace, etc. Maybe. Definitely easier to work with one company.
But after some conversations we’ve had recently with some of you, it sounds better to split our infrastructure across 2 or more companies, thereby relying on even fewer common failure points. All of the large data center / cloud providers have had their share of big outages, so we’d feel more comfortable spreading ourselves across at least 2 of them.
So that’s it for now.
We’re aware of the weakness of depending on our 1 data center, and we’re laying the foundation to solve it.
Thanks for being our customers.
—- Lance Walley, co-founder/CEO
—- Cell: +1 415 244 0349
—- +1 800 401 2414