Tech: Three reasons failed all of us

Since going live early this month, has been universally panned for being unstable, unusable, and close to an outright disaster. As the publicity of the problems have intensified, it’s become clear that the project has was mismanaged and probably sloppily developed. The team certainly faced daunting challenges like going from 0 to full capacity overnight but with a reported budget of over $50 million dollars and three years to execute it’s hard to have too much sympathy. Anyway, without knowing the project guidelines, business constraints, and other considerations it would be irresponsible to armchair quarterback and predict technical decisions that would have performed any better. However, on the non-technical side there are some strategic and architectural decisions that are disappointing.

The Team

Like any large project, the success of the project is ultimately driven by the strength of the team behind it. The experience, structure, and mindset of the team building the project will be the key drivers for what tradeoffs are made, how things are architected, and of course how development is executed. Unfortunately for us, according to the Washington Post, CGI Federal the company that built is both a relative newcomer to US government IT consulting and also has a spotty track record of building large, commercial web applications.

I’ll be the first to admit that I have no background in government procurement, but it strikes me as odd that the Obama administration didn’t handpick the team that was going to build the software for their watershed legislation. Most notably, since the sweeping democratic win in 2008 has largely been attributed to the Obama campaign’s exceptional technology resources. Brad Feld actually suggested bringing in Harper Reed who served as the CTO of the campaign to fix the problems, but why wasn’t he leading the project from the outset?

Transparency (or lack thereof)

For an administration that campaigned on transparency (lets ignore NSA/drones), the lack of transparency surrounding has been severely dissapointing. The lack of information has left media outlets reporting outlandish claims like that the site is 500 million lines of code or that Verizon has been brought in to help triage the situation. Apart from technical details, the White House has also remained completely silent on visitor and enrollment statistics even though they clearly have them since the site is running a slew of analytics and monitoring services. Leaving the public in the dark has driven rampant speculation and left everyone wondering “What if they aren’t releasing data because its THAT bad”.

Another area where has been dissapointingly secretive is the actual source code of the project. Although some people, including Fred Wilson, are recommending open sourcing the code as a potential triage measure the code really should of been available from the outset. Even if the proprietary licensed technology had remained private, tax payer dollars financed the site and open sourcing it from the outset would have added some much needed oversight and accountability. Looking at the site, its using several open source libraries anyway including Twitter Bootstrap, jQuery, BackboneJS, and apparently violating the OSS license for jQuery.Datatables.

No Plan B

One of the accepted truths in software engineering is that shit is going to break and that bugs will crop up in even the most heavily tested applications. Mix in things like legacy integrations and 3rd party APIs and issues almost become a guarantee. Despite this, looking at the rollout of it’s clear that the team had no contingency plans for how to handle various error scenerios. As pieces of the application started to fail, they didn’t have any way to mitigate poor user experiences, losing data, or simply showing “fail whale” style error messages. And although bad at launch, the situation still seems just as bad 3 weeks later – with their band aid to simply boot users once they’ve reached their concurrent session limit.

Building a high traffic web app that interfaces with legacy systems, is highly available, and is also easy to use is undoubtedly still difficult but the failures of haven’t just been technological but procedural and organizational as well. Hopefully someone has the experience and political clout to right the ship before the website’s troubled launch defines the healthcare law it was built to implement.