Why Atlassian Failed So Hard – InApps 2022

Main Contents:

Why Atlassian Failed So Hard – InApps is an article under the topic Software Development Many of you are most interested in today !! Today, let’s InApps.net learn Why Atlassian Failed So Hard – InApps in today’s post !

Not a Cyberattack

Then, there was largely silence. In a purported note from Atlassian CEO Scott Farquhar, said, “On Tuesday morning (April 5 PDT), we conducted a maintenance procedure designed to clean up old data from legacy capabilities. As a result, some sites were unintentionally deactivated, which removed access to our products for you and a small subset of our customers. We can confirm this incident was not the result of a cyberattack and there has been no unauthorized access to your data.”

That, and the obligatory: “This is our top priority and we have mobilized hundreds of engineers across the organization to work around the clock to rectify the incident” was it.

Why the silent treatment? Part of the reason, Gergely Orosz, developer and writer, suggested is that, “Atlassian staff and customers turned their attention to Atlassian’s flagship annual event, Team 22. Held in Las Vegas, many company employees, much of the leadership team, and many Atlassian partners traveled to attend the event in person.” Despite the system failure, Atlassian appeared to have stayed focused on Team 22.

The affected Atlassian customers were not happy. As one tweeted, “What happened there? It’s not a small hiccup with a few minutes or an hour-long downtime, Confluence and Jira are literally down all day.”

And, still, Atlassian remained quiet. Finally, on April 12, Atlassian CTO Sri Viswanath gave some details about what had gone wrong.

What Went Wrong

The company had deactivated its standalone legacy app, “Insight — Asset Management,” for Jira Service Management and Jira Software on customer sites. The process went badly wrong because of two critical problems:

Communication gap. First, there was a communication gap between the team that requested the deactivation and the team that ran the deactivation. Instead of providing the IDs of the intended app being marked for deactivation, the team provided the IDs of the entire cloud site where the apps were to be deactivated.
Faulty script. Second, the script we used provided both the “mark for deletion” capability used in normal day-to-day operations (where recoverability is desirable), and the “permanently delete” capability that is required to permanently remove data when required for compliance reasons. The script was executed with the wrong execution mode and the wrong list of IDs. The result was that sites for approximately 400 customers were improperly deleted.

But didn’t Atlassian have backups? Well, yes, they did. They maintained both a synchronous standby replica in multiple AWS Availability Zones (AZ) and separate immutable backups designed to enable recovery to a previous point in time.

Messy Backups

So, what’s the problem? The backups are, in a word, messy.

If they restore from the checkpoint, the troubled 400 customers would get their data back but everyone else would lose all their data since the backup had been made. So, Atlassian had to manually pull the data from the backups.

Viswanath explained, “What we have not (yet) automated is restoring a large subset of customers into our existing (and currently in use) environment without affecting any of our other customers.”

They’re now automating it, but even so, it’s slow. “Currently, we are restoring customers in batches of up to 60 tenants at a time. End-to-end, it takes between four and five elapsed days to hand a site back to a customer. Our teams have now developed the capability to run multiple batches in parallel, which has helped to reduce our overall restore time.”

Eventually, Viswanath promises, “we will conduct and share a post-incident review with our findings and next steps. This report will be public.” In the meantime, the remaining out-of-service companies are unhappy both with the failure and Atlassian’s poor communications.

Featured image via Pixabay.

Source: InApps.net

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.