At Threat Stack, we often talk about visibility. We have promoted visibility from an operations perspective and have given our customers visibility into their environments through our intrusion detection platform. But when it comes to change management, how do we give ourselves the same level of visibility into our internal process changes at Threat Stack? This became a very real question as we decided to roll out our Type 2 SOC 2 program over the last year, and the answer turned out to be sockembot — an automated SOC 2 compliance checking bot that we describe in this blog post.
I started at Threat Stack about a year ago, just as we were beginning our Type 2 SOC 2 project. (For an account of Threat Stack’s SOC 2 journey, take a look at Threat Stack Successfully Completes Type 2 SOC 2 Examination.) One of our biggest concerns at this time centered on the need for having a process for implementing change management procedures.
Without the proper infrastructure in place before we began preparing for our SOC 2 examination, the addition of change management procedures could have redirected hundreds of hours of engineering time to the manual verification of compliance — a process that would have been both error prone and wasteful of resources.
Patrick Cable — my coworker and Threat Stack’s Senior Infrastructure Engineer — had already designed a bot that the operations team used to automate away some of the pain points that we had seen during our internal test period of SOC 2 procedures. From our viewpoint, this helped operations move more quickly, created more confidence in our compliance status, and above all, gave us visibility into the process that we would soon be expected to follow throughout our six month SOC 2 examination.
One of my first projects was a redesign of Pat’s bot that would enable the entire engineering organization to take advantage of this compliance checking. The bot, which was eventually named sockembot, became a way to check compliance at every step of our gating process. Before any code was released into production, sockembot would display a helpful message tying together everything we needed to know about the code’s compliance status in a digest. If the code was compliant, sockembot would output a digest that allowed the user to inspect the changes that were about to be shipped into production. If the code was not compliant, sockembot would block the noncompliant code from being shipped to production.
When we finally integrated sockembot into our workflow, the response was uniformly positive. The operations team had already been working with a similar tool, but the developers — once they adjusted to the stricter checking — reported that this allowed them to do their jobs more effectively.
What Does Visibility Get You?
Without the proper tooling, SOC 2 can be tremendously painful for developers and operations alike. At its core, it is a verification that your practices follow your policies. The idea that JIRA tickets are opened, approved, and closed at the right times in the develop-test-deploy workflow seems trivial, but in the world of SaaS (Software as a Service), a few interesting considerations make this significantly more difficult than one might first think.
When their work involves a fast release cycle, developers want peace of mind that what they push into production will meet our change management standards for deploying code. This is where the role of operations comes in. Our job is to enable the developers to move as quickly as possible without having to think as much about how the software is deployed. sockembot let the developers see right away whether the code changes followed process or not. This was good from a change management perspective, of course, but it was also great from a security point of view. Part of our process involved a code review and approval before pushing anything out to production. This ensured that merging code into production would always have at least two sets of eyes on it, or else the merge would be blocked from release through our git service’s API. From a process perspective, this is where change management automation becomes extremely important for any company. Not only does it enforce more eyes on the code being deployed, but it also prevents an accidental violation of our change management policy.
Another consideration centered on the “single point of failure problem.” Handing off compliance checking to a single piece of software can be risky. It becomes the “single point” from a process point of view. What happens, for example, if bugs are introduced into this software? This was a question we grappled with early on.
The answer turned out to be more visibility. When something unexpected happened in sockembot, we would catch this exception and display it with the stack trace where the ticket digest would normally appear. This was a great way to show that something needed to be fixed on the process side and to encourage developers to reach out instead of merging code changes blindly. It gave us a definite indication that something had broken as opposed to silent failures and confusion about whether sockembot was still generating a report. Ultimately, this failure report feature resulted in a faster reporting time in the event of a bug from the development team and greater clarity around failure cases.
What Made sockembot Successful?
Feature creep: If you’re in software, you’ve certainly heard this term before. It refers to feature additions that would detract and distract from the original goal of the software tool or product. The temptation to give in to feature creep can be very strong — particularly in the land of automation.
sockembot was originally designed to be a compliance checking mechanism and one that could be applied to all relevant projects in our git service. It was meant to do this and do it well. One thing that we often forget in software is the fact that features can be a great asset, but new functionality — by definition — introduces complexity. We decided early on to avoid turning sockembot into a business logic engine but instead to focus exclusively on compliance with SOC 2. While this meant that we did not add some requested features, we were able, ultimately, to stabilize the code base more quickly, write more targeted tests once the common use cases were established, and have more confidence when deploying updates.
Beyond sockembot — The Human Factors
So there you have it: On the technical side, sockembot provided automation, speed, and visibility in the development cycle as outlined by our SOC 2 change management procedure. This resulted in stronger code quality and better security. But there is one more thing — the human factors.
Nothing can erase the need for human cooperation when new processes are being introduced. When I talked about sockembot’s place in process with Pete Cheslock, Threat Stack’s Senior Director of Operations, he was emphatic that no amount of protection from an automation perspective can help if everyone in the organization is not working together towards a common goal.
For helping to achieve this, everyone in operations would like to thank everyone on the Threat Stack engineering team for their dedication and hard work in following process, reporting bugs in the early days of sockembot, putting up with my ASCII Clippy image on every non-compliant merge request, and always clarifying before releasing if something seemed abnormal.
This is the culture we strive for at Threat Stack, one of freedom that comes with personal responsibility, including the responsibility of asking for clarification before acting when there’s uncertainty about the implications of a release. This applies to our approach to security as well as our approach to process. Without collaboration and hard work from everyone, our accomplishment would not have been possible, with or without sockembot.
If you’d like to learn more about Threat Stack and its intrusion detection platform, sign up for a demo today.