Who Watches the Watchmen? Securing Configuration Management Systems

This is part of a series we’re calling ‘Securing Modern Infrastructure’, where we explore the implications of modern development and operations pipelines from a security perspective.

It’s rare to walk into a modern operations team and not see Configuration Management (CM) systems driving the infrastructure. For those of you who have been living under a rug for the past few years, Configuration Management software directly enables the DevOps concept of treating infrastructure as code. Using CM correctly is a prerequisite for building systems that scale, performing many deployments per day, and generally being “Agile.”

When it comes to designing your Configuration Management infrastructure, there’s a lot to consider. There are many open source and commercial options in the CM world (Chef, Puppet, Ansible and Salt, to name a few) with a variety of architectures. Some are often deployed with a centralized server with agents installed on the machines that are being instrumented, and some use existing infrastructure (e.g. SSH) into order to instrument servers.

The architecture choice has an impact on operations, but it also has an implication on security risks:

  • Does a central server prove to be a central point of failure?
  • What additional risks should be considered and balanced with the benefits of adopting SaaS/hosted CM?
  • Without a centralized server, what’s a good way to audit the changes being executed through your CM?

Regardless of architecture, these systems share one thing in common: by nature, they must be able to execute arbitrary code on your infrastructure in order to manage it.  Configuration Management systems must have root access. Given that reality, how are modern operations team thinking about the security risks with their CM systems-of-choice?

I thought it would be interesting to poll real flesh and blood operations teams on Twitter in order get their feedback. The answers I received back will shock and amaze you!

…Ok, so maybe the responses weren’t THAT shocking and amazing. What I did receive was a variety of responses on Twitter and email with a combination of tools, techniques, and software that people employ in the real world, along with the philosophical underpinnings of their CM security strategies.  

The tweet that started it all:

The community response:

Dogfooding: Using CM to harden CM

One interesting idea proposed was having your Configuration Management system enforce its own security hardening. Engineer Eduardo Urias (@larsx2), for example, pointed out on Twitter that Telekomlabs gives away Chef recipes and Puppet modules that help operations teams harden their Chef and Puppet servers (along with base hardening modules).   

Certificates Relied On Heavily

In server/client-based CM architectures, PKI architecture is relied on heavily for authentication and authorization.

To protect such credentials, one idea proposed was to implement heavy rotation/expiration of certificates, so a certificate left on an old developer laptop or dev machine cannot remain an attack vector for years after its existence is forgotten.

A corollary for the Ansible case (which uses SSH as the backbone for its deployments) would be able to frequently rotate SSH keys, and to use File Integrity Monitoring to monitor the SSH key files and configurations for unauthorized changes. Fortunately, you can use CM systems to manage these types of key rotations as well.

Monitoring to Mitigate the ‘Insider Threat’

Given the scope of damage a malicious “insider” or an attacker with valid CM credentials could do, I probed in detail how engineers thought about these types of risks.

Chris Read (@cread), Ops and Dev at DRW, replied to my inquiry with a combination of helpful pieces of advice (“Use Chef hooks that integrate with Sensu for monitoring”) and also with the interesting idea that the careful systems monitoring they do could also ferret out any nefarious activity:

“Most types of malware would become apparent through that mechanism though as we pay close attention to CPU, disk and network load…” 

H. “Waldo” Grunenwald, (@gWaldo), Pipeline Team Lead at CommerceHub, described in detail over email a specific view he’s created in Kibana to audit and monitor Chef changes:

“As far as metrics, monitoring and auditing, Graphite fulfills it’s usual purpose, Sensu provides functional visibility, and we have created a Kibana dashboard called “chef_whodunnit” that watches the Chef API log for successful PUT and DELETE requests at the “^/environments/*”, “^/roles/*”, “^/nodes/*”, and “^cookbooks/*” endpoints.  This displays who changed what and when, which we can correlate to git commits, jira tickets, and such.”

Alex Norbert (@nobert), Director of DevOps at Minted, uses logstash (with a masterless architecture) so every change is audited:

For people looking for a commercial, industrial grade solution, ScriptRock’s GuardRail was mentioned by Apollo Catlin (@apollocatlin) and others as a worthy contender:



Keep Your Secrets Safe

Our Director of Operations and Support here at Threat Stack, Pete Cheslock (@petecheslock), reminded us how it is important to manage infrastructure secrets (like passwords and keys) in CM securely:  

“The nature of configuration management and storing secrets often (nearly always) requires the secrets to live on disk.  In the case of Chef™, you can use tools like chef-vault, which encrypts the secrets using those nodes public keys (thus only the nodes that require access to the secret have access to the secret).  Using File integrity monitoring that comes with the Threat Stack agent allows me as an Ops person to see when services that I don’t explicitly allow have touched a secret on the disk.  I can then integrate our monitoring (Sensu) with the Threat Stack API to alert (and wake up people) when access falls outside of our allowed process list.”

Monitoring To Enable A “Trust, but Verify” Culture

@cread concluded with an idea that resonates with many in the DevOps community: trust, but use monitoring to verify.

“To me one of the important parts of DevOps is creating and maintaining a high trust environment. That does not mean people are allowed to ignore security, just that there is no real way to definition what an ‘authorized action’ is.”

@gWaldo confirms the trust-but-very sentiment wholeheartedly:

“What is far more likely to happen is that a well-intentioned and properly-privileged individual will make a mistake.  With that in mind, I feel that it’s much more important and productive to enable a ‘trust-but-verify (when needed)’ environment (discovery/auditing), and to rapidly produce change (remediate) when things go wrong.”

Conclusion

From a security perspective, Configuration Management systems should be treated like any other high-impact system.  Every business is different, and the risk that CMs pose need to be weighed against the benefits they provide.

Most importantly, it’s essential to manage this risk within the context of DevOps, which means trusting fellow developers and enabling fast deployments. To counterbalance this with the needs of security, frequent security-minded auditing and monitoring is crucial, just as it is crucial to ensure that bad deployments don’t have a performance or system health impact as well.

I hope this was insightful! Stay tuned for the next post in our series where we explore how people think about storing and securing IAM and other credentials for your infrastructure.