How We Integrated Rust Into Threat Stack’s Operations Workflow


Note: The following post is related to Sensu, a monitoring tool for internal infrastructure health and alerting. If you use Sensu (https://sensuapp.org/) for internal monitoring of your own infrastructure health, this could be useful for you. However, this tool does not integrate with Threat Stack services and is not intended or supported for any such use case. It is a tool that we use internally, and we have released this with the intention that it may be helpful to the wider open source community.


Tooling is an integral part of operations at Threat Stack. On the Operations team, our job is to enable both ourselves and the Development team to work more effectively. When I started at Threat Stack almost a year ago, my role primarily centered on improving our tooling to create more granular control over our environment. My first project was creating “shush,” an operations tool for temporarily silencing monitoring checks in Sensu during maintenance. Up to that point, we had had less granularity in our check silencing capabilities for routine maintenance. While we could silence groups of checks and checks coming from a particular node, we were not able to silence single checks or a subset of checks on these hosts. After we discussed the requirements for this tool, I ultimately suggested that it be written in Rust.

In this post I describe our experience integrating Rust and also cover the benefits of using Rust in an operations workflow both technically and from a human factors perspective.

For those who are not familiar with Rust, it is a newer language backed by Mozilla. It incorporates strong typing and functional programming features into an imperative language, which has led some to describe it as partway between Haskell and C. While this doesn’t really give the full picture, it does describe the balance that Rust strikes, which has excited many developers in the community. Rust provides a systems-capable language that provides type checking around low-level primitives, even down to the pointer level. When the developer is unable to accomplish tasks in what Rust calls “safe code,” there is an option to disable some of these checks by using “unsafe code” — code that is checked less rigidly than safe code. The general idea of unsafe code is that “you are allowed to do this, but you had better know what you are doing.”

When I suggested Rust, it ultimately fulfilled our requirements for a binary executable output and robust HTTP handling. This seemed like a perfect opportunity to test out a new programming language in our environment.

Lessons From Rust

Type Safety

One element that initially drew me to Rust was its type system and compile time checking around ownership and the scope of variables. Initially, this introduced compilation challenges while I was exploring the features of a new language. Once I got the hang of it, though, I realized that it took less time to deploy my code. The time lost getting the code to compile was significantly less than the time needed to fix bugs — which includes diagnosing the problem, writing a patch, testing it manually and writing unit tests to ensure that this does not occur in the future, getting the code reviewed, and finally, deploying to production.

Rust’s compile-time type checking allows you to detect discarded error messages from functions that can fail, typos that lead to passing the wrong variable into a function as a parameter, and warnings around unused variables. While many C compilers can provide warnings for each of the problems mentioned above, Rust provides stricter type checking with requirements around explicit type conversions. Ultimately, this forces you to think about these types of problems earlier, like requiring definitions of conversions from one type to another. This stands in stark contrast to C’s ability to cast pointers from one type to another, which leaves the underlying data unaltered and still type checks. Additionally, it provides guarantees around thread safety through the language-level equivalent of a read-write lock and requirements on initialized and in-scope memory use all before the program runs. Invalid input values that pass type checking are outside the scope of the type system and can still introduce bugs, but designing a program with an overarching flow that relies on type for a particular data use case allows the type checker to more effectively verify the behavior of the program at compile time.

Testing Framework and Documentation

None of the above, however, is an argument against unit tests — after all, we need to actually put data into our application to see how it will act. Rust’s build and packaging framework, Cargo, provides a very low entry barrier to writing comprehensive test suites. It supports both unit and integration testing that the user can easily compile and run though the cargo command.

Cargo similarly supports inline documentation as comments in Markdown and a simple interface for generating documentation as easily readable static HTML pages.

Zero Cost Abstractions

When I began working on shush, the program was self contained. All the HTTP client code was incorporated in shush. As I designed more tools in Rust using various HTTP REST APIs, I found I was writing the same boilerplate code in each tool. Eventually, we made the decision to open source our second tool, teatime. This was a library meant to be handle API flows in a more domain-specific manner than hyper, the main HTTP library in Rust. I designed it to provide some templates for making standard HTTP API requests and provide default implementations for the common code that can be overridden through a language construct in Rust called a “trait,” a contract defining which methods will be implemented for any data structure that implements the trait.

Traits are a very good example of the Rust goal of “zero cost abstractions.” Because Rust contains no type information at runtime but simply uses it for checking and optimization purposes in the compilation process, the type of an object is not available at runtime, and thus Rust does not really have a method for type introspection. This significantly reduces overhead in the case of features like traits. At runtime, any code using features of traits will already be guaranteed by the compiler to have the appropriate methods implemented as opposed to using runtime to validate this. In this way, Rust is able to provide some of the same high-level features of other programming languages while performing the validation before your code even starts running.

The most important thing that Rust has taught me is the idea of making code explicit rather than implicit without sacrificing performance. Rust’s tooling ecosystem exemplifies this across testing, documentation, and type checking, and this can lead to lower costs related to development time as well as maintenance. One thing we often forget about in the field is the human factor in technical operations. It is entirely possible to write “perfect,” bug free code in any language. The question becomes “How do we practically steamline every aspect of the development process?” This not only includes writing code with fewer bugs, but also encourages a culture of documentation and testing. If this is an area of interest, Rust may just be a good solution to consider in your ops workflow.

If you’re interested in shush for Sensu silencing or teatime for building your own operations tooling, check them out on GitHub.

5 Security & Compliance Issues to Prepare for

Devise a realistic plan for your SaaS company’s security and compliance in 2018.

Download Now