Threat Stack enables businesses of all sizes to securely leverage the benefits of cloud computing by identifying and verifying insider threats, external attacks, and data loss in real time. Purpose built for today’s infrastructure, Threat Stack’s comprehensive intrusion detection platform combines continuous security monitoring and risk assessment to help companies gain an unparalleled level of visibility at the speed and scale of today’s business. Located in Boston, Massachusetts, Threat Stack works with nearly 400 security-minded customers.
At Threat Stack, we’re building a continuous monitoring platform specifically targeted at the challenges of cloud security for elastic infrastructure. This platform gives our customers deep visibility into their systems’ behaviors and helps to identify potentially anomalous actions of users and processes.
We’ll be honest: Threat Stack isn’t receiving the same amount of events per second as some larger consumer services. We do, however, have many thousands of servers sending us large payloads of critical security data every second. This means that security and reliability are key. Our customers depend on us to show them the who, what, where, when, and why of security events in their infrastructure. Excellent infrastructure operations are critical to the success of the company as a whole.
As an Infrastructure Engineer, you’ll work with a team of talented backend developers and operations staff to ensure that the Threat Stack intrusion detection platform is available for internal and external customers.
We are interested in talking with engineers of any experience level to fill this position.
As a Junior, your ability to learn new skills, ask questions, and learn how to troubleshoot problems will be more important than having domain expertise in a specific technology. We do expect some familiarity with a configuration management system for junior engineers — you’ve used it before either in a structured setting (i.e., in training or a class) or outside of one.
As a Senior, you would have the same skills that a Junior engineer as well as deeper experience in running systems with high availability and performance requirements. You should have the ability to speak about and understand trade offs when building and managing a distributed application.
Some of the following skills are also particularly interesting for us:
- Linux kernel performance tuning
- Cassandra or another NoSQL datastore
- Chef, but would expect you to have opinions and experience with at least one modern CM tool
- Experience running systems on AWS, with an understanding of the performance implications of the different features
- BGP Routing, Networking
- Performance tuning of databases like Cassandra, PostgreSQL, ElasticSearch, and/or Kafka
If you’re interested in debugging complex problems across a large platform, find (technical) root cause analysis interesting, enjoy systems thinking, know your way around a Unix shell, and feel comfortable writing code, this position may be for you.
At a high level, you will:
- Design, build, and maintain the core infrastructure used by Threat Stack’s Engineering team
- Debug production issues across the platform
- Find ways to optimize the platform, with a focus on cost effectiveness and scale
- Plan for growth of the infrastructure
- Build tools to enable our developers to better manage the applications they deliver
- Help teams improve the way they deploy their code and systems
Some of the projects that the Infrastructure Engineering team is thinking about and working on are:
- Next Generation Platform: We’ve achieved incredible growth with our data processing platform. We’re starting to work on the next iteration, and you’d be a big part of helping us scale to handling 100s of billions of events per day.
- Load Testing: We’ve been thinking about how best to load test our platform. We have some tooling, but it’s not entirely representative of what we’d like to see.
- Service Discovery: We currently have this solved with a few different tools based on our needs, but we’d like to better standardize and improve how services find each other within the platform.
- Tooling for Developers: We’re always improving our tooling to make it easier for developers to do Ops tasks. Some examples of this include making CLI tools for silencing alerts when necessary, or building a tool to pull temporary credentials from Vault, or building a tool to send you a message on Slack when you connect to some important bits of our infrastructure.
- “The DevOps”: At the end of the day, we maintain a close relationship with our developers. That means the infrastructure team is open to walk-in advice. We offer help when we see Ops-focused conversations in chat.
Being a startup, our typical day often involves working with people across the whole Engineering organization. The people you’ll work closest with each day include:
- John Baublitz, who is currently working on building out operational tooling and improving our automation infrastructure.
- Patrick Cable, our Security Engineer, who focuses on infrastructure projects and tools that help us meet our security requirements.
- Apollo Catlin, who works on the build and deployment pipeline. This includes other system integration, code testing, and data storage tasks as well.
- Nate St. Germain, our newest hire, who’s currently working on improving the durability and availability of the platform.
- Pete Cheslock, our Senior Director of Ops and Support. He helps the team as needed, and is primarily interested in monitoring/service metrics, and continually optimizing the platform.
or send an email to [email protected]