My Journey in Scala, Part 3: None is Better Than Undefined

Here’s the situation: At Threat Stack we consume a torrent of security event data every day, and as many new customers come on board, the amount of data we need to ingest, transform, store, and retrieve just keeps growing. About a year ago, we implemented a caching layer to allow us to display more aggregated information to customers on our Dashboard, which was powered by ElasticSearch Aggregates.

Fortunately, our Ops team has given us the ability to metric almost anything, which enables us to predict and see the pressure we are putting on our cluster. With the aggregates from ElasticSearch, we are essentially trying to solve a counting problem, and therefore, we turned to another service we use internally: Apache Spark.

Our Spark implementation was a mix of pure Java libraries alongside intermixed Java and Scala code. There were myriad null checks all over the code, and while these may be a necessary evil in some scenarios, adding new tasks became incredibly time-consuming and tedious. After hitting 1,400 NullPointerExceptions one at a time, I table flipped and rewrote the entire thing in Scala.

The Process

I can’t count the number of times I’ve rolled some new JavaScript code and I am all ready to test and IntelliJ displays my most favoritest error ever:

Cannot read property 'fooBar' of undefined

What a sight for sore eyes. I haven’t done a lot in JavaScript in awhile, but does that ever bring me back. While revamping one of our internal projects, I ran into its Java counterpart:

NullPointerException

I immediately realized I hadn’t seen one of these in Scala in a very long time and was incredibly grateful. Scala doesn’t eliminate NullPointers, but it does give you a heads up in a very obvious fashion that a variable could be null, and a method to deal with it. I have used this so much that I couldn’t believe Scala helped me get rid of an incredibly annoying problem and that I had forgotten about it!

The following is a simple but concise example of the problem:

var obj = {a: [“ok"]};

If I’m trying to access a property on obj, let’s say b, with the expectation that it is a list of strings and do the following:

obj.b.forEach(function(str){
    console.log(str);
});

I’ll get our wonderful friend from above:

obj.b.forEach(function(str){
undefined
     ^
TypeError: Cannot read property 'forEach' of undefined

Now I have to code in guards to make sure I’m returning an empty list and I won’t error out:

if (Array.isArray(obj.b)){
    obj.b.forEach(function(str){
        console.log(str);
    });
} else {
     console.log([]);
}

In the past, I have always just worked through it in JavaScript. I didn’t really understand the pain I was going through. It’s just what had to be done. If you’re working on websites, this is probably going to be a part of your reality. There are ways to handle this with helper libs, but I prefer to use as many native functions as possible, especially in JavaScript. You never know when one of your dependencies will get hit by the left pad problem.

In Scala the undefined problem has an excellent solution:

val myOption = Option[Thing]

myOption can either be Some(Thing) or None. It stops you from returning multiple different types and having to validate the response of a function beyond whether the value exists or not. You no longer need to type check. Take the following JavaScript code:

var makePie = function(obj){
    if (obj !== undefined && obj.type !== undefined){
        return obj.type
    } else {
        return false
    }
};

var pieObj = {type: "apple"};

var pie = makePie(pieObj);

if (pie !== false){
    throw new Pie("sadness");
}

The problem here is that while we have defined pie and the makePie function, it could be updated in the future to be any number of things: true, 13, a list, etc. So how do we handle this in Scala? In order to match the JS above:

def makePie(pie: Map[String, String]): Option[String] = {
  pie.get("type")
}

val pieMap = Map("type" -> "apple")

makePie(pieMap) match {
  case Some(pie) => //throw pie
  case None => //be sad no pie
}

So what’s different here? We’ve boiled down makePie to either return our value or None. This is important from a refactoring point of view. We could expand makePie to be 3,000 lines and do all sorts of stuff, but it always returns String, or None in the form of Option[String].

A more concise Scala example:

Map("type" -> "apple").get("type") match {
  case Some(pie) => //throw pie
  case None => //be sad no pie
}

The Lesson

Seems cool, but how does this end up helping me in the real world besides this just being “how it’s done” in Scala. Great question, and I’m glad you asked.

In my efforts to reduce ElasticSearch query pressure (mind you, we’re searching over billions of documents, while our cluster is taking a massive write load), I planned to add 15 new jobs to Spark. Due to the implementation (changing this would require a fairly extensive architecture overhaul), Spark currently runs every 10 minutes. Dealing with NullPointerExceptions one by one every 10 minutes is incredibly tedious. Given a long build time and deploy time, I was left with about 2 ½ minutes to debug, test, and deploy or I would miss a window.

After I flipped the table and picked up the pieces, I decided to rewrite the application to use Options instead of null checks. Now Spark could execute ALL of the tasks, now 25, without run time crashes. Instantly I was able to verify that existing jobs were intact and working properly, and I could debug all 15 new tasks at once instead of error by error. Removing runtime errors meant I could move faster and more efficiently. In addition, because I was setting the return as None, I knew EXACTLY where to add logging.

Another wonderful part of Option comes into play when dealing with databases. All of the Scala database implementations we have tested have support for transforming an Option that resolves to None into null when inserting/updating into a database.From a developer’s perspective, this simple and quick step is invaluable. You don’t have to worry about having if conditions while preparing SQL statements and it leaves you to debugging your inputs, not your method.

Stay Tuned . . .

Stay tuned for the next post which will be a deep dive into rewriting our Apache Spark application and all of the lessons learned about distributed applications, monitoring performance, and improving scalability.

Related Posts

If you’re interested in other posts in this series, have a look at:

My Journey in Scala, Part 1: Awakenings

My Journey in Scala, Part 2: Tips for Using IntelliJ IDEA

If you’re interested in other Scala posts, these offer up a lot of practical, real-world experience and guidance:

Scala @ Scale, Part 1: Leaving Unhandled Errors Behind

Scala @ Scale, Part 2: Compose Yourself!

Useful Scala Compiler Options for Better Scala Development, Part 1

Useful Scala Compiler Options, Part 2: Advanced Language Features

SELECT This! Scala Data Access Library Review, Part 1