First Vavr Tutorial

A few days ago a status message on Twitter led to a coding battle.

twitter-quiz

(See the full conversation here and Lukas' great article about his SQL solution here.)

In fact, competitions are a great way for Vavr to improve. We learn about expressiveness, bottlenecks and missing features.

Such a non-trivial example is also an opportunity for me to teach how to solve a problem using an object-functional style in Java.

A tutorial is worth a thousand specs.

(or the like)

So it happened that I decided to publish the first Vavr tutorial. More will follow on an irregular basis.

Substring Palindromes

Our mission is to find all substrings of a given word that are palindromes, i.e. symmetric like 'abba' within 'black sabbath'. Such a substring is described as tuple of indices (start, end), e.g. (7, 10) in the previous example.

As a bonus, the performance of the solution should be optimized from, say, O(n^2) to O(n).

Getting in touch

In the following, I will show how I typically solve such problems. I like it to start by describing or exploring a problem using symbolic languages, like drawings or math. The lines below describe the solution as set of elements that meet a specific condition.

Let n be the length of a given word s.

substringPalindromes(s) :=  
  { (i, j) | i, j ∈ [0..n-1], i < j, s[i:j] is palindrome }
= { (i, j) | i ∈ [0..n-2], j ∈ [i+1..n-1], isPalindrome(s, i, j) }

We intentionally did not define the function isPalindrome. Solutions can be found on the web.

For now, we will focus on the core of the algorithm, the non-trivial part that needs to be understood. The good thing about functional programming is that we are able to defer the unpleasant work by using functions that can be implemented later. I do this all the time ;)

Deferring work is not meant to be not doing the work at all. Functions support us in keeping the focus on the current spot and prevent us in recursing down to the next site.

Translating the above to imperative Java (and mutable collections) is straightforward:

meep

Set<int[]> substringPalindromes(String s) {  
    int n = s.length();
    Set<int[]> result = new HashSet<>();
    for (int i = 0; i <= n - 2; i++) {
        for (int j = i + 1; j <= n - 1; j++) {
            if (isPalindrome(s, i, j)) {
                int[] palindrome = new int[] { i, j };
                result.add(palindrome);
            }
        }
    }
    return result;
}

This should work - but wait! I bet you've already spotted the trap.

Java has no native notion for tuples or pairs, so we used an array of int to describe a palindrome. The Set that holds the resulting palindrome indices internally uses the elements' hashCode/equals methods. But the equality of int[] is defined to be referential, not by content as one might expect. In general, this could lead to dupicate entries.

Of course this is a solvable problem, we could use a List instead of a Set or define our own Pair class or include some 3rd party dependency that solves it for us or even invent our shiny new framework that completely generalizes the case. No - stop!

It might seem a bit artificial, but these are exactly the daily, self-made problems of software developers. It is way too easy to get distracted by language 'specialities' instead of focusing on the original tasks.

Vavr was designed to remove many of these language 'irregularities'. It enables us to create robust Java programs by encouraging side-effect free programming.

Vavr helps us to eliminate null pointer exceptions and exception handling. It includes persistent data structures and tuples that work 'as expected', e.g. they are equal by content, thread-safe without the need of synchronization, serializable, ...

But we have to accomplish a mission, so let's go one step back again.

Improving our solution

Our solution already looks okay, but the complexity of the two for-loops seems to be quadratic, i.e. O(n^2). Is it really a problem? Words tend to be very short. Also it makes a difference to have read-only operations instead of creating garbage during checks. Beware of premature optimization, it is your choice!

Let's move on. How can we do better? As always in math or algorithm design, we need an idea. Sketches and formulas help us to get an understanding of the underlying domain. What are the properties of our domain objects?

Basically, a palindrome is symmetric. We could enlarge a given substring by one character to both directions. Because of symmetry, a greater substring might only be a palindrome, if the smaller one is a palindrome. Given that, we are able to early stop one enlargement and skip to the next one.

  • black sabbath - true
  • black sabbath - true
  • black sabbath - false

The next enlargement would be

  • black sabbath - false

Then

  • black sabbath - false

And so on. Because i < j, both inclusive, we always start enlarging substrings of two or three characters.

initialSubstrings :=  
  { (i, i+1) | i ∈ [0..n-2] } ∪ { (i, i+2) | i ∈ [0..n-3] }
≘ { bl, bla, la, lac, ..., ath, th }

Namely we find the initial substrings in (n-1)+(n-2) = 2n-3 steps. This takes O(n). In many cases we perform only a few checks when enlarging an initial substring. Then the effort is effectively O(n*c) = O(n), where c is a constant factor.

In the worst case, e.g. when the word consists of same characters only, it is relatively easy to verify that the overall effort of searching substring palindromes is triangular, read: O(n(n+1)/2*c) = O(n^2). Our intuition tells us, it has to be quadratic, otherwise we would have skipped checks. That is what our optimization is all about, skipping unnecessary checks as early as possible.

Translating it to object-functional Java

Now we got a feeling for the solution and you most probably already know how to implement it - the way you encode your thoughts, day-by-day.

languages

I want to show you how to do it the object-functional way in Java. To recap, we described the improved solution like this:

  • the solution consists of all pairs of indices such that the indices are taken from the set of enlarged indices and an index represents a substring palindrome
  • with enlarged indices, defined as ... based on initial indices
  • with initial indices, defined as ... based on string indices
  • with a function isPalindrome, defined as ...

In math, we describe the solution instead of calculating it (that would take a while for infinite sets). The big benefit is, that it is very concise. We start with the result and refine only those details that are not obvious to the reader.

Above, we used the familiar declarative math set notion. We could also have used a functional notion like the following Java methdod:

meep

List<int[]> substringPalindromes(s) {  
    return filterPalindromes(s,
        enlargedIndices(s,
            initialIndices(s)));
}

Cascading method calls soon get messy. Some like it to make types explicit.

List<int[]> substringPalindromes(s) {  
    List<int[]> initialIndices = initialIndices(s);
    List<List<int[]>> enlargedIndices =
        enlargedIndices(s, initialIndices);
    return filterPalindromes(s, enlargedIndices);
}

In fact, functional and imperative Java are more or less the same, it is only a matter of notion :)

Did you recognize that we reversed the order of method occurrences, visually? That is typical for calculations resp. non-declarative programs.

Maybe you also spotted that in the last code snippet our optimization got lost. This is because Java eagerly evaluates the enlarged indices. Functional languages often are lazily evaluated by default.

However, given lazy evaluation (and recursive data structures), the code above would calculate enlargements of initial strings on-the-fly, during palindrome checks.

Using Vavr

The data structures make the big difference. Vavr has rich data structures. They provide us with simple methods that can be composed to solve complex problems.

We could use a lazy linked list (aka io.vavr.collection.Stream) to achieve laziness. Then the example above would work as expected and inherently reflect our optimization.

But it would not help us writing concise code. We still deferred the dirty, error prone work. For example we need to iterate over indices and calculate lists of new indices and so on.

Here comes Vavr into play. Such low-level programming tasks can be subsumed to transformations of inputs to outputs. If a transformation is parameterized with a function, it is called higher-order function.

Many of the Java Stream's intermediate operations are higher-order functions, for example map, flatMap and filter. However, I do not recommend to pass Java's Streams around outside of an internal context - they are stateful. It will lead to unexpected behavior.

Vavr's collections are persistent. Operations return new, immutable versions while the original version remains untouched. Because of memory sharing this is still efficient.

Operations return new versions, which allows us to fluently apply transformations. This is the essence of using Vavr.

Seq<Tuple2<Integer, Integer>> substringPalindromes(String s) {  
    final int n = s.length();
    return Vector.range(0, n-1)
        .flatMap(i -> Vector.of(Tuple.of(i, i+1), Tuple.of(i, i+2)))
        .flatMap(startIdx -> Vector.unfold(startIdx, currIdx -> {
            final int i = currIdx._1;
            final int j = currIdx._2;
            if (0 <= i && j < n && isPalindrome(s, i, j)) {
                final Tuple2<Integer, Integer> nextIdx =
                        Tuple.of(i-1, j+1);
                return Option.some(Tuple.of(nextIdx, currIdx));
            } else {
                return Option.none();
            }
        });
}

The code above does the following:

  1. The range creates a sequence of string indices
  2. These indices are transformed to initial indices, as defined above
  3. The unfold enlarges each initial index until the palindrome check fails

The code is still a bit bulky.

  • By using tuples, the int indices will be boxed/unboxed
  • We need to manually deconstruct a tuple to variables (i, j)
  • The unfold operation uses the Option type to mark the end of enlargement

But this is what Vavr offers today. I've created an issue that targets a better unfold operation. But what will really make a difference are the upcoming features that are planned by the Java language architects:

  • pattern matching will reduce the amount of code we write
  • value types will reduce the memory footprint and speed up our applications (beside being a very concise notion for data types)
  • primitive generics will eliminate boxing/unboxing when possible

With my wish of having case integrated into Java as partial function, our solution to the substring palindromes could look like this:

Seq<Tuple2<int, int>> substringPalindromes(String s) {  
    var n = s.length();
    return Vector.range(0, n-1)
        .flatMap(i -> Vector(Tuple(i, i+1), Tuple(i, i+2)))
        .flatMap(startIdx -> Vector.unfold0(startIdx,
            case Tuple2(var i, var j) && 0 <= i && j < n
                && isPalindrome(s, i, j) -> Tuple(i-1, j+1)
        ));
}

That would be the next level of programming in Java.

I hope you enjoyed the first Vavr tutorial. We created a solution for 'substring palindromes' and showed how to express it using different techniques.

Thanks,

Daniel