Category Archives: Functional

Continuing the Pipeline with tee

I always seem to end up writing the same adapter functions over and over again when starting a new project. The functions are usually one or two lines and could easily just be used as lambdas throughout the application. I have found that giving these small functions a name can help make things more succinct. A lot of those functions can be found in FSharpx.Extras, specifically in Prelude.fs.

tee

You’ve written your application and have a nice clean pipeline to define the flow of your operations. All of a sudden you need to add a side-effect in the middle of that pipeline. It could be a log to file, console, or even send some events. We have some choices on how to handle this. We can break the pipes into multiple blocks with let expressions and then evaluating the side-effect. There is nothing wrong with this approach, and may be the best solution. In some simple cases we can use the tee function.

let tee (f : 'a -> unit) x = 
    f x
    x

The tee function acts like a plumbing “T” connector allowing the contents to pass through the side-effect without breaking the pipeline. We can see tee is a higher order function that takes a function as a parameter. It then calls the passed in function with the secondary parameter as its argument. We are disregarding the resulting evaluation and then returning x.

A small example would be adding debug logging.

let processData (outputLogger : 'a -> unit) someKey =
    someKey
    |> getData
    |> transformData
    |> tee outputLogger
    |> finalProcessing

We have a processData function that takes a logging function as an argument. We then wedge the outputLogger into our pipeline using the tee adapter function.

We could have used a lambda to hook in the logging. I’ve done that in the past and still on occasion. I do believe there is value in knowing the names of these small and specialized functions. They have one purpose and do it well. I find I don’t have the mental overhead of entering the lambda when coming back to the code at a later date.

FParsec: Alternative or Choice

We left off with a working solution for parsing either “po/k”, “x4/0”, and “s12” into our choice type of Operation. We can always go back and look at our original solution here. We really want to be able to parse a bunch of operations separated by a comma.

Parsing Many

When we want to parse a large blob of operations separated by a comma we can use the sepBy function.

  • sepBy : Parser<'a, 'u> -> Parser<'b, 'u> -> Parser<'a list, 'u>

We can see by the function signature, it takes two parsers as arguments. The first parser determines the result. The second parser determines the separator. Our operations are separated by a comma so we’ll use pchar ',' parser. Let’s combine the sepBy with our previous function parseOperation and can now parse an entire blob of operations.

"po/k,x4/0,s12,x7/6" |> run (sepBy parseOperation (pchar ','))

Alternative or Choice

Revisiting the parserOperation function of

let parseOperation = pspin <|> pexchange <|> ppartner

we could have used the choice : seq> -> Parser<'a,'u> function. choice takes a collection of parsers and evaluates them in order to find a match. This can help with readability when having multiple alternative parsers. Instead of weaving the <|> operator between them. We can re-write our function as:

let parseOperation = choice [ pspin; pexchange; ppartner ]

Summary

We’ve replaced our alternative operator <|> with the choice function for readability. We’ve also introduced the sepBy function to parse a block of comma separated operations. The simplest way to test our parsers is by using the run function as it does not require or use any state.

Full Code

Starting Small with FParsec

I had used regular expression to process the inputs for Advent of Code 2017 Day 16. I thought it would be a good learning example to try FParsec. My goal was to take the sample strings given, and convert them into a choice type (discriminated unions).

Sample String

We’ll be processing inputs for three different operations. A sample input looks like this “po/k,x4/0,s12”. There are three operations which need to be parsed. We’ll create a choice type (discriminated union or sum type) to represent our operations.

We would like to parse the inputs accordingly.

"s12"  -> Spin of int
"x4/0" -> Exchange of int * int
"po/k" -> Partner of char * char

FParsec Operators

Parsing the first sample of "s12", we can use the function below.

let pspin = pchar 's' >>. pint32 |>> Spin

We are matching the first character using the pchar function. We combine it using the (>>.) operator with the pint32 function. This is all followed by (|>>) into the Spin constructor.

Breaking down the functions:

  • pchar : char -> Parser<char,'u> attempts to match a character to the one supplied. In this case we are looking for the character ‘s’.
  • pint32 : Parser<int32,'u> will convert the text to an integer providing it is a legitimate.
  • (>>.) : Parser<'a,'u> -> Parser<'b,'u> -> Parser<'b,'u> combines 2 parsers and returns the second parser result, pint32 in our case.
  • (|>>) : Parser<'a,'u> -> ('a -> 'b) -> Parser<'b,'u> is like a map function. It unwraps the value in the parser, applies the function supplied, and packs it back into the parser. We take the integer value in the parser and create a Spin choice

Parsing the Exchange

We can parse “x4/0” and convert it to an Exchange type with the function:

let pexchange = 
    pchar 'x' 
      >>. tuple2 
            (pint32 .>> pchar '/') 
            pint32 
      |>> Exchange

There are some familiar functions of (>>.), (|>>), pchar, pint32, and (|>>).

Let’s explore the new functions.

  • tuple2 : Parser<'a,'u> -> Parser<'b,'u> -> Parser<('a * 'b), 'u> takes two separate parsers and packs their results into a tuple.
  • (.>>) : Parser<'a,'u> -> Parser<'b,'u> -> Parser<'a,'u> similar to what we used earlier, however this one returns the first parser result.

Parsing the Partner

Our final sample “po/k” to convert to a Partner type.

let ppartner = 
    pchar 'p' 
      >>. tuple2 
            (asciiLetter .>> pchar '/') 
            asciiLetter 
      |>> Partner

This is fairly similar to the Exchange parser function pexchange. We did introduce the new function of asciiLetter. As the function name implies, this parser accepts any letter.

Putting them Together

We have our 3 parser functions, each handling their specific sample input. Now it’s time to combine them into a parser that can handle either sample and return the appropriate choice. We’ll do that using the (<|>) operator.

let parseOperation = pspin <|> pexchange <|> ppartner

The (<|>) operator allows us to define alternative parsers. This can be thought of like a pattern match for parsers.

Summary

Parser combinators let us build up our parsing functions by piecing smaller functions together to create the overall parser. We’ve covered quite a few functions in this small example. Regex will likely work as well for this type of example, however I’m sure there are more complex solutions that can be elegantly built with parser combinators.

Full Parser Example

FParsec and FSI

I like exploring ideas and new libraries using the REPL (Read Evaluate Print Loop). There’s something about being able to get started quickly without having to plan out an entire project. I started looking at parser combinators. It’s been on my “to learn” list for some time now. I started with FParsec for .Net and hit a couple of snags getting things running with F# Interactive. The FParsec documentation has the solutions to these issues but I thought I would put them here so I can remember.

Library Load Order Matters

FParsec uses 2 libraries, FParsecCS.dll and FParsec.dll. Loading them up in FSI should be fairly straight forward, except you must load them up in the correct order.

#r "FParsecCS.dll"
#r "FParsec.dll"

Failing to do so and you will be greeted with a message similar to this:

error FS0074: The type referenced through'FParsec.CharStream`1' is defined in an assembly that is not referenced.

Source: FParsec Tutorial – Two DLLs
"If you reference the DLLs in the F# Interactive console, you need to reference FParsecCS.dll before you reference FParsec.dll."

Value Restriction

A value restriction occurs when the Type Inference determines the type to be too generic. When working with the REPL the first thing to do is add type annotations. With FParsec, the recommended way to alleviate the value restriction is to add a type alias and use it in the type annotation. I wasn’t using any State while learning to parse the text. I created a type alias for unit and added it to the type annotation. My code looked like below.

type NodeState = unit

let pname : Parser = manyChars asciiLetter .>> spaces

Source: FParsec Tutorial – Value Restriction

Summary

I hit two small hurdles when using FParsec with the REPL. The load order of the two FParsec libraries matter. The other hurdle was a value restriction in which I created a type alias to annotate the parsing state for the combinators. After dealing with the library load order and value restriction, I was on to learning FParsec in F# interactive.

Advent of Code 2017 – Day 3 Part 2

Advent of Code 2017 Day 3 part 2 was quite the gruelling exercise. We needed to locate the first memory value larger than the input we are given. The memory locations store an accumulative value (still spiralled) based on its existing neighbours. This is the example given.

147  142  133  122   59
304    5    4    2   57
330   10    1    1   54
351   11   23   25   26
362  747  806--->   ...

Location one still starts at 1. Location two only has one neighbour with a value of 1, so it get’s a value of 1.

1   1

Location three gets a value of 2 since it has neighbours one and two in which both have a value of 1.

    2
1   1

Location four has a value of 4. This is because it has three, two, and 1 as neighbours. We then continue on to generate the rings.

Once we get passed the initial ring, we can use some general rules to generate the values. At most a memory location will have 4 active neighbours. It came down to identifying the unique location requirements.

Since a location may potentially have neighbours in its own ring and the previous ring, we need to keep the last ring generated. I initially thought about the convoluted code that would need to be created, but decided to use Active Patterns.

I was able to determine the points of interest by using some of the data I had obtained from part 1. Specifically the row length. This allowed me to identify whether I was in a corner, before the corner, or after the corner. Once that was established I needed to handle the first and last element in the ring. Everything else was considered general middle elements.

I’ve included a couple of helper functions to handle wrapping the previous array’s index and generating the index offset from the current ring to the previous.

I opted to use mutation for generating a ring. I have my active patterns in the pattern match to help with code readability.

With all the building blocks in place I can now create a sequence of memory values based on the spiral structure.

memoryValues |> Seq.take 12 |> Seq.toList;;
val it : int list = [1; 1; 2; 4; 5; 10; 11; 23; 25; 26; 54; 57]

The part was to take the first element greater than my input.

Advent of Code 2017 – Day 3 Part 1

Day 3 part 1 was about calculating the Manhattan Distance for a specific memory location to memory location 1. The memory blocks are stored in a spiralling out formation with the first block being in the middle of the grid. The memory blocks would look like this:

17  16  15  14  13
18   5   4   3  12
19   6   1   2  11
20   7   8   9  10
21  22  23---> ...

I decided to try and figure out an equation for calculating the locations and distance instead of using the brute force method. (It wasn’t until later I learned about the squares in the bottom right hand corner… oops)

Generating the Rings

The equation I came up with for generating a ring ended up being:
ringNumber * 2 * 4 + (sum of previous rings).

I would generate the sequence of rings with an index and the final value.

I could calculate the number of elements in a ring wall (middle and 1 corner) by taking the ring number and multiplying by 2.

Identifying location 1 as ring Zero. I could say ring 1 would have 2 elements to make the middle and one corner. To make the full cube I could multiply it by 4. This gives us 8 elements in ring 1. Ring 2 would have 2 * 2 * 4 elements to make the square, 16. In the example below, there are 8 ones in ring 1, and 16 twos in ring 2.

2   2   2   2   2
2   1   1   1   2
2   1   0   1   2
2   1   1   1   2
2   2   2   2   2

To get the final memory location I would add the previous ring’s final location to the number of elements required for the current ring.

Running the code would result in:

rings |> Seq.take 6 |> Seq.toList;;
[(0, 1); (1, 9); (2, 25); (3, 49); (4, 81); (5, 121)]

Determining the Distance

The manhattan distance can be calculated by the ring id and the element offset from the centre of a ring wall. Here is the code I used to determine the distance.

The premise of this part is to locate the ring before the passed location. We then determine whether the location is on a corner or on the low side of centre. This will determine the amount of steps taken to get to the centre of the ring wall. We then add the number of rings we need to step through to get to location 1.

Bonus

Computing the rings using square:

let rings = Seq.initInfinite (fun i -> (i, pown (i * 2 + 1) 2))

The Immutable Tool

Immutability is a useful tool. Think of it as an extension of your favourite editor or ReSharper. Using immutable values helps eliminate some of the easy to make mistakes when programming. Even if your language doesn’t fully support immutability you can still take advantage of the benefits.

What Is Immutability?

Immutability is something which does not change. You set a value and it doesn’t change through its lifetime. There are absolutely no mutations of that value. Unexpected mutations are a very common cause of issues in a program. Mutations can happen seemingly anywhere. They can happen concurrently from another thread causing all sorts of issues which are difficult to track down. During my talk I said “there is nothing wrong with global state, the problem is when global state changes.” I was trying to make them think about what the real danger was in their applications. A value which cannot change is immutable.

Benefits

The main benefit with using immutable values is the removal of unexpected changes. This leads to easier to reason about code. Future readers will have confidence that the values set don’t change later in the function or by any function. We limit the mutation or changes to specific areas of the code. This in turn let’s us define locations as being dangerous areas. Solutions using immutable values require less effort to move to a concurrent paradigm. Mutation is the bane of concurrent programming. Limiting mutation lowers the complexity of understanding the solution.

Disadvantages

The main disadvantage of immutable values is always having to make a copy of the values or object to apply the changes. This is not a big disadvantage if the language supports immutability. It is tedious in a language that doesn’t as this would have to be implemented by the developer. There are also some performance issues that may be noticed in mutations of many values in a collection. This is due to all the copying, updating, and garbage collection. This again can be handled by the language if supported. Mutations are a form of optimization.

Levels of Immutability

There are many functional languages that have rigid no mutations allowed. Some notable ones are Haskell and Erlang. If you come from a language that heavily relies on mutating variables, then these rigid languages can really force you to think of different ways to solve the task at hand. There are middle ground languages that support both immutable and mutable values. F# and Clojure are good languages that are immutable by default. It’s up to the developer to mark the values as mutable and accept the responsibility of that decision. Using an immutable by default language will allow you to design your solution using immutability and later modify to using mutations if the performance is required.

What’s nice about Clojure and F# is you get visual cues as to the dangerous areas in your code. F# uses the mutable keyword and a left arrow <- to define mutation.

let mutable x = 5
x <- 6

Clojure uses the earmuffs operator.

(def ^:dynamic *there-be* "dragons")

Summary

Immutability is a tool we can use to help avoid easy to make mistakes of unintentional mutations. This can lead to solutions that are easier to reason about as the code is not littered with mutations. Limiting mutations to specific areas allowing the reader to focus on those dangerous areas instead of worrying about unexpected changes. Like any tool, there are times when it doesn't fit. If performance is paramount to the solution, using mutation may be the way to go. In saying that, it's generally a good practice to make the code correct and then add mutations to make it fast.

Experimenting with Partial Application

Partial application helps us reuse functions by allowing us to create new functions based on the original. This is achieved by passing fewer arguments to the original function and binding the new function that is created. You can read more on this at Currying and Applying Arguments. We’re going to experiment with partial application and understand when execution happens.

Short notes: currying is responsible for taking a function with multiple parameters and turning it into a sequence of functions where each function only takes one argument. Partial Application is the process of applying arguments to unwrap the layers of functions until there are no more functions only the result.

Our Test Code

We’re going to start with a simple key lookup. We’ll have a function which takes a list of integers and another integer to verify it is in the list of keys. We’ve created some sample data for testing our functions.

let datapoints = [1 .. 400000]
let keys = [1; -4; 45000; 2501; 56000]

// seq -> int -> bool
let seekKey keys key =
    printfn "Creating index"
    let index = Set.ofSeq keys
    printfn "Seeking key: %d" key
    index |> Seq.contains key

Looking at our starting code, we are taking in a sequence of keys and turning them into a Set. We then look in the set of keys for verification of the key we are looking for. We’ve added some print statements so we can see the execution.

Let’s turn #time on in our REPL to give us some statistics of execution.

When we execute the function with the data points and a key we should see results similar to this:

> seekKey datapoints 67;;

Creating index
Seeking key: 67
Real: 00:00:00.653, CPU: 00:00:00.650 ...

// Your times will differ.

These are the results we expected. We are creating an index, using a Set, and searching the index. Next we’ll create a partially applied function so we can do multiple lookups.

> let seek = seekKey dataPoints;;

Real: 00:00:00.000, CPU: 00:00:00.000 ...

Again the results aren’t to shocking here. We are unwrapping the first layer of currying to reveal the function below. Now let’s use our partially applied function seek to verify a list of keys.

> keys |> List.map (seekKey datapoints);;

Creating index
Seeking key: 1
Creating index
Seeking key: -4
Creating index
Seeking key: 45000
Creating index
Seeking key: 2501
Creating index
Seeking key: 56000
Real: 00:00:03.396, CPU: 00:00:03.512 ...

We’ve revealed a problem with our function. We are creating the index on every lookup. The reason for creating an index is to improve the performance for multiple lookups. We have not accomplished this.

Multiple Seek Improvement

Let’s rewrite our function. Instead of relying on currying, we’ll have our function return a function. This will allow us to inject some execution before performing the lookup.

// seq -> (int -> int)
let seekKey' keys = 
    printfn "Creating index"
    let index = Set.ofSeq keys
    (fun key -> 
        printfn "Seeking key: %d" key
        index |> Seq.contains key)

> seekKey' datapoints 67;;

Creating index
Seeking key: 67
Real: 00:00:00.659, CPU: 00:00:00.656 ...

The type signature of the new function is close to the original function. In fact, we can call the new function the exact same way as the original.

Look at what happens when we apply only the first argument.

> let seek' = seekKey' datapoints;;

Creating index
Real: 00:00:00.690, CPU: 00:00:00.737 ...

We can see the index is created immediately. How is this going to affect multiple key lookups?

> keys |> List.map (seekKey' datapoints);;

Creating index
Seeking key: 1
Seeking key: -4
Seeking key: 45000
Seeking key: 2501
Seeking key: 56000
Real: 00:00:00.710, CPU: 00:00:00.705

That’s a more performant lookup for multiple values. What we’ve done is made the creation of the index happen after applying the first argument. The index then gets wrapped in a closure to the returned function. This means we don’t have to re-create the index every time we apply the second argument.

What more can we do?

We’re close to how we want the function to work. There’s one last thing. It would be nice to create the index only if we use it. We can accomplish this using a lazy computation.

// seq -> (int -> int)
let seekKey'' keys =
    let index = lazy (
                  printfn "Creating index"
                  Set.ofSeq keys)
    (fun key ->
        printfn "Seeking key: %d" key
        index.Value |> Seq.contains key)

We’ve placed the index creation in the lazy expression and bind it to the label index. We then call index.Value when we want to use the result of the lazy computation.

Let’s run through our testing scenarios.

> seekKey'' datapoints 67;;

Seeking key: 67
Creating index
Real: 00:00:00.679, CPU: 00:00:00.732 ...

This revealed an interesting result. The times are fairly equivalent to the first two code snippets, but in this case the print statements are in a different order. We are printing the “Seeking” before the “Creating”. This is due to the lazy and not evaluating until we tell it to.

Applying one argument:

> let seek'' = seekKey'' datapoint;;

Real: 00:00:00.000, CPU: 00:00:00.000 ...

Yes! This is what we wanted. Immediate return of the function without creating the index. This allows us to apply the first argument without any penalty if we don’t use the returned function later in processing.

For completeness, running multiple lookups.

> keys |> List.map (seekKey' datapoints);;

Seeking key: 1
Creating index
Seeking key: -4
Seeking key: 45000
Seeking key: 2501
Seeking key: 56000
Real: 00:00:00.734, CPU: 00:00:00.727 ...

Again the index is created just before the seek of the first key and the index isn’t created again.

Summary

What we’ve done is take a curried function and turn it into a higher order function. By making this change we’ve improved the performance of our function when doing multiple lookups. We used the lazy computation to remove the initial index creation if we don’t do any lookups. This was a fun experiment.

Full Source Code

Improving an Old Pattern Continued

The active patterns we’ve seen so far are enough to take some complex code and make it clean and readable. Let’s take the active patterns we know and improve the pattern matching code we created earlier.

Reviewing the Code

We were reviewing a chunk of code to determine whether a client requires an update based on the version numbers. Here is where we left off with the pattern match.

let checkForUpdate (rMaj, rMin, rBld) userVer = 
    match userVer with
    | (uMaj, _, _) 
        when rMaj > uMaj                                     
            -> Major
    | (uMaj, uMin, _) 
        when rMaj = uMaj && rMin > uMin                   
            -> Minor
    | (uMaj, uMin, uBld) 
        when rMaj = uMaj && rMin = uMin && rMin > uBld 
            -> Build
    | _     -> NoUpdate

The match is checking the major version, followed by the minor, and the build. The issue I have with this pattern match is we keep having to compare values we already know. As the match falls through the patterns we end up checking the major version three times. This isn’t very readable.

Adding a Partial Active Pattern

The first thing we can do is add a partial active pattern to match on all the cases that don’t require an update. We can leverage the fact that F# does structural equality by default that we can do this quite easily.

let (|NoUpdatePartial|_|) (relVer, userVer) =
    if relVer <= userVer then Some ()
    else None

With this partial pattern we've now eliminated the need to do multiple comparisons of any individual values.

Combining Active Patterns

We'll take the partial active pattern and use it in a multicase active pattern. Our multicase pattern needs only be concerned with matching the individual version values. This is providing we add the partial active pattern before the individual value patterns. It will look something like this:

let (|Major|Minor|Build|NoUpdate|) (relVer, userVer) =
    match (relVer, userVer) with
    | NoUpdatePartial                        -> NoUpdate
    | (rMaj,_,_),(uMaj,_,_) when rMaj > uMaj -> Major
    | (_,rMin,_),(_,uMin,_) when rMin > uMin -> Minor
    | _                                      -> Build

We now have a complete active pattern we can use when determining what kind of update is required.

let checkForUpdate relVer userVer =
    match relVer, userVer with
    | Major    -> doMajorUpdate ()
    | Minor    -> doMinorUpdate () 
    | Build    -> doBuildUpdate ()
    | NoUpdate -> doNoUpdate ()

Summary

We’ve greatly increased the expressiveness of our pattern matching by using Active Patterns. We’ve used a Partial Active Pattern and a Multi-Case Active Pattern. Active pattern can be combined together to create larger patterns all in the name of readability. All of this helps lead us to correct and expressive solutions.

Partial Active Patterns

There are four kinds of active patterns. We’ve already looked at the Single and Multicase Active Patterns. We’ll now look at Partial Active Pattern.

Partial Active Pattern

A partial active pattern is used when the match expression needs to be transformed in different ways for the individual patterns. Partial active patterns are mutually exclusive in that they don’t necessarily have to relate to the other patterns.

We define the partial active pattern using the banana clips, a label, and an underscore. We identify a successful match by returning Some value and a non match with None.

let (|StartsWithCapital|_|) (v : string) = 
    if v.[0] |> System.Char.IsUpper then Some v
    else None

We can use the active pattern in our match expressions for readability.

let processFile (filePath : string) =
    filePath |> System.IO.File.ReadAllLines
      |> Seq.filter (function StartsWithCapital _ -> true
                            | _                   -> false)

Our processFile function takes in a file, reads all the lines, and then gives us a sequence of lines that begin with a capital letter. There is an issue though. We are using an indexer (v.[0]) on the string in our active pattern which will fail if the value is null or empty. Let’s add another partial active pattern.

let (|NullOrEmptyLine|_|) (v : string) = 
    if System.String.IsNullOrEmpty v then Some ()
    else None

The filter of the processFile will now look like:

Seq.filter (function NullOrEmptyLine     -> false
                   | StartsWithCapital _ -> true
                   | _                   -> false)

The order in which we place the pattern expressions matters. We have to do the EmptyLine pattern match before the StartsWithCapital or we will end up with the same error as before.

Putting it to use

We’ve received some specifications for interfacing with an old system. The system streams a sentence per line. A sentence is identified by the starting capital letter of the sentence. There are some issues with the system. Every once in a while the system will send a sentence in reverse or it will send garbage data (no starting or ending capital letter). Some lines may be null and they should be converted to newlines. Format the incoming data where all sentences are not reversed and include the empty lines.

Based on the specs above we have four cases:

  • Normal sentence
  • Reversed sentence
  • Empty line
  • Garbage data

We seem to already have 2 partial patterns for this exercise which means we only need one more to fulfill our requirements. The reversed sentence.

let (|EndsWithCapital|_|) (v : string) =
    if v.[v.Length - 1] |> System.Char.IsUpper then Some v
    else None

With the final partial active pattern in place we can setup our pattern match to fix the streamed data.

let reverseString : string -> string = 
    Seq.rev >> Seq.toArray >> (fun s' -> new string(s'))

let mungeLine = function
    | NullOrEmptyLine     -> Some "\n"
    | StartsWithCapital s -> Some s
    | EndsWithCapital s   -> s |> reverseString |> Some
    | _                   -> None

Taking my word that the funny syntax in the reverseString function actually reverses a string, we can see the mungeLine function uses our partial active patterns for validating the data. Partial active patterns have made out match expressive and readable to non coders.

Summary

Partial active patterns give us a means to do pattern matching when it may seem like the patterns have nothing in common. Partial active patterns allow us to transform the value we are matching before evaluating the pattern. We have not added partial active patterns to our collection of tools to create readable and expressive code.