Friday, 24 April 2015

Constructing a Trie in F#

This post might get a bit more context next week, but essentially, in continuation of looking at data structures in F# and C# I picked prefix trees (tries) as the next step. A trie is similar to a BST except the search is cumulative. A simple example would be building a spellchecker. The spell checker can use the structure to see if a word exists quickly and efficiently.

It works by creating a new node for each part of the word, using existing nodes if they're already in place. You can see in this diagram an example structure after an insertion of several words.

In the above diagram the words ANT, AND, BATS have been added into the data structure. You'll notice that the word BATS can also be the word BAT and that becomes a part of the structure, setting a value at each node to state if this is a place that a word terminates.


1
2
3
4
 // Node type for storing a trie.  
 type TrieNode =   
   | Node of char * TrieNode list * bool  
   | Root of TrieNode list  

I defined the type to have 2 options. The root node, which we want to basically ignore after we've looked through it's children and the node for storing data. It has a char for storing the value, a list of sub nodes and a bool to state whether this is where a word terminates.

The first part is to get inserting nodes into the structure sorted. There are some similarities to the BST, but my first attempt was very convoluted, and eventually was cut down into this:


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// Basic check for equality on a node against a character.  
 let matchChar c = (fun node -> match node with  
                 | Root(_) -> false  
                 | Node(v, _, _) -> v = c)  
    
 // Insert a new node into the trie.  
 let rec Insert (chars: char list) = function  
   // If this is the root node and it starts on a known character continue, otherwise add a new child, then continue.  
   | Root(list) -> if List.exists(matchChar chars.Head) list = false  
             then Root(list @ [(Node(chars.Head, [], false) |> Insert chars)])  
           else Root(List.map (fun node -> Insert chars node) list)  
   // If we've ran out letters it's now a word. If there's no children, make some, as we have more letters.  
   | Node(c, list, isword) when c = chars.Head -> if chars.Tail = []   
                             then Node(c, list, true)  
                           else if list = []   
                             then Node(c, [(Node(chars.Tail.Head, [], false) |> Insert chars.Tail)], isword)  
                           // If we need to add a child to continue, do so.  
                           else if List.exists(matchChar chars.Tail.Head) list = false   
                             then Node(c, list @ [(Node(chars.Tail.Head, [], false) |> Insert chars.Tail)], isword)  
                           // Continue down the tree.  
                           else Node (c, List.map (fun node -> Insert chars.Tail node) list, isword)  
   // Default case for and existing node.  
   | Node(c, list, isword) -> Node(c, list, isword)

There are probably some ways to simplify this further, but it does state the intention of the function clearly.

Much like with the BST, insertion was the trickier part of the important steps, and looking up a word was similar to insertion but simpler. We just want to keep going down the tree until we either run out of letters for the word we're looking up, in which case we want to check for a flag to see if this is a word termination node, or until we run out of tree, in which case it's obviously not a word.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
let Exists (word: string, tree: TrieNode) =   
   // Turn the string into a list of char.  
   let wordChars = [for character in word -> character]  
    
   // Look through the tree for the word.  
   let rec find (chars: char list) = function  
     // If it's the root, look at all the child nodes.  
     | Root(list) -> if list = []   
               then false  
             else list |> List.exists (fun node -> find chars node)  
     // If we've found the last character, is it a word? If there's no more tree, it's not a word, otherwise keep searching.  
     | Node(c, list, isword) when c = chars.Head -> if chars.Tail = []  
                               then isword  
                             else if list = []  
                               then false  
                             else list |> List.exists (fun node -> find chars.Tail node)  
     | Node(c, list, isword) -> false  
    
   // Find the word.  
   find wordChars tree  

Some testing verified that it behaved as expected, with a fairly simple program and a few breakpoints.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
 [<EntryPoint>]  
 let main argv =   
   let root = Root([]);  
   let b = Insert ['a';'b'] root  
   let c = Insert ['b';'c'] b  
   let d = Insert ['a';'b';'c'] c  
   let e = Insert ['a';'b';'d'] d  
   let f = Insert ['a';'b';'c';'d';'e';'f';'g';'h'] e  
   let exists = Exists ("abd", f)  
   
   0 // return an integer exit code  

Hopefully in the next post, we'll use it for something a bit more interesting.

Wednesday, 22 April 2015

The Harper Lee Experiment

Harper Lee, author of 'To Kill a Mockingbird' wrote the book after a friend gifted her 1 years salary to write whatever she wanted. No-one would argue that that time or money was misspent. It made me think however, when I randomly remembered it earlier in the day about how with some changed variables it could apply to a developer. Then it became a bit of a thought experiment.

Let's say you're random developer with some experience. Not necessarily lots, but enough to be comfortable. On new years eve, a friend (whose reasoning skills you'd likely and understandably question at this point) offers you 1 years salary to create whatever you like under the understanding that after December 31st that this creation would be your livelihood. Through some sort of magic your day job is a thing of the past and you are free to do as you please.

We'll also say as a matter of co-incidence you happen to have been sitting on an idea for some software that a blue box inhabiting man has informed you is a guaranteed hit. You have your idea and 365 days, where do you start?

Getting Started
Well, how much time do you have? That's a good start.

52 weeks - 4 weeks because your friend insisted you take a break.
48 weeks of 5 day weeks, to generalise working for a comfortable 7.5 hours a day.

That gives us a round 1,800 hours of time. It's feeling a bit tight already isn't it? So we already have an idea about the scope of what it might be best to undertake. For example, an application for uploading pictures with captions for your friends to see, that's monetised through some sort of advertising (The requirements stated we have to live off it, remember?).

From there, we can extrapolate a rough feature set, there's going to be some authentication, storage, relationships between user accounts and a phone based UI for your platform of choice. Your at this point unbelievable friend also happens to be a marketing and UI expert and will complete these elements of your work when appropriate to the best of their ability and free of charge, so you can just focus on the software. What's next?

Considerations
How are you going to manage your workload, Kanban perhaps?
Will you start with the back-end first, the front-end, or develop them in a scenario based format in parallel?
Is December 31st the release day or is that the day you expect to be running smoothly?
Is there beta testing? How long for?
How will you handle security?
What kind of testing do you intend to undertake with your already stretched resources?
How will you know that the service is running smoothly after launch?

You might have a rough plan for what is to be done now to successfully unveil your masterpiece to the world. How does it stand against a few test cases?

Test Cases
1) 3 days after launch your primary chosen cloud platform drops off because of a time-related bug and doesn't return for a day, as a result of this are you missing out on ad revenue or is there a contingency?
2) You decide to take a break for a week, as the sole developer you ask a friend to take the keys to your newly up and running service and keep an eye on it...
    a) Something vague goes wrong and users are sometimes failing to upload photos, can your friend solve it easily and how long did it take them to find out?
    b) That massive new feature you checked in just before you left, it turns out, doesn't work, did it make it through to production and can your friend fix it, and do they even know there's a problem?
3) Generic tech review site X found your app and wrote a front page article about how world changing it is and your user-base explodes like you didn't know was possible, will it scale out succesfully?
4) Someone got into your user database, how bad is the damage? Are they looking at password 'Hunter2' or did you go to some lengths to secure it.
5) It turns out that after your app has been running for 3 minutes it runs out of memory, did you catch it during testing and if not, how long will it take to get the bug-fix to the user?

Outcome
It's quite easy to underestimate how much time/effort it takes to get something relatively simple to be reliable and the costs involved in doing so. A year seems like a long time until you realise you probably have less than 6 months of product development time when you factor in testing, security, stability, etc. You could just not test it I suppose, and hope that if it falls down your friend isn't too upset about their failed investment...



Monday, 20 April 2015

Trees: I still can't be-leaf them

Carrying on from the last post, I wanted to add deletion to the BST. My first attempt at this was in F#. I decided the best way to do it was to find the node that contained the value and if it only had one child node, replace it with that otherwise add the tree from the right to the tree from the left. The impact of this decision will be that the height of the tree is increased by an unknown amount and will affect performance, but binary search trees are unbalanced anyway and it's just for fun. All of the cases were simple up until the point where the right tree might need to be added to the left.


1
2
3
4
5
6
7
8
 // Remove a given value fron the tree.  
 let rec Remove value = function  
   | Leaf -> Leaf  
   | Node(v, left, right) when value <> v -> Node(v, Remove value left, Remove value right)  
   | Node(v, left, Leaf) -> left  
   | Node(v, Leaf, right) -> right  
   | Node(v, Leaf, Leaf) -> Leaf  
   | Node(_, _, _) as node -> node  


For that, I'd need a new function that would do the same as insertion but attach an existing node instead of just a value. To do that, we'd need to decompose the root node of the tree which will be appended so that it's value can be compared to each node of the tree it is being appended to.


1
2
3
4
5
6
// Add the provided branch to the most appropriate part of the tree.  
 let rec Append value = function  
   | Leaf -> value  
   | Node(v, left, right) -> match value with   
                | Node(a, _, _) -> if a < v then Node(v, Append value left, right) else Node(v, left, Append value right)  
                | Leaf -> Node(v, left, right)  


So now we can add the case for a populated left and right node into the Remove function.


1
2
3
4
5
6
7
8
9
// Remove a given value fron the tree.  
 let rec Remove value = function  
   | Leaf -> Leaf  
   | Node(v, left, right) when value <> v -> Node(v, Remove value left, Remove value right)  
   | Node(v, left, right) -> Append right left  
   | Node(v, left, Leaf) -> left  
   | Node(v, Leaf, right) -> right  
   | Node(v, Leaf, Leaf) -> Leaf  
   | Node(_, _, _) as node -> node  


As previously stated, it will have a varied affect on the performance depending on the size of the trees beneath the value being removed but it was relatively straight forward. On to the C# implementation.

With a similar approach, the Append method is similarly trivial.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
/// <summary>  
 /// Adds a Node to the most appropriate place in the tree.  
 /// </summary>  
 /// <param name="node">The <see cref="BSTNode"/> to add.</param>  
 private void Append(BSTNode<t> node)  
 {  
   if (node.Value.CompareTo(Value) == 0)  
   {  
     if (Left != null)  
     {  
       Left.Append(node);  
       return;  
     }  
   
       Left = node;  
       return;  
     }  
     else  
     {  
       if (Right != null)  
       {  
         Right.Append(node);  
         return;  
       }  
   
       Right = node;  
       return;  
     }  
   }  
 }  


It looks very similar to the Insert method from the last post but instead of adding a new node, we're just using the one we already have. The next part is where it would get tricky. As you would expect from object oriented code, the Remove method, like the others would live inside the class. It would operate on itself and conditionally then on it's left and right nodes. This becomes a problem when trying to remove a value, as we have not made room for the idea of a 'parent' node, to go back up the chain.

The functional approach, in a way, goes from the bottom up where the approach taken in the C# implementation goes bottom down. This is fine in almost all cases except for the root node, as we can just operate on the children and hope that the value of 'this' has been processed by the node above. In the case that it was the parent node I opted for returning a node that represented the new base of the tree, containing the concatenation of the left and right nodes. For the cases of the children I applied the same logic that would be applied internally but one level down. A better way of doing this would be to do it outside of the structure itself, but it would be pretty similar.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
/// <summary>  
 /// Removes a value from a tree.  
 /// </summary>  
 /// <param name="value">The value to remove.</param>  
 /// <returns>null if the value is stored in a child, otherwise if the value is in the root, it returns the new tree.</returns>  
 public BSTNode<t> Remove(int value)  
 {  
   // If the root node is the value, remove it, sort the children and return the new root.  
   if (value.CompareTo(Value) == 0)  
   {  
      // If the left and right nodes have values, make them into 1 tree and return that as the root.  
     if (Left != null && Right != null)  
     {  
       Left.Append(Right);  
       return Left;  
     }  
     else if (Left != null)  
     {  
       return Left;  
     }  
     else  
     {  
       return Right;  
     }  
   }  
   
   // If the left node has a value.  
   if (Left != null)  
   {  
     // And the left value is equal to the one to be removed.  
     if (Left.Value.CompareTo(value) == 0)  
     {  
       // And it has both child nodes, add the right tree to the left and use the left nodes left node as the left node.  
       if (Left.Right != null && Left.Left != null)  
       {  
         Left.Left.Append(Left.Right);  
         Left = Left.Left;  
       }  
       else if (Left.Right != null)  
       {  
         Left = Left.Right;  
       }  
       else if (Left.Left != null)  
       {  
         Left = Left.Left;  
       }  
       else  
       {  
         Left = null;  
       }  
     }  
     else if (value.CompareTo(Value) < 0)  
     {  
       Left.Remove(value);  
       return null;  
     }  
   }  
   
   if (Right != null)  
   {  
     if (Right.Value.CompareTo(value) == 0)  
     {  
       if (Right.Right != null && Right.Left != null)  
       {  
         Right.Left.Append(Right.Right);  
         Right = Right.Left;  
       }  
       else if (Right.Right != null)  
       {  
         Right = Right.Right;  
       }  
       else if (Right.Left != null)  
       {  
         Right = Right.Left;  
       }  
       else  
       {  
         Right = null;  
       }  
     }  
     else if (value.CompareTo(Value) > 0)  
     {  
       Right.Remove(value);  
       return null;  
     }  
   }  
   
   return null;  
 }  


There is some duplication here, we essentially do the same thing to the left and the right node and it could be simplified by abstracting that out into another method and doing it that way, or having the child nodes as an enumerable and iterating through. There's some new features in C# 6 that would make this much tidier, I'll re-visit it at some point.

Friday, 17 April 2015

Trees: I can't be-leaf them

It's been a while since I've really dived into functional programming like I had when I was working on the parser project. In an attempt to better express myself functionally I'm going to run through the common algorithms/data structures in both C# and F# using each as intended.

A few posts back I posted an implementation of merge sort. Merge sort, in it's divide & conquer nature lends itself well to functional programming because after division a new container is returned with the sorted items much like how you would approach a problem in an immutable fashion. I'll come back to other sorting algorithms at another point, but I want to dive into trees.

Starting with a binary search tree and working through seemed like a sensible place to start. I wanted to start with the F# version and follow with the C# version.


1
2
3
4
 // A tree node.  
 type 'a BSTNode =   
     | Node of 'a * 'a BSTNode * 'a BSTNode  
     | Leaf  

So far so good, concise discriminated union to store our data and we can use any type to populate the value. The first obvious thing to do is insert values into the tree. After a bit of thought I came up with this:


1
2
3
4
5
6
 // Insert a new value into the tree.  
 let rec Insert value = function  
   | Leaf -> Node(value, Leaf, Leaf);  
   | Node(v, left, right) when value < v -> Node(v, Insert value left, right);  
   | Node(v, left, right) when value > v -> Node(v, left, Insert value right);  
   | Node(_, _, _) as node -> node  

Pattern match the node in a recursive function, if it's empty, make the empty node a new node with the value. If it's not empty, decompose the node, compare the value against the one we want to insert and run the insert function on the most appropriate node, and then handle the default case.

Finding a value in the tree is pretty much the same idea except that instead of creating a new node we just want to give back the one we found, so it doesn't take much of an adjustment to get something like this:


1
2
3
4
5
6
7
 // Find a specific value within the tree.   
 let rec Find value = function  
   | Leaf -> Leaf  
   | Node(v, left, right) as node when v = value -> node  
   | Node(v, left, right) when value > v -> Find value right  
   | Node(v, left, right) when value < v -> Find value left  
   | Node(_, _, _) -> Leaf  

It's very concise and it works just fine.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
 [<entrypoint>]  
 let main argv =   
   // Create a root node.  
   let root = Node(5, Leaf, Leaf);  
   // Add some more items.  
   let root = Insert 3 root;  
   let root = Insert 4 root;  
   let root = Insert 2 root;  
   let root = Insert 1 root;  
   // Find the four.  
   let four = Find 4 root;  
   0  


Of course, the C# implementation needs a different approach, as it's an object oriented language. It seems sensible to start with the same 3 important points, the value and the children. We're missing the type inference we get in F# but we do have generics.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
 public class BSTNode<t> where T  
 {  
   private T Value;  
   private BSTNode<t> Left;  
   private BSTNode<t> Right;  
   public BSTNode(T value)  
   {  
     Value = value;  
     Left = null;  
     Right = null;  
   }  
 }  

As before, insert seems like the sensible next thing to implement. However, there's a problem. You can't apply the standard operators to generic types but there's a bit of a workaround in adding a constraint to T to IComparable:


1
 public class BSTNode<t> where T : IComparable  

 and using CompareTo.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
 /// <summary>  
 /// Add a value below the current node.  
 /// </summary>  
 /// <param name="value"></param>  
 public void Insert(T value)  
 {  
   if (value.CompareTo(Value) == 0)  
   {  
     return;  
   }  
   else if (value.CompareTo(Value) < 0)  
   {  
     if (Left == null)  
     {  
       Left = new BSTNode<T>(value);  
       return;  
     }  
     Left.Insert(value);  
   }  
   else  
   {  
     if (Right == null)  
     {  
       Right = new BSTNode<t>(value);  
       return;  
     }  
     Right.Insert(value);  
   }  
 }  

That's a lot more code than the F# implementation but they work in a similar way, something that it still has in common with Find:


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
 /// <summary>  
 /// Finds the Node with a given value, if it exists.  
 /// </summary>  
 /// <param name="value">The <see cref="int"/> value to find.</param>  
 /// <returns>The <see cref="BSTNode"/> containing the value, or null.</returns>  
 public BSTNode<t> Find(int value)  
 {  
   if (value.CompareTo(Value) == 0)  
   {  
     return this;  
   }  
   else if (value.CompareTo(Value) < 0)  
   {  
     if (Left != null)  
     {  
       return Left.Find(value);  
     }  
     return null;  
   }  
   else  
   {  
     if (Right != null)  
     {  
       return Right.Find(value);  
     }  
     return null;  
   }  
 }  

It's pretty clear to read and still works fine.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
 class Program  
 {  
   static void Main(string[] args)  
   {  
     BSTNode<int> root = new BSTNode<int>(10);  
     root.Insert(5);  
     root.Insert(15);  
     root.Insert(7);  
     root.Insert(6);  
   }  
 }  



Continued in the next post with deletion of nodes.

Wednesday, 15 April 2015

F# Poker Hand Analysis and GitHub

The F# Poker Hand Analysis code I'd written about in previous posts is now public on GitHub. I tidied it up slightly and put it up so that A) I don't lose it, because I thought it was interesting work and B) So that perhaps I can come back to it now and again, which is much easier when it's not lurking on a USB stick at the bottom of my bag. It's functionality includes:
- Reads in files containing hand histories and parses them form the perspective of the user.
- The information stored is complete, detailing every action made by each player.
- Unit tested extensively using existing and real hand history files.
- Can calculate a players 'VPIP' (voluntarily put in pot, % of times the player willingly engages in action) over a provided data set.

It's implemented with F#'s native functional features and as such lends itself well to concurrency. I'm hoping to continue by calculating players hand ranges, but if not I have some ideas about similar work in the domain.

The project exists here: https://github.com/bretcolloff/PokerHandAnalysis

Monday, 13 April 2015

Functional Programming and Algorithms

I found an old post I made sitting in my Gists on Github where I'd implemented merge sort in F#. At the time I was quite interested in the way algorithms change when you're 'confined' to a stateless world. For some it's easier than others and I might do some experimentation with them at some point. I do have a book on it that I'd planned to read at some point in the depths of my Kindle. Here was the code I found anyway:


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
 // Learn more about F# at http://fsharp.net  
 // Performs Merge Sort on a list of strings.  
 let rec MergeSort (input: int list) =   
   // Merges 2 lists together in ascended sorted order.  
   let Merge (left: int list, right: int list) =  
     let rec mergeLists (left: int list, right: int list, output: int list) =  
       match (left, right, output) with  
         | ([], right, output) -> output@right  
         | (left, [], output) -> output@left  
         | (left, right, output) when left.Head < right.Head -> mergeLists (left.Tail, right, output@[left.Head])  
         | (left, right, output) ->mergeLists (left, right.Tail, output@[right.Head])  
     mergeLists (left, right, [])  
   // Process the input.  
   if input.Length = 0 then []  
   else if input.Length = 1 then input  
   else if input.Length = 2 then  
     if input.[0] > input.[1] then [input.[1]; input.[0]]  
     else input  
   else  
     // Valid list size found, sort and merge.  
     let left = MergeSort (input |> Seq.take (input.Length / 2) |> Seq.toList)  
     let right = MergeSort (input |> Seq.skip (input.Length / 2) |> Seq.toList)  
     Merge (left, right)  

It could be more compact perhaps but given that I don't think there's a performance issue I like it in the slightly more readable format.

Friday, 10 April 2015

My Slippery Mind

I completely forgot that this existed. I only noticed when the domain came up for renewal and I thought "I should put a blog there" and realised I already had, and even posted to it. Weird. It's been a pretty busy 2 years since the previous post. My team in Microsoft, Mediaroom moved over to our new home in Ericsson and continued working on and announcing the project that had been keeping us busy all that time. I'll be back.