Skip to main content

Constructing a Trie in F#

This post might get a bit more context next week, but essentially, in continuation of looking at data structures in F# and C# I picked prefix trees (tries) as the next step. A trie is similar to a BST except the search is cumulative. A simple example would be building a spellchecker. The spell checker can use the structure to see if a word exists quickly and efficiently.

It works by creating a new node for each part of the word, using existing nodes if they're already in place. You can see in this diagram an example structure after an insertion of several words.

In the above diagram the words ANT, AND, BATS have been added into the data structure. You'll notice that the word BATS can also be the word BAT and that becomes a part of the structure, setting a value at each node to state if this is a place that a word terminates.


1
2
3
4
 // Node type for storing a trie.  
 type TrieNode =   
   | Node of char * TrieNode list * bool  
   | Root of TrieNode list  

I defined the type to have 2 options. The root node, which we want to basically ignore after we've looked through it's children and the node for storing data. It has a char for storing the value, a list of sub nodes and a bool to state whether this is where a word terminates.

The first part is to get inserting nodes into the structure sorted. There are some similarities to the BST, but my first attempt was very convoluted, and eventually was cut down into this:


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// Basic check for equality on a node against a character.  
 let matchChar c = (fun node -> match node with  
                 | Root(_) -> false  
                 | Node(v, _, _) -> v = c)  
    
 // Insert a new node into the trie.  
 let rec Insert (chars: char list) = function  
   // If this is the root node and it starts on a known character continue, otherwise add a new child, then continue.  
   | Root(list) -> if List.exists(matchChar chars.Head) list = false  
             then Root(list @ [(Node(chars.Head, [], false) |> Insert chars)])  
           else Root(List.map (fun node -> Insert chars node) list)  
   // If we've ran out letters it's now a word. If there's no children, make some, as we have more letters.  
   | Node(c, list, isword) when c = chars.Head -> if chars.Tail = []   
                             then Node(c, list, true)  
                           else if list = []   
                             then Node(c, [(Node(chars.Tail.Head, [], false) |> Insert chars.Tail)], isword)  
                           // If we need to add a child to continue, do so.  
                           else if List.exists(matchChar chars.Tail.Head) list = false   
                             then Node(c, list @ [(Node(chars.Tail.Head, [], false) |> Insert chars.Tail)], isword)  
                           // Continue down the tree.  
                           else Node (c, List.map (fun node -> Insert chars.Tail node) list, isword)  
   // Default case for and existing node.  
   | Node(c, list, isword) -> Node(c, list, isword)

There are probably some ways to simplify this further, but it does state the intention of the function clearly.

Much like with the BST, insertion was the trickier part of the important steps, and looking up a word was similar to insertion but simpler. We just want to keep going down the tree until we either run out of letters for the word we're looking up, in which case we want to check for a flag to see if this is a word termination node, or until we run out of tree, in which case it's obviously not a word.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
let Exists (word: string, tree: TrieNode) =   
   // Turn the string into a list of char.  
   let wordChars = [for character in word -> character]  
    
   // Look through the tree for the word.  
   let rec find (chars: char list) = function  
     // If it's the root, look at all the child nodes.  
     | Root(list) -> if list = []   
               then false  
             else list |> List.exists (fun node -> find chars node)  
     // If we've found the last character, is it a word? If there's no more tree, it's not a word, otherwise keep searching.  
     | Node(c, list, isword) when c = chars.Head -> if chars.Tail = []  
                               then isword  
                             else if list = []  
                               then false  
                             else list |> List.exists (fun node -> find chars.Tail node)  
     | Node(c, list, isword) -> false  
    
   // Find the word.  
   find wordChars tree  

Some testing verified that it behaved as expected, with a fairly simple program and a few breakpoints.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
 [<EntryPoint>]  
 let main argv =   
   let root = Root([]);  
   let b = Insert ['a';'b'] root  
   let c = Insert ['b';'c'] b  
   let d = Insert ['a';'b';'c'] c  
   let e = Insert ['a';'b';'d'] d  
   let f = Insert ['a';'b';'c';'d';'e';'f';'g';'h'] e  
   let exists = Exists ("abd", f)  
   
   0 // return an integer exit code  

Hopefully in the next post, we'll use it for something a bit more interesting.

Popular posts from this blog

A Docker Experiment

Containers are a topic that have been rising in occurrence for me, for the last year or so. From using them as part of the architecture at work or for various pet projects friends have been working on. I figured it was time to experiment myself and get to grips with what people have been talking about. It seemed like a good idea to find some introductory material that would give me an overview of its uses without delving too far into the details so I could see what it actually was, so I found a PluralSight course about Docker, specifically “Docker and Containers:The Big Picture”. This gave a nice overview of what Docker is, and more importantly what it was trying to achieve and how to use it. With a bit more of an understanding, I wanted to use it for something, preferably something familiar. I decided to try setting up ElasticSearch and Kibana containers, where Kibana would visualize the ElasticSearch data. I used bits of this article along the way as a guide, if you'd prefer a more…

Self-Producing an EP Pt.5: Mixing

After some discussion we opted to mix it ourselves for a few reasons.

It's our first EP, it's not going to be heard by a huge amount of people and while it's important that it sounds good, it's expensive to get done properly and we might be able to manage 'good enough' on our own with a lot of work.Mixing as an online service seems to be the main way to go for small/starting out bands and that seems too creatively detached. There are many stories floating around of people who've sent things off for mixing and the mixer has a much different idea of how it sounds.
Maybe it's not the best choice, but it seemed an acceptable risk that we would at least attempt to have it 'look' how we want and it be a bit wonky than potentially end up with something we're not happy with in a different way. The benefit would have been being able to just ship the stems off and make it someone else's problem instead of weeks of being unsure while swimming in the…