Wednesday, 10 April 2013

Identifying and Processing Flexible Structured Data

In writing a card game history parser, I came across an interesting problem. At first it seemed like it might be mostly straightforward, as these types of problems often do. When you consider a game like Texas Hold’em, there are several rounds of betting, referred to as streets. These streets as you expect are sequential. Preflop always leads to the flop, the flop always leads to the turn, the turn always leads to the river and if more than one player got that far, the player with the strongest hand takes the money. The problem I ran into is in the ‘if’ part of that last sentence. When a sufficient number of players during any part of the hand folds such that there is only 1 remaining, the hand is then short-circuited to the summary, which states some brief information about what happened. When you consider the possible combinations here:

Preflop -> Summary
Preflop -> Flop -> Summary
Preflop -> Flop -> Turn -> Summary
Preflop -> Flop -> Turn -> River -> Summary
Preflop -> Flop -> Turn -> River -> Showdown -> Summary

Each of these stages has an identifying string, so it’s easy to check where they begin. The first approach I took to this was not particularly efficient, and excessively complicated. In this implementation the following happened (warning, much code ahead):

 let GetLinesBetweenMatches (input: string list, startMatch: string, endMatch: string) =  
  let indexOfStart = (input.TakeWhile (fun x -> Regex.Matches(x, startMatch).Count = 0)).Count()  
  let indexOfEnd = (input.TakeWhile (fun x -> Regex.Matches(x, endMatch).Count = 0)).Count()  
  let numberOfLines = indexOfEnd - indexOfStart  
  if startMatch.Length = 0 then  
  input.Skip(indexOfStart + 1).Take(numberOfLines - 1)  
 let flopExists = (List.forall (fun (x: string) -> Regex.Matches(x, "\*\*\*\sFLOP\s\*\*\*").Count = 0) input) = false  
 let turnExists = (List.forall (fun (x: string) -> Regex.Matches(x, "\*\*\*\sTURN\s\*\*\*").Count = 0) input) = false  
 let riverExists = (List.forall (fun (x: string) -> Regex.Matches(x, "\*\*\*\sRIVER\s\*\*\*").Count = 0) input) = false  
 let gotToShowdown = (List.forall (fun (x: string) -> Regex.Matches(x, "\*\*\*\sSHOW\sDOWN\s\*\*\*").Count = 0) input) = false  
 // Process the preflop action  
 let preflopStrings = if flopExists then GetLinesBetweenMatches (input, "Dealt to", "\*\*\*\sFLOP\s\*\*\*")  
    else if gotToShowdown then GetLinesBetweenMatches (input, "Dealt to", "\*\*\*\sSHOW\sDOWN\s\*\*\*")  
    else GetLinesBetweenMatches (input, "Dealt to", "\*\*\*\sSUMMARY\s\*\*\*")  
 let pfact = [for x in (ProcessStreet (Seq.toList preflopStrings, players)) -> x] |> (fun x -> x)  
 let preflop = { actions = pfact.ToList(); streetType = Streets.Preflop }  
 // Process the flop action  
 let flopStrings = if turnExists then GetLinesBetweenMatches (input, "\*\*\*\sFLOP\s\*\*\*", "\*\*\*\sTURN\s\*\*\*")  
    else if gotToShowdown then GetLinesBetweenMatches (input, "\*\*\*\sFLOP\s\*\*\*", "\*\*\*\sSHOW\sDOWN\s\*\*\*")  
    else GetLinesBetweenMatches (input, "\*\*\*\sFLOP\s\*\*\*", "\*\*\*\sSUMMARY\s\*\*\*")  
 let flact = [for x in (ProcessStreet (Seq.toList flopStrings, players)) -> x] |> (fun x -> x)  
 let flop = if flopExists then Some({ actions = flact.ToList(); streetType = Streets.Flop })  
    else None  
 if flop = None then  
  if gotToShowdown then  
  { metaData = metaData; players = players.ToList(); streets = ([preflop; showdown.Value]).ToList() }  
  { metaData = metaData; players = players.ToList(); streets = ([preflop]).ToList() }  
 else if turn = None then  
  if gotToShowdown then   
   { metaData = metaData; players = players.ToList(); streets = ([preflop; flop.Value; showdown.Value]).ToList() }  
   { metaData = metaData; players = players.ToList(); streets = ([preflop; flop.Value]).ToList() }  
 /// ...skipping the turn and river parsing and processing code, you get the idea that it's a branching nightmare and it would be at least double this length.  

As you can see, it’s complicated, inelegant and difficult to modify. I was aware of this at the time, but wanted to sketch something out so that I understood the problem much better. I considered what the better approach would be for a little while, as it was evident that I was trying to solve the problem in an object oriented manner, when it wasn’t an object oriented problem. I mostly wrote this up to demonstrate the difference a small ‘a-ha!’ moment can make, and probably the first time I’d come to a much better answer through the use of this new paradigm.
The improved code is shown below:

 /// Maps a street identifier to each action.  
 let MapHand (input: List<string>) =  
  let street = ref Streets.MetaData  
  seq { for line in input do  
  if Regex.Matches(line, "Seat \d: (?<name>.+) \(\$(?<stack>\d+(\.\d{2})?) in chips\)").Count > 0 then street := Streets.Players  
  else if Regex.Matches(line, "posts small blind").Count > 0 then street := Streets.Blinds  
  else if !street = Streets.Hero then street := Streets.Preflop  
  else if Regex.Matches(line, "Dealt to (?<hero>.+)\s\[(?<leftcard>.{2})\s(?<rightcard>.{2})\]").Count > 0 then street := Streets.Hero  
  else if Regex.Matches(line, "\*\*\*\sFLOP\s\*\*\*").Count > 0 then street := Streets.Flop  
  else if Regex.Matches(line, "\*\*\*\sTURN\s\*\*\*").Count > 0 then street := Streets.Turn  
  else if Regex.Matches(line, "\*\*\*\sRIVER\s\*\*\*").Count > 0 then street := Streets.River  
  else if Regex.Matches(line, "\*\*\*\sSHOW\sDOWN\s\*\*\*").Count > 0 then street := Streets.ShowDown  
  else if Regex.Matches(line, "\*\*\*\sSUMMARY\s\*\*\*").Count > 0 then street := Streets.Summary  
  yield (!street, line) } |> Seq.toList  
 let ProcessHand (input: (Streets * string) list) =  
  let FilterByStreet lines filterStreet =  
  lines |> List.filter (fun (street, _) -> street = filterStreet)  
  |> (fun (_, action) -> action)  
  let result = ProcessResult (FilterByStreet input Streets.Summary)  
  let filtered = input |> List.filter (fun (_, action) -> FilterAction action)  
  let metaData = ProcessMetaData (FilterByStreet filtered Streets.MetaData)  
  let players = GetPlayers (FilterByStreet filtered Streets.Players)  
  let hero = GetHero ((FilterByStreet filtered Streets.Hero), players)  
  let blinds = { actions = ProcessStreet ((FilterByStreet filtered Streets.Blinds), players); streetType = Streets.Blinds }  
  let preflop = { actions = ProcessStreet ((FilterByStreet filtered Streets.Preflop), players); streetType = Streets.Preflop }  
  let flop = { actions = ProcessStreet ((FilterByStreet filtered Streets.Flop), players); streetType = Streets.Flop }  
  let turn = { actions = ProcessStreet ((FilterByStreet filtered Streets.Turn), players); streetType = Streets.Turn }  
  let river = { actions = ProcessStreet ((FilterByStreet filtered Streets.River), players); streetType = Streets.River }  
  let showdown = { actions = ProcessStreet ((FilterByStreet filtered Streets.ShowDown), players); streetType = Streets.ShowDown }  
  let summary = { actions = ProcessSummary ((FilterByStreet filtered Streets.Summary), players); streetType = Streets.Summary }  
  let streets = [blinds; preflop; flop; turn; river; showdown; summary] |> List.filter (fun x -> x.actions.Length <> 0)  
  { metaData = metaData; players = players; streets = streets; result = result }  
 /// ...back into the main processing function.  
 seq { for x in lines do yield MapHand x} |> Seq.toList  
       |> (fun hand -> try Some (ProcessHand hand) with | InvalidHand(_) -> None)  
       |> List.filter (fun processedHand -> processedHand.IsSome)  
       |> (fun hand -> hand.Value)  

In this solution, there’s a value that states the street, each street is paired to a regular expression string. It iterates through the lines of the history and yields a tuple of the line and the street. This sequence of tuples is then given to the process hand function which the code for is in the snippet above which creates values for all streets. If those streets don’t exist, the ProcessStreet function will give us an empty list, and all of the streets are added into a single list, then filtered for empty lists. It’s not purely functional, as there is a mutable variable used in processing the file, but with a recursive function the mutable value wouldn't be necessary anyway, as we’d just give the next use of the function the new value we want to use instead.
In this solution, the readability has improved and so has the extensibility, if a new street is invented in some sort of variant, it’s very simple to include without any branching.
If I’d written a solution to the above problem in C#, I’d imagine I’d have come to a similar solution eventually, but it was the awareness of the functional concepts of map and filter that were important here, and steered me back to describing what I wanted instead of writing the steps to get it myself.

Edit: I'm guessing I'm going to need to write an F# add-on to that syntax highlighter.