Wednesday, 17 April 2013

Irregular Expressions

To parse the input of my hand history parser, I thought about a couple of possible options for reading the text. Given the dynamic nature of the file format, from my limited use of regular expressions up to that point, I decided they would be the best tool for the job and would also give me an opportunity to learn about them a bit. You can start as simply as just checking a string will or won’t match:

 bool match = Regex.IsMatch("exampletext", ".+");  

But when you have lines of text that contain multiple sections of information you will have to stray into grouped matching. Let’s say a line contains some summary information, that a player has won some amount of money at the end of a hand and as such their cards are displayed. Eg.

 "Player1 won $1.90 with Ah Jh"  

Grouped matching allows you to take all the useful information at once:


Using that string we can now get the useful information out:

 string input = "Player1 won $1.90 with Ah Jh";  
 string matchString = @"(?<player>[\w\s\.]+)\swon\s\$(?<amount>\d+\.\d{2})\swith\s(?<leftcard>.{2})\s(?<rightcard>.{2})";  
 var matches = Regex.Matches(input, matchString);  
 string playerName = matches[0].Groups["player"].Value;  
 float amount = Convert.ToSingle(matches[0].Groups["amount"].Value);  
 string leftCard = matches[0].Groups["leftcard"].Value;  
 string rightCard = matches[0].Groups["rightcard"].Value;  

Let’s suppose the string gets a little more troublesome, and the log marks the players role at the table (dealer button, small blind, big blind), and nothing for the rest. Eg.

 "Player1 (dealer) won $1.90 with Ah Jh"  
 "Player1 (small blind) won $1.90 with Ah Jh"  
 "Player1 (big blind) won $1.90 with Ah Jh"  
 "Player1 won $1.90 with Ah Jh"  

That match string will fail for ¾ of the cases supplied above, so now we need to use optional matching. Let’s say we start with just the first and last possibilities, dealer and normal player:


Where we’ve added (dealer\s|), the | allowing matching of the text either side (‘dealer ‘, or nothing extra). Or if it was just the small and big blind, it could be modified to have (small|big)\sblind\s and if we want to catch all of these possibilities, (((big|small) blind|button)\s|) giving us the full string:

 "(?<player>[\w\s\.]+)\s(((big|small) blind|button)\s|)won\s\$(?<amount>\d+\.\d{2})\swith\s(?<leftcard>.{2})\s(?<rightcard>.{2})"  

Which will happily match all of my potential options.