My company deals with file parsing quite a bit, and I've been playing around with F# to see if I could write a better file parsing library using it instead of what we currently use in the C# world. This experimentation led me to F# Active Patterns.
F#, along with other languages, has a concept of Pattern Matching. Pattern matching allows a programmer to transform data by matching patterns in the shape of the data automatically, without a lot of if-then-else branching logic.
Pattern matching, using the match keyword, functions a lot like a switch statement in C#. A typical pattern matching statement in F# would look something like:
let filter1 x = match x | 1 -> printfn "The value is 1." | _ -> printfn "The value is not 1." filter1 5. // The value is not 1. filter1 1. // The value is 1.This code defines a function, filter1, that takes an integer. It checks to see if the integer is a constant value of 1. If it is, it prints a message telling us it is 1. If it isn't, it tells us the value is not 1. This code illustrates a constant pattern, and their are many additional options. The other options are described here.
The example above, and the other basic pattern matching, is great, but not enough for some of my file parsing needs. Many of the files that we need to parse are fixed-width ASCII text files. Parsing the data requires defining fields in a particular order with a specified size. This aspect is unavoidable, but I wanted to find an elegant, way to approach this problem.
This is where I found Active Patterns. Active Patterns allow you to define a named pattern and apply it in a match statement. This pattern will match and parse the data if defined correctly. It is also the mechanism for matching with regular expressions, which lend themselves well to my file parsing problem. An Active Pattern looks like this:
let (|EmailMatchActivePattern|_|) input = let m = Regex.match(input, "(.*)@(.*)") if m.Success then Some m.Groups.[2].Value else NoneThis construct creates an Active Pattern named EmailMatchActivePattern and it takes in input to work against. Inside the definition, I'm using a regex for a basic email pattern (.*)@(.*). If the regex matches successfully, I return the second grouping. (The indexing starts at 0, but the 0 index contains the entire string that was tested with the regex. The following indices contain the matched classes from the regex.)
Now that I have a named pattern defined, I can use it in a normal pattern matching scenario to match a line from a file and parse the data in that line. I do that with code like:
let parseLine line = match line with | EmailMatchActivePattern (domainName) -> printfn "This is the domain in the email %s" domainName | _ -> printfn "Not an email addres."This code parses a line of text to find an email address. If the line matches, it parses the domain portion of the email address into domainName. I can then reference domainName in the subsequent function call (a print statement). This code matches and parses the data at the same time.
I took this concept further to explore the file parsing that I described at the beginning. In this exploration, I read lines from a file, match the first two characters of the line to determine a record type, and then parse the line into tuples with individual values. This construct allowed me to define my file format as a couple of predefined regular expressions and to match the records and parse them at the same time. The source code for this is below.
Active Patterns is an F# construct that can be used to dynamically match and parse data. This is a powerful construct that can be used to build tools to parse files and other data structures with a minimal amount of code.
No comments:
Post a Comment