- Setup
- The types
- Decoders
- Generating the histogram
- Finding “Julie days”
- Generating ranges
- Counting tweets within a range
- Printing the histogram
- A difference in our histograms
- 57 minutes
For our four-year “anniversary” of becoming Twitter pals, I decided to see how this program looks written with a different library. I decided to use the sv
library which didn’t even exist when Chris wrote the first version of this program.The sv
library on Hackage. Twitter no longer provides your archives in CSV format, and I don’t have an archive of my own tweets that is old enough to be in CSV format, so I couldn’t analyze my own tweets with this program. For those reasons, I used Chris’s old tweet archive, and so my goal with this program was to produce a histogram that matched Chris’s original.
This is my first time writing a CSV-processing program. I had, of course, read Chris’s original code before I started writing this, but to be quite honest, I found it extremely difficult to read and understand. I have never been a Scala or Java programmer, and I don’t think in terms of one big main
where all the action happens. It’s difficult for me to read such programs, and it’s nearly impossible for me to write them. So, what I have done here has ended up being quite different from his original program, and even fairly different from his refactored program. I didn’t read his refactored program before I wrote this, at least not until near the end, and I was surprised to find that, despite how different so much of our program looks, some of it looks exactly the same.
I was very happy with my decision to use the sv
library here. It’s not really a CSV-parsing library; it’s a set of combinators and wrappers around a CSV-parsing library. It uses a library called hw-dsv
The hw-dsv
library on Hackage.for the parsing, and there is an sv-cassava
The sv-cassava
library on Hackage. package that provides the sv
set of combinators and types but uses cassava
for the parsing. As such, I’m not going to be discussing how the parsing gets done at all, instead focusing on using the sv
package.
Setup
I started off the project with two modules. For a bigger project, I likely would have wanted more, but it’s not always clear to me how many I want from the start, so I usually divide things up later. However, at a minimum I want to follow Haskell’s example and keep my IO separate from my pure functions. I called my two modules Main
and Parse
, where Main
has the main
executable and imports Parse
. The latter probably isn’t a very good name for this module, but it’s fine.
The Main
module also contains a couple of supporting definitions for the main
executable. The sv
library supports other delimiter-separated value file types and can work with or without headers, so the first things I added to Main
were those options, along with the necessary imports.
The default parse options there are for comma-separated values with headers, which is perfect for the Twitter data we’re working with.
Next, I chose to define a variable for the filepath.
We’re not making that file available. Change the file path appropriately if you’re following along with some other Twitter data.
Then comes the main parsing function from sv
: parseDecodeFromFile
which, according to the documentation, loads a file, parses it, and decodes it. By decode they seem to mean the process of turning the parsed CSV into “a list of your Haskell datatype.” Although sv
offers some other parse
functions for different situations, this seems to have the basic functionality we’re looking for. It takes three arguments: a decoder, some parse options (defined above as opts
), and a file path (defined above as file
). It returns a m (DecodeValidation ByteString [a])
; the m
is constrained by MonadIO
and I made it concrete as IO
. So, my main parsing function looks like this.
The decoder, here called tweetsDecoder
, is something I have to provide, and it amounts to specific instructions for how to read each field into my Tweet
datatype. I wrote that in the other module, along with the Tweet
type that this will make a list of.
If you’re already familiar with the Validation
type, then you may already wonder if DecodeValidation
is a reference to that, and it is! DecodeValidation
is a type synonym for Validation
, so it shares the same Applicative
instance.We have written about the Validation
type and its Applicative
previously. I love working with the Validation
type, so I was pretty pleased with this. You can see in Chris’s writeup that the equivalent part of his program returns an Either
, so in his main
, he’s case matching on Right
and Left
, but since I’m using a Validation
type, mine will have Success
and Failure
for its two cases. It ended up not having any practical ramifications, because I never ended up having any errors, but, nevertheless, I always appreciate that, if I did, Validation
has the ability to tell me all of the errors in one error message, instead of only the first one it failed on.
OK, so my first iteration of main
looked like this.
The Failure
case won’t ever need to change, I think; all it does is tell me there was a failure and print a list of the errors and then exit. The Success
case, on the other hand, changed often as I worked through the program because I changed it each time to “test” various functions that I wrote. Once I had some basic decoding functions in place and the tweetsDecoder
function working, I could, for example, print a tweet record by indexing into the list:
I do this a lot, and then usually run main
in GHCi, because it helps me see what I’m doing. I need to see what the outputs of different steps look like; I need to see what I’m working with. So, while ghcid is useful and I do keep it running to keep the fast typechecking going, I also run main
in GHCi a lot. In this case, I haven’t told you what the tweetsDecoder
looks like yet, so for now it doesn’t work, but it will soon!