- The types
- Generating the histogram
- Finding “Julie days”
- Generating ranges
- Counting tweets within a range
- Printing the histogram
- A difference in our histograms
- 57 minutes
For our four-year “anniversary” of becoming Twitter pals, I decided to see how this program looks written with a different library. I decided to use the
sv library which didn’t even exist when Chris wrote the first version of this program.The
sv library on Hackage. Twitter no longer provides your archives in CSV format, and I don’t have an archive of my own tweets that is old enough to be in CSV format, so I couldn’t analyze my own tweets with this program. For those reasons, I used Chris’s old tweet archive, and so my goal with this program was to produce a histogram that matched Chris’s original.
This is my first time writing a CSV-processing program. I had, of course, read Chris’s original code before I started writing this, but to be quite honest, I found it extremely difficult to read and understand. I have never been a Scala or Java programmer, and I don’t think in terms of one big
main where all the action happens. It’s difficult for me to read such programs, and it’s nearly impossible for me to write them. So, what I have done here has ended up being quite different from his original program, and even fairly different from his refactored program. I didn’t read his refactored program before I wrote this, at least not until near the end, and I was surprised to find that, despite how different so much of our program looks, some of it looks exactly the same.
I was very happy with my decision to use the
sv library here. It’s not really a CSV-parsing library; it’s a set of combinators and wrappers around a CSV-parsing library. It uses a library called
hw-dsv library on Hackage.for the parsing, and there is an
sv-cassava library on Hackage. package that provides the
sv set of combinators and types but uses
cassava for the parsing. As such, I’m not going to be discussing how the parsing gets done at all, instead focusing on using the
I started off the project with two modules. For a bigger project, I likely would have wanted more, but it’s not always clear to me how many I want from the start, so I usually divide things up later. However, at a minimum I want to follow Haskell’s example and keep my IO separate from my pure functions. I called my two modules
Main has the
main executable and imports
Parse. The latter probably isn’t a very good name for this module, but it’s fine.
Main module also contains a couple of supporting definitions for the
main executable. The
sv library supports other delimiter-separated value file types and can work with or without headers, so the first things I added to
Main were those options, along with the necessary imports.
The default parse options there are for comma-separated values with headers, which is perfect for the Twitter data we’re working with.
Next, I chose to define a variable for the filepath.
We’re not making that file available. Change the file path appropriately if you’re following along with some other Twitter data.
Then comes the main parsing function from
parseDecodeFromFile which, according to the documentation, loads a file, parses it, and decodes it. By decode they seem to mean the process of turning the parsed CSV into “a list of your Haskell datatype.” Although
sv offers some other
parse functions for different situations, this seems to have the basic functionality we’re looking for. It takes three arguments: a decoder, some parse options (defined above as
opts), and a file path (defined above as
file). It returns a
m (DecodeValidation ByteString [a]); the
m is constrained by
MonadIO and I made it concrete as
IO. So, my main parsing function looks like this.
The decoder, here called
tweetsDecoder, is something I have to provide, and it amounts to specific instructions for how to read each field into my
Tweet datatype. I wrote that in the other module, along with the
Tweet type that this will make a list of.
If you’re already familiar with the
Validation type, then you may already wonder if
DecodeValidation is a reference to that, and it is!
DecodeValidation is a type synonym for
Validation, so it shares the same
Applicative instance.We have written about the
Validation type and its
Applicative previously. I love working with the
Validation type, so I was pretty pleased with this. You can see in Chris’s writeup that the equivalent part of his program returns an
Either, so in his
main, he’s case matching on
Left, but since I’m using a
Validation type, mine will have
Failure for its two cases. It ended up not having any practical ramifications, because I never ended up having any errors, but, nevertheless, I always appreciate that, if I did,
Validation has the ability to tell me all of the errors in one error message, instead of only the first one it failed on.
OK, so my first iteration of
main looked like this.
Failure case won’t ever need to change, I think; all it does is tell me there was a failure and print a list of the errors and then exit. The
Success case, on the other hand, changed often as I worked through the program because I changed it each time to “test” various functions that I wrote. Once I had some basic decoding functions in place and the
tweetsDecoder function working, I could, for example, print a tweet record by indexing into the list:
I do this a lot, and then usually run
main in GHCi, because it helps me see what I’m doing. I need to see what the outputs of different steps look like; I need to see what I’m working with. So, while ghcid is useful and I do keep it running to keep the fast typechecking going, I also run
main in GHCi a lot. In this case, I haven’t told you what the
tweetsDecoder looks like yet, so for now it doesn’t work, but it will soon!