- 20 videos, 126 minutes total
The original program was one of the first things I ever produced in Haskell, and it looks very different from something I’d write today. In this lesson, I walk through the process of cleaning it up, thinking carefully about what makes a well-designed program.
A few of the general ideas covered here:
- Splitting up long definitions
- Separating pure functions from I/O
- Avoiding the use of partial functions
- Printing aggregated error information
Semigroup in Prelude
This program was written in 2016 and it is now 2020, so I anticipate that it may need some small adjustments to bring it up to date with the latest libraries. Fortunately, it does all still compile.
The only thing I see when I load this code into GHCi with
-Wall is one warning:
warning: [-Wunused-imports] The import of ‘Data.Monoid’ is redundant | 13 | import Data.Monoid ((<>)) | ^^^^^^^^^^^^^^^^^^^^^^^^^
I had imported the
<> operator because it was not yet in
Prelude at the time. As of GHC 8.4 in 2018,
<> is now in
Prelude, so we can remove this import.
Parts of main
To be honest, I can’t immediately tell what this code is doing – I think the biggest problem is that nearly all of it is in one big
main definition, which I attribute to the indiscretions of my youth. The first thing I do when I find something like this is start to break it up in to smaller definitions.
I do at least remember that this program follows a classic three-step pattern: read some stuff from a file, interpret the data, and print the results. So this is what I want
main to look like:
Unfortunately, the program as I had written it doesn’t decompose this way. Look at what I had done:
-- ... in sequence_ $ do <- bins bin let count = julieDays & mfilter (liftA2 (&&) (>= bin) (< nextBin bin)) & length return $ putStrLn $ show bin <> " " <> show count
This doesn’t ever produce a value that represents the output. Instead what we have here is an imperative-style loop that prints each line of output as it goes. The data processing and the output printing are intertwined. So I’m going to abandon this attempt to simplify
main for the moment, and hope I can come back around to it eventually.
I do think that everything that follows
Right rows -> ... in the definition of
main is begging to be written as its own top-level function.
main = do bs <- Bs.readFile "tweets.csv" let parsed = (Csv.decode Csv.HasHeader bs) :: Either String (Vector [Text]) case parsed of Left err -> putStrLn err Right rows -> processDataAndPrintOutput rows processDataAndPrintOutput rows = let julieDays = findJulieDays rows firstDay = minimum julieDays -- ...
I have given it an awkwardly long name to reflect my irritation that it does two things.
GHCi provides the type signature for the new function:
But I’m going to simplify it because I know that
m is the
Vector of tweets that we get from parsing the CSV file.