- Semigroup in Prelude
- Parts of main
- Process-and-print
- Specialization
- Where
- Count and filter
- Range type
- A list of ranges
- Process, then print
- With input
- Read, decode, fail
- Day parsing
- Crashing concern
- Interpret, filter, group
- Zipping ranges with numbers
- Error aggregation
- Failure values
- Using the header
- Final result
- 20 videos, 126 minutes total
The original program was one of the first things I ever produced in Haskell, and it looks very different from something I’d write today. In this lesson, I walk through the process of cleaning it up, thinking carefully about what makes a well-designed program.
A few of the general ideas covered here:
- Splitting up long definitions
- Separating pure functions from I/O
- Avoiding the use of partial functions
- Printing aggregated error information
Semigroup in Prelude
This program was written in 2016 and it is now 2020, so I anticipate that it may need some small adjustments to bring it up to date with the latest libraries. Fortunately, it does all still compile.
The only thing I see when I load this code into GHCi with -Wall
is one warning:
warning: [-Wunused-imports]
The import of ‘Data.Monoid’ is redundant
|
13 | import Data.Monoid ((<>)) | ^^^^^^^^^^^^^^^^^^^^^^^^^
I had imported the <>
operator because it was not yet in Prelude
at the time. As of GHC 8.4 in 2018, <>
is now in Prelude
, so we can remove this import.
Parts of main
To be honest, I can’t immediately tell what this code is doing – I think the biggest problem is that nearly all of it is in one big main
definition, which I attribute to the indiscretions of my youth. The first thing I do when I find something like this is start to break it up in to smaller definitions.
I do at least remember that this program follows a classic three-step pattern: read some stuff from a file, interpret the data, and print the results. So this is what I want main
to look like:
Unfortunately, the program as I had written it doesn’t decompose this way. Look at what I had done:
-- ...
in sequence_ $ do
<- bins
bin let count = julieDays
& mfilter (liftA2 (&&) (>= bin) (< nextBin bin))
& length
return $ putStrLn $ show bin <> " " <> show count
This doesn’t ever produce a value that represents the output. Instead what we have here is an imperative-style loop that prints each line of output as it goes. The data processing and the output printing are intertwined. So I’m going to abandon this attempt to simplify main
for the moment, and hope I can come back around to it eventually.
Process-and-print
I do think that everything that follows Right rows -> ...
in the definition of main
is begging to be written as its own top-level function.
main = do
bs <- Bs.readFile "tweets.csv"
let parsed = (Csv.decode Csv.HasHeader bs)
:: Either String (Vector [Text])
case parsed of
Left err -> putStrLn err
Right rows ->
processDataAndPrintOutput rows
processDataAndPrintOutput rows =
let julieDays = findJulieDays rows
firstDay = minimum julieDays
-- ...
I have given it an awkwardly long name to reflect my irritation that it does two things.
GHCi provides the type signature for the new function:
But I’m going to simplify it because I know that m
is the Vector
of tweets that we get from parsing the CSV file.