Data classes

Python classes are “a means of bundling data and functionality together.”Classes in The Python Tutorial Data classes are a recent addition to Python Data classes first appeared in Python 3.7, released in June 2018. that place a heavier emphasis on the data than the functionality aspect. This invites a pleasant comparison with Haskell datatypes, which exhibit a more distinct separation between data and functionality.

The following example defines a type representing a point in three dimensions as the conjunction of three integers:

This class looks and behaves quite similarly to the following Haskell datatype defined using record syntax:

Data classes can also have all the complexities of regular Python classes, and the dataclass decorator is a good deal more feature-packed than we discuss here. We will focus on the primary purpose of data classes: the field list and the special methods that are generated based on the field list.

Init

__init__ __init__ is a special method Special methods are sometimes called “dunder” methods (for the double underscores in their names). that initializes the instance; it is called when a value is first constructed. In the example above, the dataclass decorator in the example above generates the following method for the Point class:

The effect is roughly akin to what the constructor of a Haskell record does. Here’s how we create a new Point value, in Python and in Haskell:

Types vs constructors

In both languages, one may find it slightly unsettling that the word “Point” describes two things:

  1. The type;
  2. A function that returns a value of that type.

How can “Point” be overloaded to mean both of those things? The answer varies between the two languages. The Python and Haskell REPL sessions above look more or less identical, but if you’re interested in the subtle distinctions about what’s going on “under the hood”, as they say, read on.

The Python expression Point, by itself, refers to the class. We can see this by typing Point into the REPL:

The reason we can write the expression Point(3,4,5) is that Python classes are callable; this means they have the special method __call__. __call__ The Python documentation describes how the expression desugars:

if this method is defined, x(arg1, arg2, ...) is a shorthand for x.__call__(arg1, arg2, ...).

We can verify in the REPL that this method is in fact defined on Point:

And we can also evaluate the desugared form and see that it does produce the same result.

In Haskell, the distinction between Point the type and Point the function is made by context. We will use the following function definition to illustrate. This function accepts one Point argument, and returns a new Point where the x coordinate is reduced by one. Interpreted according to the convention that the axis labeled X in three-dimensional space is oriented with lesser numbers to the left and greater numbers to the right, this function’s output is one unit to the left of its input, and so we have chosen to name it moveLeft.

“Point” appears multiple times here: In a type context, in a pattern context, and in an expression context.

  • What comes after two colons (::) is a type.
  • What comes before the equal symbol (=) is a pattern. As a pattern, a constructor splits up a value into its consituent parts.
  • What comes after the equal symbol (=) is an expression. As an expression, a constructor builds a value from its constituent parts.

From the contexts in which they appear, then, we can tell:

  1. “Point” refers to the Point type
  2. “Point” is a function that constructs a value of the Point type.

Deriving

The other special methods that data classes generate automatically correspond to Haskell typeclass deriving.

  • __repr__ – equivalent to deriving Show
  • __eq__ – equivalent to deriving Eq.
  • __lt__, __le__, __gt__, and __ge__If the dataclass function is called with order=True. – equivalent to deriving Ord.
  • __hash__If the dataclass function is called with frozen=True. – equivalent to deriving Hashable, which we discuss below.

Replacing field values

The replace function dataclasses.replace lets you construct a new instance based on an existing one, with one or more field values modified.

Somewhat uncharacteristically, this is not accomplished with a function in Haskell, but instead is a feature of the built-in record syntax.

Hashing

Hashing is important in Python because it is the basis of the set and dict structures. For some variety we’ll switch up the example; this time we’ll define a data class called Color:

Suppose we want to make a set of the primary colors.

Colors can’t go into a set unless they’re hashable. We can get that by requesting a frozen data class, which entreats Python to generate an implementation of the __hash__ function.

Now we have a hash function for Color:

And thus now we can put Colors into a set.

Haskell doesn’t have hashing as a built-in concept. Instead of hash-based collections, we more often use the Ord-based collections See the containers package on Hackage. found in the containers library.

But we do have hash-based collections HashSet is defined in the unordered-collections package. in Haskell as well. If we want to store a collection of colors in a HashSet, then the Color type must have an instance of the Hashable typeclass. We will need to turn to the hashable library See the hashable package on Hackage. and a few language extensions. For more about what’s going on here, see our page on deriving strategies.

Default values

Python’s data classes can have default values for its fields, just like Python functions can have defaults for their parameters. Haskell does not have a concept of parameter defaults, so this is something we have to reckon with.

Let’s look at the first example from PEP 557: PEP 557

The default value of 0 for quantity_on_hand in the Python data class allows us to construct an InventoryItem with only two parameters instead of three:

Our corresponding Haskell type declaration has no ability to specify default values for its fields. Note that this example uses floating-point numbers to represent money, which is not generally a good idea due to the possibility of rounding error. Consider using the safe-money package if you deal with money.

As an analogue in Haskell, we might write a function with two parameters.

Or more tersely using record wildcards:

Now we can use the inventoryItem function like we used the Python InventoryItem constructor:

Instance methods

There is one more piece of the example from PEP 557 that we haven’t mentioned yet: the total_cost method on the InventoryItem class.

We do not typically attach functions to data in this way in Haskell. The total_cost function becomes a top-level function in the Haskell version. We use the fromIntegral function to convert from Integer to Double (or, more generally, from any integer-like type to any other type of number).

Alternatively, using the RecordWildCards extension:

Join Type Classes for courses and projects to get you started and make you an expert in FP with Haskell.