Paul Buis' Blog: Programming Languages: Discriminated Unions are called Enums with Attributes in Swift

In C, there was this scary thing called a union which was declared like a struct, but the fields of a union all shared the same address instead of being a separate offesets from the beginning of a struct. The fields could be of different types which made type errors easy to commit. Normally, one wrapped a union in a struct that had a "tag" field that would let the programmer tell which field name to use (and hence avoid type errors). Typically, one would use a switch on the tag and then have a case of for each of the possible tag values.

In more advanced languages, the compiler would make sure that when a specific tag value was present, only the associated field name of a union could be referenced. E.g., the field reference needed to be nested inside an "if" or a "switch/case" that had checked on the tag. If a "switch" was being used in such a language for this purpose, all possible tag values needed to have associated "case" labels. In these languages the data structure was referred to as a "discriminated" union and it was considered to be type safe.

Microsoft's F#, having descended from ML, has such a feature with this name. Interestingly, when accessing an F# discriminated union from C#, one sees the discriminated union type as an abstract base class with no data value, and each of the fields as an inner subclass of the abstract base class. In Apple's new Swift programming language, "enums with attributes" look like they get used with the "enum" part playing the role of a tag and each tag's attributes playing the role of field values in a discriminated union. Since enums are more common now than unions, this should make explaining them easier, but without an understanding of unions and how they are used, it that explanation will need to be done in a vacuum. Java also has enums that have attributes, but each possible enum value ends up having the same attribute type(s), so one has much less flexibility and Java enums should not be considered equivalent to discriminated unions. Swift's allowance for each enum value to have a different attribute types gives its enum types a different spin than Java's enums and makes them equivalent to ML/F# discriminated unions.

If one has both many (N) possible field values and many (M) possible operations one wants to perform you see two different patterns emerge if you have designed this kind of thing with discriminated unions or with a flat inheritance hierarchy with virtual methods. With discriminated unions, you end up with M functions, each "fat" with a big switch statement checking N possible field values. With OO inheritance, you have N classes, each with M virtual functions override a pure virtual function in the base class. Unless one field value is much more common then the rest and you check it at the top of each switch statement the discriminated union approach of sequentially having N machine level branches will probably be slower than virtual function dispatch which takes longer than a single branch, but can still be done in constant time.

The question of which is easier to maintain and expand, however, depends on what happens after the early design and implementation phase. Each time one adds a new potential field value, the discriminated union approach requires that each fat function be modified to insert a new case branch, and the compiler will complain if the new case branch isn't added. Each time one adds a new potential field value to the OO design, one needs a whole new class with a new virtual function for each operation. The compiler should you if you've failed to override all the pure virtuals from the base class as long as the inheriting subclass is not itself abstract. One way a single large chunk of code is inserted, in the other a bunch of small pieces have to be inserted scattered.

Maintenance and expansion goes is the other way around if you are adding a new operation to the design. This time, with the discriminated union approach, you need to add one big fat function including a switch statement and a long chain of cases. With the OO design, you need to add one virtual to each subclass. Again, the difference is between a single large chunk of code and a bunch of scattered pieces, but this time the what you have to do is flip-flopped the other way around between the two design approaches.

In most realistically complex problems, it could be argued that the OO design is really not a flat hierarchy and many subclasses in the system would not need to override the behavior of their super-classes. However, in that case the discriminated union approach becomes a set of nested discriminated unions and each function needs to cover fewer cases, so they both simplify. It just isn't clear to me that one approach is fundamentally better than the other, although my OO-loving colleagues would, I'm sure, try to tell me discriminated unions are bad because, for them, axiomatically OO="good."

Paul Buis' Blog

Thursday, June 12, 2014

Programming Languages: Discriminated Unions are called Enums with Attributes in Swift

No comments:

Post a Comment