The Phantom Type Menace

This is the fourth in a series of posts covering concepts in functional programming:

This time around we're talking about phantom types.

Important Disclaimers

Please note that:

This post is based on a famous article by Heiko Seeberger. His approach is the best I'm aware of for writing Phantom Types without needing to resort to external libraries or move to more purist languages like Haskell. So credit where it's due...
At time of writing the IntelliJ IDE reports errors when using the 'type sandwiches' described below. But the code compiles as it should.

Warmup - Generics and Type Safely

We're all familiar with the idea of using Generic types to enhance the safety of our code. In modern compiled languages type parameters are used to control what inputs are allowed into functions and hence what operations are permitted.

The standard options are set out in the Java example below:

public class Program {
	public void demo() {
		tst1(new LinkedList<Employee>());
		tst1(new LinkedList<Manager>());
		tst1(new LinkedList<Director>());

		//tst2(new LinkedList<Employee>()); //Will not work
		tst2(new LinkedList<Manager>());
		tst2(new LinkedList<Director>());

		tst3(new LinkedList<Employee>());
		tst3(new LinkedList<Manager>());
		//tst3(new LinkedList<Director>()); //Will not work
	}
	private void tst1(List<?> input) {
		//input.add(????);           //We can't add anything safely here
		Object obj = input.get(0);   //The returned type can only be Object
	}
	private void tst2(List<? extends Manager> input) {
		//input.add(????);           //We still can't add anything safely
		Manager emp = input.get(0);  //But the returned type can be Manager
	}
	private void tst3(List<? super Manager> input) {
		input.add(new Manager());    //We can add managers
		input.add(new Director());   //We can add directors
		Object obj = input.get(0);   //But the returned type can only be Object
	}
}

When we use the wildcard by itself in tst1 we are saying that any type of input is permitted, which means that nothing can be added and only references of type Object are returned.

When we use an upper bounded wildcard (via the extends keyword) in tst2 we will still be unable to add items, but now the return type can be Manager. So obviously all the methods of the Manager type and its super-types are available.

Finally when we add a lower bounded wildcard (via the super keyword) in tst3 we will be allowed to add items but the return type becomes Object once again.

Everyone finds these rules confusing initially. It helps to remember that with an upper bound the set of matching types is infinite. So for example in the current case there is no limit to the number of classes we write that extend from Manager.

Since there is no 'bottom limit' to the set of types we cannot add a Manager in case the type of T is Director and we cannot add a Director in case someone derives a new class such as CEO. With a lower bound the set of possible types is constrained, in this case to Manager, Employee and Object. So we can insert with confidence but not extract.

For this reason the guideline is to use an upper bound when we want to access items and a lower bound when we want to add them. Joshua Block in Effective Java invented the PECS mnemonic (short for producer extends, consumer super) to help remember this idiom.

Generics and State Transitions

What we are not used to doing with Generics is to think of them as a way to validate business rules within the object itself.

For example let's say that within our Employee type there are states we wish to model, these being allocated, available and sick. The first two states can be considered as sub-states of a working state. The methods of our type will transition the instance between these states, as shown in the following UML statechart:

Of special interest to us is the fact that certain methods may only be invoked on an employee object when it is in a particular state. For example we do not want to be able to allocate jobs of work to sick employees.

The Phantom Type Pattern in Scala

We can take the statechart above and implement it in Scala using bounded types.

The type declarations work as follows:

We create a Job class to represent jobs of work
We create an Employee type, made up of a generic class and companion object
The companion object declares a number of traits, none of which will ever be instantiated (thats the phantom bit)
The traits represent the states from our statechart
The traits all inherit from State and those other than Sick inherit from Working
The traits we will use directly are placed in a nested object for safety and auto-completion in the IDE
The companion object declares a single factory method for employees. The returned employee is in the Available state
Within the Employee type we will have a name field and methods to represent the state transitions
The class has a single private constructor, ensuring that new instances can only be created via the factory method
Employee objects will be immutable - each method that appears to change the state will return a copy instead

class Job(val name: String)

object Employee {
  sealed trait State
  object EmpStates {
    sealed trait Working extends State
    sealed trait Sick extends State
    sealed trait Allocated extends Working
    sealed trait Available extends Working
  }
  def employ(name: String): Employee[EmpStates.Available] = new Employee(name)
}

class Employee[S <: Employee.State] private(val name: String) {
  import Employee.EmpStates

  def assignJob[T >: S <: EmpStates.Available](job: Job): Employee[EmpStates.Allocated] = new Employee(name)

  def reassign[T >: S <: EmpStates.Allocated](): Employee[EmpStates.Available] = new Employee(name)

  def markAsSick[T >: S <: EmpStates.Working](): Employee[EmpStates.Sick] = new Employee(name)

  def markAsWell[T >: S <: EmpStates.Sick](): Employee[EmpStates.Available] = new Employee(name)
}

If you look at the methods of Employee you can see that each uses a type sandwich where we are constraining the type of the input.

For example in assignJob the T >: S <: EmpStates.Available type declaration is saying:

That there must be a type T which satisfies the lower type bound T >: S. That is to say T must be a super-type of S where S is the type of the current instance.
This type T must also satisfy the upper type bound S <: EmpStates.Available. That is to say T must be sub-type of Available (note the set of sub-types includes Available itself)

Each method returns a new Employee instance in an altered state. So for example assignJob returns an Allocated employee and markAsSick a Sick one.

We can now write code that uses the Employee type, in full confidence that the compiler will prevent us from making invalid state transitions.

Let's list some valid and invalid transitions before examining the code:

A new employee should be available, and can therefore be assigned jobs
Once an employee can been allocated a new job they cannot be allocated again
However an employee who has been reassigned can be assigned a new job
Employees who are working (with or without a job) can be marked as sick
Sick employees cannot have jobs assigned to them until marked as well

object Program {
  def main(args: Array[String]): Unit = {
    val jane = Employee.employ("Jane")
    println("We just employed " + jane.name)

    //OK - Jane is 'Available' by default
    val janeAllocated = jane.assignJob(new Job("Write the code"))

    //WILL NOT COMPILE - Jane already has a job
    //janeAllocated.assignJob(new Job("Run the tests"))

    //OK - Jane can be reassigned
    val janeAvailable = janeAllocated.reassign()
    val janeReAllocated = janeAvailable.assignJob(new Job("Run the tests"))

    //OK - Working people can become ill
    val janeSickOne = janeAvailable.markAsSick()
    val janeSickTwo = janeReAllocated.markAsSick()

    //WILL NOT COMPILE - You cannot go on sick leave
    // while you are already on sick leave
    janeSickOne.markAsSick()
    janeSickTwo.markAsSick()

    //WILL NOT COMPILE - You cant be assigned jobs
    // while you are on sick leave
    janeSickOne.assignJob(new Job("Write more code"))
    janeSickTwo.assignJob(new Job("Write more code"))

    //OK - Once you are back from sick leave
    // you can be assigned jobs again
    val janeWellAndAvailable = janeSickOne.markAsWell()
    janeWellAndAvailable.assignJob(new Job("Write more code"))
  }
}

As you can see from the comments the compiler itself will prevent you from making method calls that would result in invalid state transitions. Because this is all done at compile time, and the types representing states are never instantiated, there is no runtime penalty. Another way to think of this is as a version of the Design By Contract philosophy, championed by the Eiffel language and implemented in Java via assertions (since V1.4).

Conclusions and further directions

This post has presented a 'Hello World' example of Phantom Types in Scala. Heiko Seeberger's original post covers a second syntax, based on implicit parameters, which is shorter and gives clearer compiler messages. However it is much more magical and non-intuitive unless you've been spending quality time with Scala.

Phantom types can be represented more succinctly in languages further up the scale of functional purity, F# and Haskell being the main examples. A closely related area which is slowly gaining popularity is that of Type Driven Development (TDD), also known as Type First Development (TFD) to avoid confusion with the Agile technique.

In TFD you model the states of your type in a manner similar to what we have done, but the compiler and IDE provides first class support for specifying the various states. For example the IDE might provide a shortcut to generate a conditional for all the valid states that an instance might be in. Edwin Brady at St. Andrews has created Idris - the first language to directly support TFD, which went to version 1.0 in April and has an accompanying book.