This is the fourth in a series of posts covering concepts in functional programming:
This time around we're talking about phantom types.
Important Disclaimers
Please note that:
- This post is based on a famous article by Heiko Seeberger. His approach is the best I'm aware of for writing Phantom Types without needing to resort to external libraries or move to more purist languages like Haskell. So credit where it's due...
- At time of writing the IntelliJ IDE reports errors when using the 'type sandwiches' described below. But the code compiles as it should.
Warmup - Generics and Type Safely
We're all familiar with the idea of using Generic types to enhance the safety of our code. In modern compiled languages type parameters are used to control what inputs are allowed into functions and hence what operations are permitted.
The standard options are set out in the Java example below:
public class Program {
public void demo() {
tst1(new LinkedList<Employee>());
tst1(new LinkedList<Manager>());
tst1(new LinkedList<Director>());
//tst2(new LinkedList<Employee>()); //Will not work
tst2(new LinkedList<Manager>());
tst2(new LinkedList<Director>());
tst3(new LinkedList<Employee>());
tst3(new LinkedList<Manager>());
//tst3(new LinkedList<Director>()); //Will not work
}
private void tst1(List<?> input) {
//input.add(????); //We can't add anything safely here
Object obj = input.get(0); //The returned type can only be Object
}
private void tst2(List<? extends Manager> input) {
//input.add(????); //We still can't add anything safely
Manager emp = input.get(0); //But the returned type can be Manager
}
private void tst3(List<? super Manager> input) {
input.add(new Manager()); //We can add managers
input.add(new Director()); //We can add directors
Object obj = input.get(0); //But the returned type can only be Object
}
}
When we use the wildcard by itself in tst1
we are saying that any type of input is permitted, which means that nothing can be added and only references of type Object
are returned.
When we use an upper bounded wildcard (via the extends
keyword) in tst2
we will still be unable to add items, but now the return type can be Manager
. So obviously all the methods of the Manager
type and its super-types are available.
Finally when we add a lower bounded wildcard (via the super
keyword) in tst3
we will be allowed to add items but the return type becomes Object
once again.
Everyone finds these rules confusing initially. It helps to remember that with an upper bound the set of matching types is infinite. So for example in the current case there is no limit to the number of classes we write that extend from Manager
.
Since there is no 'bottom limit' to the set of types we cannot add a Manager
in case the type of T is Director
and we cannot add a Director
in case someone derives a new class such as CEO
. With a lower bound the set of possible types is constrained, in this case to Manager
, Employee
and Object
. So we can insert with confidence but not extract.
For this reason the guideline is to use an upper bound when we want to access items and a lower bound when we want to add them. Joshua Block in Effective Java invented the PECS mnemonic (short for producer extends, consumer super) to help remember this idiom.
Generics and State Transitions
What we are not used to doing with Generics is to think of them as a way to validate business rules within the object itself.
For example let's say that within our Employee
type there are states we wish to model, these being allocated, available and sick. The first two states can be considered as sub-states of a working state. The methods of our type will transition the instance between these states, as shown in the following UML statechart:
Of special interest to us is the fact that certain methods may only be invoked on an employee object when it is in a particular state. For example we do not want to be able to allocate jobs of work to sick employees.
The Phantom Type Pattern in Scala
We can take the statechart above and implement it in Scala using bounded types.
The type declarations work as follows:
- We create a
Job
class to represent jobs of work - We create an Employee type, made up of a generic class and companion object
- The companion object declares a number of traits, none of which will ever be instantiated (thats the phantom bit)
- The traits represent the states from our statechart
- The traits all inherit from
State
and those other thanSick
inherit fromWorking
- The traits we will use directly are placed in a nested object for safety and auto-completion in the IDE
- The companion object declares a single factory method for employees. The returned employee is in the
Available
state - Within the Employee type we will have a
name
field and methods to represent the state transitions - The class has a single private constructor, ensuring that new instances can only be created via the factory method
- Employee objects will be immutable - each method that appears to change the state will return a copy instead
class Job(val name: String)
object Employee {
sealed trait State
object EmpStates {
sealed trait Working extends State
sealed trait Sick extends State
sealed trait Allocated extends Working
sealed trait Available extends Working
}
def employ(name: String): Employee[EmpStates.Available] = new Employee(name)
}
class Employee[S <: Employee.State] private(val name: String) {
import Employee.EmpStates
def assignJob[T >: S <: EmpStates.Available](job: Job): Employee[EmpStates.Allocated] = new Employee(name)
def reassign[T >: S <: EmpStates.Allocated](): Employee[EmpStates.Available] = new Employee(name)
def markAsSick[T >: S <: EmpStates.Working](): Employee[EmpStates.Sick] = new Employee(name)
def markAsWell[T >: S <: EmpStates.Sick](): Employee[EmpStates.Available] = new Employee(name)
}
If you look at the methods of Employee
you can see that each uses a type sandwich where we are constraining the type of the input.
For example in assignJob
the T >: S <: EmpStates.Available
type declaration is saying:
- That there must be a type
T
which satisfies the lower type boundT >: S
. That is to sayT
must be a super-type ofS
whereS
is the type of the current instance. - This type T must also satisfy the upper type bound
S <: EmpStates.Available
. That is to sayT
must be sub-type ofAvailable
(note the set of sub-types includesAvailable
itself)
Each method returns a new Employee
instance in an altered state. So for example assignJob
returns an Allocated
employee and markAsSick
a Sick
one.
We can now write code that uses the Employee
type, in full confidence that the compiler will prevent us from making invalid state transitions.
Let's list some valid and invalid transitions before examining the code:
- A new employee should be available, and can therefore be assigned jobs
- Once an employee can been allocated a new job they cannot be allocated again
- However an employee who has been reassigned can be assigned a new job
- Employees who are working (with or without a job) can be marked as sick
- Sick employees cannot have jobs assigned to them until marked as well
object Program {
def main(args: Array[String]): Unit = {
val jane = Employee.employ("Jane")
println("We just employed " + jane.name)
//OK - Jane is 'Available' by default
val janeAllocated = jane.assignJob(new Job("Write the code"))
//WILL NOT COMPILE - Jane already has a job
//janeAllocated.assignJob(new Job("Run the tests"))
//OK - Jane can be reassigned
val janeAvailable = janeAllocated.reassign()
val janeReAllocated = janeAvailable.assignJob(new Job("Run the tests"))
//OK - Working people can become ill
val janeSickOne = janeAvailable.markAsSick()
val janeSickTwo = janeReAllocated.markAsSick()
//WILL NOT COMPILE - You cannot go on sick leave
// while you are already on sick leave
janeSickOne.markAsSick()
janeSickTwo.markAsSick()
//WILL NOT COMPILE - You cant be assigned jobs
// while you are on sick leave
janeSickOne.assignJob(new Job("Write more code"))
janeSickTwo.assignJob(new Job("Write more code"))
//OK - Once you are back from sick leave
// you can be assigned jobs again
val janeWellAndAvailable = janeSickOne.markAsWell()
janeWellAndAvailable.assignJob(new Job("Write more code"))
}
}
As you can see from the comments the compiler itself will prevent you from making method calls that would result in invalid state transitions. Because this is all done at compile time, and the types representing states are never instantiated, there is no runtime penalty. Another way to think of this is as a version of the Design By Contract philosophy, championed by the Eiffel language and implemented in Java via assertions (since V1.4).
Conclusions and further directions
This post has presented a 'Hello World' example of Phantom Types in Scala. Heiko Seeberger's original post covers a second syntax, based on implicit parameters, which is shorter and gives clearer compiler messages. However it is much more magical and non-intuitive unless you've been spending quality time with Scala.
Phantom types can be represented more succinctly in languages further up the scale of functional purity, F# and Haskell being the main examples. A closely related area which is slowly gaining popularity is that of Type Driven Development (TDD), also known as Type First Development (TFD) to avoid confusion with the Agile technique.
In TFD you model the states of your type in a manner similar to what we have done, but the compiler and IDE provides first class support for specifying the various states. For example the IDE might provide a shortcut to generate a conditional for all the valid states that an instance might be in. Edwin Brady at St. Andrews has created Idris - the first language to directly support TFD, which went to version 1.0 in April and has an accompanying book.
If you're enjoying these posts, why not subscribe to our newletter or check out our Fast-track to Scala and Scala Programming for Java Developers courses.