A Cure for Primitive Obsession

What is Primitive Obsession?

To start with, primitives are the basic data types available in most languages. These include data types such as strings, numbers (int, floats), and booleans.

Primitive obsession is a code smell in which primitive data types are used excessively to represent your data models. The problem with primitives is that are very general. For example, a string could represent a name, an address, or even an ID. Why is this a problem?

They can't contain any model-specific logic or behaviour, meaning that any logic must be stored in the containing class. This means you end up with classes containing lots of unrelated logic, which violates the Single Responsibility Principle (read more here).
They lose their type safety. A string is a string. The compiler doesn't know if your passing it a string representing a name or an address. This means it is easy to accidentally assign a value to the wrong field, the code will run and compile fine, and you may not notice until the data is completely messed up.
- On a side note, an area that I have commonly experienced this in my career is with IDs. It's common to represent IDs as integers and if you are trying to perform some filtering logic on nested objects (each with their own id), it is surprisingly easy to accidentally use the wrong ID. The code compiles fine, and often appears to work to start with, but then weird bugs start cropping up that are really hard to track down.

Curing Primitive Obsession

Below is an example of a Person class that suffers from Primitive Obsession:

public class Person
{
    public Person(string id, string firstName, string lastName, string address, string postcode, string city, string country)
    {
        // initialisation logic
    }

    public string Id { get; set; }

    public string FirstName { get; set; }
    public string LastName { get; set; }

    public string Address { get; set; }
    public string PostCode { get; set; }
    public string City { get; set; }
    public string Country { get; set; }

    public void ChangeAddress(string address, string postcode, string city, string country)
    {
        // change address logic
    }
}

What is wrong here? Well we have a class that consists entirely of string properties. The constructor consists of a long list of string parameters - I can guarantee that at some point the wrong value will be assigned to the wrong parameter slot! We also have a method to change the address, but really this logic shouldn't be the responsibility of the Person class. Finally, the ID is also a string, so could accidentally be used as the ID for other types.

So what can we do?

Well the first thing we can look at is, are there any properties that make sense to be grouped together? A good test for this is to ask, which of these properties are likely to be updated together? In out example, it is clear that the Address, PostCode, City and Country fields should be grouped (if you move house, it is likely that all or most of these properties will be updated together). Let's refactor these properties into their own class then.

public class Address
{
    public Address(string address, string postCode, string city, string country)
    {
        // initialisation logic
    }

    public string Address { get; set; }
    public string PostCode { get; set; }
    public string City { get; set; }
    public string Country { get; set; }
}

public class Person
{
    public Person(string id, string firstName, string lastName, Address address)
    {
        // initialisation logic
    }

    public string Id { get; set; }

    public string FirstName { get; set; }
    public string LastName { get; set; }

    public Address Address { get; set; }
}

Already our Person class is looking much better. All of the logic to do with the address is now encapsulated in the address class, and we've managed to eliminate a lot of string parameters from the constructor.

You may have noticed that, overall, we still have the same number of string properties. This is fine as ultimately it is likely that you will need to store you data in a primitive. The important part of avoiding Primitive Obsession is encapsulating those primitives into well defined objects that actually represent their meaning.

The next thing that we need to deal with is the ID. To bring type safety back to the ID, we can create what is known as a strongly-typed ID. In short, this is just the primitive value wrapped in a container object specific to that entity. For example, an implementation of a PersonId object might look like (adapted from here):

public readonly struct PersonId : IComparable<PersonId>, IEquatable<PersonId>
{
    public string Value { get; }

    public PersonId(string value)
    {
        Value = value;
    }

    public static PersonId New() => new PersonId(Guid.NewGuid().ToString());

    public bool Equals(PersonId other) => this.Value.Equals(other.Value);
    public int CompareTo(PersonId other) => Value.CompareTo(other.Value);

    public override bool Equals(object obj)
    {
        if (ReferenceEquals(null, obj)) return false;
        return obj is PersonId other && Equals(other);
    }

    public override int GetHashCode() => Value.GetHashCode();
    public override string ToString() => Value.ToString();

    public static bool operator ==(PersonId a, PersonId b) => a.CompareTo(b) == 0;
    public static bool operator !=(PersonId a, PersonId b) => !(a == b);
}

public class Person
{
    public Person(PersonId id, string firstName, string lastName, Address address)
    {
        // initialisation logic
    }

    public PersonId Id { get; set; }

    public string FirstName { get; set; }
    public string LastName { get; set; }

    public Address Address { get; set; }
}

This is good as now the Person class uses it's own PersonId for the ID. There is no way it could be used accidentally as the ID for another type, as that will have it's own strongly typed ID.

However, you may have noticed that this PersonId definition is somewhat large, and it would be cumbersome to have to declare this for every strongly typed ID that you wish to create. To be honest, this hurdle is likely the reason that strongly typed IDs still aren't that common in C# development.

Fortunately, in C# 9, we have the new record type that makes defining strongly typed IDs much easier, so hopefully it's usage will become far more common (you can read more about records here).

// PersonId as a record
public record PersonId(string Value);

// how to initialise
var personId = new PersonId("my-id");

Yes, that's literally it. This declares a PersonId record with a string property called Value, which can be passed via the constructor. Records automatically base equality on the values of their properties, so there is no need to add anything else in the declaration.

Primitive Obsession in F#

Although, this article has focussed mainly on C#, and all the principles applied above can be used in F#, there is a unique feature in F# that I think is worth noting.

Discriminated Unions are a data type in F# (and many other languages, but not C#) that allow different types of data to be returned depending on the situation (to learn more about discriminated unions, please see here). A common use case is error handling:

type Result<'a> =
    | Data of 'a
    | Error of string

This above example, would allow a function to return some generic data type, but if there is an error, return a string.

However, a single-case discriminated union can act as a wrapper for a primitive type, and gives type safety to your functions. For example, we may have this Person record:

type Person = {
    FirstName: string;
    LastName: string;
    EmailAddress: string;
}

The problem here is that an email address isn't really just a string - they have a specific format and so we might want to design some specific validation logic for it. Because of this, it is best to encapsulate it in its own type. With single case discriminated unions this is really easy:

type EmailAddress = EmailAddress of string

type Person = {
    FirstName: string;
    LastName: string;
    EmailAddres: EmailAddress;
}

Now the Person record will only accept email address of the EmailAddress type.

Conclusion

In this article, I have introduced the concept of primitive obsession, what problems it may cause, and how to fix them. Hopefully, you can take away some of this information and use it in your own codebase to create safer, more maintainable code.