Versioning Codable types in Swift apps (without tearing your hair out)

If you work on codebases any larger than a few lines, at some point you will have to deal with data stored in some sort of document. Inevitably, this type will change, and you’ll need to add, remove, and rearrange fields—while still needing to retain compatibility with stored documents in earlier versions of the format.

This is a problem I’ve recently encountered on my side project, Unspool. Even though I’m the only person ever to have used the app, I still found myself iterating on the data structure to make more logical sense—leading to decoding errors and un-openable documents, along with lots of boilerplate that wouldn’t have been sustainable in the long run.

So for this month’s Unspool devlog post, I’m going to look at how I arrived at an incremental migration solution for document model versioning using Swift’s type system.

I’ve open-sourced the code I wrote for it as a Swift package, and you can now use it as VersionedCodable (I released version 1.0 today!)

The documentation in the package should provide enough for you to get started. But if you want a closer look at the inner workings, or are interested in why and how I arrived at this solution, then read on…

Problem statement

The problem we’re trying to solve here is around being able to change a document type whilst retaining compatibility with existing documents. For our purposes, a document is:

  • A data structure that’s encoded in a specific format, or schema
  • Must be openable and usable at some point in the future—even though by that time, the document schema may have changed
  • Can’t be migrated to a new schema all at once, and so need to be migrated opportunistically when you load them. This could be for any number of reasons:
    • They don’t all exist in a central location. This covers documents opened by apps on your computer or phone (e.g. word processor documents, image editor files, spreadsheets, etc.) that live in cloud storage, or on the machine’s internal/external storage, and can be copied around, archived, sent via email, hosted on the Web, etc.
    • They do all exist in a central location, but you still can’t migrate them all at once—for instance, because you can’t afford to take the system down for 2 hours to migrate everything and test it worked, or because you have a heavily distributed system in high demand where things are constantly changing. This could include any non-relational database or storage system—e.g. patient profiles for a doctor’s surgery, or order details for a retailer.

For the sake of argument, let’s imagine we’re building an app to compile collections of people’s favourite scraps of poetry, and will then save these in a JSON file. If we were quickly knocking out a proof of concept, the initial version of our document format might look like this:

{
	"author": "McGonagall, William Topaz",
	"poem": [
		"And the morning I sailed from the city of New York",
		"My heart it felt as light as a cork."
	]
}

In the next version, we might add a star rating system. We might also decide that we shouldn’t store our poem as an array of lines, instead storing the poem as a big string where we preserve the poet’s original formatting (e.g. representing new lines with the conventional character \n):

{
	"author": "McGonagall, William Topaz",
	"poem": "And the morning I sailed from the city of New York\nMy heart it felt as light as a cork.",
	"starRating": 1
}

And by the following version, we might decide that nobody is using star ratings accurately (everyone’s rating everything either 1 or 5 stars) so instead we just want to store whether you love it, hate it, or have no opinion:

{
	"author": "McGonagall, William Topaz",
	"poem": "And the morning I sailed from the city of New York\nMy heart it felt as light as a cork.",
	"rating": "hate"
}

The Codable type for this in Swift is relatively straightforward:

struct Poem: Codable {
	var author: String
	var poem: String
	var rating: Rating
	
	enum Rating: Codable, String {
		case love, meh, hate
	}
}

This will, however, not decode if we then present it with an older version of the type…

let data = """
{
	"author": "Anonymous",
	"poem": "An epicure dining at Crewe...",
	"starRating": 1
}
""".data(using: .utf8)!

let poem = try JSONDecoder().decode(Poem.self, from: data) // throws a `DecodingError`

The ‘easy’ solution: Restricting yourself to adding fields and deprecating old ones

Three versions of the Poem type next to each other. Each one has some new field added and an old one deprecated. From the first to the second, there's a new field called poemString which is a String, and poem (an array of Strings) has been deprecated. From the second to the third, a new field called rating appears, replacing the previous starRating optional integer which is now deprecated.

This might seem like an attractive solution at first glance, but quickly becomes a source of developer frustration and technical debt—particularly in large types with regular changes. Even for our poem type, where we’ve made a grand total of two changes, a backwards-compatible definition is already unwieldy:

struct Poem {
	var author: String
	var poemString: String
	var rating: Rating

	@available(*, deprecated, message: "Please use `rating` instead")
	var starRating: Int?

	@available(*, deprecated, message: "Please use `poemString` instead and treat poems as one single string")
	var poem: [String]
	
	enum Rating: String, Codable {
		case love, meh, hate
	}
}

This also does not handle actual mappings between the old and new types, including instances the contents of a field might change in format, even though the name does not. In the above example, I’d still need to specify on decoding what to do with a starRating—and I’d still need to handle poems separated by slashes and newlines. So we now need to override the synthesised Codable conformance, which then means the faff of specifying coding keys:

extension Poem: Codable {
	init(from decoder: Decoder) throws {
		let container = try decoder.container(keyedBy: CodingKeys.self)
		
		author = try container.decode(String.self, forKey: .author)
		if let poem = try container.decodeIfPresent([String.self], forKey: .poem) {
			poemString = poem.joined(separator: "\n")
		} else {
			poemString = try container.decode(String.self, forKey: .poemString)
		}
				
		// Handle legacy star ratings. If there's a star rating persisted, that
		// takes priority over anything in the `rating` field.
		if let starRating = try container
			.decodeIfPresent(Int.self, forKey: .starRating) {
			switch starRating {
			case 0...2:
				rating = .hate
			case 3:
				rating = .meh
			case 4...5:
				rating = .love
			default:
				throw DecodingError
					.dataCorruptedError(forKey: .starRating,
										in: container,
										debugDescription: "We shouldn't have a starRating outside 0...5")
			}
		} else {
			rating = try container.decode(Rating.self, forKey: .rating)
		}
	}

	enum CodingKeys: CodingKey {
		case author
		case poem
		case poemString
		case rating
		case starRating
	}
}

For these two, somewhat minimal migrations, I’ve now had to write around 40 lines of code. These decode(from:) functions are, by their nature, big lumps of logic. This makes it hard to reason about what this initialiser actually does at first glance—and that’s with only a few, relatively small changes!

Clearly, in the long run, this becomes unsustainable. It also makes it a pain from a developer experience perspective. If I’m adopting this type for the first time, I still have to comb through deprecated fields to find the one I’m actually supposed to use/populate, even if the developer has provided deprecation warnings. In many projects where people are accustomed to ignoring little yellow warnings, this can be a problem.

Applying versioning to our schema

A good first step is to start thinking about the version of the type. Whenever we make a change that isn’t backwards-compatible, we increase that version number. When we encode the type (to be written to storage) we should store the version number, so we then know what type it is on decoding. For instance:

{
	"author": "McGonagall, William Topaz",
	"poem": "And the morning I sailed from the city of New York\nMy heart it felt as light as a cork.",
	"rating": "hate",
	"version": 3
}

But we still have to think about how we’re going to deal with migrations. An obvious way to do this would be to define a protocol that specifies how you make your current type out of the previous version:

protocol MigratableToCurrentPoem {
	var asPoem: Poem { get }
}

extension PoemV2: MigratableToCurrentPoem {
	var asPoem: Poem {
		// this is where you do your migration
	}
}

However, this may end up giving you more work than you realise. Each time you want to change the type, you end up having to change all the previous migrators (and write a new one for the new version of the type.) This is a pain when you only have a few versions to deal with, but could quickly become unmanageable with, say, 50 different versions.

In principle, the migration is a sequential operation—a migration from V1 to V5 should be functionally the same as migrating from V1, to V2, to V3, to V4, to V5. So we can use this property of a migration process to do incremental migrations, saving ourselves work and helping us reason about our migrations.

How incremental migrations work

Incremental migration, for our purposes, means upgrading the data in stages. This means we write migration logic to transform each old version of the schema to the next-newest one. For instance, we would write logic to migrate from version 1 to version 2, from version 2 to version 3, and so on.

The three Poem types linked by green arrows. To get from the first to the second, you migrate by joining the Poem field into a single string. To get from the second to the third, you turn starRating into a Rating (from 0-2 make it ‘hate’, for 3 or nil set it to ‘meh’, and for 4-5 set it to ‘love’.)

This reduces the developer workload for a new version to the following:

  • Define the new version of the type
  • Write a test to be sure you can decode the new version
  • Write a migrator from the previous version

I achieved this in Swift with a new protocol called VersionedCodable, which extends Codable. To conform to it, you specify the previous version as an associated type, the current version number, and an initialiser to that accepts the previous version you just defined. For instance:

extension Poem: VersionedCodable {
	static let version: Int? = 3
	typealias PreviousVersion = PoemV2

	init(from old: PoemV2) {
		// handle your migration here
	}
}

We also need to account for the oldest version of the type, where there are no previous versions to try decoding. In this case, you use a special type, NothingEarlier:

extension PoemV1: VersionedCodable {
	static let version: Int? = 1
	typealias PreviousVersion = NothingEarlier
	// No need to specify an initialiser here
}

Inside the guts of VerisonedCodable, on decoding, it does the following:

  1. Decodes the version key from the data (which may be nil, to account for documents that may have been stored before you adopted VersionedCodable.)
  2. If version matches the version field on the type, it decodes it.
  3. If PreviousVersion is NothingEarlier, it throws an error because we have an unsupported version.
  4. Otherwise, it repeats the process by comparing the version of PreviousVersion with the key in the data, decoding, delegating further back along the PreviousVersion chain, etc.

You can have as many previous versions as the call stack will allow. I haven’t tried to break it in this way yet. 😉

This diagram might help you to understand how the relationships between different types work:

Three type definitions next to each other: Poem, PoemV1, and PoemPreV1. Poem has a `static let version = 2` and has a reference to PoemV1 as its `PreviousVersion`. PoemV1's version is 1 and its PreviousVersion is PoemPreV1, whose version is nil. There's also an initializer that allows a PoemV1 to be initialized from a PoemPreV1, and a PoemV2 from a `PoemV1`.

Don’t forget the tests!!

Although VersionedCodable, and Swift’s type system, makes it easy enough for you to do step-by-step migrations, they aren’t a substitute for a good testing strategy.

Whenever you’re decoding anything, it’s always a good idea to have some test cases to be sure you can decode something that meets the spec. The same goes here, except it’s also a good idea to make sure you can decode each previous version of your type; that way, you have a suite of confidence tests that can help you be sure you’re not going to suddenly be unable to open anyone’s old document.

I strongly recommend turning on test coverage in Xcode, and checking to see if your VersionedCodable initialisers are covered by the tests—if not, you should really get some tests in to be sure you’re not going to break any of your migration logic.

Observations about doing this with the Swift type system

This is a pattern that I’ve seen and used many times before, but this is the first chance I’ve had to try implementing it in Swift.

Even after the best part of a decade of writing and enjoying Swift, I’m always surprised by how much you can encode into the type system. Rather than storing a list of known version numbers and their associated types, I can specify these in extensions. I was able to provide a default initialiser where the PreviousVersion is NothingEarlier, to reduce boilerplate.

And the joy of Swift’s type-safe encoding and decoding is that most real clangers will get caught at compile time; you usually won’t end up accidentally encoding something with the wrong key by accident. This, combined with some comprehensive test coverage, has given me the confidence to make changes to Unspool’s document model without making it feel like a chore.

One thing I did find myself wishing for was some kind of optional associated type. The reason NothingEarlier exists is solely to make the compiler work, and give us a type to assign to PreviousVersion where there is no previous version. I did toy with using Never for this, but didn’t like that it would involve writing extensions on Never (making it conform to VersionedCodable) that then pollute the whole namespace, so I stuck with a new supporting type.

This is also the first time I’ve open-sourced a Swift package, and I was pleasantly surprised by how easy it was to set up on GitHub Actions, and by the quality of Swift-DocC, which truly made writing documentation feel like less of a chore. It was relatively easy to get my VersionedCodable package’s documentation to build and deploy to GitHub Pages every time I tagged a release.

I’m now happy enough with the API I have to consider it stable, and I released version 1.0.0 earlier today (only to find a performance issue which I then resolved in v1.0.1.) It’s capital-F Free software under the MIT licence, so you’re welcome to use VersionedCodable in your own projects, fork it, extend it, submit issues/pull requests/whatever.