On Documentation – thinks that i thought

I have yet to work anywhere that had either a good documentation tool or a good documentation workflow, and as a result, I’ve yet to work somewhere that had good documentation. Indeed, based on my own personal experiences, there are only two kinds of docs: out-of-date and non-existent.

Because of this, I can’t say for sure what good tooling or workflows would look like, but if I squint just right at all the wrongness I’ve been exposed to, I can start to see the fuzzy outline of rightness taking shape. I’ll do my best to describe it here.

First and foremost, as anyone who’s done it can tell you, writing is hard. Both the tooling and the workflow for creating docs need to recognize that, and do their absolute best to reduce the friction involved in getting words from your head to the page. One of the biggest sources of this friction that I’ve personally encountered is categorization. Docs are usually just about as well organized as the kitchen junk-drawer, and often along the same principle: “as long as we can get the drawer closed, it’s fine”. This means that checking to see if there’s existing documentation on a subject, trying to update that doc if it does exist, and figuring out where to add it if it doesn’t exist, all carry some frustrating cognitive load. The fix to this is to have an agreed upon structure to the docs, whether it’s a hierarchy of categories or just an accepted tagging taxonomy. Knowing where to add new docs and where to find existing ones needs to be effortless.

In and ideal world, having an agreed upon structure would mean no one ever creates redundant documents. In the real world, tooling needs to account for human error, and help users identify possible duplicates when creating or updating docs. It also needs to help prevent duplicating data within documents, and from external sources: it should be at least possible, if not easy, to import data from other documents, source control, build systems, databases, etc. Every time you copy and paste data into a document from an external source, you’re violating DRY; “don’t repeat yourself” applies just as readily to documents and data as it does to code.

And in the spirit of not repeating yourself: there needs to be an agreed-upon style-guide. It seems pretty common for docs to have ad hoc conventions for differentiating code, shell commands, config options, etc. – and often those conventions aren’t even consistently observed within a single document, meaning you have to remake those style decisions every time they come up. Agreeing on a style-guide as a group means that those decisions get made once, and you just consult the guide as needed while you write. Even better would be if the tooling provides formatting help and document templates that make it easy to adhere to the guide.

Another major sticking point is that docs are often centralized in some neglected wiki or corporate CMS of some kind. Even the simplest changes require multiple round trips to the server, and there’s too much latency on every round-trip. You can’t use your preferred editor, and the editor you can use is crappy. A good documentation tool needs to treat online editing and offline editing as equals. Think what Github is to git: it’s easy to get things done online, and just as easy to get them done offline, even if you don’t have a network connection. Since offline editing should let you use the editor of your choice, we probably want to use a portable text format. Markdown is probably the path of least resistance there, though there are other options. If we go a step further and say it should be just as easy to read the docs offline as on, then we’ve covered our bases for when our centralized doc repository goes offline during a disaster and suddenly no-one can access the disaster recovery docs.

You might think having a revision history and being able to diff revisions would go without saying, but in practice it’s still often overlooked, so let’s state explicitly that this is a must have. Another must is a way to mark terms as being synonymous, and its counterpart, a way to disambiguate terms that have multiple meanings. It would also be great to have a system to suggest potential cross-references, i.e. when there’s a document title that matches what you’ve typed, you’re asked if you want to link to that document. I’ve only seen this feature once, but it was amazing and should be standard.

I’d also love to see some really first rate search become more common. Not just full-text search, which is starting to become the norm, but solid presentation of the search results: document title, a snippet showing the query match in context, but also all the relevant document metadata such as author, last modified time, category/taxonomy, etc. Further, metadata fields need to be searchable and sortable. Being able to search and sort on last modified time is a god-send when it comes to finding out-of-date docs, as is being able to search for docs authored by past employees. It’s also often handy information for the users reading your docs, as well.

And now that we’ve dealt with all of these hurdles to getting docs written and keeping them updated, we can get to the heart of the matter: documentation needs to be treated as a necessity. Code shouldn’t pass review if the commit doesn’t have an accompanying update for the docs. New department policies can’t be announced via terse emails, they have to be codified as actual documentation. But unless the sticking points from above are sanded down first you’ll have a hard time getting people to agree to these policies, and a harder time getting people to follow them. You can mandate whatever you want, but it absolutely won’t happen if following the mandate is too much of a hassle. Better to skip the hassle and pick tools that make good practice practicable.

Sadly, I don’t know of a tool set that encompasses even most of those features, let alone all of them, but there’s a small fortune waiting for the first person to put all these pieces together.