My DNA needs some refactoring

Recently, I had a personal epiphany when I realised that clean code and good code are very different. This fact struck me when thinking about DNA. To a programmer's mind the proteins in a cell make up a virtual machine, and DNA is the code they execute. Furthermore natural selection guarantees that this is "good code" in the sense that it is proven to be effective in the real world. The problem is this code is crap - I'm assured by Biologists that junk DNA was a poorly chosen term and that recent research is proving that these base pairs are useful, but even so our bodies are contain an appendix, a tailbone and other vestigial structures - known to programmers as cruft. In short, despite being the recipe for the biological marvel that is the human body, our genomes could do with a refactoring.

I'm joking a little, but the parallel with ugly, but useful, code in "real-world" programming is good. You dream up a piece of software, write 1.0, and the code is beautiful. You release it and it is useless, its incompatible with Word 97 files, so you put in a fix, it crashes in the presence of a virus scan, so you put in a fix, it is . . . and before you know it your fixes are overrunning your code, so you get to work straight away on refactoring. Unfortunately the emails keep coming and this is a battle you are destined to lose. Constant refactoring is important to keep your code maintainable, but it is unlikely to keep it beautiful. Never mind though; writing useful software is an exercise in dealing with the special cases, and inevitably your codebase will reflect that lesson. These hacks and kludges are the code equivalents of your appendix and tailbone. Their existence is unjustifiable, in fact they are better out than in, but they go hand in hand with any well developed real world system.

I can already hear your complaints - this comparison is unfair, if our cells are a virtual machine, then DNA is more accurately compared to object code, not the source code it is compiled from - object code needn't to be tidy, it needs to be efficient. Good object code is efficient and good source code is beautiful. This leads me to my true, and admittedly obvious, epiphany. The sole job of a programming language is to bridge the gap between ugly, but efficent, object code, and beautiful source code.

I think this obvious point is often lost when discussing programming languages, especially when people are evangelising for functional languages. No one cares that a functional language makes it possible to write provably correct code, we do care about the flexibility that a functional flavour adds to a language, especially when a language is flexible enough for Domain Specific Languages.

Haskell's Parsec module was my lure into the world of functional programming, and it is still the best example I can give for the "why should I learn a functional language?" question. It is a parser library which allows you to rewrite the composition of grammatical elements as the composition of functions and build a parser directly from BNF without needing to rely on an external system to autogenerate code. The beauty is that it doesn't feel like a parser library, it feels like a programming language specifically for writing parsers.

The first parser you write with Parsec is the first parser you will write in which your code will reflect the structure of the data being parsed, not the highly efficient finite state machine it will compile to. If you want to know why Haskell is a "better" programming language than C/C++/C#/Java, this is why. In "curly braces and classes" family of languages you have the choice between writing the state machine from scratch, autogenerating your code with bison/lemon or if you are lucky a cunning but ugly compile time hack (sorry Boost::Spirit). In Haskell your code is a work of art. It reflects the grammar being parsed which makes it concise and self documenting. Nevertheless the compiler bridges the gap between this abstract elegance and the real world by churning out a blazingly fast executable.

Posted on 8 October 2012

Based on a work at

Slide to code blog is licensed under a Creative Commons Attribution 3.0 Unported License