The genesis of a perfect programming language

Ask a room of young programmers "what is the best programming language?" and each will immediately spring to the defence of his or her favourite development tool, loudly explaining how it triumphs over all others in all situations. This will continue until the elder statesman of the office, disturbed by the cacophony of youthful conviction, swings around from his terminal, clears his throat and to the hushed awe of his colleagues he announces that the best programming language is the best one for the job. He is right of course, but in the family tree of programming languages certain branches have been more fruitful than others, and it is these I would like to discuss.

If I were to design a programming language, my starting point would be to ask myself - what are my favourite programming language features? To avoid offence, I will present this as a list of features I admire, rather than features I don't. A list of dos, not do nots, for language designers. Before this though I will summarise the two most significant problems these language designers should tackle.

  • Parallelisability Of my laptop's 2 CPU cores and 72 GPU cores, it is tough to max out just the CPU, let alone unlock the power of the GPU. Mutexes, Semaphores, and CUDA make it possible, but unnecessarily difficult, and in the new age of multicores threading is too important to be left to the standard library. This isn't only relevant to backend languages, no application should ever again leave the user thumping their keyboard in frustration as what could easily be a background task locks up the interface. Garbage collection was once a fringe feature, and it is now unthinkable to leave it out of a language's core. Soon it will be the same with threading support.
  • Typed languages with the feel of a scripting language The well-trodden path to glory for an internet entrepreneur is to write version 1 of the next great product in Python, or Ruby, or Javascript, launch it to high acclaim, instantly run into scaling problems, and use VC money to hire a small army of C++ programmers with pony tails and faded star wars T-Shirts to beef up their backend.

    I've no complaint with this, but it leads to the mistaken impression that great applications are written in dynamic languages. They aren't. Great prototypes are written in dynamic languages, but the lasting replacement will be written in a dull typed language like Java/C++/C#. Facebook went as far to write HipHop to translate their many millions of lines of PHP into many more millions of lines more of C++ to avoid writing one to throw away. A language with Ruby's agility and C++'s suitability to large scale software would be a far better solution to this problem.

I love Scheme for its simplicity. It is nothing more than the minimal set of language constructs and predefined forms necessary for a language to be usable, yet somehow it achieves a terseness to rival a syntax heavyweight such as Perl. Whereas Lisp's industrial descendants - Scala and F# - arm programmers with the resuable software objects they need to build useful applications, Scheme focusses on supplying the building blocks for a programming language, and the flexibility to adapt it perfectly to your task - it is a hacker's language. So far my fondness with Scheme has coexisted with disappointment at its unsuitability for real world tasks, but the emerging R7RS standard, with its minimal core, new package system, and vast standard library looks set to provide us with a language true to the Scheme spirit but also usable for real applications.

As I mentioned above, a new language must focus on multithreading, and right now the leader in this regard is Erlang. Its creator, Joe Armstrong, realised that clever multithreading libraries treat the symptom, not the cause - shared state between threads. He wove the Actor model of concurrency into the Erlang language itself, importing techniques from functional programming to eliminate shared state and the entire class of associated threading bugs, creating what is without a doubt the sturdiest foundation for large server applications today.

My attraction to C++ is straightforward; I like native code. JIT compilers vastly increased the efficiency of the CLR and the JVM, so why not go the next step and move the native code compiler to the compilation stage? After all the "write-once run-anywhere" upside of managed code has been made redundant by web applications and javascript. Once again, this isn't uniquely a concern for back office software engineers scraping the last dregs of performance out of their big iron, nothing kills an interface like a spinning beachball, and it is a lot easier to write fast interface code in a fast language.

Much of my day to day programming is in Objective-C, and although I have grown fond of the language as a good balance between the usability of Java and the efficiency of C++ the only Objective-C feature I miss when coding in other languages is named arguments(e.g. [self drawRectOfSize:size atPoint:point inColour:colour]). Doxygen, rdoc, javadoc and similar tools do a great job of generating API documentation from source, but I prefer the self-documenting nature of Objective-C method calls. In fact I am so attached to the concept of self documenting code that I would be happy with compiler warnings for every variable and function name not composed from a valid word or phrase.

Notably I have passed over object orientation. It is a common and desirable feature, but object orientation for me serves two, often conflated, purposes. One is to encapsulate and package code so that teams of programmers in a large organisation can work together. With C++ a small group of senior coders can lay out interfaces and object models, leaving their junior counterparts to fill in the blanks. This is not unique to object oriented languages; Monads play this role in Haskell, and Erlang, another non-OOP language, has an elegant package system for this purpose.

Object orientation is also valuable as a mechanism for polymorphism. Sharing common interfaces across disparate datatypes is the foundation of code reuse today, but object orientation is just one approach to polymorphism, Haskell achieves it via typeclasses or type inference. Even C++, OOP's flagwaver-in-chief, now supports templating, which is effectively a competing compile time strategy for writing polymorphic code.

Templating is popular, but it is a poor man's version of the type inference of ML, Haskell, Scala et al. Consider the function int abs(int val) { if (val < 0) return -val; return val; } Is it really necessary to type the function's return type? why not let the compiler infer its type from the type of the input? Even better, why not leave out the type of the input and let the compiler instantiate all the necessary types from the following template abs(val) { if (val < 0) return -val; return val; } It is certainly a lot prettier than the C++ template version.

I have omitted a language's ecosystem from consideration - its standard library, its development tools, and the continuing evolution of the language. I have ignored these because although indescribably important to someone who wishes to use, rather than study, the language, the quality of a standard library isn't attributable to a design choice so much as it is to the corporate sponsor's deep pockets. C# and Objective-C are two examples of developing languages with great standard libraries and development tools, unsurprising considering their champions, Microsoft and Apple respectively.

The Opa project deserves an honourable mention for its efforts to end the practice of writing a web application's frontend in javascript, and the backend in a web application framework. At heart AJAX is an protocol for remote procedure calls, so why not code it as such? With Opa you write one application, designating some code to be executed on the server, and some to be executed on the backend. If this isn't enough, Opa is type inferred. It may be immature, but surely Opa is the future of web programming.

Other miscellaneous features I admire are: Python's use of whitespace to ensure that the layout of the sourcecode matches the structure of the software; Alice ML's elegant inclusion of parallel futures, an simple, but exciting approach to multithreading; Haskell's laziness - laziness is a mandatory trait for any good hacker, why not the language they code in too?

By compiling a list of features for my perfect programming language I have fallen victim to the same trap that the novice programmers did in my introduction. Instead of choosing a problem, and searching for the best solution, I am cherry-picking features from existing languages and laying the foundation for an uneasy mix of paradigms. Erlang may be an excellent example of how to multithread, but it is a better example of how a language designer has picked a problem - in this case multithreading - and pieced together a language around that need. A language designer should mimic the focus of Erlang's creators, not necessarily the features they chose to solve their problem.

Another excellent guiding principle for a language designer is a conviction that a language should shackle the programmer to good habits. The syntax of a good language compels the programmer to code "well". For example, Python enforces the use of whitespace for formatting, Objective-C ensures that all arguments to a function are documented, and Erlang guarantees that you can't write race conditions. I mentioned above that laziness is a trait of good hackers, so where possible the programming language should act as their chaperone and coerce them into good habits. This is the root of my preference for static type systems. If it can be sorted at compile time, whether it is a potential type conflict or a case of poorly laid out code, it should be sorted out at compile time.

This post was intended not so much as an article, but as an organisation of my thoughts. It is inconceivable a hacker could work with any language for a lengthy period of time and not dream about improving it, but I am increasingly convinced of the need for a multithreading-centric, statically typed, type inferred language that compiles to native code but possesses the feel, readability and programmer efficiency of Python. That said, I have had a Go tutorial on my reading list for some time now. Its Wikipedia description fits my criteria, maybe it is the one?

EDIT: corrected go! to go

Posted on 06 May 2012

Creative Commons License
Slide to code blog is licensed under a Creative Commons Attribution 3.0 Unported License.
Based on a work at http://slidetocode.com/blog/.