Saturday 17 September 2011

Immutable Code

Did I really say Immutable Code? Is it not rather Immutable Object?

Yes and no. Yes, because Immutable Code can be obviously stored in Immutable Objects. No, because Immutable Code is about code and not data. Oh wait, but we all know that code is data, don't we? So what is this thing and what is it good for?

Well, I must admit that this is a purely hypothetical, probably not new and somewhat vague idea. For demonstration purposes I will assume a programming language where a program is a list of definitions. What a definition really is, does not matter for the moment. It may be a function, a class, a global variable, anything that has a unique name and a body. Obviously definitions refer to each other using their names within their bodies.

What is the problem?

Any serious programmer knows that code changes over time. This is one of the most difficult things to deal with when you do real world programming. Programs are organized into files, directories, modules, libraries, executables, manifests, plugins, projects, etc. Programmers assign version numbers to these parts and specify which ones are compatible. This process is complicated, time consuming and error prone. Some operating systems provide tools to solve this problem at the library level. Unfortunately that is quite coarse-grained and very tedious to use.

Another difficulty is that programs became so complex that they can only be implemented through the collaboration of many people. Programmers work on different parts of the programs and these parts change incompatibly over time. Programmers also have different preferences in terms of versions, and they don't like backwards incompatible changes.

These are the points where Immutable Code could provide a completely different (and arguably better) solution. Unfortunately nothing comes for free and in this case the price is pretty high.

How much is it?

Code cannot be easily stored in plain text file formats anymore. It becomes difficult to find a format that remains human readable and editable. This mean that most of the current programming tools become unusable without dramatic changes. Editing code requires new tools that natively support the idea of Immutable Code. Even compilers may need to be updated, because source code might become so much different.

How does it work?

Obviously Immutable Code doesn't mean that you can't change code, that would be useless. It also doesn't mean that any time you press a key you are effectively creating a new definition.

Immutable Code does mean that any definition that is made available to others can not be changed afterwards. Published code is carved into stone.

The above statement has a number of very important implications. First of all, a definition now not only has a name and a body but it has a version too. Definitions no longer refer to each other using only their names. Every reference to a definition in all definition bodies must also specify the exact version of the target definition.

Notice that I did not say anything about what a version really is. In the simplest case it may be an integer number incremented with each public change. On the other hand a definition version may also specify the publisher, the date of publishing, the branch name, various publisher specified tags, etc.

What are the bad news?

Anytime you change a definition publicly and want that to be used in your own program, you must also change all of your definitions up to the program's entry point. Moreover, other people also need to update all of their definitions up to their programs' entry points if they want to use your new version.

In such an environment migrating a program to new versions of some used definitions becomes changing the references in your definitions from the old versions to the new, desired versions. This process may be done manually for every single reference, but that is really not what one would do. Luckily updating references could be automated by writing and publishing migration programs, so it could be more or less transparent.

This also means that the amount of publicly available code grows pretty fast, because all definition versions are available until somebody or some process makes them inaccessible. The well-known term garbage collection gets a new meaning with Immutable Code.

How does code look like?

How does code look like on the screen? It depends on who is looking at it and why. Code editors must be able to support hiding version information completely, so that we could get what we have today. Besides they must also be able to present code with version information included. This is the main reason why a pure text editor cannot be used effectively.

How does code look like on the disk? Well, I think it doesn't matter much. It could be text or it could be some binary format. Although storing code in text format probably makes the tool developers' life more complicated. Text files would need more complicated parsing/unparsing while keeping little benefit from the text editors' world.

What are the good news?

Programmers of this hypothetical programming language (and environment) can be sure that their programs do not change over time unexpectedly. For example, updating definitions required by a program definitely doesn't affect another program, even if it is using the very same definitions.

Moreover, assuming a distributed, persistent programming environment (just like a database) where programmers collaborate using Immutable Code, one can be sure that a program does not change be it run anywhere and anytime.

Notice that collaboration and version control is done on the definition level which may actually be as small as a single function definition. It becomes possible to use multiple incompatible versions of the same definition in different places of the program. Remember, programs are implemented by several collaborating people with different knowledge and preferences (in terms of definitions and their versions). This is a really fine-grained and effective collaboration between programmers be it global or local to a company.

Backward compatibility becomes pretty much different, because the old versions are always there and cannot be changed. Deploying a program from the test environment to the production environment becomes safe. As long as the main entry point has the same version, the program behaves the same. Reverting a production system to an earlier version is left as an exercise to the reader.

So what?

You may have noticed that some existing programming languages (e.g. the ones from the functional programming family) are more easily adaptable to this idea than others. I did not mention any of them on purpose, because that is not important to me. I know that this idea is far from even being roughly specified, but I think my point should be clear now.