Programming Musings

In this post I'm going to talk about projectional editors, how they work and what are they good for. I also present a simple prototype (implemented in Common Lisp) and a raw demonstration video.

Let me talk a little about text editors first

We all know that text editors are great tools for editing arbitrary text. The simplest data structure used by a text editor (e.g. Microsoft Notepad) is an ordered sequence of characters including line breaks. A somewhat more sophisticated data structure extends this with style related data such as colors and fonts. A word processing editor (e.g. Microsoft Word) further extends the data structure with chapters, bulleted lists, tables, pictures, etc.

A projectional editor differs from a text editor in that it is a general purpose editor with respect to the data structure being edited, and also how the data structure is edited. According to this view, a text editor is basically a projectional editor specialized for the text document data structure. The simplest projection that a text editor provides is that it writes characters to the screen from left to right. If line breaks are also included in the document, or if word wrapping is used, then the editor also writes characters to the screen from top to bottom. There are several text editors that provide more complicated projections, but only a few of them provide multiple different projections.

The term structure(d) editor is often used to describe editors that are cognizant of the underlying data structure that is being edited. I think this terminology is somewhat misleading in that it suggests that text editors are not structured editors. People might say that a text editor is not cognizant of the edited data structure, because the actual meaning is hidden in the user's mind. I think this distinction is wrong, because no matter how clever an editor is, there can always be an extra projection in the user's mind. The editor has no way of knowing what's in the user's mind. According to this view a text editor is a perfectly valid structured editor for the text document data structure.

So what is a projectional editor then?

It's a general purpose editor (framework) that provides multiple ways (notations) to edit arbitrary data structures. The difficulty lies in the implementation of the projectional editor, because it needs to be in par with the very high usability of text editors. In my terminology a projectional editor provides all the features and usability of text editors and word processing editors.

It's called projectional editor, because it provides its editing capabilities using bidirectional multistage projections between problem domains. The more domains it knows, the more kind of documents it can edit. The more projections it knows, the more alternative ways of editing and notations it provides.

I'd like to stress the importance of that it should be able to provide the same textual editing behavior that standard text editors have. This should be possible for all kind of structured documents that have textual representation. For example, it should provide editing for XML structured documents in the same way as text editors do as far as the user is concerned. That is, it should support invalid syntax, incomplete syntax, etc. while still having all the benefits of projectional editors.

What does the projection editor offer compared to text editors?

It allows to reuse existing generic projections that can be applied on the documents of any problem domain. The important thing to note here is that the projection is not simply applied to the document only once. It is rather applied to the document continuously while the document is being edited. Here are some examples:

filtering - it may be used to focus only some parts of the document while the rest is hidden (e.g. show the chapter titles only from a text document, show some elements from an XML document, etc.)
ordering - it may be used to sort the content (e.g. sort the functions in alphabetical order in Java code document, sort the calendar events in chronological order, etc.)
grouping - it may be used to arrange the content into groups (e.g. group the rows of a table according to the value in the cells of the first column, etc.)

It allows to reuse existing projections by building on top of them. For example, there might be multiple different tree projections that allows editing the tree data strcutures through textual (e.g. using separators and delimiters) or graphical notations. The new projection can reuse them allowing to use the multiple different notations for the new problem domain.

It allows to reuse existing generic documents as part of the documents of any new problem domain.

text - stylized text similar to word processing documents
table - two dimensional arrangements
tree - collapsible hierarchical drawings
graph - drawings with nodes and edges

It allows to reuse existing domains by mixing them in the new problem domain. For example, there might be documents and projections for word processing documents with text style and word wrapping. The new document can reuse them by composition allowing the already known notations to be part of the new problem domain.

How does the editor work internally?

It communicates with the user in a similar way to the well known and widely used read-eval-print loop. Simply speaking the REPL is a very powerful and generic way to talk to the computer, but I think Wikipedia explains it better than me. The main difference from the standard REPL is that it has to handle different kind of input and output devices simultaneously. These IO devices include among others a keyboard, a mouse and a graphical display as a bare minimum.

The standard REPL prints characters to a single character output device. The generalized REPL prints (projects) the editor's state to all output devices simultaneously. For example, printing to a graphical display needs a projection from the domain being edited to the graphics domain (the domain of the output device).

The standard REPL reads characters from a single character input device. The generalized REPL reads (projects) from all input devices simultaneously to the editor's state. For example, reading from a keyboard needs a projection from the keyboard domain (the domain of the input device) to the domain being edited.

What is a domain after all?

It describes the data structures of documents, selections and operations that belong to a problem domain. Domains are mainly important, because they can be analyzed, described and implemented independently. Here are some problem domain examples:

data structures - boolean, integer, number, string, pair, tuple, list, tree, graph, set, map, etc.
data formats - XML, JSON, ASN.1, etc.
programming languages - C code, Java code, Lisp code, etc.
text related - paragraphs, styled text, word processing with chapters, spreadsheet, presentation, graphics, etc.
databases - relational data, object oriented data, etc.

You might surprised that integer is listed as a domain. In fact, an integer is also a compound data structure consisting of digits or bits if you like, and as such it can be edited.

What is a document data structure?

It describes the possible states of domain specific documents. Here are some document data structure examples:

integer domain - an integer consists of a sequence of bits
string domain - a string consists of a sequence of characters
list domain - a list consists of a sequence of arbitrary elements
XML domain - an XML element consists of a list of XML attributes and a list of XML child elements, an XML attribute consists of a name string and a value string, etc.
graphics domain - a point consists of a 2D coordinate and a color, a line segment consists of two 2D coordinates, a color and a thickness, etc.

What is a document?

It's a piece of data that is being edited. Here are some document examples:

true, 12, "", "hello", "hello world", (1, 1, 2, 3, 5), etc.
, {"name": "levy" birthday: "08/05"}, etc.
public static void main(String[] args) { }, int main() { return 0; }, etc.

What is a selection data structure?

It describes the possible states of domain specific selections. Here are some selection data structure examples:

string domain - a character position selection consists of a string reference and an integer in the range [0, length]
XML domain - an element range selection consists of a parent element reference and two integers in the range [0, child count)
Java code domain - a method selection consists of a method reference

What is a selection?

It specifies some part of a document that is being edited. Here are some selection examples:

integer domain - a bit selection referring to the 1st bit of the integer 42
string domain - a character position selection referring to the 3rd position (between the 'l' characters) of the string "hello world"
graph domain - a subgraph selection referring to vertices A, B and C including the edges between them

What is an operation data structure?

It describes the possible states of domain specific operations. Here are some operation data structure examples:

integer domain - bit negation operation consists of a bit selection
string domain - delete character operation consists of a character selection
list domain - replace element range operation consists of an element range selection and a replacement list document

What is an operation?

It specifies a way of modifying a document using selections and other documents. Here are some operation examples:

integer domain - a negate bit operation of a bit referred by a bit selection.
graph domain - an add edge operation between two vertices referred by two vertex selections.

What is a projection?

Simply speaking it's a transformation from one domain to another. A projection transforms the domain specific data structures of documents, selections and operations.

The projection is used by both the printer and the reader in two distinct algorithm. These algorithms are purely functional, they don't have any side effect on the document, or the input, or the output devices. This aspect is very important, because it makes them very good candidates for various optimizations such as constraint based change propagation and parallel execution.

There are two different kind of projections: primitive and compound projections. Primitive projections always directly transform between two problem domains. Here are a few examples:

styled string to graphics - transforms a styled string to graphics document directly related to API calls
tree to string - transforms a tree of nodes to a flat string using delimiters and separators
XML to tree - transforms an XML document to a tree with attributes and elements being nodes

On the other hand, compound projections require other projections to combine them (primitive or compound).

sequential - applies projections one after the other
parallel - applies projections on the same input resulting in a list of outputs
recursive - applies one projection recursively
type dispatching - applies the selected projection based on the type of the input
predicate dispatching - applies the selected projection based on an arbitrary predicate on the input
nested - applies projections one after the other nestedly

Prototype

The projectional editor prototype is implemented in Common Lisp. If you want to take a look at it you can browse the source online. If you want to actually get the source, then you will need the git version control system.

Demonstration

The following video demonstrates the projectional editor for several different problem domains. The currently supported domains are:

string - a sequence of characters similar to the primitive types found in programming languages
styled text - a sequence of characters with colors and fonts organized into a list of paragraphs
graphics - point, line, rectangle, circle, text, image similar to what can be found in the graphics APIs of programming languages
list - a one dimensional array of arbitrary objects
table - a two dimensional array of arbitrary objects
XML - element, attribute, text, etc.
JSON - null, boolean, number, string, array, object, etc.
Java code - class, method, argument, variable, for statement, function call, etc.
Common Lisp code - function definition, function call, argument, lexical binding, etc.
and combinations of the above

The currently supported projections are:

string to styled string
styled string to string
styled string to graphics
tree of styled string to flat styled string
table of styled string to flat styled string
walked lisp form to lisp form
lisp form to tree of styled string
XML to tree of styled string
JSON to tree of styled string
Java code to tree of styled string
combining multiple projections sequentially
nesting multiple projections

This video demonstrates how the editor keeps track of the selection while navigating in the document. The domain specific selection is represented as a Common Lisp code fragment. If the selection is applied to the document, then it returns the actual thing (or a representation in cases such as character position) that is selected. The document is projected to the graphics domain in a domain specific way.

Did I really say Immutable Code? Is it not rather Immutable Object?

Yes and no. Yes, because Immutable Code can be obviously stored in Immutable Objects. No, because Immutable Code is about code and not data. Oh wait, but we all know that code is data, don't we? So what is this thing and what is it good for?

Well, I must admit that this is a purely hypothetical, probably not new and somewhat vague idea. For demonstration purposes I will assume a programming language where a program is a list of definitions. What a definition really is, does not matter for the moment. It may be a function, a class, a global variable, anything that has a unique name and a body. Obviously definitions refer to each other using their names within their bodies.

What is the problem?

Any serious programmer knows that code changes over time. This is one of the most difficult things to deal with when you do real world programming. Programs are organized into files, directories, modules, libraries, executables, manifests, plugins, projects, etc. Programmers assign version numbers to these parts and specify which ones are compatible. This process is complicated, time consuming and error prone. Some operating systems provide tools to solve this problem at the library level. Unfortunately that is quite coarse-grained and very tedious to use.

Another difficulty is that programs became so complex that they can only be implemented through the collaboration of many people. Programmers work on different parts of the programs and these parts change incompatibly over time. Programmers also have different preferences in terms of versions, and they don't like backwards incompatible changes.

These are the points where Immutable Code could provide a completely different (and arguably better) solution. Unfortunately nothing comes for free and in this case the price is pretty high.

How much is it?

Code cannot be easily stored in plain text file formats anymore. It becomes difficult to find a format that remains human readable and editable. This mean that most of the current programming tools become unusable without dramatic changes. Editing code requires new tools that natively support the idea of Immutable Code. Even compilers may need to be updated, because source code might become so much different.

How does it work?

Obviously Immutable Code doesn't mean that you can't change code, that would be useless. It also doesn't mean that any time you press a key you are effectively creating a new definition.

Immutable Code does mean that any definition that is made available to others can not be changed afterwards. Published code is carved into stone.

The above statement has a number of very important implications. First of all, a definition now not only has a name and a body but it has a version too. Definitions no longer refer to each other using only their names. Every reference to a definition in all definition bodies must also specify the exact version of the target definition.

Notice that I did not say anything about what a version really is. In the simplest case it may be an integer number incremented with each public change. On the other hand a definition version may also specify the publisher, the date of publishing, the branch name, various publisher specified tags, etc.

What are the bad news?

Anytime you change a definition publicly and want that to be used in your own program, you must also change all of your definitions up to the program's entry point. Moreover, other people also need to update all of their definitions up to their programs' entry points if they want to use your new version.

In such an environment migrating a program to new versions of some used definitions becomes changing the references in your definitions from the old versions to the new, desired versions. This process may be done manually for every single reference, but that is really not what one would do. Luckily updating references could be automated by writing and publishing migration programs, so it could be more or less transparent.

This also means that the amount of publicly available code grows pretty fast, because all definition versions are available until somebody or some process makes them inaccessible. The well-known term garbage collection gets a new meaning with Immutable Code.

How does code look like?

How does code look like on the screen? It depends on who is looking at it and why. Code editors must be able to support hiding version information completely, so that we could get what we have today. Besides they must also be able to present code with version information included. This is the main reason why a pure text editor cannot be used effectively.

How does code look like on the disk? Well, I think it doesn't matter much. It could be text or it could be some binary format. Although storing code in text format probably makes the tool developers' life more complicated. Text files would need more complicated parsing/unparsing while keeping little benefit from the text editors' world.

What are the good news?

Programmers of this hypothetical programming language (and environment) can be sure that their programs do not change over time unexpectedly. For example, updating definitions required by a program definitely doesn't affect another program, even if it is using the very same definitions.

Moreover, assuming a distributed, persistent programming environment (just like a database) where programmers collaborate using Immutable Code, one can be sure that a program does not change be it run anywhere and anytime.

Notice that collaboration and version control is done on the definition level which may actually be as small as a single function definition. It becomes possible to use multiple incompatible versions of the same definition in different places of the program. Remember, programs are implemented by several collaborating people with different knowledge and preferences (in terms of definitions and their versions). This is a really fine-grained and effective collaboration between programmers be it global or local to a company.

Backward compatibility becomes pretty much different, because the old versions are always there and cannot be changed. Deploying a program from the test environment to the production environment becomes safe. As long as the main entry point has the same version, the program behaves the same. Reverting a production system to an earlier version is left as an exercise to the reader.

So what?

You may have noticed that some existing programming languages (e.g. the ones from the functional programming family) are more easily adaptable to this idea than others. I did not mention any of them on purpose, because that is not important to me. I know that this idea is far from even being roughly specified, but I think my point should be clear now.