Saturday, February 16, 2008

An Arc-Tangent

There has been a lot of talk going on about Arc, Paul Graham's LISP derivative. I've been watching this discussion with some interest, not because I'm a LISPer, but because it is bringing up some interesting questions about programming languages and software development.

First, I want to be clear that this is not a critique of Arc. As I said, I'm not a LISPer. I would not be qualified to say much about the language itself. I have a great amount of respect for what Paul is trying to do: he is trying to make an appreciable difference in how software is built. Whether I agree with him or not does not matter in that context. He is putting his sweat into his beliefs. In that context, even if I totally disagreed with everything he said and did, he has still done it.

What is driving this post is the question of what makes a language "best." For instance, the primary tenant driving Paul's development is code size:
...making programs short is what high level languages are for. It may not be 100% accurate to say the power of a programming language is in inverse proportion to the length of programs written in it, but it's damned close.

I wholeheartedly agree that brevity is an important aspect of high level programming languages. I've switched from Java to Python in the majority of my work for exactly that reason. However, I do not think that the primary aspect of importance is brevity. In fact, I would go so far as to say that there is no primary driver of a good language.

So what would my idea language look like? Like I said, brevity is important, but just as important is self-documentation. With those two items, you have a very good start to a language. However, those are not the only important aspects, in my mind. Added to brevity and self-documentation are clear flow control, a strong set of built-in libraries, and an emphasis in simplicity. In the end, what I'm looking for is an efficient language.

There is no doubt there is a relationship between code size, program grok-ability, and development speed. Code that is too long requires too many page faults to understand. Development speed is similarly related, where longer code has more constructs that need to be developed, tested, and debugged. Quite literally, the more you can do with a single line of code, the less opportunity you have to introduce a problem.

But can that go too far? For example, can you say that code that is too short slows down development? Can you put too much on a single line of code? At some point, I think the answer is 'yes'. At some point, code reaches a sufficiently dense size that it requires multiple mental translation phases to expand to a reasonable vocabulary, and therefore reasonable understanding. The litmus test there is "Do I think I need to comment this code to understand WHAT it is doing?" (as opposed to WHY it is doing it, which may be valid in any case). This is very hard to measure because the line of "too short" is going to vary significantly by developer and experience with a particular language, but I do firmly believe it is there. Furthermore, for code that somebody else will have to read (including yourself in the future), if you pass that point, you are going to pay a penalty in the future.

So, regarding brevity, the ideal code length is the point where grok-ability and development speed curves have the most area under them. Java, in my opinion, is high on grok-ability but very low on development speed. Perl is low on grok-ability but high on development speed. Python has come in just right, for me. My short experience with Ruby and Groovy seem to put them in that category as well. Your mileage may vary.

Grok-ability brings up an important point, which is self-documentation. To what level does a language encourage self-documentation. If a language requires constant commenting or some other form of context switch to understand it, then it is probably not a highly efficient language. This, of course, is also impacted by one's knowledge of said language: the more you know the language, the easier it is to understand the language, the more self-documenting it is. This is most important in the context of interacting with others' code.

An interesting challenge with self-documentation is understanding all of the ways code flows through the system. Branches, loops, goto's, breaks, labeled breaks, exceptions, come from's, alter, signals, continuations, function calls and undoubtedly others that I've never heard of, all increase the difficulty in understanding what the code does. Obviously branching and loops are necessary, but at some point, the complexity of it all might just overwhelm. As an example, a great many people consider exceptions to be evil. I'm not one of those, but a language ought to understand the implications of its flow control on the people who are both writing and reading the code.

If brevity was everything that mattered, APL would have a much larger mind-share than it does. Included above are some of the things I think are important, but they are not all of them. As you go through your process of choosing a language, make sure that you understand your needs. And, if Arc is the right one for you, happy trails for you.

No comments: