Portrait of Yegge

Let me start by saying that very often I find Steve Yegge’s points insightful. His last post about commenting code and type information has a few things I have an issue with, though. I should probably point out my area of research is software verification, and so some of the things he brings up does hit upon what I do.

A large part of his argument is about metadata and it’s usefulness in programming. I wouldn’t say that he is totally against metadata, but his problem with it is a major part of his argument. This is a major oversimplification, but it does cover a lot of what he said.

Metadata is just a great topic. If I ever get cornered in a dangerous situation, I think I could get out of it just by putting my opponent to sleep talking about either metadata, public policy, or XMPP. It’s an interesting topic (and I’m not in to it anywhere near as much as the people who wrote the Wikipedia article I linked to) mainly because we interact with metadata all the time without thinking about it.

Facebook’s photo application is an example making metadata useful for people (and I think many people would argue that half of Facebook itself is really just metadata). Many current photo applications allow some form of adding keywords to photos, but Facebook allows one to tag the actual person in the photo. In either case, we’re talking about metadata, but it’s at a different level. You can add comments to photos, which is also metadata but noticably different. There are tons of examples like this and I doubt Facebook is unique with this, but it’s the one that comes to mind. The point is, it is there, and few people think of it as “metadata”. Once you’ve got a tagged photo, that information can be searched for and used. But remember: the photo’s the data, the tagging is the metadata.

Steve breaks metadata for coding down to three main situations. I’ll leave the third one—annotations—alone, simply because I don’t have enough experience with it to say one thing or the other. One form of the metadata is commenting. This part I can agree with him more than the other. Commenting is a way of annotating code for future programmers. While I’ve never seen “n00b”-commenting as bad as he provides, there is certainly a point of diminishing returns. It’s important to remember the purpose it serves. A programmer, looking at the code, can see the nitty-gritty details of the code because it’s right there in front of them. Commenting allows us to take a step back and try to understand the big picture.

Here’s a common example of what I’m talking about (in the example, a and b are ints.):

a = a - b;
b = b + a;
a = b - a;

This is three lines of code, and yet it takes a bit of thinking to see what it is doing. We can augment it to help ourselves, and future people:

// Swaps a and b
a = a - b;
b = b + a;
a = b - a;

With one line of commenting, we’ve made it a lot more understandable. Some people might add comments to each line to be able to follow the math. I think that’s fine, but personally would rather leave that for the person reading the code. I don’t think anything I’ve said here really fights Steve Yegge’s main point about commenting, but I could be wrong. Others might suggest writing swap as a separate function, in some ways abstracting away how swap itself is done. You certainly won’t hear a complaint about that from me.

This actually leads us the second form of metadata Steve mentions. This one is type information. He does not seem to be a big fan of it really, at least in the static sense. Incidentally, the comments of his post actually link to a good summary of static and dynamic typing. [Note: I happen to find dynamically typed languages fascinating in their own right. There’s a lot of interesting things that can be done with them. But in the end I’m trying to verify software, and I’ll be able to verify something in a static language long before I can verify something in a dynamic one.]

I actually included type information in my example; I said that a and b are ints. But say I didn’t, and you worked under the assumption that we are using some objects that have + and - defined. Here’s the question I spend my time with: is the above code correct?

This depends on the language. Let’s start out with a language like C or C++, where ints are bounded, say between the values min and max, with min < 0 < max (for our purposes). For sure, this is not correct code then. What if a is max and b is -1? We just had overflow after the first line.

[Note: It should be mentioned there are other ways this can break, of course. Even more implementations break if a = b (usually using some bitwise xor-ing). There is, of course, a perfectly good version of swap. I’ll leave that one up to the reader.]

In C++, this will typically not crash. Instead, a gets set to a non-useful value (related to the original values, but useless none-the-less). Other static languages will throw an exception. My feelings on exceptions are another matter, so I will leave it at that.

What about in a dynamic language like Smalltalk? Smalltalk with just change types on you. Consider this common example:

1 class maxVal + 1.

So what the heck just happened there? We asked 1 what its class was. In this case, it said SmallInteger. Okay, good so far. Then we ask the SmallInteger class what its maximum value is. In my copy of Squeak, I get 1073741823 back. Then, we take that number and add one. This seemingly would overflow, but Smalltalk doesn’t work this way. It’ll actually return an object of type LargePositiveInteger, with a value of 1073741824 (i.e., one bigger than what we said the largest value was.)

Smalltalk can’t always find another, more appropriate type. Sometimes, an exception is raised instead (and we get back to how I’ll talk about my feelings about exceptions another time), just because you did something wrong.

But let’s go back to Steve Yegge’s original point about the type information being metadata, and being more of a nuisance. Well, you know what? This is true. But it’s a good nuisance, in my book. Where I’m at, the whole point is to statically prove your code does what you say it does. I’m not convinced we couldn’t do this in a dynamic language, but in the end you could very well need to include the information we’re already including in a static language for the same purpose.

It’s true that languages like C++ allow us to abuse type systems to the point where they are, in reality, useless. But this is a fault of C++, not the idea of typing itself. C++ turns out to be a bad example of anything. But that’s what it was designed for, so this shouldn’t be surprising. When I use C++ in a disciplined manner, or better writing in a language that doesn’t allow me to do anything but, I get saved every day just in the compilation stage because of static typing. I’m not a great programmer, but I certainly am not a bad one, either.

In the end, I think Steve’s issues with metadata is not with metadata, but how it is used. Most, if not all, current static programming languages have features that work against their own typing systems. They also are not truly exploiting the full potential a static type system can provide (showing correctness). Part of this is because the field is young (we’ve had higher level programming languages for what, 50 years now?) compared to other fields. Part of it is because people like me tend to work slowly. But please, don’t knock the metadata.

Technorati Tags: , , , ,


Popular posts from this blog

iPhone and iPod Touch, 802.1X and LEAP

Xcode 3 language specification changes

Comics without the newspaper