Firebird News

Monday, January 17, 2005

Firebird Coding Style Again

Jim started a new thread about coding style (flamewar is on)

I'd like to explain why I use the coding style that I do, and why I think other people should use it as well. Many people argue that coding style isn't important as long as it's consistent. I disagree. Consistency is important and costs little, but other factors are even more important.

I’d also like to establish a rule for this discussion. Firebird is our project. We should choose a coding style based on our needs, not on the rules of some other authority. Coding style is an important element of software engineering that can be discussed rationally using logic and analysis rather than reference to authority, whether that authority was your professor, the editor of K&R, or Linus Torvalds. If any of those people wants to explain his preferred style, he should do it himself. If he doesn't want to defend it, I want to hear his arguments, not his name.

In a project like Firebird, a piece of code is written once, modified maybe a dozen times, but it will be scanned thousands of times by hundreds of individuals. Like most large software projects, Firebird has little comprehensive and accurate internal documentation. The basic structure is described in various places, but the final source of information is the code itself.

Some structure can be deduced from the naming conventions used for modules and functions, but old names are very terse because global symbols were constrained to 8 characters when I wrote the original code. In line comments certainly help, and I'm very happy to note a significant increase in comments since my original "terse" implementation. But the bottom line is that the best information on the code is the code itself, and anything we can do to make the code more accessible benefits everyone.

People study code in two distinctly different modes. Most often, they peruse - scan - the code quickly to get the general picture or to locate a particular piece of functionality. Much less often, someone will analyze a piece of code closely to isolate a bug or to revise or extend the function. Just about anyone can understand a modest piece of reasonably well-written code by detailed analysis. Close analysis is time consuming however the code is structured. Not all that much can be done to accelerate it. Code perusal, on the other hand, can be accelerated by reasonable coding practices.

The key to code perusal is following the flow of control. In most cases knowing that a particular block of code is handling a specific condition is sufficient. The actual handling isn't important until after the big picture has been mastered.

The primary tool for presenting control flow is judiciously used white space -- indentation and blank lines. I don't know who invented the idea of code paragraphing, but I suspect it evolved with Algol-60. Earlier languages like Fortran and COBOL had a strong line orientation. Algol introduced the concept of arbitrarily nested statements that, left to the highly creative, rapidly became unreadable. To an Algol compiler, an Algol program is a stream of lexemes. To a human however, something else was necessary to expose the structure of a program to a casual observer (or even to the author a week or two later). The solution that evolved was paragraphing.

The essence of code paragraphing is childishly simple --

1) code conditionally executed is indented under the code that controls its execution.

2) white space separates blocks of conditional code from the flow around it.

Trading white space for clarity was a bargain even when the cost of white space was more punch cards.

I've outlined what I consider sound coding practices in the Vulcan Rules document at the top of the Vulcan CVS tree. I'd like to go over some of the rules and explain why I consider them important.

Rule: Comments should have the same indentation as the code they document, and should be preceded and followed by blank lines.

Explanation: The indentation convention should be obvious -- comments need to follow paragraphing rules for the same reason code does. When you are scanning for the end of a block, you should not have to read every comment and code line. The purpose for separating comments from code with blank lines is less obvious. Good comments are not restatements of code. Over the years you’ve been programming, you’ve read a lot of comments like this:

mov r3,r4 ; move r3 to r4

When you are trying to understand a large body of code, comments that simply restate the code stop being just useless and become really annoying. If you need to know the implementation of the algorithm, read the code. If you need to why the value in r3 needs to be put into r4, that information should be available from the comment. Useful comments describe the motivation for a block of code -- what it's trying to do, not how it's trying to do it. These comments are clearer when set apart from the code. Likewise, the code is clearer when set apart from the comments. For a reader, the code and comments serve different purposes, and each needs to stand by itself.

Rule: The bodies of compound statements should be indented beneath the code that controls it. This means that the following is preferred:

if (trace)
printf ("something happened\n");

The style most commonly taught in programming courses is:

if (trace) {
printf ("something happened\n");
}

To my mind, this violates both graphic clarity and language structure. The substatement of the "if" is a compound statement itself containing a single statement. The substatement is subordinate to the "if" and belongs underneath and indented. Putting part of the compound statement on the same line as the "if" suggests that the brace is part of the "if". It isn't.

A more serious problem is that even the most dedicated brace-manglers sometimes omit the brace, which requires the reader, who just wants to scan the code checking flow control, to stop and read the entire "if" line, scrolling horizontally if necessary, to check for a brace to find the end of the "if" statement. This is an unreasonable amount of work. [A small note: Working software engineers almost never teach software engineering. Computer science curricula are run by a self-perpetuating, inbred organization unsullied with practical experience.]

Another school thought would write this snippet:

if (trace)
{
printf ("something happened\n")'
}

Again, this violates the logic of paragraphing. The braces are subordinate to the "if" and should be indented; the next line at the same level as the "if" should be line common to the various control paths through the "if" block. I also don't like braces where they aren't necessary as it reduces the amount of code one can see at a glance. After 40 years of programming, I'm afraid by reaction to a simple statement enclosed in braces is "Huh? Did the guy miss school the day they covered statements?” The same applies to:

if (trace)
{
printf ("something happened\n");
}

Ok, this is clear, the indentation makes logical sense, but the braces add nothing. While the purpose of paragraphing to clarify the code, it is reasonable to assume the reader knows the language.

Indentation is generally important, but absolutely critical for try/catch blocks. A careful reader is going to be constantly scrolling down to see how a thrown error is going to be handled. In many too many places in Firebird this involves counting braces on one's fingers and hoping for a comment on the critical brace indicating it's the end of the "try" block. In structured code, these comments are completely unnecessary.

The practice of putting "case" branches at the same level of indentation as "switch" is terribly confusing, particularly when used with nested switches.

Someone may argue that consistent indenting under if, for, while, switch, try, and catch moves the code too far to the right. A large block of white space on the left is the software gods’ way of saying that your code is too complicated. Rethink it. Consider a subroutine.

During the Borland years, the average length of a function went from maybe a dozen lines to over a hundred. In almost all cases, this is the result of bad design and bad organization. Simple code almost can't fail. Complex code with many levels of conditionals is almost always buggy.

Blank lines are an under-used resource. Vulcan rules put blank lines before and after all if statements, and most loops, switch, try, and catch statements. Research has shown that most bugs occur when two code branches converge with inconsistent state. The danger point is not the top of a conditional statement, but the bottom.

While doing Vulcan, I tried to change as little as possible. Time after time, however, I found myself re-paragraphing code so I could understand what it was doing. After a while, it became a habit -- if a block of code wasn't crystal clear, I cleaned it up. In many cases, I was reverting code to the style in which it was written. I will concede that the phrase "grrr -- code vandals --- grrr" passed though my mind during the process, particularly when some well-meaning soul rendered a piece of code unreadable.

Two things are clear with regards to coding style. First, there is great disagreement about what the proper style should be. And second, there will be great agreement that I'm completely wrong on all counts. So I'm going to ask people on the project to do two things. One is to take the time to understand my arguments, even you don't agree with them. The other is that you explain why an alternative is better, not who likes the alternative.

--

Jim Starkey

No comments: