4.0 Program Unit Organization
A program unit is any procedure, function, coroutine, iterator, subroutine,
subprogram, routine, or other term that describes a section of code that
abstracts a set of common operations on the computer. This text will simply use
the term procedure or routine to describe these concepts.
Routines are closely related to modules, since they tend to be the major
component of a module (along with data, constants, and types). Hence, many of
the attributes that apply to a module also apply to routines. The following
paragraphs, at the expense of being redundant, repeat the earlier definitions so
you don't have to flip back to the previous sections.
4.1 Routine Cohesion
Routines exhibit the following kinds of cohesion (listed from good to
bad):
- Functional or logical cohesion exists if the routine accomplishes exactly
one (simple) task.
- Sequential or pipelined cohesion exists when a routine does several
sequential operations that must be performed in a certain order with the data
from one operation being fed to the next in a "filter-like" fashion.
- Global or communicational cohesion exists when a routine performs a set of
operations that make use of a common set of data, but are otherwise unrelated.
- Temporal cohesion exists when a routine performs a set of operations that
need to be done at the same time (though not necessarily in the same order). A
typical initialization routine is an example of such code.
- Procedural cohesion exists when a routine performs a sequence of
operations in a specific order, but the only thing that binds them together is
the order in which they must be done. Unlike sequential cohesion, the
operations do not share data.
- State cohesion occurs when several different (unrelated) operations appear
in the same module and a state variable (e.g., a parameter) selects the
operation to execute. Typically such routines contain a case (switch) or
if..elseif..elseif... statement.
- No cohesion exists if the operations in a routine have no apparent
relationship with one another.
The first three forms of cohesion above are generally acceptable in a
program. The fourth (temporal) is probably okay, but you should rarely use it.
The last three forms should almost never appear in a program. For some
reasonable examples of routine cohesion, you should consult "Code Complete".
- Guideline:
- All routines should exhibit good cohesiveness. Functional cohesiveness is
best, followed by sequential and global cohesiveness. Temporal cohesiveness is
okay on occasion. You should avoid the other forms.
4.1.1 Routine Coupling
Coupling refers to the way that two routines communicate with one another.
There are several criteria that define the level of coupling between two
routines:
- Cardinality- the number of objects communicated between two routines. The
fewer objects the better (i.e., fewer parameters).
- Intimacy- how "private" is the communication? Parameter lists are the most
private form; private data fields in a class or object are next level; public
data fields in a class or object are next, global variables are even less
intimate, and passing data in a file or database is the least intimate
connection. Well-written routines exhibit a high degree of intimacy.
- Visibility- this is somewhat related to intimacy above. This refers to how
visible the data is to the entire system that you pass between two routines.
For example, passing data in a parameter list is direct and very visible (you
always see the data the caller is passing in the call to the routine); passing
data in global variables makes the transfer less visible (you could have set
up the global variable long before the call to the routine). Another example
is passing simple (scalar) variables rather than loading up a bunch of values
into a structure/record and passing that structure/record to the callee.
- Flexibility- This refers to how easy it is to make the connection between
two routines that may not have been originally intended to call one another.
For example, suppose you pass a structure containing three fields into a
function. If you want to call that function but you only have three data
objects, not the structure, you would have to create a dummy structure, copy
the three values into the field of that structure, and then call the routine.
On the other hand, had you simply passed the three values as separate
parameters, you could still pass in structures (by specifying each field) as
well as call the routine with separate values.
A function is loosely coupled if it exhibits low cardinality, high intimacy,
high visibility, and high flexibility. Often, these features are in conflict
with one another (e.g., increasing the flexibility by breaking out the fields
from a structures [a good thing] will also increase the cardinality [a bad
thing]). It is the traditional goal of any engineer to choose the appropriate
compromises for each individual circumstance; therefore, you will need to
carefully balance each of the four attributes above.
A program that uses loose coupling generally contains fewer errors per KLOC
(thousands of lines of code). Furthermore, routines that exhibit loose coupling
are easier to reuse (both in the current and future projects). For more
information on coupling, see the appropriate chapter in "Code Complete".
- Guideline:
- Coupling between routines in source code should be loose.
4.1.2 Routine Size
Sometime in the 1960's, someone decided that programmers could only look at
one page in a listing at a time, therefore routines should be a maximum of one
page long (66 lines, at the time). In the 1970's, when interactive computing
became popular, this was adjusted to 24 lines -- the size of a terminal screen.
In fact, there is very little empirical evidence to suggest that small routine
size is a good attribute. In fact, several studies on code containing artificial
constraints on routine size indicate just the opposite -- shorter routines often
contain more bugs per KLOC[10].
A routine that exhibits functional cohesiveness is the right size, almost
regardless of the number of lines of code it contains. You shouldn't
artificially break up a routine into two or more subroutines (e.g., sub_partI
and sub_partII) just because you feel a routine is getting to be too long.
First, verify that your routine exhibits strong cohesion and loose coupling. If
this is the case, the routine is not too long. Do keep in mind, however, that a
long routine is probably a good indication that it is performing several actions
and, therefore, does not exhibit strong cohesion.
Of course, you can take this too far. Most studies on the subject indicate
that routines in excess of 150-200 lines of code tend to contain more bugs and
are more costly to fix than shorter routines. Note, by the way, that you do not
count blank lines or lines containing only comments when counting the lines of
code in a program.
Also note that most studies involving routine size deal with HLLs. A
comparable assembly language routine will contain more lines of code than the
corresponding HLL routine. Therefore, you can expect your routines in assembly
language to be a little longer.
- Guideline:
- Do not let artificial constraints affect the size of your routines. If a
routine exceeds about 200-250 lines of code, make sure the routine exhibits
functional or sequential cohesion. Also look to see if there aren't some
generic subsequences in your code that you can turn into stand alone routines.
- Rule:
- Never shorten a routine by dividing it into n parts that you would always
call in the appropriate sequence as a way of shortening the original routine.
4.2 Placement of the Main Procedure and Data
As noted earlier, you should name the main procedure main and place it in the
source file bearing the same name as the executable file. If this module is
rather long, it can still be difficult to locate the main program. A good
solution is to always place the main procedure at the same point in the source
file. By convention (meaning everyone expects it this way), most programmers
make their main program the first or last procedure in an assembly language
program. Either position is fine. Putting the main program anywhere else makes
it hard to find.
- Rule:
- Always make the main procedure the first or last procedure in a source
file.
MASM, because it is a multiphase assembler, does not require that you define
a symbol before you use it. This is necessary because many instructions (like
JMP) need to refer to symbols found later in the program. In a similar manner,
MASM doesn't really care where you define your data - before or after its use[11]. However, most programmers "grew up" with high
level languages that require the definition of a symbol before its first use. As
a result, they expect to be able to find a variable declaration by looking
backwards in the source file. Since everyone expects this, it is a good
tradition to continue in an assembly language program.
- Rule:
- You should declare all variables, constants, and macros prior to their use
in an assembly language program.
- Rule:
- You should define all static variables (those you declare in a segment) at
the beginning of the source module.
|