Assembly Language Style Guidelines


Assembly Language Style Guidelines

You can read and download this Assembly Language Style Guidelines coding standard and code convention for assembly and asm from here for free at your own risk. All trademarks, registered trademarks, product names and company names or logos mentioned herein are the property of their respective owners.

Assembly Language Style Guidelines

Style Guidelines for Assembly Language Programmers
1.0 - Introduction
1.1 - ADDHEX.ASM
1.2 - Graphics Example
1.3 - S.COM Example
1.4 - Intended Audience
1.5 - Readability Metrics
1.6 - How to Achieve Readability
1.7 - How This Document is Organized
1.8 - Guidelines, Rules, Enforced Rules, and Exceptions
1.9 - Source Language Concerns
2.0 - Program Organization
2.1 - Library Functions
2.2 - Common Object Modules
2.3 - Local Modules
2.4 - Program Make Files
3.0 - Module Organization
3.1 - Module Attributes
3.1.1 - Module Cohesion
3.1.2 - Module Coupling
3.1.3 - Physical Organization of Modules
3.1.4 - Module Interface
4.0 - Program Unit Organization
4.1 - Routine Cohesion
4.1.1 - Routine Coupling
4.1.2 - Routine Size
4.2 - Placement of the Main Procedure and Data
5.0 - Statement Organization
6.0 - Comments
6.1 - What is a Bad Comment?
6.2 - What is a Good Comment?
6.3 - Endline vs. Standalone Comments
6.4 - Unfinished Code
6.5 - Cross References in Code to Other Documents
7.0 - Names, Instructions, Operators, and Operands
7.1 - Names
7.1.1 - Naming Conventions
7.1.2 - Alphabetic Case Considerations
7.1.3 - Abbreviations
7.1.4 - The Position of Components Within an Identifier
7.1.5 - Names to Avoid
7.2 - Instructions, Directives, and Pseudo-Opcodes
7.2.1 - Choosing the Best Instruction Sequence
7.2.2 - Control Structures
7.2.3 - Instruction Synonyms
8.0 - Data Types
8.1 - Defining New Data Types with TYPEDEF
8.2 - Creating Array Types
8.3 - Declaring Structures in Assembly Language
8.4 - Data Types and the UCR Standard Library

Style Guidelines for Assembly Language Programmers

1.0 Introduction

Most people consider assembly language programs difficult to read. While there are a multitude of reasons why people feel this way, the primary reason is that assembly language does not make it easy for programmers to write readable programs. This doesn't mean it's impossible to write readable programs, only that it takes an extra effort on the part of an assembly language programmer to produce readable code.

To demonstrate some common problems with assembly language programs, consider the following programs or program segments. These are actual programs written in assembly language taken from the internet. Each example demonstrates a separate problem. (By the way, the choice of these examples is not intended to embarass the original authors. These programs are typical of assembly language source code found on the Internet.)

1.1 ADDHEX.ASM



%TITLE "Sums TWO hex values"

        IDEAL
        DOSSEG
        MODEL   small
        STACK   256

        DATASEG

exitCode        db      0
prompt1         db      'Enter value 1: ', 0
prompt2         db      'Enter value 2: ', 0
string          db      20 DUP (?)

        CODESEG

        EXTRN   StrLength:proc
        EXTRN   StrWrite:proc, StrRead:proc, NewLine:proc
        EXTRN   AscToBin:proc, BinToAscHex:proc

Start:
        mov     ax,@data
        mov     ds,ax
        mov     es,ax
        mov     di, offset prompt1
        call    GetValue
        push    ax
        mov     di, offset prompt2
        call    GetValue
        pop     bx
        add     ax,bx
        mov     cx,4
        mov     di, offset string
        call    BinToAscHex
        call    StrWrite
Exit:
        mov     ah,04Ch
        mov     al,[exitCode]
        int     21h

PROC    GetValue
        call    StrWrite
        mov     di, offset string
        mov     cl,4
        call    StrRead
        call    NewLine
        call    StrLength
        mov     bx,cx
        mov     [word bx + di], 'h'
        call    AscToBin
        ret
ENDP    GetValue

        END     Start

Well, the biggest problem with this program should be fairly obvious - it has absolutely no comments other than the title of the program. Another problem is the fact that strings that prompt the user appear in one part of the program and the calls that print those strings appear in another. While this is typical assembly language programming, it still makes the program harder to read. Another, relatively minor, problem is that it uses TASM's "less-than" IDEAL syntax[1].

This program also uses the MASM/TASM "simplified" segment directives. How typically Microsoft to name a feature that adds complexity to a product "simplified." It turns out that programs that use the standard segmentation directives will be easier to read[2].

Before moving one, it is worthwhile to point out two good features about this program (with respect to readability). First, the programmer chose a reasonable set of names for the procedures and variables this program uses (I'll assume the author of this code segment is also the author of the library routines it calls). Another positive aspect to this program is that the mnemonic and operand fields are nicely aligned.

Okay, after complaining about how hard this code is to read, how about a more readable version? The following program is, arguably, more readable than the version above. Arguably, because this version uses the UCR Standard Library v2.0 and it assumes that the reader is familiar with features of that particular library.



;**************************************************
;
; AddHex-
;
; This simple program reads two integer values from
; the user, computes their sum, and prints the
; result to the display.
;
; This example uses the "UCR Standard Library for
; 80x86 Assembly Language Programmers v2.0"
;
; Randall Hyde
; 12/13/96

                title           AddHex
                .xlist
                include         ucrlib.a
                includelib      ucrlib.lib
                .list



cseg            segment para public 'code'
                assume  cs:cseg

; GetInt-
;
; This function reads an integer value from the keyboard and
; returns that value in the AX register.
;
; This routine traps illegal values (either too large or
; incorrect digits) and makes the user re-enter the value.

GetInt          textequ <call GetInt_p>
GetInt_p        proc
                push    dx              ;DX hold error code.

GetIntLoop:     mov     dx, false       ;Assume no error.
                try                     ;Trap any errors.

                FlushGetc               ;Force input from a new line.
                geti                    ;Read the integer.

                except  $Conversion     ;Trap if bad characters.
                print   "Illegal numeric conversion, please
re-enter", nl
                mov     dx, true

                except  $Overflow       ;Trap if # too large.
                print   "Value out of range, please re-enter.",nl
                mov     dx, true


                endtry
                cmp     dx, true
                je      GetIntLoop
                pop     dx
                ret
GetInt_p        endp



Main            proc

                InitExcept

                print   'Enter value 1: '
                GetInt
                mov     bx, ax

                print   'Enter value 2: '
                GetInt
                print   cr, lf, 'The sum of the two values is '
                add     ax, bx
                puti
                putcr

Quit:           CleanUpEx
                ExitPgm                 ;DOS macro to quit program.
Main            endp

cseg            ends

sseg            segment para stack 'stack'
stk             db      256 dup (?)
sseg            ends


zzzzzzseg       segment para public 'zzzzzz'
LastBytes       db      16 dup (?)
zzzzzzseg       ends
                end     Main

It is well worth pointing out that this code does quite a bit more than the original AddHex program. In particular, it validates the user's input; something the original program did not do. If one were to exactly simulate the original program, the program could be simplified to the following:



                print   nl, 'Enter value 1: '
                Geti
                mov     bx, ax

                print   nl, 'Enter value 2: '
                Geti
                add     ax, bx
                putcr
                puti
                putcr

In this example, the two sample solutions improved the readability of the program by adding comments, formatting the program a little bit better, and by using the high-level features of the UCR Standard Library to simplify the coding and keep output string literals with the statements that print them.

1.2 Graphics Example

The following program segment comes from a much larger program named "MODEX.ASM" on the net. It deals with setting up the color graphics display.



;===================================
;SET_POINT (Xpos%, Ypos%, ColorNum%)
;===================================
;
; Plots a single Pixel on the active display page
;
; ENTRY: Xpos     = X position to plot pixel at
;        Ypos     = Y position to plot pixel at
;        ColorNum = Color to plot pixel with
;
; EXIT:  No meaningful values returned
;

SP_STACK    STRUC
                DW  ?,? ; BP, DI
                DD  ?   ; Caller
    SETP_Color  DB  ?,? ; Color of Point to Plot
    SETP_Ypos   DW  ?   ; Y pos of Point to Plot
    SETP_Xpos   DW  ?   ; X pos of Point to Plot
SP_STACK    ENDS

        PUBLIC SET_POINT

SET_POINT   PROC    FAR

    PUSHx   BP, DI              ; Preserve Registers
    MOV     BP, SP              ; Set up Stack Frame

    LES     DI, d CURRENT_PAGE  ; Point to Active VGA Page

    MOV     AX, [BP].SETP_Ypos  ; Get Line # of Pixel
    MUL     SCREEN_WIDTH        ; Get Offset to Start of Line

    MOV     BX, [BP].SETP_Xpos  ; Get Xpos
    MOV     CX, BX              ; Copy to extract Plane # from
    SHR     BX, 2               ; X offset (Bytes) = Xpos/4
    ADD     BX, AX              ; Offset = Width*Ypos + Xpos/4

    MOV     AX, MAP_MASK_PLANE1 ; Map Mask & Plane Select Register
    AND     CL, PLANE_BITS      ; Get Plane Bits
    SHL     AH, CL              ; Get Plane Select Value
    OUT_16  SC_Index, AX        ; Select Plane

    MOV     AL,[BP].SETP_Color  ; Get Pixel Color
    MOV     ES:[DI+BX], AL      ; Draw Pixel

    POPx    DI, BP              ; Restore Saved Registers
    RET     6                   ; Exit and Clean up Stack

SET_POINT        ENDP

Unlike the previous example, this one has lots of comments. Indeed, the comments are not bad. However, this particular routine suffers from its own set of problems. First, most of the instructions, register names, and identifiers appear in upper case. Upper case characters are much harder to read than lower case letters. Considering the extra work involved in entering upper case letters into the computer, it's a real shame to see this type of mistake in a program[3]. Another big problem with this particular code segment is that the author didn't align the label field, the mnemonic field, and the operand field very well (it's not horrible, but it's bad enough to affect the readability of the program.

Here is an improved version of the program:



;===================================
;
;SetPoint (Xpos%, Ypos%, ColorNum%)
;
;
; Plots a single Pixel on the active display page
;
; ENTRY: Xpos     = X position to plot pixel at
;        Ypos     = Y position to plot pixel at
;        ColorNum = Color to plot pixel with
;
;        ES:DI    = Screen base address (??? I added this without really
;                                        knowing what is going on here
[RLH]).
;
; EXIT:  No meaningful values returned
;
dp              textequ <dword ptr>

Color           textequ <[bp+6]>
YPos            textequ <[bp+8]>
XPos            textequ <[bp+10]>

                public  SetPoint
SetPoint        proc    far
                push    bp
                mov     bp, sp
                push    di
                les     di, dp CurrentPage      ;Point at active VGA Page

                mov     ax, YPos                ;Get line # of Pixel
                mul     ScreenWidth             ;Get offset to start of
line

                mov     bx, XPos                ;Get offset into line
                mov     cx, bx                  ;Save for plane
computations
                shr     bx, 2                   ;X offset (bytes)= XPos/4
                add     bx, ax                  ;Offset=Width*YPos + XPos/4

                mov     ax, MapMaskPlane1       ;Map mask & plane
select reg
                and     cl, PlaneBits           ;Get plane bits
                shl     ah, cl                  ;Get plane select value
                out_16  SCIndex, ax             ;Select plane

                mov     al, Color               ;Get pixel color
                mov     es:[di+bx], al          ;Draw pixel

                pop     di
                pop     bp
                ret     6
SetPoint        endp

Most of the changes here were purely mechanical: reducing the number of upper case letters in the program, spacing the program out better, adjusting some comments, etc. Nevertheless, these small, subtle, changes have a big impact on how easy the code is to read (at least, to an experienced assembly langage programmer).

1.3 S.COM Example

The following code sequence came from a program labelled "S.COM" that was also found in an archive on the internet.



;Get all file names matching filespec and set up tables
GetFileRecords:
    mov dx, OFFSET DTA          ;Set up DTA
    mov ah, 1Ah
    int 21h
    mov dx, FILESPEC            ;Get first file name
    mov cl, 37h
    mov ah, 4Eh
    int 21h
    jnc FileFound               ;No files.  Try a different filespec.
    mov si, OFFSET NoFilesMsg
    call Error
    jmp NewFilespec
FileFound:
    mov di, OFFSET fileRecords  ;DI -> storage for file names
    mov bx, OFFSET files        ;BX -> array of files
    sub bx, 2
StoreFileName:
    add bx, 2                   ;For all files that will fit,
    cmp bx, (OFFSET files) + NFILES*2
    jb @@L1
    sub bx, 2
    mov [last], bx
    mov si, OFFSET tooManyMsg
    jmp DoError
@@L1:
    mov [bx], di                ;Store pointer to status/filename in
files[]
    mov al, [DTA_ATTRIB]        ;Store status byte
    and al, 3Fh                 ;Top bit is used to indicate file is marked
    stosb
    mov si, OFFSET DTA_NAME     ;Copy file name from DTA to filename
storage
    call CopyString
    inc di
    mov si, OFFSET DTA_TIME     ;Copy time, date and size
    mov cx, 4
    rep movsw
    mov ah, 4Fh                 ;Next filename
    int 21h
    jnc StoreFileName
    mov [last], bx              ;Save pointer to last file entry
    mov al, [keepSorted]        ;If returning from EXEC, need to resort
files?
    or al, al
    jz DisplayFiles
    jmp Sort0

The primary problem with this program is the formatting. The label fields overlap the mnemonic fields (in almost every instance), the operand fields of the various instructions are not aligned, there are very few blank lines to organize the code, the programmer makes excessive use of "local" label names, and, although not prevalent, there are a few items that are all uppercase (remember, upper case characters are harder to read). This program also makes considerable use of "magic numbers," especially with respect to opcodes passed on to DOS.

Another subtle problem with this program is the way it organizes control flow. At a couple of points in the code it checks to see if an error condition exists (file not found and too many files processed). If an error exists, the code above branches around some error handling code that the author places in the middle of the routine. Unfortunately, this interrupts the flow of the program. Most readers will want to see a straight-line version of the program's typical operation without having to worry about details concerning error conditions. Unfortunately, the organization of this code is such that the user must skip over seldomly-executed code in order to follow what is happening with the common case[4].

Here is a slightly improved version of the above program:



;Get all file names matching filespec and set up tables

GetFileRecords  mov     dx, offset DTA          ;Set up DTA
                DOS     SetDTA

; Get the first file that matches the specified filename (that may
; contain wildcard characters).  If no such file exists, then
; we've got an error.

                mov     dx, FileSpec
                mov     cl, 37h
                DOS     FindFirstFile
                jc      FileNotFound

; As long as there are no more files matching our file spec (that contains
; wildcard characters), get the file information and place it in the
; "files" array.  Each time through the
"StoreFileName" loop we've got
; a new file name via a call to DOS' FindNextFile function (FindFirstFile
; for the first iteration).  Store the info concerning the file away and
; move on to the next file.

                mov     di, offset fileRecords  ;DI -> storage for file
names
                mov     bx, offset files        ;BX -> array of
files
                sub     bx, 2                   ;Special case for 1st
iteration
StoreFileName:  add     bx, 2
                cmp     bx, (offset files) + NFILES*2
                jae     TooManyFiles

; Store away the pointer to the status/filename in files[] array.
; Note that the H.O. bit of the status byte indicates that the file is
; is marked.

                mov     [bx], di                ;Store pointer in files[]
                mov     al, [DTAattrib]         ;Store status byte
                and     al, 3Fh                 ;Clear file is marked bit
                stosb

; Copy the filename from the DTA storage area to the space we've set aside.

                mov     si, offset DTAname
                call    CopyString
                inc     di                      ;Skip zero byte (???).

                mov     si, offset DTAtime      ;Copy time, date and size
                mov     cx, 4
        rep     movsw

; Move on to the next file and try again.

                DOS     FindNextFile
                jnc     StoreFileName

; After processing the last file entry, do some clean up.
; (1) Save pointer to last file entry.
; (2) If returning from EXEC, we may need to resort and display the files.

                mov     [last], bx
                mov     al, [keepSorted]
                or      al, al
                jz      DisplayFiles
                jmp     Sort0

; Jump down here if there were no files to process.

FileNotFound:   mov     si, offset NoFilesMsg
                call    Error
                jmp     NewFilespec

; Jump down here if there were too many files to process.

TooManyFiles:   sub     bx, 2
                mov     [last], bx
                mov     si, offset tooManyMsg
                jmp     DoError

This improved version dispenses with the local labels, formats the code better by aligning all the statement fields and inserting blank lines into the code. It also eliminates much of the uppercase characters appearing in the previous version. Another improvment is that this code moves the error handling code out of the main stream of this code segment, allowing the reader to follow the typical execution in a more linear fashion.

1.4 Intended Audience

Of course, an assembly language program is going to be nearly unreadable to someone who doesn't know assembly language. This is true for almost any programming language. In the examples above, it's doubtful that the "improved" versions are really any more readable than the original version if you don't know 80x86 assembly language. Perhaps the improved versions are more aesthetic in a generic sense, but if you don't know 80x86 assembly language it's doubtful you'd make any more sense of the second version than the first. Other than burying a tutorial on 80x86 assembly language in a program's comments, there is no way to address this problem[5].

In view of the above, it makes sense to define an "intended audience" that we intend to have read our assembly language programs. Such a person should:

Be a reasonably competent 80x86 assembly language programmer.
Be reasonably familiar with the problem the assembly language program is attempting to solve.
Fluently read English[6].
Have a good grasp of high level language concepts.
Possess appropriate knowledge for someone working in the field of Computer Science (e.g., understands standard algorithms and data structures, understands basic machine architecture, and understands basic discrete mathmatics).

1.5 Readability Metrics

One has to ask "What is it that makes one program more readable than another?" In other words, how do we measure the "readability" of a program? The usual metric, "I know a well-written program when I see one" is inappropriate; for most people, this translates to "If your programs look like my better programs then they are readable, otherwise they are not." Obviously, such a metric is of little value since it changes with every person.

To develop a metric for measuring the readability of an assembly language program, the first thing we must ask is "Why is readability important?" This question has a simple (though somewhat flippant) answer: Readability is important because programs are read (furthermore, a line of code is typically read ten times more often than it is written). To expand on this, consider the fact that most programs are read and maintained by other programmers (Steve McConnell claims that up to ten generations of maintenance programmers work on a typically real world program before it is rewritten; furthermore, they spend up to 60% of their effort on that code simply figuring out how it works). The more readable your programs are, the less time these other people will have to spend figuring out what your program does. Instead, they can concentrate on adding features or correcting defects in the code.

For the purposes of this document, we will define a "readable" program as one that has the following trait:

A "readable" program is one that a competent programmer (one who is familiar with the problem the program is attempting to solve) can pick up, without ever having seen the program before, and fully comprehend the entire program in a minimal amount of time.

That's a tall order! This definition doesn't sound very difficult to achieve, but few non-trivial programs ever really achieve this status. This definition suggests that an appropriate programmer (i.e., one who is familiar with the problem the program is trying to solve) can pick up a program, read it at their normal reading pace (just once), and fully comprehend the program. Anything less is not a "readable" program.

Of course, in practice, this definition is unusable since very few programs reach this goal. Part of the problem is that programs tend to be quite long and few human beings are capable of managing a large number of details in their head at one time. Furthermore, no matter how well-written a program may be, "a competent programmer" does not suggest that the programmer's IQ is so high they can read a statement a fully comprehend its meaning without expending much thought. Therefore, we must define readabilty, not as a boolean entity, but as a scale. Although truly unreadable programs exist, there are many "readable" programs that are less readable than other programs. Therefore, perhaps the following definition is more realistic:

A readable program is one that consists of one or more modules. A competent program should be able to pick a given module in that program and achieve an 80% comprehension level by expending no more than an average of one minute for each statement in the program.

An 80% comprehension level means that the programmer can correct bugs in the program and add new features to the program without making mistakes due to a misunderstanding of the code at hand.

1.6 How to Achieve Readability

The "I'll know one when I see one" metric for readable programs provides a big hint concerning how one should write programs that are readable. As pointed out early, the "I'll know it when I see it" metric suggests that an individual will consider a program to be readable if it is very similar to (good) programs that this particular person has written. This suggests an important trait that readable programs must possess: consistency. If all programmers were to write programs using a consistent style, they'd find programs written by others to be similar to their own, and, therefore, easier to read. This single goal is the primary purpose of this paper - to suggest a consistent standard that everyone will follow.

Of course, consistency by itself is not good enough. Consistently bad programs are not particularly easy to read. Therefore, one must carefully consider the guidelines to use when defining an all-encompassing standard. The purpose of this paper is to create such a standard. However, don't get the impression that the material appearing in this document appears simply because it sounded good at the time or because of some personal preferences. The material in this paper comes from several software engineering texts on the subject (including Elements of Programming Style, Code Complete, and Writing Solid Code), nearly 20 years of personal assembly language programming experience, and a set of generic programming guidelines developed for Information Management Associates, Inc.

This document assumes consistent usage by its readers. Therefore, it concentrates on a lot of mechanical and psychological issues that affect the readability of a program. For example, uppercase letters are harder to read than lower case letters (this is a well-known result from psychology research). It takes longer for a human being to recognize uppercase characters, therefore, an average human being will take more time to read text written all in upper case. Hence, this document suggests that one should avoid the use of uppercase sequences in a program. Many of the other issues appearing in this document are in a similar vein; they suggest minor changes to the way you might write your programs that make it easier for someone to recognize some pattern in your code, thus aiding in comprehension.

1.7 How This Document is Organized

This document follows a top-down discussion of readability. It starts with the concept of a program. Then it discusses modules. From there it works its way down to procedures. Then it talks about individual statements. Beyond that, it talks about components that make up statements (e.g., instructions, names, and operators). Finally, this paper concludes by discussing some orthogonal issues.

Section Two discusses programs in general. It primarily discusses documentation that must accompany a program and the organization of source files. It also discusses, briefly, configuration management and source code control issues. Keep in mind that figuring out how to build a program (make, assemble, link, test, debug, etc.) is important. If your reader fully understands the "heapsort" algorithm you are using, but cannot build an executable module to run, they still do not fully understand your program.

Section Three discusses how to organize modules in your program in a logical fashion. This makes it easier for others to locate sections of code and organizes related sections of code together so someone can easily find important code and ignore unimportant or unrelated code while attempting to understand what your program does.

Section Four discusses the use of procedures within a program. This is a continuation of the theme in Section Three, although at a lower, more detailed, level.

Section Five discusses the program at the level of the statement. This (large) section provides the meat of this proposal. Most of the rules this paper presents appear in this section.

Section Six discusses those items that make up a statement (labels, names, instructions, operands, operators, etc.) This is another large section that presents a large number of rules one should follow when writing readable programs. This section discusses naming conventions, appropriateness of operators, and so on.

Section Seven discusses data types and other related topics.

Section Eight covers miscellaneous topics that the previous sections did not cover.

1.8 Guidelines, Rules, Enforced Rules, and Exceptions

Not all rules are equally important. For example, a rule that you check the spelling of all the words in your comments is probably less important than suggesting that the comments all be in English[7]. Therefore, this paper uses three designations to keep things straight: Guidelines, Rules, and Enforced Rules.

A Guideline is a suggestion. It is a rule you should follow unless you can verbally defend why you should break the rule. As long as there is a good, defensible, reason, you should feel no apprehension violated a guideline. Guidelines exist in order to encourage consistency in areas where there are no good reasons for choosing one methodology over another. You shouldn't violate a Guideline just because you don't like it -- doing so will make your programs inconsistent with respect to other programs that do follow the Guidline (and, therefore, harder to read -- however, you shouldn't lose any sleep because you violated a Guideline.

Rules are much stronger than Guidelines. You should never break a rule unless there is some external reason for doing so (e.g., making a call to a library routine forces you to use a bad naming convention). Whenever you feel you must violate a rule, you should verify that it is reasonable to do so in a peer review with at least two peers. Furthermore, you should explain in the program's comments why it was necessary to violate the rule. Rules are just that -- rules to be followed. However, there are certain situations where it may be necessary to violate the rule in order to satisfy external requirements or even make the program more readable.

Enforced Rules are the toughest of the lot. You should never violate an enforced rule. If there is ever a true need to do this, then you should consider demoting the Enforced Rule to a simple Rule rather than treating the violation as a reasonable alternative.

An Exception is exactly that, a known example where one would commonly violate a Guideline, Rule, or (very rarely) Enforced Rule. Although exceptions are rare, the old adage "Every rule has its exceptions..." certainly applies to this document. The Exceptions point out some of the common violations one might expect.

Of course, the categorization of Guidelines, Rules, Enforced Rules, and Exceptions herein is one man's opinion. At some organizations, this categorization may require reworking depending on the needs of that organization.

1.9 Source Language Concerns

This document will assume that the entire program is written in 80x86 assembly language. Although this organization is rare in commercial applications, this assumption will, in no way, invalidate these guidelines. Other guidelines exist for various high level languages (including a set written by this paper's author). You should adopt a reasonable set of guidelines for the other languages you use and apply these guidelines to the 80x86 assembly language modules in the program.

2.0 Program Organization

A source program generally consists of one or more source, object, and library files. As a project gets larger and the number of files increases, it becomes difficult to keep track of the files in a project. This is especially true if a number of different projects share a common set of source modules. This section will address these concerns.

2.1 Library Functions

A library, by its very nature, suggests stability. Ignoring the possibility of software defects, one would rarely expect the number or function of routines in a library to vary from project to project. A good example is the "UCR Standard Library for 80x86 Assembly Language Programmers." One would expect "printf" to behave identically in two different programs that use the Standard Library. Contrast this against two programs, each of which implement their own version of printf. One could not reasonably assume both programs have identical implementations[8]. This leads to the following rule:

Rule:: Library functions are those routines intended for common reuse in many different assembly language programs. All assembly language (callable) libraries on a system should exist as ".lib" files and should appear in the "/lib" or "/asmlib" subdirectory.
Guideline:: "/asmlib" is probably a better choice if you're using multiple languages since those other languages may need to put files in a "/lib" directory.
Exception:: It's probably reasonable to leave the UCR Standard Library's "stdlib.lib" file in the "/stdlib/lib" directory since most people expect it there.

The rule above ensures that the library files are all in one location so they are easy to find, modify, and review. By putting all your library modules into a single directory, you avoid configuration management problems such as having outdated versions of a library linking with one program and up-to-date versions linking with other programs.

2.2 Common Object Modules

This document defines a library as a collection of object modules that have wide application in many different programs. The UCR Standard Library is a typical example of a library. Some object modules are not so general purpose, but still find application in two or more different programs. Two major configuration management problems exist in this situation: (1) making sure the ".obj" file is up-to-date when linking it with a program; (2) Knowing which modules use the module so one can verify that changes to the module won't break existing code.

The following rules takes care of case one:

Rule:: If two different program share an object module, then the associated source, object, and makefiles for that module should appear in a subdirectory that is specific to that module (i.e., no other files in the subdirectory). The subdirectory name should be the same as the module name. If possible, you should create a set of link/alias/shortcuts to this subdirectory and place these links in the main directory of each of the projects that utilize the module. If links are not possible, you should place the module's subdirectory in the "/common" subdirectory.
Enforced Rule:: Every subdirectory containing one or more modules should have a make file that will automatically generate the appropriate, up-to-date, ".obj" files. An individual, a batch file, or another make file should be able to automatically generate new object modules (if necessary) by simply executing the make program.
Guideline:: Use Microsoft's nmake program. At the very least, use nmake acceptable syntax in your makefiles.

The other problem, noting which projects use a given module is much more difficult. The obvious solution, commenting the source code associated with the module to tell the reader which programs use the module, is impractical. Maintaining these comments is too error-prone and the comments will quickly get out of phase and be worse than useless -- they would be incorrect. A better solution is to create a dummy file using the module's name with a ".elw" (elsewhere) suffix and placing this file in the main subdirectory of each program that links the module. Now, using one of the venerable "whereis" programs, you can easily locate all projects that use the module.

Guideline:: If a project uses a module that is not local to the project's subdirectory, create a dummy file (using "TOUCH" or a comparable program) that uses the module's main name with a ".elw" suffix. This will allow someone to easily search for all the projects that use a common object module by using a "whereis" program.

2.3 Local Modules

Local modules are those that a single program/project uses. Typically, the source and object code for each module appears in the same directory as the other files associated with the project. This is a reasonable arrangement until the number of files increases to the point that it is difficult to find a file in a directory listing. At that point, most programmers begin reorganizing their directory by creating subdirectories to hold many of these source modules. However, the placement, name, and contents of these new subdirectories can have a big impact on the overall readability of the program. This section will address these issues.

The first issue to consider is the contents of these new subdirectories. Since programmers rummaging through this project in the future will need to easily locate source files in a project, it is important that you organize these new subdirectories so that it is easy to find the source files you are moving into them. The best organization is to put each source module (or a small group of strongly related modules) into its own subdirectory. The subdirectory should bear the name of the source module minus its suffix (or the main module if there is more than one present in the subdirectory). If you place two or more source files in the same directory, ensure this set of source files forms a cohesive set (meaning the source files contain code that solve a single problem). A discussion of cohesiveness appears later in this document.

Rule:: If a project directory contains too many files, try to move some of the modules to subdirectories within the project directory; give the subdirectory the same name as the source file without the suffix. This will nearly reduce the number of files in half. If this reduction is insufficient, try categorizing the source modules (e.g., FileIO, Graphics, Rendering, and Sound) and move these modules to a subdirectory bearing the name of the category.
Enforced Rule:: Each new subdirectory you create should have its own make file that will automatically assemble all source modules within that subdirectory, as appropriate.
Enforced Rule:: Any new subdirectories you create for these source modules should appear within the directory containing the project. The only excepts are those modules that are, or you anticipate, sharing with other projects. See "Common Object Modules" on page 13 for more details.

Stand-alone assembly language programs generally contain a "main" procedure - the first program unit that executes when the operating system loads the program into memory. For any programmer new to a project, this procedure is the anchor where one first begins reading the code and the point where the reader will continually refer. Therefore, the reader should be able to easily locate this source file. The following rule helps ensure this is the case:

Rule:: The source module containing the main program should have the same name as the executable (obviously the suffix will be different). For example, if the "Simulate 886" program's executable name is "Sim886.exe" then you should find the main program in the "Sim886.asm" source file.

Finding the souce file that contains the main program is one thing. Finding the main program itself can be almost as hard. Assembly language lets you give the main program any name you want. However, to make the main procedure easy to find (both in the source code and at the O/S level), you should actually name this program "main". See "Module Organization" on page 15 for more details about the placement of the main program.

Rule:: The name of the main procedure in an assembly language program should be "main".

2.4 Program Make Files

Every project, even if it contains only a single source module, should have an associated make file. If someone want to assemble your program, they should have to worry about what program (e.g., MASM) to use, what command line options to use, what library modules to use, etc. They should be able to type "nmake"[9] and wind up with an executable program. Even if assembling the program consists of nothing more than typing the name of the assembler and the source file, you should still have a make file. Someone else may not realize that's all that is necessary.

Enforced Rule:: The main project directory should contain a make file that will automatically generate an executable (or other expected object module) in response to a simple make/nmake command.
Rule:: If your project uses object modules that are not in the same subdirectory as the main program's module, you should test the ".obj" files for those modules and execute the corresponding make files in their directories if the object code is out of date. You can assume that library files are up to date.
Guideline:: Avoid using fancy "make" features. Most programmers only learn the basics about make and will not be able to understand what your make file is doing if you fully exploit the make language. Especially avoid the use of default rules since this can create havoc if someone arbitrarily adds or removes files from the directory containing the make file.

3.0 Module Organization

A module is a collection of objects that are logically related. Those objects may include constants, data types, variables, and program units (e.g., functions, procedures, etc.). Note that objects in a module need not be physically related. For example, it is quite possible to construct a module using several different source files. Likewise, it is quite possible to have several different modules in the same source file. However, the best modules are physically related as well as logically related; that is, all the objects associated with a module exist in a single source file (or directory if the source file would be too large) and nothing else is present.

Modules contain several different objects including constants, types, variables, and program units (routines). Modules shares many of the attributes with routines (program units); this is not surprising since routines are the major component of a typical module. However, modules have some additional attributes of their own. The following sections describe the attributes of a well-written module.

Note:: Unit and package are both synonyms for the term module.

3.1 Module Attributes

A module is a generic term that describes a set of program related objects (program units as well as data and type objects) that are somehow coupled. Good modules share many of the same attributes as good program units as well as the ability to hide certain details from code outside the module.

3.1.1 Module Cohesion

Modules exhibit the following different kinds of cohesion (listed from good to bad):

Functional or logical cohesion exists if the module accomplishes exactly one (simple) task.
Sequential or pipelined cohesion exists when a module does several sequential operations that must be performed in a certain order with the data from one operation being fed to the next in a "filter-like" fashion.
Global or communicational cohesion exists when a module performs a set of operations that make use of a common set of data, but are otherwise unrelated.
Temporal cohesion exists when a module performs a set of operations that need to be done at the same time (though not necessarily in the same order). A typical initialization module is an example of such code.
Procedural cohesion exists when a module performs a sequence of operations in a specific order, but the only thing that binds them together is the order in which they must be done. Unlike sequential cohesion, the operations do not share data.
State cohesion occurs when several different (unrelated) operations appear in the same module and a state variable (e.g., a parameter) selects the operation to execute. Typically such modules contain a case (switch) or if..elseif..elseif... statement.
No cohesion exists if the operations in a module have no apparent relationship with one another.

The first three forms of cohesion above are generally acceptable in a program. The fourth (temporal) is probably okay, but you should rarely use it. The last three forms should almost never appear in a program. For some reasonable examples of module cohesion, you should consult "Code Complete".

Guideline:: Design good modules! Good modules exhibit strong cohesion. That is, a module should offer a (small) group of services that are logically related. For example, a "printer" module might provide all the services one would expect from a printer. The individual routines within the module would provide the individual services.

3.1.2 Module Coupling

Coupling refers to the way that two modules communicate with one another. There are several criteria that define the level of coupling between two modules:

Cardinality- the number of objects communicated between two modules. The fewer objects the better (i.e., fewer parameters).
Intimacy- how "private" is the communication? Parameter lists are the most private form; private data fields in a class or object are next level; public data fields in a class or object are next, global variables are even less intimate, and passing data in a file or database is the least intimate connection. Well-written modules exhibit a high degree of intimacy.
Visibility- this is somewhat related to intimacy above. This refers to how visible the data is to the entire system that you pass between two modules. For example, passing data in a parameter list is direct and very visible (you always see the data the caller is passing in the call to the routine); passing data in global variables makes the transfer less visible (you could have set up the global variable long before the call to the routine). Another example is passing simple (scalar) variables rather than loading up a bunch of values into a structure/record and passing that structure/record to the callee.
Flexibility- This refers to how easy it is to make the connection between two routines that may not have been originally intended to call one another. For example, suppose you pass a structure containing three fields into a function. If you want to call that function but you only have three data objects, not the structure, you would have to create a dummy structure, copy the three values into the field of that structure, and then call the function. On the other hand, had you simply passed the three values as separate parameters, you could still pass in structures (by specifying each field) as well as call the function with separate values. The module containing this later function is more flexible.

A module is loosely coupled if its functions exhibit low cardinality, high intimacy, high visibility, and high flexibility. Often, these features are in conflict with one another (e.g., increasing the flexibility by breaking out the fields from a structures [a good thing] will also increase the cardinality [a bad thing]). It is the traditional goal of any engineer to choose the appropriate compromises for each individual circumstance; therefore, you will need to carefully balance each of the four attributes above.

A module that uses loose coupling generally contains fewer errors per KLOC (thousands of lines of code). Furthermore, modules that exhibit loose coupling are easier to reuse (both in the current and future projects). For more information on coupling, see the appropriate chapter in "Code Complete".

Guideline:: Design good modules! Good modules exhibit loose coupling. That is, there are only a few, well-defined (visible) interfaces between the module and the outside world. Most data is private, accessible only through accessor functions (see information hiding below). Furthermore, the interface should be flexible.
Guideline:: Design good modules! Good modules exhibit information hiding. Code outside the module should only have access to the module through a small set of public routines. All data should be private to that module. A module should implement an abstract data type. All interface to the module should be through a well-defined set of operations.

3.1.3 Physical Organization of Modules

Many languages provide direct support for modules (e.g., packages in Ada, modules in Modula-2, and units in Delphi/Pascal). Some languages provide only indirect support for modules (e.g., a source file in C/C++). Others, like BASIC, don't really support modules, so you would have to simulate them by physically grouping objects together and exercising some discipline. Assembly language falls into the middle ground. The primary mechanism for hiding names from other modules is to implement a module as an individual source file and publish only those names that are part of the module's interface to the outside world.

Rule:: Each module should completely reside in a single source file. If size considerations prevent this, then all the source files for a given module should reside in a subdirectory specifically designated for that module.

Some people have the crazy idea that modularization means putting each function in a separate source file. Such physical modularization generally impairs the readability of a program more than it helps. Strive instead for logical modularization, that is, defining a module by its actions rather than by source code syntax (e.g., separating out functions).

This document does not address the decomposition of a problem into its modular components. Presumably, you can already handle that part of the task. There are a wide variety of texts on this subject if you feel weak in this area.

3.1.4 Module Interface

In any language system that supports modules, there are two primary components of a module: the interface component that publicizes the module visible names and the implementation component that contains the actual code, data, and private objects. MASM (and most assemblers) uses a scheme that is very similar to the one C/C++ uses. There are directives that let you import and export names. Like C/C++, you could place these directives directly in the related source modules. However, such code is difficult to maintain (since you need to change the directives in every file whenever you modify a public name). The solution, as adopted in the C/C++ programming languages, is to use header files. Header files contain all the public definitions and exports (as well as common data type definitions and constant definitions). The header file provides the interface to the other modules that want to use the code present in the implementation module.

The MASM 6.x externdef directive is perfect for creating interface files. When you use externdef within a source module that defines a symbol, externdef behaves like the public directive, exporting the name to other modules. When you use externdef within a source modules that refers to an external name, externdef behaves like the extern (or extrn ) directive. This lets you place an externdef directive in a single file and include this file into both the modules that import and export the public names.

If you are using an assembler that does not support externdef, you should probably consider switching to MASM 6.x. If switching to a better assembler (that supports externdef) is not feasible, the last thing you want to do is have to maintain the interface information in several separate files. Instead, use the assembler's ifdef conditional assembly directives to assemble a set of public statements in the header file if a symbol with the module's name is defined prior to including the header file. It should assemble a set of extrn statements otherwise. Although you still have to maintain the public and external information in two places (in the ifdef true and false sections), they are in the same file and located near one another.

Rule:: Keep all module interface directives (public, extrn, extern, and externdef) in a single header file for a given module. Place any other common data type definitions and constant definitions in this header file as well.
Guideline:: There should only be a single header file associated with any one module (even if the module has multiple source files associated with it). If, for some reason, you feel it is necessary to have multiple header files associated with a module, you should create a single file that includes all of the other interface files. That way a program that wants to use all the header files need only include the single file.

When designing header files, make sure you can include a file more than once without ill effects (e.g., duplicate symbol errors). The traditional way to do this is to put an IFDEF statement like the following around all the statements in a header file:



; Module: MyHeader.a

                ifndef   MyHeader_A
MyHeader_A      =       0
                .
                .       ;Statements in this header file.
                .
                endif

The first time a source file includes "MyHeader.a" the symbol "MyHeader_A" is undefined. Therefore, the assembler will process all the statements in the header file. In successive include operations (during the same assembly) the symbol "MyHeader_A" is already defined, so the assembler ignores the body of the include file.

My would you ever include a file twice? Easy. Some header files may include other header files. By including the file "YourHeader.a" a module might also be including "MyHeader.a" (assuming "YourHeader.a" contains the appropriate include directive). Your main program, that includes "YourHeader.a" might also need "MyHeader.a" so it explicitly includes this file not realizing "YourHeader.a" has already processed "MyHeader.a" thereby causing symbol redefinitions.

Rule:: Always put an appropriate IFNDEF statement around all the definitions in a header file to allow multiple inclusion of the header file without ill effect.
Guideline:: Use the ".a" suffix for assembly language header/interface files.
Rule:: Include files for library functions on a system should exist as ".a" files and should appear in the "/include" or "/asminc" subdirectory.
Guideline:: "/asminc" is probably a better choice if you're using multiple languages since those other languages may need to put files in a "/include" directory.
Exception:: It's probably reasonable to leave the UCR Standard Library's "stdlib.a" file in the "/stdlib/include" directory since most people expect it there.

4.0 Program Unit Organization

A program unit is any procedure, function, coroutine, iterator, subroutine, subprogram, routine, or other term that describes a section of code that abstracts a set of common operations on the computer. This text will simply use the term procedure or routine to describe these concepts.

Routines are closely related to modules, since they tend to be the major component of a module (along with data, constants, and types). Hence, many of the attributes that apply to a module also apply to routines. The following paragraphs, at the expense of being redundant, repeat the earlier definitions so you don't have to flip back to the previous sections.

4.1 Routine Cohesion

Routines exhibit the following kinds of cohesion (listed from good to bad):

Functional or logical cohesion exists if the routine accomplishes exactly one (simple) task.
Sequential or pipelined cohesion exists when a routine does several sequential operations that must be performed in a certain order with the data from one operation being fed to the next in a "filter-like" fashion.
Global or communicational cohesion exists when a routine performs a set of operations that make use of a common set of data, but are otherwise unrelated.
Temporal cohesion exists when a routine performs a set of operations that need to be done at the same time (though not necessarily in the same order). A typical initialization routine is an example of such code.
Procedural cohesion exists when a routine performs a sequence of operations in a specific order, but the only thing that binds them together is the order in which they must be done. Unlike sequential cohesion, the operations do not share data.
State cohesion occurs when several different (unrelated) operations appear in the same module and a state variable (e.g., a parameter) selects the operation to execute. Typically such routines contain a case (switch) or if..elseif..elseif... statement.
No cohesion exists if the operations in a routine have no apparent relationship with one another.

The first three forms of cohesion above are generally acceptable in a program. The fourth (temporal) is probably okay, but you should rarely use it. The last three forms should almost never appear in a program. For some reasonable examples of routine cohesion, you should consult "Code Complete".

Guideline:: All routines should exhibit good cohesiveness. Functional cohesiveness is best, followed by sequential and global cohesiveness. Temporal cohesiveness is okay on occasion. You should avoid the other forms.

4.1.1 Routine Coupling

Coupling refers to the way that two routines communicate with one another. There are several criteria that define the level of coupling between two routines:

Cardinality- the number of objects communicated between two routines. The fewer objects the better (i.e., fewer parameters).
Intimacy- how "private" is the communication? Parameter lists are the most private form; private data fields in a class or object are next level; public data fields in a class or object are next, global variables are even less intimate, and passing data in a file or database is the least intimate connection. Well-written routines exhibit a high degree of intimacy.
Visibility- this is somewhat related to intimacy above. This refers to how visible the data is to the entire system that you pass between two routines. For example, passing data in a parameter list is direct and very visible (you always see the data the caller is passing in the call to the routine); passing data in global variables makes the transfer less visible (you could have set up the global variable long before the call to the routine). Another example is passing simple (scalar) variables rather than loading up a bunch of values into a structure/record and passing that structure/record to the callee.
Flexibility- This refers to how easy it is to make the connection between two routines that may not have been originally intended to call one another. For example, suppose you pass a structure containing three fields into a function. If you want to call that function but you only have three data objects, not the structure, you would have to create a dummy structure, copy the three values into the field of that structure, and then call the routine. On the other hand, had you simply passed the three values as separate parameters, you could still pass in structures (by specifying each field) as well as call the routine with separate values.

A function is loosely coupled if it exhibits low cardinality, high intimacy, high visibility, and high flexibility. Often, these features are in conflict with one another (e.g., increasing the flexibility by breaking out the fields from a structures [a good thing] will also increase the cardinality [a bad thing]). It is the traditional goal of any engineer to choose the appropriate compromises for each individual circumstance; therefore, you will need to carefully balance each of the four attributes above.

A program that uses loose coupling generally contains fewer errors per KLOC (thousands of lines of code). Furthermore, routines that exhibit loose coupling are easier to reuse (both in the current and future projects). For more information on coupling, see the appropriate chapter in "Code Complete".

Guideline:: Coupling between routines in source code should be loose.

4.1.2 Routine Size

Sometime in the 1960's, someone decided that programmers could only look at one page in a listing at a time, therefore routines should be a maximum of one page long (66 lines, at the time). In the 1970's, when interactive computing became popular, this was adjusted to 24 lines -- the size of a terminal screen. In fact, there is very little empirical evidence to suggest that small routine size is a good attribute. In fact, several studies on code containing artificial constraints on routine size indicate just the opposite -- shorter routines often contain more bugs per KLOC[10].

A routine that exhibits functional cohesiveness is the right size, almost regardless of the number of lines of code it contains. You shouldn't artificially break up a routine into two or more subroutines (e.g., sub_partI and sub_partII) just because you feel a routine is getting to be too long. First, verify that your routine exhibits strong cohesion and loose coupling. If this is the case, the routine is not too long. Do keep in mind, however, that a long routine is probably a good indication that it is performing several actions and, therefore, does not exhibit strong cohesion.

Of course, you can take this too far. Most studies on the subject indicate that routines in excess of 150-200 lines of code tend to contain more bugs and are more costly to fix than shorter routines. Note, by the way, that you do not count blank lines or lines containing only comments when counting the lines of code in a program.

Also note that most studies involving routine size deal with HLLs. A comparable assembly language routine will contain more lines of code than the corresponding HLL routine. Therefore, you can expect your routines in assembly language to be a little longer.

Guideline:: Do not let artificial constraints affect the size of your routines. If a routine exceeds about 200-250 lines of code, make sure the routine exhibits functional or sequential cohesion. Also look to see if there aren't some generic subsequences in your code that you can turn into stand alone routines.
Rule:: Never shorten a routine by dividing it into n parts that you would always call in the appropriate sequence as a way of shortening the original routine.

4.2 Placement of the Main Procedure and Data

As noted earlier, you should name the main procedure main and place it in the source file bearing the same name as the executable file. If this module is rather long, it can still be difficult to locate the main program. A good solution is to always place the main procedure at the same point in the source file. By convention (meaning everyone expects it this way), most programmers make their main program the first or last procedure in an assembly language program. Either position is fine. Putting the main program anywhere else makes it hard to find.

Rule:: Always make the main procedure the first or last procedure in a source file.

MASM, because it is a multiphase assembler, does not require that you define a symbol before you use it. This is necessary because many instructions (like JMP) need to refer to symbols found later in the program. In a similar manner, MASM doesn't really care where you define your data - before or after its use[11]. However, most programmers "grew up" with high level languages that require the definition of a symbol before its first use. As a result, they expect to be able to find a variable declaration by looking backwards in the source file. Since everyone expects this, it is a good tradition to continue in an assembly language program.

Rule:: You should declare all variables, constants, and macros prior to their use in an assembly language program.
Rule:: You should define all static variables (those you declare in a segment) at the beginning of the source module.

5.0 Statement Organization

In an assembly language program, the author must work extra hard to make a program readable. By following a large number of rules, you can produce a program that is readable. However, by breaking a single rule no matter how many other rules you've followed, you can render a program unreadable. Nowhere is this more true than how you organize the statements within your program. Consider the following example taken from "The Art of Assembly Language Programming":



______________________________________________________
                mov     ax, 0
                mov     bx, ax
                add     ax, dx
                mov     cx, ax
 ______________________________________________________
mov             ax,                     0
          mov bx,                      ax
    add               ax, dx
                        mov             cx, ax
______________________________________________________

While this is an extreme example, do note that it only takes a few mistakes to have a large impact on the readability of a program. Consider (a short section from) an example presented earlier:



GetFileRecords:
    mov dx, OFFSET DTA          ;Set up DTA
    mov ah, 1Ah
    int 21h
    mov dx, FILESPEC            ;Get first file name
    mov cl, 37h
    mov ah, 4Eh
    int 21h
    jnc FileFound               ;No files.  Try a different filespec.
    mov si, OFFSET NoFilesMsg
    call Error
    jmp NewFilespec
FileFound:
    mov di, OFFSET fileRecords  ;DI -> storage for file names
    mov bx, OFFSET files        ;BX -> array of files
    sub bx, 2

Improved version:



GetFileRecords: mov     dx, offset DTA          ;Set up DTA
                DOS     SetDTA

                mov     dx, FileSpec
                mov     cl, 37h
                DOS     FindFirstFile
                jc      FileNotFound

                mov     di, offset fileRecords  ;DI -> storage for file
names
                mov     bx, offset files        ;BX -> array of files
                sub     bx, 2                   ;Special case for 1st
iteration

An assembly language statement consists of four possible fields: a label field, a mnemonic field, an operand field, and a comment file. The mnemonic and comment fields are always optional. The label field is generally optional although certain instructions (mnemonics) do not allow labels while others require labels. The operand field's presence is tied to the mnemonic field. For most instructions the actual mnemonic determines whether an operand field must be present.

MASM is a free-form assembler insofar as it does not require these fields to appear in any particular column[12]. However, the freedom to arrange these columns in any manner is one of the primary contributors to hard to read assembly language programs. Although MASM lets you enter your programs in free-form, there is absolutely no reason you cannot adopt a fixed field format, always starting each field in the same column. Doing so generally helps make an assembly language program much easier to read. Here are the rules you should use:

Rule:: If an identifier is present in the label field, always start that identifier in column one of the source line.
Rule:: All mnemonics should start in the same column. Generally, this should be column 17 (the second tab stop) or some other convenient position.
Rule:: All operands should start in the same column. Generally, this should be column 25 (the third tab stop) or some other convenient position.
Exception:: If a mnemonic (typically a macro) is longer than seven characters and requires an operand, you have no choice but to start the operand field beyond column 25 (this is an exception assuming you've chosen columns 17 and 25 for your mnemonic and operand fields, respectively).
Guideline:: Try to always start the comment fields on adjacent source lines in the same column (note that it is impractical to always start the comment field in the same column throughout a program).

Most people learn a high level language prior to learning assembly language. They have been firmly taught that readable (HLL) programs have their control structures properly indented to show the structure of the program. Indentation works great when you have a block structured language. Assembly language, however, is the original unstructured language and indentation rules for a structured programming language simply do not apply. While it is important to be able to denote that a certain sequence of instructions is special (e.g., belong to the "then" portion of an if..then..else..endif statement), indentation is not the proper way to achieve this in an assembly language program.

If you need to set off a sequence of statements from surrounding code, the best thing you can do is use blank lines in your source code. For a small amount of detachment, to separate one computation from another for example, a single blank line is sufficient. To really show that one section of code is special, use two, three, or even four blank lines to separate one block of statements from the surrounding code. To separate two totally unrelated sections of code, you might use several blank lines and a row of dashes or asterisks to separate the statements. E.g.,



                mov     dx, FileSpec
                mov     cl, 37h
                DOS     FindFirstFile
                jc      FileNotFound

;     *********************************************

                mov     di, offset fileRecords  ;DI -> storage for file
names
                mov     bx, offset files        ;BX -> array of files
                sub     bx, 2                   ;Special case for 1st
iteration

Guideline:: Use blank lines to separate special blocks of code from the surrounding code. Use an aesthetic looking row of asterisks or dashes if you need a stronger separation between two blocks of code (do not overdo this, however).

If two sequences of assembly language statements correspond to roughly two HLL statements, it's generally a good idea to put a blank line between the two sequences. This helps clarify the two segments of code in the reader's mind. Of course, it is easy to get carried away and insert too much white space in a program, so use some common sense here.

Guideline:: If two sequences of code in assembly language correspond to two adjacent statements in a HLL, then use a blank line to separate those two assembly sequences (assuming the sequences are real short).

A common problem in any language (not just assembly language) is a line containing a comment that is adjacent to one or two lines containing code. Such a program is very difficult read because it is hard to determine where the code ends and the comment begins (or vice-versa). This is especially true when the comments contain sample code. It is often quite difficult to determine if what you're looking at is code or comments; hence the following enforced rule:

Enforced Rule:: Always put at least one blank line between code and comments (assuming, of course, the comment is sitting only a line by itself; that is, it is not an endline comment[13]).

6.0 Comments

Comments in an assembly language program generally come in two forms: endline comments and standalone comments[14]. As their names suggest, endline lines comments always occur at the end of a source statement and standalone comments sit on a line by themselves[15]. These two types of comments have distinct purposes, this section will explore their use and describe the attributes of a well-commented program.

6.1 What is a Bad Comment?

It is amazing how many programmers claim their code is well-commented. Were you to count characters between (or after) the comment delimiters, they might have a point. Consider, however, the following comment:



                mov   ax, 0     ;Set AX to zero.

Quite frankly, this comment is worse than no comment at all. It doesn't tell the reader anything the instruction itself doesn't tell and it requires the reader to take some of his or her precious time to figure out that the comment is worthless. If someone cannot tell that this instruction is setting AX to zero, they have no business reading an assembly language program. This brings up the first guideline of this section:

Guideline:: Choose an intended audience for your source code and write the comments to that audience. For assembly language source code, you can usually assume the target audience are those who know a reasonable amount of assembly language.

Don't explain the actions of an assembly language instruction in your code unless that instruction is doing something that isn't obvious (and most of the time you should consider changing the code sequence if it isn't obvious what is going on). Instead, explain how that instruction is helping to solve the problem at hand. The following is a much better comment for the instruction above:



                mov   ax, 0     ;AX is the resulting sum.  Initialize it.

Note that the comment does not say "Initialize it to zero." Although there would be nothing intrinsically wrong with saying this, the phrase "Initialize it" remains true no matter what value you assign to AX. This makes maintaining the code (and comment) much easier since you don't have to change the comment whenever you change the constant associated with the instruction.

Guideline:: Write your comments in such a way that minor changes to the instruction do not require that you change the corresponding comment.

Note: Although a trivial comment is bad (indeed, worse than no comment at all), the worst comment a program can have is one that is wrong. Consider the following statement:



                mov   ax, 1     ;Set AX to zero.

It is amazing how long a typical person will look at this code trying to figure out how on earth the program sets AX to zero when it's obvious it does not do this. People will always believe comments over code. If there is some ambiguity between the comments and the code, they will assume that the code is tricky and that the comments are correct. Only after exhausting all possible options is the average person likely to concede that the comment must be incorrect.

Enforced Rule:: Never allow incorrect comments in your program.

This is another reason not to put trivial comments like "Set AX to zero" in your code. As you modify the program, these are the comments most likely to become incorrect as you change the code and fail to keep the comments in sync. However, even some non-trivial comments can become incorrect via changes to the code. Therefore, always follow this rule:

Enforced Rule:: Always update all comments affected by a code change immediately after making the code change.

Undoubtedly you've heard the phrase "make sure you comment your code as though someone else wrote it for you; otherwise in six months you'll wish you had." This statement encompasses two concepts. First, don't ever think that your understanding of the current code will last. While working on a given section of a program you're probably investing considerable thought and study to figure out what's going on. Six months down the road, however, you will have forgotten much of what you figured out and the comments can go a long way to getting you back up to speed quickly. The second point this code makes is the implication that others read and write code too. You will have to read someone else's code, they will have to read yours. If you write the comments the way you would expect others to write it for you, chances are pretty good that your comments will work for them as well.

Rule:: Never use racist, sexist, obscene, or other exceptionally politically incorrect language in your comments. Undoubtedly such language in your comments will come back to embarass you in the future. Furthermore, it's doubtful that such language would help someone better understand the program.

It's much easier to give examples of bad comments than it is to discuss good comments. The following list describes some of the worst possible comments you can put in a program (from worst up to barely tolerable):

The absolute worst comment you can put into a program is an incorrect comment. Consider the following assembly statement:



		mov ax, 10;  { Set AX to 11 }

It is amazing how many programmers will automatically assume the comment is correct and try to figure out how this code manages to set the variable "A" to the value 11 when the code so obviously sets it to 10.
The second worst comment you can place in a program is a comment that explains what a statement is doing. The typical example is something like "mov ax, 10; { Set 'A' to 10 }". Unlike the previous example, this comment is correct. But it is still worse than no comment at all because it is redundant and forces the reader to spend additional time reading the code (reading time is directly proportional to reading difficulty). This also makes it harder to maintain since slight changes to the code (e.g., "mov ax, 9") requires modifications to the comment that would not otherwise be required.
The third worst comment in a program is an irrelevant one. Telling a joke, for example, may seem cute, but it does little to improve the readability of a program; indeed, it offers a distraction that breaks concentration.
The fourth worst comment is no comment at all.
The fifth worst comment is a comment that is obsolete or out of date (though not incorrect). For example, comments at the beginning of the file may describe the current version of a module and who last worked on it. If the last programmer to modify the file did not update the comments, the comments are now out of date.

6.2 What is a Good Comment?

Steve McConnell provides a long list of suggestions for high-quality code. These suggestions include:

Use commenting styles that don't break down or discourage modification. Essentially, he's saying pick a commenting style that isn't so much work people refuse to use it. He gives an example of a block of comments surrounded by asterisks as being hard to maintain. This is a poor example since modern text editors will automatically "outline" the comments for you. Nevertheless, the basic idea is sound.
Comment as you go along. If you put commenting off until the last moment, then it seems like another task in the software development process always comes along and management is likely to discourage the completion of the commenting task in hopes of meeting new deadlines.
Avoid self-indulgent comments. Also, you should avoid sexist, profane, or other insulting remarks in your comments. Always remember, someone else will eventually read your code.
Avoid putting comments on the same physical line as the statement they describe. Such comments are very hard to maintain since there is very little room. McConnell suggests that endline comments are okay for variable declarations. For some this might be true but many variable declarations may require considerable explanation that simply won't fit at the end of a line. One exception to this rule is "maintenance notes." Comments that refer to a defect tracking entry in the defect database are okay (note that the CodeWright text editor provides a much better solution for this -- buttons that can bring up an external file). Of course, endline comments are marginally more useful in assembly language than in the HLLs that McConnell addresses, but the basic idea is sound.
Write comments that describe blocks of statements rather than individual statements. Comments covering single statements tend to discuss the mechanics of that statement rather than discussing what the program is doing.
Focus paragraph comments on the why rather than the how. Code should explain what the program is doing and why the programmer chose to do it that way rather than explain what each individual statement is doing.
Use comments to prepare the reader for what is to follow. Someone reading the comments should be able to have a good idea of what the following code does without actually looking at the code. Note that this rule also suggests that comments should always precede the code to which they apply.
Make every comment count. If the reader wastes time reading a comment of little value, the program is harder to read; period.
Document surprises and tricky code. Of course, the best solution is not to have any tricky code. In practice, you can't always achieve this goal. When you do need to restore to some tricky code, make sure you fully document what you've done.
Avoid abbreviations. While there may be an argument for abbreviating identifiers that appear in a program, no way does this apply to comments.
Keep comments close to the code they describe. The prologue to a program unit should give its name, describe the parameters, and provide a short description of the program. It should not go into details about the operation of the module itself. Internal comments should to that.
Comments should explain the parameters to a function, assertions about these parameters, whether they are input, output, or in/out parameters.
Comments should describe a routine's limitations, assumptions, and any side effects.

Rule:: All comments will be high-quality comments that describe the actions of the surrounding code in a concise manner

6.3 Endline vs. Standalone Comments

Guideline:: Whenever a comment appears on a line by itself, always put the semicolon in column one. You may indent the text if this is appropriate or aesthetic.
Guideline:: Adjacent lines of comments should not have any interspersed blank lines. A blank comment line should, at least, contain a semicolon in column one.

The guidline above suggests that your code should look like this:



; This is a comment with a blank line between it and the next comment.
;
; This is another line with a comment on it.

Rather than like this:



; This is a comment with a blank line between it and the next comment.

; This is another line with a comment on it.

The semicolon appearing between the two statements suggest continuity that is not present when you remove the semicolon. If two blocks of comments are truly separate and whitespace between them is appropriate, you should consider separating them by a large number of blank lines to completely eliminate any possible association between the two.

Standalone comments are great for describing the actions of the code that immediately follows. So what are endline comments useful for? Endline comments can explain how a sequence of instructions are implimenting the algorithm described in a previous set of standalone comments. Consider the following code:



; Compute the transpose of a matrix using the algorithm:
;
;       for i := 0 to 3 do
;               for j := 0 to 3 do
;                       swap( a[i][j], b[j][i] );

                forlp   i, 0, 3

                forlp   j, 0, 3

                mov     bx, i           ;Compute address of a[i][j] using
                shl     bx, 2           ; row major ordering (i*4 + j)*2.
                add     bx, j
                add     bx, bx
                lea     bx, a[bx]
                push    bx              ;Push address of a[i][j] onto
stack.

                mov     bx, j           ;Compute address of b[j][i] using
                shl     bx, 2           ;row major ordering (j*4 + i)*2.
                add     bx, i
                add     bx, bx
                lea     bx, b[bx]
                push    bx              ;Push address of b[j][i] onto
stack.

                call    swap            ;Swap a[i][j] with b[j][i].

                next
                next

Note that the block comments before this sequence explain, in high level terms, what the code is doing. The endline comments explain how the statement sequence implements the general algorithm. Note, however, that the endline comments do not explain what each statement is doing (at least at the machine level). Rather than claiming "add bx, bx" is multiplying the quantity in BX by two, this code assumes the reader can figure that out for themselves (any reasonable assembly programmer would know this). Once again, keep in mind your audience and write your comments for them.

6.4 Unfinished Code

Often it is the case that a programmer will write a section of code that (partially) accomplishes some task but needs further work to complete a feature set, make it more robust, or remove some known defect in the code. It is common for such programmers to place comments into the code like "This needs more work," "Kludge ahead," etc. The problem with these comments is that they are often forgotten. It isn't until the code fails in the field that the section of code associated with these comments is found and their problems corrected.

Ideally, one should never have to put such code into a program. Of course, ideally, programs never have any defects in them, either. Since such code inevitably finds its way into a program, it's best to have a policy in place to deal with it, hence this section.

Unfinished code comes in five general categories: non-functional code, partially functioning code, suspect code, code in need of enhancement, and code documentation. Non-functional code might be a stub or driver that needs to be replaced in the future with actual code or some code that has severe enough defects that it is useless except for some small special cases. This code is really bad, fortunately its severity prevents you from ignoring it. It is unlikely anyone would miss such a poorly constructed piece of code in early testing prior to release.

Partially functioning code is, perhaps, the biggest problem. This code works well enough to pass some simple tests yet contains serious defects that should be corrected. Moreover, these defects are known. Software often contains a large number of unknown defects; it's a shame to let some (prior) known defects ship with the product simply because a programmer forgot about a defect or couldn't find the defect later.

Suspect code is exactly that- code that is suspicious. The programmer may not be aware of a quantifiable problem but may suspect that a problem exists. Such code will need a later review in order to verify whether it is correct.

The fourth category, code in need of enhancement, is the least serious. For example, to expedite a release, a programmer might choose to use a simple algorithm rather than a complex, faster algorithm. S/he could make a comment in the code like "This linear search should be replaced by a hash table lookup in a future version of the software." Although it might not be absolutely necessary to correct such a problem, it would be nice to know about such problems so they can be dealt with in the future.

The fifth category, documentation, refers to changes made to software that will affect the corresponding documentation (user guide, design document, etc.). The documentation department can search for these defects to bring existing documentation in line with the current code.

This standard defines a mechanism for dealing with these five classes of problems. Any occurrence of unfinished code will be preceded by a comment that takes one of the following forms (where "_" denotes a single space):



;_#defect#severe_;
;_#defect#functional_;
;_#defect#suspect_;
;_#defect#enhancement_;
;_#defect#documentation_;

It is important to use all lower case and verify the correct spelling so it is easy to find these comments using a text editor search or a tool like grep. Obviously, a separate comment explaining the situation must follow these comments in the source code.

Examples:




; #defect#suspect ;
; #defect#enhancement ;
; #defect#documentation ;

Notice the use of comment delimiters (the semicolon) on both sides even though assembly language, doesn't require them.

Enforced Rule:: If a module contains some defects that cannot be immediately removed because of time or other constraints, the program will insert a standardized comment before the code so that it is easy to locate such problems in the future. The five standardized comments are ";_#defect#severe_;, ";_#defect#functional_;", ";_#defect#suspect_;", ";_#defect#enhancement_;", and ";_#defect#documentation_;" where "_" denotes a single space. The spelling and spacing should be exact so it is easy to search for these strings in the source tree.

6.5 Cross References in Code to Other Documents

In many instances a section of code might be intrinsically tied to some other document. For example, you might refer the reader to the user document or the design document within your comments in a program. This document proposes a standard way to do this so that it is relatively easy to locate cross references appearing in source code. The technique is similar to that for defect reporting, except the comments take the form:



		;  text #link#location text ;

"Text" is optional and represents arbitrary text (although it is really intended for embedding html commands to provide hyperlinks to the specified document). "Location" describes the document and section where the associated information can be found.



Examples:

; #link#User's Guide Section 3.1 ;
; #link#Program Design Document, Page 5 ;
; #link#Funcs.pas module, "xyz" function ;
; <A HREF="DesignDoc.html#xyzfunc"> #link#xyzfunc
</a> ;

Guideline:: If a module contains some cross references to other documents, there should be a comment that takes the form "; text #link#location text ;" that provides the reference to that other document. In this comment "text" represents some optional text (typically reserved for html tags) and "location" is some descriptive text that describes the document (and a position in that document) related to the current section of code in the program.

7.0 Names, Instructions, Operators, and Operands

Although program features like good comments, proper spacing of statements, and good modularization can help yield programs that are more readable; ultimately, a programmer must read the instructions in a program to understand what it does. Therefore, do not underestimate the importance of making your statements as readable as possible. This section deals with this issue.

7.1 Names

According to studies done at IBM, the use of high-quality identifiers in a program contributes more to the readability of that program than any other single factor, including high-quality comments. The quality of your identifiers can make or break your program; program with high-quality identifiers can be very easy to read, programs with poor quality identifiers will be very difficult to read. There are very few "tricks" to developing high-quality names; most of the rules are nothing more than plain old-fashion common sense. Unfortunately, programmers (especially C/C++ programmers) have developed many arcane naming conventions that ignore common sense. The biggest obstacle most programmers have to learning how to create good names is an unwillingness to abandon existing conventions. Yet their only defense when quizzed on why they adhere to (existing) bad conventions seems to be "because that's the way I've always done it and that's the way everybody else does it."

The aforementioned researchers at IBM developed several programs with the following set of attributes:

Bad comments, bad names
Bad comments, good names
Good comments, bad names
Good comments, good names

As should be obvious, the programs that had bad comments and names were the hardest to read; likewise, those programs with good comments and names were the easiest to read. The surprising results concerned the other two cases. Most people assume good comments are more important than good names in a program. Not only did IBM find this to be false, they found it to be really false.

As it turns out, good names are even more important that good comments in a program. This is not to say that comments are unimportant, they are extremely important; however, it is worth pointing out that if you spend the time to write good comments and then choose poor names for your program's identifiers, you've damaged the readability of your program despite the work you've put into your comments. Quickly read over the following code:



                mov     ax, SignedValue
                cwd
                add     ax, -1
                rcl     dx, 1
                mov     AbsoluteValue, dx

Question: What does this code compute and store in the AbsoluteValue variable?

The sign extension of SignedValue.
The negation of SignedValue.
The absolute value of SignedValue.
A boolean value indicating that the result is positive or negative.
Signum(SignedValue) (-1, 0, +1 if neg, zero, pos).
Ceil(SignedValue)
Floor(SignedValue)

The obvious answer is the absolute value of SignedValue. This is also incorrect. The correct answer is signum:



                mov     ax, SignedValue ;Get value to check.
                cwd                     ;DX = FFFF if neg, 0000 otherwise.
                add     ax, 0ffffh      ;Carry=0 if ax is zero, one
otherwise.
                rcl     dx, 1           ;DX = FFFF if AX is neg, 0 if
ax=0,
                mov     Signum, dx      ; 1 if ax>0.

Granted, this is a tricky piece of code[16]. Nonetheless, even without the comments you can probably figure out what the code sequence does even if you can't figure out how it does it:



                mov     ax, SignedValue
                cwd
                add     ax, 0ffffh
                rcl     dx, 1
                mov     Signum, dx

Based on the names alone you can probably figure out that this code computes the signum function. This is the "understanding 80% of the code" referred to earlier. Note that you don't need misleading names to make this code unphathomable. Consider the following code that doesn't trick you by using misleading names:



                mov     ax, x
                cwd
                add     ax, 0ffffh
                rcl     dx, 1
                mov     y, dx

This is a very simple example. Now imagine a large program that has many names. As the number of names increase in a program, it becomes harder to keep track of them all. If the names themselves do not provide a good clue to the meaning of the name, understanding the program becomes very difficult.

Enforced Rule:: All identifiers appearing in an assembly language program must be descriptive names whose meaning and use are clear.

Since labels (i.e., identifiers) are the target of jump and call instructions, a typical assembly language program will have a large number of identifiers. Therefore, it is tempting to begin using names like "label1, label2, label3, ..." Avoid this temptation! There is always a reason you are jumping to some spot in your code. Try to describe that reason and use that description for your label name.

Rule:: Never use names like "Lbl0, Lbl1, Lbl2, ..." in your program.

7.1.1 Naming Conventions

Naming conventions represent one area in Computer Science where there are far too many divergent views (program layout is the other principle area). The primary purpose of an object's name in a programming language is to describe the use and/or contents of that object. A secondary consideration may be to describe the type of the object. Programmers use different mechanisms to handle these objectives. Unfortunately, there are far too many "conventions" in place, it would be asking too much to expect any one programmer to follow several different standards. Therefore, this standard will apply across all languages as much as possible.

The vast majority of programmers know only one language - English. Some programmers know English as a second language and may not be familiar with a common non-English phrase that is not in their own language (e.g., rendezvous). Since English is the common language of most programmers, all identifiers should use easily recognizable English words and phrases.

Rule:: All identifiers that represent words or phrases must be English words or phrases.

7.1.2 Alphabetic Case Considerations

A case-neutral identifier will work properly whether you compile it with a compiler that has case sensitive identifiers or case insensitive identifiers. In practice, this means that all uses of the identifiers must be spelled exactly the same way (including case) and that no other identifier exists whose only difference is the case of the letters in the identifier. For example, if you declare an identifier "ProfitsThisYear" in Pascal (a case-insensitive language), you could legally refer to this variable as "profitsThisYear" and "PROFITSTHISYEAR". However, this is not a case-neutral usage since a case sensitive language would treat these three identifiers as different names. Conversely, in case-sensitive languages like C/C++, it is possible to create two different identifiers with names like "PROFITS" and "profits" in the program. This is not case-neutral since attempting to use these two identifiers in a case insensitive language (like Pascal) would produce an error since the case-insensitive language would think they were the same name.

Enforced Rule:: All identifiers must be "case-neutral."

Different programmers (especially in different languages) use alphabetic case to denote different objects. For example, a common C/C++ coding convention is to use all upper case to denote a constant, macro, or type definition and to use all lower case to denote variable names or reserved words. Prolog programmers use an initial lower case alphabetic to denote a variable. Other comparable coding conventions exist. Unfortunately, there are so many different conventions that make use of alphabetic case, they are nearly worthless, hence the following rule:

Rule:: You should never use alphabetic case to denote the type, classification, or any other program-related attribute of an identifier (unless the language's syntax specifically requires this).

There are going to be some obvious exceptions to the above rule, this document will cover those exceptions a little later. Alphabetic case does have one very useful purpose in identifiers - it is useful for separating words in a multi-word identifier; more on that subject in a moment.

To produce readable identifiers often requires a multi-word phrase. Natural languages typically use spaces to separate words; we can not, however, use this technique in identifiers. Unfortunatelywritingmultiwordidentifiers makesthemalmostimpossibletoreadifyoudonotdosomethingtodistiguishtheindividualwords (Unfortunately writing multiword identifiers makes them almost impossible to read if you do not do something to distinguish the individual words). There are a couple of good conventions in place to solve this problem. This standard's convention is to capitalize the first alphabetic character of each word in the middle of an identifier.

Rule:: Capitalize the first letter of interior words in all multi-word identifiers.

Note that the rule above does not specify whether the first letter of an identifier is upper or lower case. Subject to the other rules governing case, you can elect to use upper or lower case for the first symbol, although you should be consistent throughout your program.

Lower case characters are easier to read than upper case. Identifiers written completely in upper case take almost twice as long to recognize and, therefore, impair the readability of a program. Yes, all upper case does make an identifier stand out. Such emphasis is rarely necessary in real programs. Yes, common C/C++ coding conventions dictate the use of all upper case identifiers. Forget them. They not only make your programs harder to read, they also violate the first rule above.

Rule:: Avoid using all upper case characters in an identifier.

7.1.3 Abbreviations

The primary purpose of an identifier is to describe the use of, or value associated with, that identifier. The best way to create an identifier for an object is to describe that object in English and then create a variable name from that description. Variable names should be meaningful, concise, and non-ambiguous to an average programmer fluent in the English language. Avoid short names. Some research has shown that programs using identifiers whose average length is 10-20 characters are generally easier to debug than programs with substantially shorter or longer identifiers.

Avoid abbreviations as much as possible. What may seem like a perfectly reasonable abbreviation to you may totally confound someone else. Consider the following variable names that have actually appeared in commercial software:

NoEmployees, NoAccounts, pend

The "NoEmployees" and "NoAccounts" variables seem to be boolean variables indicating the presence or absence of employees and accounts. In fact, this particular programmer was using the (perfectly reasonable in the real world) abbreviation of "number" to indicate the number of employees and the number of accounts. The "pend" name referred to a procedure's end rather than any pending operation.

Programmers often use abbreviations in two situations: they're poor typists and they want to reduce the typing effort, or a good descriptive name for an object is simply too long. The former case is an unacceptable reason for using abbreviations. The second case, especially if care is taken, may warrant the occasional use of an abbreviation.

Guideline:: Avoid all identifier abbreviations in your programs. When necessary, use standardized abbreviations or ask someone to review your abbreviations. Whenever you use abbreviations in your programs, create a "data dictionary" in the comments near the names' definition that provides a full name and description for your abbreviation.

The variable names you create should be pronounceable. "NumFiles" is a much better identifier than "NmFls". The first can be spoken, the second you must generally spell out. Avoid homonyms and long names that are identical except for a few syllables. If you choose good names for your identifiers, you should be able to read a program listing over the telephone to a peer without overly confusing that person.

Rule:: All identifiers should be pronounceable (in English) without having to spell out more than one letter.

7.1.4 The Position of Components Within an Identifier

When scanning through a listing, most programmers only read the first few characters of an identifier. It is important, therefore, to place the most important information (that defines and makes this identifier unique) in the first few characters of the identifier. So, you should avoid creating several identifiers that all begin with the same phrase or sequence of characters since this will force the programmer to mentally process additional characters in the identifier while reading the listing. Since this slows the reader down, it makes the program harder to read.

Guideline:: Try to make most identifiers unique in the first few character positions of the identifier. This makes the program easier to read.
Corollary:: Never use a numeric suffix to differentiate two names.

Many C/C++ Programmers, especially Microsoft Windows programmers, have adopted a formal naming convention known as "Hungarian Notation." To quote Steve McConnell from Code Complete: "The term 'Hungarian' refers both to the fact that names that follow the convention look like words in a foreign language and to the fact that the creator of the convention, Charles Simonyi, is originally from Hungary." One of the first rules given concerning identifiers stated that all identifiers are to be English names. Do we really want to create "artificially foreign" identifiers? Hungarian notation actually violates another rule as well: names using the Hungarian notation generally have very common prefixes, thus making them harder to read.

Hungarian notation does have a few minor advantages, but the disadvantages far outweigh the advantages. The following list from Code Complete and other sources describes what's wrong with Hungarian notation:

Hungarian notation generally defines objects in terms of basic machine types rather than in terms of abstract data types.
Hungarian notation combines meaning with representation. One of the primary purposes of high level language is to abstract representation away. For example, if you declare a variable to be of type integer, you shouldn't have to change the variable's name just because you changed its type to real.
Hungarian notation encourages lazy, uninformative variable names. Indeed, it is common to find variable names in Windows programs that contain only type prefix characters, without an descriptive name attached.
Hungarian notation prefixes the descriptive name with some type information, thus making it harder for the programming to find the descriptive portion of the name.

Guideline:: Avoid using Hungarian notation and any other formal naming convention that attaches low-level type information to the identifier.

Although attaching machine type information to an identifier is generally a bad idea, a well thought-out name can successfully associate some high-level type information with the identifier, especially if the name implies the type or the type information appears as a suffix. For example, names like "PencilCount" and "BytesAvailable" suggest integer values. Likewise, names like "IsReady" and "Busy" indicate boolean values. "KeyCode" and "MiddleInitial" suggest character variables. A name like "StopWatchTime" probably indicates a real value. Likewise, "CustomerName" is probably a string variable. Unfortunately, it isn't always possible to choose a great name that describes both the content and type of an object; this is particularly true when the object is an instance (or definition of) some abstract data type. In such instances, some additional text can improve the identifier. Hungarian notation is a raw attempt at this that, unfortunately, fails for a variety of reasons.

A better solution is to use a suffix phrase to denote the type or class of an identifier. A common UNIX/C convention, for example, is to apply a "_t" suffix to denote a type name (e.g., size_t, key_t, etc.). This convention succeeds over Hungarian notation for several reasons including (1) the "type phrase" is a suffix and doesn't interfere with reading the name, (2) this particular convention specifies the class of the object (const, var, type, function, etc.) rather than a low level type, and (3) It certainly makes sense to change the identifier if it's classification changes.

Guideline:: If you want to differentiate identifiers that are constants, type definitions, and variable names, use the suffixes "_c", "_t", and "_v", respectively.
Rule:: The classification suffix should not be the only component that differentiates two identifiers.

Can we apply this suffix idea to variables and avoid the pitfalls? Sometimes. Consider a high level data type "button" corresponding to a button on a Visual BASIC or Delphi form. A variable name like "CancelButton" makes perfect sense. Likewise, labels appearing on a form could use names like "ETWWLabel" and "EditPageLabel". Note that these suffixes still suffer from the fact that a change in type will require that you change the variable's name. However, changes in high level types are far less common than changes in low-level types, so this shouldn't present a big problem.

7.1.5 Names to Avoid

Avoid using symbols in an identifier that are easily mistaken for other symbols. This includes the sets {"1" (one), "I" (upper case "I"), and "l" (lower case "L")}, {"0" (zero) and "O" (upper case "O")}, {"2" (two) and "Z" (upper case "Z")}, {"5" (five) and "S" (upper case "S")}, and ("6" (six) and "G" (upper case "G")}.

Guideline:: Avoid using symbols in identifiers that are easily mistaken for other symbols (see the list above).

Avoid misleading abbreviations and names. For example, FALSE shouldn't be an identifier that stands for "Failed As a Legitimate Software Engineer." Likewise, you shouldn't compute the amount of free memory available to a program and stuff it into the variable "Profits".

Rule:: Avoid misleading abbreviations and names.

You should avoid names with similar meanings. For example, if you have two variables "InputLine" and "InputLn" that you use for two separate purposes, you will undoubtedly confuse the two when writing or reading the code. If you can swap the names of the two objects and the program still makes sense, you should rename those identifiers. Note that the names do not have to be similar, only their meanings. "InputLine" and "LineBuffer" are obviously different but you can still easily confuse them in a program.

Rule:: Do not use names with similar meanings for different objects in your programs.

In a similar vein, you should avoid using two or more variables that have different meanings but similar names. For example, if you are writing a teacher's grading program you probably wouldn't want to use the name "NumStudents" to indicate the number of students in the class along with the variable "StudentNum" to hold an individual student's ID number. "NumStudents" and "StudentNum" are too similar.

Rule:: Do not use similar names that have different meanings.

Avoid names that sound similar when read aloud, especially out of context. This would include names like "hard" and "heart", "Knew" and "new", etc. Remember the discussion in the section above on abbreviations, you should be able to discuss your problem listing over the telephone with a peer. Names that sound alike make such discussions difficult.

Guideline:: Avoid homonyms in identifiers.

Avoid misspelled words in names and avoid names that are commonly misspelled. Most programmers are notoriously bad spellers (look at some of the comments in our own code!). Spelling words correctly is hard enough, remembering how to spell an identifier incorrectly is even more difficult. Likewise, if a word is often spelled incorrectly, requiring a programer to spell it correctly on each use is probably asking too much.

Guideline:: Avoid misspelled words and names that are often misspelled in identifiers.

If you redefine the name of some library routine in your code, another program will surely confuse your name with the library's version. This is especially true when dealing with standard library routines and APIs.

Enforced Rule:: Do not reuse existing standard library routine names in your program unless you are specifically replacing that routine with one that has similar semantics (i.e., don't reuse the name for a different purpose).

7.2 Instructions, Directives, and Pseudo-Opcodes

Your choice of assembly language sequences, the instructions themselves, and your choice of directives and pseudo-opcodes can have a big impact on the readability of your programs. The following subsections discuss these problems.

7.2.1 Choosing the Best Instruction Sequence

Like any language, you can solve a given problem using a wide variety of solutions involving different instruction sequences. As a continuing example, consider (again) the following code sequence:



                mov     ax, SignedValue ;Get value to check.
                cwd                     ;DX = FFFF if neg, 0000 otherwise.
                add     ax, 0ffffh      ;Carry=0 if ax is zero.
                rcl     dx, 1           ;DX = FFFF if AX is neg, 0 if AX=0,
                mov     Signum, dx      ; 1 if AX>0.

Now consider the following code sequence that also computes the signum function:



                mov     ax, SignedValue ;Get value to check.
                cmp     ax, 0           ;Check the sign.
                je      GotSignum       ;We're done if it's zero.
                mov     ax, 1           ;Assume it was positive.
                jns     GotSignum       ;Branch if it was positive.
                neg     ax              ;Else return -1 for negative
values.
GotSignum:      mov     Signum, ax

Yes, the second version is longer and slower. However, an average person can read the instruction sequence and figure out what it's doing; hence the second version is much easier to read than the first. Which sequence is best? Unless speed or space is an extremely critical factor and you can show that this routine is in the critical execution path, then the second version is obviously better. There is a time and a place for tricky assembly code; however, it's rare that you would need to pull tricks like this throughout your code.

So how does one choose appropriate instruction sequences when there are many possible ways to accomplish the same task? The best way is to ensure that you have a choice. Although there are many different ways to accomplish an operation, few people bother to consider any instruction sequence other than the first one that comes to mind. Unfortunatley, the "best" instruction sequence is rarely the first instruction sequence that comes to most people's minds[17]. In order to make a choice, you have to have a choice to make. That means you should create at least two different code sequences for a given operation if there is ever a question concerning the readability of your code. Once you have at least two versions, you can choose between them based on your needs at hand. While it is impractical to "write your program twice" so that you'll have a choice for every sequence of instructions in the program, you should apply this technique to particularly bothersome code sequences.

Guideline:: For particularly difficult to understand sections of code, try solving the problem several different ways. Then choose the most easily understood solution for actual incorporation into your program.

One problem with the above suggestion is that you're often too close to your own work to make decisions like "this code isn't too hard to understand, I don't have to worry about it." It is often a good idea to have someone else review your code and point out those sections they find hard to understand[18].

Guideline:: Take advantage of reviews to determine those sections of code in your program that may need to be rewritten to make them easier to understand.

7.2.2 Control Structures

Ralph Griswold[19] once said (roughly) the following about C, Pascal, and Icon: "C makes it easy to write hard to read programs[20], Pascal makes it hard to write hard to read programs, and Icon makes it easy to write easy to read programs." Assembly language can be summed up like this: "Assembly language makes it hard to write easy to read programs and easy to write hard to read programs." It takes considerable discipline to write readable assembly language programs; but it can be done. Sadly, most assembly code you find today is extremely poorly written. Indeed, that state of affairs is the whole reason for this document. Once you get past issues like comments and naming conventions, issues like program control flow and data structure design have among the largest impacts on program readability. Since most assembly languages lack structured control flow constructs, this is one area where undisciplined programmers can really show how poorly they can write their code. One need look no farther than the public domain code on the Internet, or at Microsoft's sample code for that matter[21], to see abundant examples of poorly written assembly language code.

Fortunately, with a little discipline it is possible to write readable assembly language programs. How you design your control structures can have a big impact on the readability of your programs. The best way to do this can be summed up in two words: avoid spaghetti.

Spaghetti code is the name given to a program that has a large number of intertwined branches and branch targets within a code sequence. Consider the following example:



                jmp     L1
L1:             mov     ax, 0
                jmp     L2
L3:             mov     ax, 1
                jmp     L2
L4:             mov     ax, -1
                jmp     L2
L0:             mov     ax, x
                cmp     ax, 0
                je      L1
                jns     L3
                jmp     L4
L2:             mov     y, ax

This code sequence, by the way, is our good friend the Signum function. It takes a few moments to figure this out because as you manually trace through the code you find yourself spending more time following jumps around than you do looking at code that computes useful results. Now this is a rather extreme example, but it is also fairly short. A longer code sequence code become just as obfuscated with even fewer branches all over the place.

Spaghetti code is given this name because it resembles a bowl of spaghetti. That is, if we consider a control path in the program a spaghetti noodle, spaghetti code contains lots of intertwined branches into and out of different sections of the program. Needless to say, most spaghetti programs are difficult to understand, generally contain lots of bugs, and are often inefficient (don't forget that branches are among the slowest executing instructions on most modern processors).

So how to we resolve this? Easy by physically adopting structured programming techniques in assembly language code. Of course, 80x86 assembly language doesn't provide if..then..else..endif, while..endwhile, repeat..until, and other such statements[22], but we can certainly simulate them. Consider the following high level language code sequence:



        if(expression) then

                << statements to execute if expression is true
>>

        else

                << statements to execute if expression is false
>>

        endif

Almost any high level language program can figure out what this type of statement will do. Assembly languge programmers should leverage this knowledge by attempting to organize their code so it takes this same form. Specifically, the assembly language version should look something like the following:



                << Assembly code to compute value of expression
>>

                JNxx    ElsePart ;xx is the opposite condition we want to
check.

                << Assembly code corresponding to the then portion
>>

                jmp     AroundElsePart

ElsePart:
                << Assembly code corresponding to the else portion
>>

AroundElsePart:

For an concrete example, consider the following:



        if ( x=y ) then

                write( 'x = y' );

        else

                write( 'x <> y' );

        endif;

; Corresponding Assembly Code:

                mov     ax, x
                cmp     ax, y
                jne     ElsePart

                print   "x=y",nl
                jmp     IfDone

ElsePart:       print   "x<>y",nl
IfDone:

While this may seem like the obvious way to organize an if..then.else..endif statement, it is suprising how many people would naturally assume they've got to place the else part somewhere else in the program as follows:



                mov     ax, x
                cmp     ax, y
                jne     ElsePart

                print   "x=y",nl
IfDone:
                 .
                 .
                 .
ElsePart:       print   "x<>y",nl
                jmp     IfDone

This code organization makes the program more difficult to follow. Most programmers have a HLL background and despite a current assignment, they still work mostly in HLLs. Assembly language programs will be more readable if they mimic the HLL control constructs[23].

For similar reasons, you should attempt to organize your assembly code that simulates while loops, repeat..until loops, for loops, etc., so that the code resembles the HLL code (for example, a while loop should physically test the condition at the beginning of the loop with a jump at the bottom of the loop).

Rule:: Attempt to design your programs using HLL control structures. The organization of the assembly code that you write should physically resemble the organization of some corresponding HLL program.

Assembly language offers you the flexibility to design arbitrary control structures. This flexibility is one of the reasons good assembly language programmers can write better code than that produced by a compiler (that can only work with high level control structures). However, keep in mind that a fast program doesn't have to contain the tightest possible code in every sequence. Execution speed is nearly irrelevant in most parts of the program. Sacrificing readability for speed isn't a big win in most of the program.

Guideline:: Avoid control structures that don't easily map to well-known high level language control structures in your assembly language programs. Deviant control structures should only appear in small sections of code when efficiency demands their use.

7.2.3 Instruction Synonyms

MASM defines several synonyms for common instructions. This is especially true for the conditional jump and "set on condition code" instructions. For example, JA and JNBE are synonyms for one another. Logically, one could use either instruction in the same context. However, the choice of synonym can have an impact on the readability of a code sequence. To see why, consider the following:



                if( x <= y ) then
                    << true statements>>
                else
                    << false statements>>
                endif

; Assembly code:

                mov     ax, x
                cmp     ax, y
                ja      ElsePart
                << true code >>
                jmp     IfDone

ElsePart:       << false code >>
IfDone:

When someone reads this program, the "JA" statement skips over the true portion. Unfortunately, the "JA" instruction gives the illusion we're checking to see if something is greater than something else; in actuality, we're testing to see if some condition is less than or equal, not greater than. As such, this code sequence hides some of the original intent of high level algorithm. One solution is to swap the false and true portions of the code:



                mov     ax, x
                cmp     ax, y
                jbe     ThenPart
                << false code >>
                jmp     IfDone

ThenPart:       << true code >>
IfDone:

This code sequence uses the conditional jump that matches the high level algorithm's test (less than or equal). However, this code is now organized in a non-standard fashion (it's an if..else..then..endif statement). This hurts the readability more than using the proper jump helped it. Now consider the following solution:



                mov     ax, x
                cmp     ax, y
                jnbe    ElsePart
                << true code >>
                jmp     IfDone

ElsePart:       << false code >>
IfDone:

This code is organized in the traditional if..then..else..endif fashion. Instead of using JA to skip over the then portion, it uses JNBE to do so. This helps indicate, in a more readable fashion, that the code falls through on below or equal and branches if it is not below or equal. Since the instruction (JNBE) is easier to relate to the original test (<=) than JA, this makes this section of code a little more readable.

Rule:: When skipping over some code because some condition has failed (e.g., you fall into the code because the condition is successful), always use a conditional jump of the form "JNxx" to skip over the code section. For example, to fall through to a section of code if one value is less than another, use the JNL or JNB instruction to skip over the code. Of course, if you are testing a negative condition (e.g., testing for equality) then use an instruction of the form Jx to skip over the code.

8.0 Data Types

Prior to the arrival of MASM, most assemblers provided very little capability for declaring and allocated complex data types. Generally, you could allocate bytes, words, and other primitive machine structures. You could also set aside a block of bytes. As high level languages improved their ability to declare and use abstract data types, assembly language fell farther and farther behind. Then MASM came along and changed all that[24]. Unfortunately, many long time assembly language programmers haven't bothered to learn the new MASM syntax for things like arrays, structures, and other data types. Likewise, many new assembly language programmers don't bother learning and using these data typing facilities because they're already overwhelmed by assembly language and want to minimize the number of things they've got to learn. This is really a shame because MASM data typing is one of the biggest improvements to assembly language since using mnemonics rather than binary opcodes for machine level programming.

Note that MASM is a "high-level" assembler. It does things assemblers for other chips won't do like checking the types of operands and reporting errors if there are mismatches. Some people, who are used to assemblers on other machines find this annoying. However, it's a great idea in assembly language for the same reason it's a great idea in HLLs[25]. These features have one other beneficial side-effect: they help other understand what you're trying to do in your programs. It should come as no surprise, then, that this style guide will encourage the use of these features in your assembly language programs.

8.1 Defining New Data Types with TYPEDEF

MASM provides a small number of primitive data types. For typical applications, bytes, sbytes, words, swords, dwords, sdwords, and various floating point formats are the most commonly used scalar data types available. You may construct more abstract data types by using these built-in types. For example, if you want a character, you'd normally declare a byte variable. If you wanted a 16-bit integer, you'd typically use the sword (or word) declaration. Of course, when you encounter a variable declaration like "answer byte ?" it's a little difficult to figure out what the real type is. Do we have a character, a boolean, a small integer, or something else here? Ultimately it doesn't matter to the machine; a byte is a byte is a byte. It's interpretation as a character, boolean, or integer value is defined by the machine instructions that operate on it, not by the way you define it. Nevertheless, this distinction is important to someone who is reading your program (perhaps they are verifying that you've supplied the correct instruction sequence for a given data object). MASM's typedef directive can help make this distinction clear.

In its simplest form, the typedef directive behaves like a textequ. It let's you replace one string in your program with another. For example, you can create the following definitions with MASM:



char            typedef byte
integer         typedef sword
boolean         typedef byte
float           typedef real4
IntPtr          typedef far ptr integer

Once you have declared these names, you can define char, integer, boolean, and float variables as follows:



MyChar          char    ?
I               integer ?
Ptr2I           IntPtr  I
IsPresent       boolean ?
ProfitsThisYear float   ?

Rule:: Use the existing MASM data types as type building blocks. For most data types you create in your program, you should declare explicit type names using the typedef directive. There is really no excuse for using the built-in primitive types[26].

8.2 Creating Array Types

MASM provides an interesting facility for reserving blocks of storage - the DUP operator. This operator is unusual (among assembly languages) because its definition is recursive. The basic definition is (using HyGram notation):



	DupOperator = expression ws* 'DUP' ws* '('  ws*  operand ws* ') %%

Note that "expression" expands to a valid numeric value (or numeric expression), "ws*" means "zero or more whitespace characters" and "operand" expands to anything that is legal in the operand field of a MASM word/dw, byte/db, etc., directive[27]. One would typically use this operator to reserve a block of memory locations as follows:



ArrayName       integer 16 dup (?)      ;Declare array of 16 words.

This declaration would set aside 16 contiguous words in memory.

The interesting thing about the DUP operator is that any legal operand field for a directive like byte or word may appear inside the parentheses, including additional DUP expressions. The DUP operator simply says "duplicate this object the specified number of times." For example, "16 dup (1,2)" says "give me 16 copies of the value pair one and two. If this operand appeared in the operand field of a byte directive, it would reserve 32 bytes, containing the alternating values one and two.

So what happens if we apply this technique recursively? Well, "4 dup ( 3 dup (0))" when read recursively says "give me four copies of whatever is inside the (outermost) parentheses. This turns out to be the expression "3 dup (0)" that says "give me three zeros." Since the original operand says to give four copies of three copies of a zero, the end result is that this expression produces 12 zeros. Now consider the following two declarations:



Array1		integer	4 dup ( 3 dup (0))
Array2		integer	12 dup (0)

Both definitions set aside 12 integers in memory (initializing each to zero). To the assembler these are nearly identical; to the 80x86 they are absolutely identical. To the reader, however, they are obviously different. Were you to declare two identical one-dimensional arrays of integers, using two different declarations makes your program inconsistent and, therefore, harder to read.

However, we can exploit this difference to declare multidimensional arrays. The first example above suggests that we have four copies of an array containing three integers each. This corresponds to the popular row-major array access function. The second example above suggests that we have a single dimensional array containing 12 integers.

Guideline:: Take advantage of the recursive nature of the DUP operator to declare multidimensional arrays in your programs.

8.3 Declaring Structures in Assembly Language

MASM provides an excellent facility for declaring and using structures, unions, and records[28]; for some reason, many assembly language programmers ignore them and manually compute offsets to fields within structures in their code. Not only does this produce hard to read code, the result is nearly unmaintainable as well.

Rule:: When a structure data type is appropriate in an assembly language program, declare the corresponding structure in the program and use it. Do not compute the offsets to fields in the structure manually, use the standard structure "dot-notation" to access fields of the structure.

One problem with using structures occurs when you access structure fields indirectly (i.e., through a pointer). Indirect access always occurs through a register (for near pointers) or a segment/register pair (for far pointers). Once you load a pointer value into a register or register pair, the program doesn't readily indicate what pointer you are using. This is especially true if you use the indirect access several times in a section of code without reloading the register(s). One solution is to use a textequ to create a special symbol that expands as appropriate. Consider the following code:




s               struct
a               Integer ?
b               integer ?
s               ends
                 .
                 .
                 .
r               s       {}
ptr2r           dword   r
                 .
                 .
                 .
                les     di, ptr2r
                mov     ax, es:[di].s.a         ;No indication this is
ptr2r!
                 .
                 .
                 .
                mov     es:[di].b, bx           ;Really no indication!

Now consider the following:



s               struct
a               Integer ?
b               integer ?
s               ends

sPtr            typedef far ptr s
                 .
                 .
                 .
q               s       {}
r               sPtr    q
r@              textequ <es:[di].s>
                 .
                 .
                 .
                les     di, ptr2r
                mov     ax, r@.a        ;Now it's clear this is using r
                 .
                 .
                 .
                mov     r@.b, bx        ;Ditto.

Note that the "@" symbol is a legal identifier character to MASM, hence "r@" is just another symbol. As a general rule you should avoid using symbols like "@" in identifiers, but it serves a good purpose here - it indicates we've got an indirect pointer. Of course, you must always make sure to load the pointer into ES:DI when using the textequ above. If you use several different segment/register pairs to access the data that "r" points at, this trick may not make the code anymore readable since you will need several text equates that all mean the same thing.

8.4 Data Types and the UCR Standard Library

The UCR Standard Library for 80x86 Assembly Language Programmers (version 2.0 and later) provide a set of macros that let you declare arrays and pointers using a C-like syntax. The following example demonstrates this capability:



var
	integer	i, j, array[10], array2[10][3], *ptr2Int
	char	*FirstName, LastName[32]
endvar

These declarations emit the following assembly code:




i               integer ?
j               integer 25
array           integer 10 dup (?)
array2          integer 10 dup ( 3 dup (?))
ptr2Int         dword   ?

LastName        char    32 dup (?)
Name            dword   LastName

For those comfortable with C/C++ (and other HLLs) the UCR Standard Library declarations should look very familiar. For that reason, their use is a good idea when writing assembly code that uses the UCR Standard Library.

[1] Someone who uses TASM all the time may think this is fine, but consider those individuals who don't. They're not familiar with TASM's funny syntax so they may find several statements in this program to be confusing.

[2] Simplified segment directives do make it easier to write assembly language programs that interface with HLLs. However, they only complicate matters in stand-alone assembly language programs.

[3] A lot of old-time programmers believe that assembly instructions should appear in upper case. A lot of this has to do with the fact that old IBM mainframes and certain personal computers like the original Apple II only supported upper case characters.

[4] Note, by the way, that I am not suggesting that this error checking/handling code should be absent from the program. I am only suggesting that it not interrupt the normal flow of the program while reading the code.

[5] Doing so (inserting an 80x86 tutorial into your comments) would wind up making the program less readable to those who already know assembly language since, at the very least, they'd have to skip over this material; at the worst they'd have to read it (wasting their time).

[6] Or whatever other natural language is in use at the site(s) where you develop, maintain, and use the software.

[7] You may substitute the local language in your area if it is not English.

[8] In fact, just the opposite is true. One should get concerned if both implementations are identical. This would suggest poor planning on the part of the program's author(s) since the same routine must now be maintained in two different programs.

[9] Or whatever make program you normally use.

[10] This happens because shorter function invariable have stronger coupling, leading to integration errors.

[11] Technically, this is incorrect. In some very special cases MASM will generate better machine code if you define your variables before you use them in a program.

[12] Older assemblers on other machines have required the labels to begin in column one, the mnemonic to appear in a specific column, the operand to appear in a specific column, etc. These were examples of fixed-formant source line translators.

[13] See the next section concerning comments for more information.

[14] This document will simply use the term comments when refering to standalone comments.

[15] Since the label, mnemonic, and operand fields are all optional, it is legal to have a comment on a line by itself.

[16] It could be worse, you should see what the "superoptimizer" outputs for the signum function. It's even shorter and harder to understand than this code.

[17] This is true regardless of what metric you use to determine the "best" code sequence.

[18] Of course, if the program is a class assignment, you may want to check your instructor's cheating policy before showing your work to your classmates!

[19] The designer of the SNOBOL4 and Icon programming languages.

[20] Note that this does not infer that it is hard to write easy to read C programs. Only that if one is sloppy, one can easily write something that is near impossible to understand.

[21] Okay, this is a cheap shot. In fact, most of the assembly code on this planet is poorly written.

[22] Actually, MASM 6.x does, but we'll ignore that fact here.

[23] Sometimes, for performance reasons, the code sequence above is justified since straight-line code executes faster than code with jumps. If the program rarely executes the ELSE portion of an if statement, always having to jump over it could be a waste of time. But if you're optimizing for speed, you will often need to sacrafice readability.

[24] Okay, MASM wasn't the first, but such techniques were not popularized until MASM appeared.

[25] Of course, MASM gives you the ability to override this behavoir when necessary. Therefore, the complaints from "old-hand" assembly language programmers that this is insane are groundless.

[26] Okay, using some assembler that doesn't support typedef would probably be a good excuse!

[27] For brevity, the productions for these objects do not appear here.

[28] MASM records are equivalent to bit fields in C/C++. They are not equivalent to records in Pascal.

1.0 - Introduction
1.1 - ADDHEX.ASM
1.2 - Graphics Example
1.3 - S.COM Example
1.4 - Intended Audience
1.5 - Readability Metrics
1.6 - How to Achieve Readability
1.7 - How This Document is Organized
1.8 - Guidelines, Rules, Enforced Rules, and Exceptions
1.9 - Source Language Concerns
2.0 - Program Organization
2.1 - Library Functions
2.2 - Common Object Modules
2.3 - Local Modules
2.4 - Program Make Files
3.0 - Module Organization
3.1 - Module Attributes
3.1.1 - Module Cohesion
3.1.2 - Module Coupling
3.1.3 - Physical Organization of Modules
3.1.4 - Module Interface
4.0 - Program Unit Organization
4.1 - Routine Cohesion
4.1.1 - Routine Coupling
4.1.2 - Routine Size
4.2 - Placement of the Main Procedure and Data
5.0 - Statement Organization
6.0 - Comments
6.1 - What is a Bad Comment?
6.2 - What is a Good Comment?
6.3 - Endline vs. Standalone Comments
6.4 - Unfinished Code
6.5 - Cross References in Code to Other Documents
7.0 - Names, Instructions, Operators, and Operands
7.1 - Names
7.1.1 - Naming Conventions
7.1.2 - Alphabetic Case Considerations
7.1.3 - Abbreviations
7.1.4 - The Position of Components Within an Identifier
7.1.5 - Names to Avoid
7.2 - Instructions, Directives, and Pseudo-Opcodes
7.2.1 - Choosing the Best Instruction Sequence
7.2.2 - Control Structures
7.2.3 - Instruction Synonyms
8.0 - Data Types
8.1 - Defining New Data Types with TYPEDEF
8.2 - Creating Array Types
8.3 - Declaring Structures in Assembly Language
8.4 - Data Types and the UCR Standard Library

Don't waste time on formatting assembly code by hand any more! Try SourceFormatX 80x86 Asm Code Formatter today!