Assembly Language Style Guidelines
- Style Guidelines for Assembly Language
Programmers
-
- 1.0 - Introduction
-
- 1.1 - ADDHEX.ASM
-
- 1.2 - Graphics Example
-
- 1.3 - S.COM Example
-
- 1.4 - Intended Audience
-
- 1.5 - Readability Metrics
-
- 1.6 - How to Achieve Readability
-
- 1.7 - How This Document is Organized
-
- 1.8 - Guidelines, Rules, Enforced Rules, and Exceptions
-
- 1.9 - Source Language Concerns
-
- 2.0 - Program Organization
-
- 2.1 - Library Functions
-
- 2.2 - Common Object Modules
-
- 2.3 - Local Modules
-
- 2.4 - Program Make Files
-
- 3.0 - Module Organization
-
- 3.1 - Module Attributes
-
- 3.1.1 - Module Cohesion
-
- 3.1.2 - Module Coupling
-
- 3.1.3 - Physical Organization of Modules
-
- 3.1.4 - Module Interface
-
- 4.0 - Program Unit Organization
-
- 4.1 - Routine Cohesion
-
- 4.1.1 - Routine Coupling
-
- 4.1.2 - Routine Size
-
- 4.2 - Placement of the Main Procedure and Data
-
- 5.0 - Statement Organization
-
- 6.0 - Comments
-
- 6.1 - What is a Bad Comment?
-
- 6.2 - What is a Good Comment?
-
- 6.3 - Endline vs. Standalone Comments
-
- 6.4 - Unfinished Code
-
- 6.5 - Cross References in Code to Other Documents
-
- 7.0 - Names, Instructions, Operators, and Operands
-
- 7.1 - Names
-
- 7.1.1 - Naming Conventions
-
- 7.1.2 - Alphabetic Case Considerations
-
- 7.1.3 - Abbreviations
-
- 7.1.4 - The Position of Components Within an Identifier
-
- 7.1.5 - Names to Avoid
-
- 7.2 - Instructions, Directives, and Pseudo-Opcodes
-
- 7.2.1 - Choosing the Best Instruction Sequence
-
- 7.2.2 - Control Structures
-
- 7.2.3 - Instruction Synonyms
-
- 8.0 - Data Types
-
- 8.1 - Defining New Data Types with TYPEDEF
-
- 8.2 - Creating Array Types
-
- 8.3 - Declaring Structures in Assembly Language
-
- 8.4 - Data Types and the UCR Standard
Library
Style Guidelines for Assembly Language Programmers
1.0 Introduction
Most people consider assembly language programs difficult to read. While
there are a multitude of reasons why people feel this way, the primary reason is
that assembly language does not make it easy for programmers to write readable
programs. This doesn't mean it's impossible to write readable programs, only
that it takes an extra effort on the part of an assembly language programmer to
produce readable code.
To demonstrate some common problems with assembly language programs, consider
the following programs or program segments. These are actual programs written in
assembly language taken from the internet. Each example demonstrates a separate
problem. (By the way, the choice of these examples is not intended to embarass
the original authors. These programs are typical of assembly language source
code found on the Internet.)
1.1 ADDHEX.ASM
%TITLE "Sums TWO hex values"
IDEAL
DOSSEG
MODEL small
STACK 256
DATASEG
exitCode db 0
prompt1 db 'Enter value 1: ', 0
prompt2 db 'Enter value 2: ', 0
string db 20 DUP (?)
CODESEG
EXTRN StrLength:proc
EXTRN StrWrite:proc, StrRead:proc, NewLine:proc
EXTRN AscToBin:proc, BinToAscHex:proc
Start:
mov ax,@data
mov ds,ax
mov es,ax
mov di, offset prompt1
call GetValue
push ax
mov di, offset prompt2
call GetValue
pop bx
add ax,bx
mov cx,4
mov di, offset string
call BinToAscHex
call StrWrite
Exit:
mov ah,04Ch
mov al,[exitCode]
int 21h
PROC GetValue
call StrWrite
mov di, offset string
mov cl,4
call StrRead
call NewLine
call StrLength
mov bx,cx
mov [word bx + di], 'h'
call AscToBin
ret
ENDP GetValue
END Start
Well, the biggest problem with this program should be fairly obvious - it has
absolutely no comments other than the title of the program. Another problem is
the fact that strings that prompt the user appear in one part of the program and
the calls that print those strings appear in another. While this is typical
assembly language programming, it still makes the program harder to read.
Another, relatively minor, problem is that it uses TASM's "less-than" IDEAL
syntax[1].
This program also uses the MASM/TASM "simplified" segment directives. How
typically Microsoft to name a feature that adds complexity to a product
"simplified." It turns out that programs that use the standard segmentation
directives will be easier to read[2].
Before moving one, it is worthwhile to point out two good features about this
program (with respect to readability). First, the programmer chose a reasonable
set of names for the procedures and variables this program uses (I'll assume the
author of this code segment is also the author of the library routines it
calls). Another positive aspect to this program is that the mnemonic and operand
fields are nicely aligned.
Okay, after complaining about how hard this code is to read, how about a more
readable version? The following program is, arguably, more readable than the
version above. Arguably, because this version uses the UCR Standard Library v2.0
and it assumes that the reader is familiar with features of that particular
library.
;**************************************************
;
; AddHex-
;
; This simple program reads two integer values from
; the user, computes their sum, and prints the
; result to the display.
;
; This example uses the "UCR Standard Library for
; 80x86 Assembly Language Programmers v2.0"
;
; Randall Hyde
; 12/13/96
title AddHex
.xlist
include ucrlib.a
includelib ucrlib.lib
.list
cseg segment para public 'code'
assume cs:cseg
; GetInt-
;
; This function reads an integer value from the keyboard and
; returns that value in the AX register.
;
; This routine traps illegal values (either too large or
; incorrect digits) and makes the user re-enter the value.
GetInt textequ <call GetInt_p>
GetInt_p proc
push dx ;DX hold error code.
GetIntLoop: mov dx, false ;Assume no error.
try ;Trap any errors.
FlushGetc ;Force input from a new line.
geti ;Read the integer.
except $Conversion ;Trap if bad characters.
print "Illegal numeric conversion, please
re-enter", nl
mov dx, true
except $Overflow ;Trap if # too large.
print "Value out of range, please re-enter.",nl
mov dx, true
endtry
cmp dx, true
je GetIntLoop
pop dx
ret
GetInt_p endp
Main proc
InitExcept
print 'Enter value 1: '
GetInt
mov bx, ax
print 'Enter value 2: '
GetInt
print cr, lf, 'The sum of the two values is '
add ax, bx
puti
putcr
Quit: CleanUpEx
ExitPgm ;DOS macro to quit program.
Main endp
cseg ends
sseg segment para stack 'stack'
stk db 256 dup (?)
sseg ends
zzzzzzseg segment para public 'zzzzzz'
LastBytes db 16 dup (?)
zzzzzzseg ends
end Main
It is well worth pointing out that this code does quite a bit more than the
original AddHex program. In particular, it validates the user's input; something
the original program did not do. If one were to exactly simulate the original
program, the program could be simplified to the following:
print nl, 'Enter value 1: '
Geti
mov bx, ax
print nl, 'Enter value 2: '
Geti
add ax, bx
putcr
puti
putcr
In this example, the two sample solutions improved the readability of the
program by adding comments, formatting the program a little bit better, and by
using the high-level features of the UCR Standard Library to simplify the coding
and keep output string literals with the statements that print them.
1.2 Graphics Example
The following program segment comes from a much larger program named
"MODEX.ASM" on the net. It deals with setting up the color graphics display.
;===================================
;SET_POINT (Xpos%, Ypos%, ColorNum%)
;===================================
;
; Plots a single Pixel on the active display page
;
; ENTRY: Xpos = X position to plot pixel at
; Ypos = Y position to plot pixel at
; ColorNum = Color to plot pixel with
;
; EXIT: No meaningful values returned
;
SP_STACK STRUC
DW ?,? ; BP, DI
DD ? ; Caller
SETP_Color DB ?,? ; Color of Point to Plot
SETP_Ypos DW ? ; Y pos of Point to Plot
SETP_Xpos DW ? ; X pos of Point to Plot
SP_STACK ENDS
PUBLIC SET_POINT
SET_POINT PROC FAR
PUSHx BP, DI ; Preserve Registers
MOV BP, SP ; Set up Stack Frame
LES DI, d CURRENT_PAGE ; Point to Active VGA Page
MOV AX, [BP].SETP_Ypos ; Get Line # of Pixel
MUL SCREEN_WIDTH ; Get Offset to Start of Line
MOV BX, [BP].SETP_Xpos ; Get Xpos
MOV CX, BX ; Copy to extract Plane # from
SHR BX, 2 ; X offset (Bytes) = Xpos/4
ADD BX, AX ; Offset = Width*Ypos + Xpos/4
MOV AX, MAP_MASK_PLANE1 ; Map Mask & Plane Select Register
AND CL, PLANE_BITS ; Get Plane Bits
SHL AH, CL ; Get Plane Select Value
OUT_16 SC_Index, AX ; Select Plane
MOV AL,[BP].SETP_Color ; Get Pixel Color
MOV ES:[DI+BX], AL ; Draw Pixel
POPx DI, BP ; Restore Saved Registers
RET 6 ; Exit and Clean up Stack
SET_POINT ENDP
Unlike the previous example, this one has lots of comments. Indeed, the
comments are not bad. However, this particular routine suffers from its own set
of problems. First, most of the instructions, register names, and identifiers
appear in upper case. Upper case characters are much harder to read than lower
case letters. Considering the extra work involved in entering upper case letters
into the computer, it's a real shame to see this type of mistake in a program[3]. Another big problem with this particular code
segment is that the author didn't align the label field, the mnemonic field, and
the operand field very well (it's not horrible, but it's bad enough to affect
the readability of the program.
Here is an improved version of the program:
;===================================
;
;SetPoint (Xpos%, Ypos%, ColorNum%)
;
;
; Plots a single Pixel on the active display page
;
; ENTRY: Xpos = X position to plot pixel at
; Ypos = Y position to plot pixel at
; ColorNum = Color to plot pixel with
;
; ES:DI = Screen base address (??? I added this without really
; knowing what is going on here
[RLH]).
;
; EXIT: No meaningful values returned
;
dp textequ <dword ptr>
Color textequ <[bp+6]>
YPos textequ <[bp+8]>
XPos textequ <[bp+10]>
public SetPoint
SetPoint proc far
push bp
mov bp, sp
push di
les di, dp CurrentPage ;Point at active VGA Page
mov ax, YPos ;Get line # of Pixel
mul ScreenWidth ;Get offset to start of
line
mov bx, XPos ;Get offset into line
mov cx, bx ;Save for plane
computations
shr bx, 2 ;X offset (bytes)= XPos/4
add bx, ax ;Offset=Width*YPos + XPos/4
mov ax, MapMaskPlane1 ;Map mask & plane
select reg
and cl, PlaneBits ;Get plane bits
shl ah, cl ;Get plane select value
out_16 SCIndex, ax ;Select plane
mov al, Color ;Get pixel color
mov es:[di+bx], al ;Draw pixel
pop di
pop bp
ret 6
SetPoint endp
Most of the changes here were purely mechanical: reducing the number of upper
case letters in the program, spacing the program out better, adjusting some
comments, etc. Nevertheless, these small, subtle, changes have a big impact on
how easy the code is to read (at least, to an experienced assembly langage
programmer).
1.3 S.COM Example
The following code sequence came from a program labelled "S.COM" that was
also found in an archive on the internet.
;Get all file names matching filespec and set up tables
GetFileRecords:
mov dx, OFFSET DTA ;Set up DTA
mov ah, 1Ah
int 21h
mov dx, FILESPEC ;Get first file name
mov cl, 37h
mov ah, 4Eh
int 21h
jnc FileFound ;No files. Try a different filespec.
mov si, OFFSET NoFilesMsg
call Error
jmp NewFilespec
FileFound:
mov di, OFFSET fileRecords ;DI -> storage for file names
mov bx, OFFSET files ;BX -> array of files
sub bx, 2
StoreFileName:
add bx, 2 ;For all files that will fit,
cmp bx, (OFFSET files) + NFILES*2
jb @@L1
sub bx, 2
mov [last], bx
mov si, OFFSET tooManyMsg
jmp DoError
@@L1:
mov [bx], di ;Store pointer to status/filename in
files[]
mov al, [DTA_ATTRIB] ;Store status byte
and al, 3Fh ;Top bit is used to indicate file is marked
stosb
mov si, OFFSET DTA_NAME ;Copy file name from DTA to filename
storage
call CopyString
inc di
mov si, OFFSET DTA_TIME ;Copy time, date and size
mov cx, 4
rep movsw
mov ah, 4Fh ;Next filename
int 21h
jnc StoreFileName
mov [last], bx ;Save pointer to last file entry
mov al, [keepSorted] ;If returning from EXEC, need to resort
files?
or al, al
jz DisplayFiles
jmp Sort0
The primary problem with this program is the formatting. The label fields
overlap the mnemonic fields (in almost every instance), the operand fields of
the various instructions are not aligned, there are very few blank lines to
organize the code, the programmer makes excessive use of "local" label names,
and, although not prevalent, there are a few items that are all uppercase
(remember, upper case characters are harder to read). This program also makes
considerable use of "magic numbers," especially with respect to opcodes passed
on to DOS.
Another subtle problem with this program is the way it organizes control
flow. At a couple of points in the code it checks to see if an error condition
exists (file not found and too many files processed). If an error exists, the
code above branches around some error handling code that the author places in
the middle of the routine. Unfortunately, this interrupts the flow of the
program. Most readers will want to see a straight-line version of the program's
typical operation without having to worry about details concerning error
conditions. Unfortunately, the organization of this code is such that the user
must skip over seldomly-executed code in order to follow what is happening with
the common case[4].
Here is a slightly improved version of the above program:
;Get all file names matching filespec and set up tables
GetFileRecords mov dx, offset DTA ;Set up DTA
DOS SetDTA
; Get the first file that matches the specified filename (that may
; contain wildcard characters). If no such file exists, then
; we've got an error.
mov dx, FileSpec
mov cl, 37h
DOS FindFirstFile
jc FileNotFound
; As long as there are no more files matching our file spec (that contains
; wildcard characters), get the file information and place it in the
; "files" array. Each time through the
"StoreFileName" loop we've got
; a new file name via a call to DOS' FindNextFile function (FindFirstFile
; for the first iteration). Store the info concerning the file away and
; move on to the next file.
mov di, offset fileRecords ;DI -> storage for file
names
mov bx, offset files ;BX -> array of
files
sub bx, 2 ;Special case for 1st
iteration
StoreFileName: add bx, 2
cmp bx, (offset files) + NFILES*2
jae TooManyFiles
; Store away the pointer to the status/filename in files[] array.
; Note that the H.O. bit of the status byte indicates that the file is
; is marked.
mov [bx], di ;Store pointer in files[]
mov al, [DTAattrib] ;Store status byte
and al, 3Fh ;Clear file is marked bit
stosb
; Copy the filename from the DTA storage area to the space we've set aside.
mov si, offset DTAname
call CopyString
inc di ;Skip zero byte (???).
mov si, offset DTAtime ;Copy time, date and size
mov cx, 4
rep movsw
; Move on to the next file and try again.
DOS FindNextFile
jnc StoreFileName
; After processing the last file entry, do some clean up.
; (1) Save pointer to last file entry.
; (2) If returning from EXEC, we may need to resort and display the files.
mov [last], bx
mov al, [keepSorted]
or al, al
jz DisplayFiles
jmp Sort0
; Jump down here if there were no files to process.
FileNotFound: mov si, offset NoFilesMsg
call Error
jmp NewFilespec
; Jump down here if there were too many files to process.
TooManyFiles: sub bx, 2
mov [last], bx
mov si, offset tooManyMsg
jmp DoError
This improved version dispenses with the local labels, formats the code
better by aligning all the statement fields and inserting blank lines into the
code. It also eliminates much of the uppercase characters appearing in the
previous version. Another improvment is that this code moves the error handling
code out of the main stream of this code segment, allowing the reader to follow
the typical execution in a more linear fashion.
1.4 Intended Audience
Of course, an assembly language program is going to be nearly unreadable to
someone who doesn't know assembly language. This is true for almost any
programming language. In the examples above, it's doubtful that the "improved"
versions are really any more readable than the original version if you don't
know 80x86 assembly language. Perhaps the improved versions are more aesthetic
in a generic sense, but if you don't know 80x86 assembly language it's doubtful
you'd make any more sense of the second version than the first. Other than
burying a tutorial on 80x86 assembly language in a program's comments, there is
no way to address this problem[5].
In view of the above, it makes sense to define an "intended audience" that we
intend to have read our assembly language programs. Such a person should:
- Be a reasonably competent 80x86 assembly language programmer.
- Be reasonably familiar with the problem the assembly language program is
attempting to solve.
- Fluently read English[6].
- Have a good grasp of high level language concepts.
- Possess appropriate knowledge for someone working in the field of Computer
Science (e.g., understands standard algorithms and data structures,
understands basic machine architecture, and understands basic discrete
mathmatics).
1.5 Readability Metrics
One has to ask "What is it that makes one program more readable than
another?" In other words, how do we measure the "readability" of a program? The
usual metric, "I know a well-written program when I see one" is inappropriate;
for most people, this translates to "If your programs look like my better
programs then they are readable, otherwise they are not." Obviously, such a
metric is of little value since it changes with every person.
To develop a metric for measuring the readability of an assembly language
program, the first thing we must ask is "Why is readability important?" This
question has a simple (though somewhat flippant) answer: Readability is
important because programs are read (furthermore, a line of code is typically
read ten times more often than it is written). To expand on this, consider the
fact that most programs are read and maintained by other programmers (Steve
McConnell claims that up to ten generations of maintenance programmers work on a
typically real world program before it is rewritten; furthermore, they spend up
to 60% of their effort on that code simply figuring out how it works). The more
readable your programs are, the less time these other people will have to spend
figuring out what your program does. Instead, they can concentrate on adding
features or correcting defects in the code.
For the purposes of this document, we will define a "readable" program as one
that has the following trait:
- A "readable" program is one that a competent programmer (one who is
familiar with the problem the program is attempting to solve) can pick up,
without ever having seen the program before, and fully comprehend the entire
program in a minimal amount of time.
That's a tall order! This definition doesn't sound very difficult to achieve,
but few non-trivial programs ever really achieve this status. This definition
suggests that an appropriate programmer (i.e., one who is familiar with the
problem the program is trying to solve) can pick up a program, read it at their
normal reading pace (just once), and fully comprehend the program. Anything less
is not a "readable" program.
Of course, in practice, this definition is unusable since very few programs
reach this goal. Part of the problem is that programs tend to be quite long and
few human beings are capable of managing a large number of details in their head
at one time. Furthermore, no matter how well-written a program may be, "a
competent programmer" does not suggest that the programmer's IQ is so high they
can read a statement a fully comprehend its meaning without expending much
thought. Therefore, we must define readabilty, not as a boolean entity, but as a
scale. Although truly unreadable programs exist, there are many "readable"
programs that are less readable than other programs. Therefore, perhaps the
following definition is more realistic:
- A readable program is one that consists of one or more modules. A
competent program should be able to pick a given module in that program and
achieve an 80% comprehension level by expending no more than an average of one
minute for each statement in the program.
An 80% comprehension level means that the programmer can correct bugs in the
program and add new features to the program without making mistakes due to a
misunderstanding of the code at hand.
1.6 How to Achieve Readability
The "I'll know one when I see one" metric for readable programs provides a
big hint concerning how one should write programs that are readable. As pointed
out early, the "I'll know it when I see it" metric suggests that an individual
will consider a program to be readable if it is very similar to (good) programs
that this particular person has written. This suggests an important trait that
readable programs must possess: consistency. If all programmers were to write
programs using a consistent style, they'd find programs written by others to be
similar to their own, and, therefore, easier to read. This single goal is the
primary purpose of this paper - to suggest a consistent standard that everyone
will follow.
Of course, consistency by itself is not good enough. Consistently bad
programs are not particularly easy to read. Therefore, one must carefully
consider the guidelines to use when defining an all-encompassing standard. The
purpose of this paper is to create such a standard. However, don't get the
impression that the material appearing in this document appears simply because
it sounded good at the time or because of some personal preferences. The
material in this paper comes from several software engineering texts on the
subject (including Elements of Programming Style, Code Complete, and Writing
Solid Code), nearly 20 years of personal assembly language programming
experience, and a set of generic programming guidelines developed for
Information Management Associates, Inc.
This document assumes consistent usage by its readers. Therefore, it
concentrates on a lot of mechanical and psychological issues that affect the
readability of a program. For example, uppercase letters are harder to read than
lower case letters (this is a well-known result from psychology research). It
takes longer for a human being to recognize uppercase characters, therefore, an
average human being will take more time to read text written all in upper case.
Hence, this document suggests that one should avoid the use of uppercase
sequences in a program. Many of the other issues appearing in this document are
in a similar vein; they suggest minor changes to the way you might write your
programs that make it easier for someone to recognize some pattern in your code,
thus aiding in comprehension.
1.7 How This Document is Organized
This document follows a top-down discussion of readability. It starts with
the concept of a program. Then it discusses modules. From there it works its way
down to procedures. Then it talks about individual statements. Beyond that, it
talks about components that make up statements (e.g., instructions, names, and
operators). Finally, this paper concludes by discussing some orthogonal
issues.
Section Two discusses programs in general. It primarily discusses
documentation that must accompany a program and the organization of source
files. It also discusses, briefly, configuration management and source code
control issues. Keep in mind that figuring out how to build a program (make,
assemble, link, test, debug, etc.) is important. If your reader fully
understands the "heapsort" algorithm you are using, but cannot build an
executable module to run, they still do not fully understand your program.
Section Three discusses how to organize modules in your program in a logical
fashion. This makes it easier for others to locate sections of code and
organizes related sections of code together so someone can easily find important
code and ignore unimportant or unrelated code while attempting to understand
what your program does.
Section Four discusses the use of procedures within a program. This is a
continuation of the theme in Section Three, although at a lower, more detailed,
level.
Section Five discusses the program at the level of the statement. This
(large) section provides the meat of this proposal. Most of the rules this paper
presents appear in this section.
Section Six discusses those items that make up a statement (labels, names,
instructions, operands, operators, etc.) This is another large section that
presents a large number of rules one should follow when writing readable
programs. This section discusses naming conventions, appropriateness of
operators, and so on.
Section Seven discusses data types and other related topics.
Section Eight covers miscellaneous topics that the previous sections did not
cover.
1.8 Guidelines, Rules, Enforced Rules, and Exceptions
Not all rules are equally important. For example, a rule that you check the
spelling of all the words in your comments is probably less important than
suggesting that the comments all be in English[7].
Therefore, this paper uses three designations to keep things straight:
Guidelines, Rules, and Enforced Rules.
A Guideline is a suggestion. It is a rule you should follow unless you can
verbally defend why you should break the rule. As long as there is a good,
defensible, reason, you should feel no apprehension violated a guideline.
Guidelines exist in order to encourage consistency in areas where there are no
good reasons for choosing one methodology over another. You shouldn't violate a
Guideline just because you don't like it -- doing so will make your programs
inconsistent with respect to other programs that do follow the Guidline (and,
therefore, harder to read -- however, you shouldn't lose any sleep because you
violated a Guideline.
Rules are much stronger than Guidelines. You should never break a rule unless
there is some external reason for doing so (e.g., making a call to a library
routine forces you to use a bad naming convention). Whenever you feel you must
violate a rule, you should verify that it is reasonable to do so in a peer
review with at least two peers. Furthermore, you should explain in the program's
comments why it was necessary to violate the rule. Rules are just that -- rules
to be followed. However, there are certain situations where it may be necessary
to violate the rule in order to satisfy external requirements or even make the
program more readable.
Enforced Rules are the toughest of the lot. You should never violate an
enforced rule. If there is ever a true need to do this, then you should consider
demoting the Enforced Rule to a simple Rule rather than treating the violation
as a reasonable alternative.
An Exception is exactly that, a known example where one would commonly
violate a Guideline, Rule, or (very rarely) Enforced Rule. Although exceptions
are rare, the old adage "Every rule has its exceptions..." certainly applies to
this document. The Exceptions point out some of the common violations one might
expect.
Of course, the categorization of Guidelines, Rules, Enforced Rules, and
Exceptions herein is one man's opinion. At some organizations, this
categorization may require reworking depending on the needs of that
organization.
1.9 Source Language Concerns
This document will assume that the entire program is written in 80x86
assembly language. Although this organization is rare in commercial
applications, this assumption will, in no way, invalidate these guidelines.
Other guidelines exist for various high level languages (including a set written
by this paper's author). You should adopt a reasonable set of guidelines for the
other languages you use and apply these guidelines to the 80x86 assembly
language modules in the program.
|