PHP Coding Guidelines
by Brandon Blackmoor [2007-02-15]
Using these guidelines
If you want to make a local copy of these guidelines and use them as your own
you are perfectly free to do so. If you find any errors please email me the changes
so I can merge them in.
Contents
Introduction
This PHP coding guidelines document contains the standard
conventions that the author follows and recommends that others follow. It covers
filenames, file organization, indentation, comments, declarations, statements,
white space, naming conventions, programming practices, and includes code
examples for most of the guidelines presented. The majority of these guidelines
are based on the Sun Java guidelines and the PEAR guidelines.
Why have code conventions?
Code conventions are important to programmers for a number of reasons:
- 80% of the lifetime cost of a piece of software goes to maintenance.
- Hardly any software is maintained for its whole life by the original
author.
- Code conventions improve the readability of the software, allowing
engineers to understand new code more quickly and thoroughly.
It doesn't matter what your guidelines are, so long as everyone understands
and sticks to them. These PHP coding guidelines are based largely on
Sun's Code Conventions for the Java
Programming Language. Deviations from the Sun code conventions are largely a
result of the interaction of PHP with other web server applications, primarily
databases.
Enforcing the code conventions
If you are running a joint project, you might consider putting your foot down
and basically refuse to accept any code that does not conform to your published
guidelines, regardless of its technical merit (or lack thereof). This may sound
somewhat draconian and off-putting to developers at first, but once everyone
begins to code to the guidelines you'll find it a lot easier to manage the
project and you'll get more work done in the same time. It will require a lot of
effort from some of your developers who don't want to abandon their coding
habits, but at the end of the day different coding styles will cause more
problems than they're worth.
Editor settings
Indentation (tabs vs. spaces)
- Indent using 4 spaces for each level.
- Do not use tabs, use spaces. Most editors can substitute spaces for tabs.
- Indent as much as needed, but no more. There are no arbitrary rules as to
the maximum indenting level. If the indenting level is more than 4 or 5 levels
you may think about factoring out code.
Justification
- When people use different tab settings the code is impossible to read or
print. One space is always one space: a tab can be anything.
- Most PHP applications use 4 spaces.
- Many editors use 4 spaces by default.
- As much as people would like to limit the maximum indentation levels, it
never seems to work in general. We'll trust that programmers will choose
wisely how deeply to nest code.
Examplefunction func()
{
if (something bad)
{
if (another thing bad)
{
while (more input)
{
...
}
}
}
}
Line endings
The three major operating systems (Unix, Windows, and Mac OS) use different
ways to represent the end of a line. Unix systems use the newline character
(\n ), Mac systems use a carriage return (\r ), and
Windows systems are terribly wasteful in that they use a carriage return
followed by a line feed (\r\n ). Ensure that your editor is saving
files in the UNIX format. This means lines are terminated with a linefeed (\n),
not with a carriage return/linefeed (\r\n) as they are on Win32, or a carriage
return (\r) as they are on the Mac. Any decent editor (such as TextPad) is able
to do this, but it might not be the default. If you develop on Windows (and many
people do), either set up your editor to save files in Unix format or run a
utility that converts between the two file formats.
Terminate the last line of the file, unless it's the only line.
Names
Make names fit
Names are the heart of programming. In the past people believed knowing
someone's true name gave them magical power over that person. If you can think
up the true name for something, you give yourself and the people coming after
power over the code.
A name is the result of a long deep thought process about the ecology it
lives in. Only a programmer who understands the system as a whole can create a
name that "fits" with the system. If the name is appropriate everything fits
together naturally, relationships are clear, meaning is derivable, and reasoning
from common human expectations works as expected.
If you find all your names could be Thing and DoIt then you should probably
revisit your design.
Hungarian notation
Do not use any form of Hungarian notation. A lot of textbooks (particulary
those about Visual C++) will try to drum Hungarian notation into your head.
Basically, this means having rules such as pre-pending g_ to global variables, i
to integer data types, and do on. Not only is a lot of this irrelevant to PHP
(being an untyped language), it also produces variable names such as
$g_iPersonAge which, to be honest, are not easy to read at a glance and often
end up looking like a group of random characters strung together without rhyme
or reason.
Justification
- Hungarian notation is directly counter to encapsulation, one of the basic
principles of good programming.
- Hungarian notation is essentially a method of encoding comments into a
variable name. There are two reasons this is a bad idea. First, it is
redundant: the proper place to comment on the type and usage of a variable in
when the variable is declared. Second, it leads to having incorrect
information embedded in the application. What happens when the requirements
change, and $i_postal_code must support not only US zip codes (which are
numbers), but also Canadian postal codes (which have letters)? Now
$i_postal_code is a string. Will the programmer go through every file and
change the name of this variable everywhere it appears? When deadlines are
looming, tasks like this are postponed... and deadlines are always looming.
- Hungarian notation is the Weapon Of Mass Destruction ™ of obfuscation
techniques, and code obfuscation is both counterproductive and immoral.
Abbreviations
Use whole words -- avoid acronyms and abbreviations unless the acronym is
much more widely used than the long form, such as URL and HTML. When confronted
with a situation where you could use an all upper case abbreviation instead use
an initial upper case letter followed by all lower case letters.
- Do use: GetHtmlStatistic.
- Do not use: GetHTMLStatistic.
- Do not use: GetHypertextMarkupLanguageStatistic.
Justification
- People seem to have very different intuitions when making names containing
abbreviations. It's best to settle on one strategy so the names are absolutely
predictable. Take, for example, NetworkABCKey. Notice how the C from
ABC and K from key are confused. Some people don't mind this and others just
hate it so you'll find different policies in different code so you never know
what to call something.
Exampleclass FluidOz // NOT FluidOZ
class GetHtmlStatistic // NOT GetHTMLStatistic
Class names
Class names should be nouns, in mixed case with the first ketter of each
internal word capitalized. Try to keep your class names simple and
descriptive.
- Use upper case letters as word separators, lower case for the rest of a
word
- First character in a name is upper case
- No underscores ('_').
- Name the class after what it is. If you can't think of what it is, that is
a clue you have not thought through the design well enough.
- Compound names of over three words are a clue your design may be confusing
various entities in your system. Revisit your design. Try a CRC card session
to see if your objects have more responsibilities than they should.
- Avoid the temptation of bringing the name of the class a class derives
from into the derived class's name. A class should stand on its own. It
doesn't matter what it derives from.
- Suffixes are sometimes helpful. For example, if your system uses agents
then naming something DownloadAgent conveys real information.
Exampleclass NameOneTwo
class Name
Class library names
Now that name spaces are becoming more widely implemented, name spaces should
be used to prevent class name conflicts among libraries from different vendors
and groups. When not using name spaces, it's common to prevent class name
clashes by prefixing class names with a unique string. Two characters is often
sufficient, but a longer length is fine. For example, the xTS project used "Xts"
(don't let the shift in capitalization bother you: see Class names).
Example
Jo Johanssen's data structure library could use JJ as a prefix, so
classes could be: class JjNameOneTwo
{
}
Method and function names
Methods should be verbs, in mixed case with the first letter lowercase, with
the first letter of each internal word capitalized. Most methods and functions
performs actions, so the name should make clear what it does, as consisely as
possible: checkForErrors() instead of errorCheck(), dumpDataToFile() instead of
dumpDataFileToDiskAfterAParticularlyHorrendousCrash(). This will also make
functions and data objects more distinguishable.
- Suffixes are sometimes useful:
- Max - to mean the maximum value something can have.
- Cnt - the current count of a running count variable.
- Key - key value.
For example: RetryMax to mean the
maximum number of retries, RetryCnt to mean the current retry count.
- Prefixes are sometimes useful:
- Is - to ask a question about something. Whenever someone sees
Is they will know it's a question.
- Get - get a value.
- Set - set a value.
For example: IsHitRetryLimit.
Exampleclass JjNameOneTwo
{
function setSomething()
{
...
}
function handleError()
{
...
}
}
Variable names
Variable names should be all lowercase, with words separated by underscores.
For example, $current_user is correct, but
$currentuser , $currentUser and
$CurrentUser are not. Variable names should be short yet
meaningful. The choice of a variable name should indicate to the casual observer
the intent of its use. One-character variable names should be avoided except for
temporary variables and loop indices. Common names for temporary variables are
i , j , k , m , and
n for integers; c , d , e for
strings.
- use all lower case letters
- use '_' as the word separator
- do not use 'l' (lowercase 'L') as a temporary variable
- do not use '-' as the word separator
Justification
- This allows a variable name to have the same name as a database column
name, which is a very common practice in PHP.
- 'l' (lowercase 'L') is easily confused with 1 (the number 'one')
- if '-' is used as a word separator, it will generate warnings used with
magic quotes.
Examplefunction HandleError($error_number)
{
$error = new OsError;
$time_of_error = $error->getTimeOfError();
$error_processor = $error->getErrorProcessor($error_number);
}
Example$myarr['foo_bar'] = 'Hello';
print "$myarr[foo_bar] world"; // will output: Hello world
$myarr['foo-bar'] = 'Hello';
print "$myarr[foo-bar] world"; // warning message
Method argument names
Since function arguments are just variables used in a specific context, they
should follow the same guidelines as variable names. It
should be possible to tell the purpose of a method just by looking at the first
line, e.g. getUserData($username) . By examination, you can make a
good guess that this function gets the user data of a user with the username
passed in the $username argument. Method arguments should be
separated by spaces, both when the function is defined and when it is called.
However, there should not be any spaces between the arguments and the
opening/closing parentheses.
Exampleclass NameOneTwo
{
function startYourEngines(&$some_engine, &$another_engine)
{
$this->some_engine = $some_engine;
$this->another_engine = $another_engine;
}
var $some_engine;
var $another_engine;
}
Exampleget_user_data( $username, $password ); // NOT correct: spaces next to parentheses
get_user_data($username,$password); // NOT correct: no spaces between arguments
get_user_data($a, $b); // ambiguous: what do variables $a and $b hold?
get_user_data($username, $password); // correct
Array elements
Since array elements are just variables used in a specific context, they
should follow the same guidelines as variable names.
- Access an array's elements with single or double quotes.
- Don't use quotes within magic quotes
Justification
- Some PHP configurations will output warnings if arrays are used without
quotes except when used within magic quotes
Example$myarr['foo_bar'] = 'Hello';
$element_name = 'foo_bar';
print "$myarr[foo_bar] world"; // will output: Hello world
print "$myarr[$element_name] world"; // will output: Hello world
print "$myarr['$element_name'] world"; // parse error
print "$myarr["$element_name"] world"; // parse error
Constant (define )
names
Constants should be all uppercase with words separated by underscores
('_').
- use all lower case letters
- use '_' as the word separator
Justification
It's tradition for global constants to named this way. You must be careful to
not conflict with other predefined globals.
This capitalization method for constant values (particularly for
language-specific constants) provides the greatest amount of flexibility.
Exampledefine("A_CONSTANT", "Hello world!");
Contant values
- Text fragments should have the first letter capitalized, with the rest in
lower case. This allows capitalization of text fragments to be handled with a
style, based on how that sentence fragment is used.
- Complete sentences should be punctuated as sentences. This means using
capitalizing the first word and including the end punctuation. Note: a text
fragment with a colon at the end is not a complete sentence.
- Text fragments should not have punctuation at the end unless it is a
question, in which case the use of a question mark is acceptable (because this
actually makes the text fragment a complete sentence with an implied
subject/predicate/whatever).
- It is very easy to overuse colons and exclamation points, and having them
as part of the constant limits that constant's reusability, so these should
not be included unless it is absolutely necessary. It rarely is.
Exampledefine("TITLE_CONSTANT", "User profile administration");
define("ERROR_CONSTANT", "This is an error message.");
Formatting
PHP code tags
PHP Tags are used for delimit PHP from HTML in a file. There are several ways
to do this. <?php ... ?> , <? ... ?> ,
<script language="php"> ... </script> , <% ...
%> , and <?=$name?> . Some of these may be turned off
in your PHP settings. Always use <?php ... ?> .
Justification
<?php ?> is always avaliable in any system and setup.
Example<?php print "Hello world"; ?> // correct
<? print "Hello world"; ?> // NOT correct
<script language="php"> print "Hello world"; </script> // NOT correct
<% print "Hello world"; %> // NOT correct
<?=$street?> // NOT correct
One statement per line
There should be only one statement per line.
Braces {...}
This is a subject of great controversy, but we will use a policy that can be
summed up simply: a brace appears alone on a line, left-aligned with its
associated keyword. A closing brace is always in the same column as the
corresponding opening brace. Additonally, if braces can be used, they
must be used. Leaving out braces makes code harder to maintain
in the future, and can also cause bugs that are very difficult to track
down.
Exampleif ($condition)
{
while ($condition)
{
...
}
}
// NOT...
if ($condition)
while ($condition) { // NOT correct
...
}
Justification
- This removes any ambiguity from the control structure, and avoids many
needless hours of debugging later. Even if the body of the contruct is only
one line long, always include the braces.
- Additionally, if you use an editor (such as vi) that supports brace
matching, this style is much easier to use. Let's say you have a large block
of code and want to know where the block ends. You move to the first brace hit
a key and the editor finds the matching brace. To move from block to block you
just need to use cursor down and your brace matching key. No need to move to
the end of the line to match a brace then jerk back and forth.
Exampleif ($very_long_condition && $second_very_long_condition)
{
...
}
elseif (...)
{
...
}
There are those who prefer to leave curly braces off of very simple
if statements, and who are satisfied to convey the conditional
nature of the statement through indentation. The problem with this is twofold:
first, indentation can't be trusted to remain in place. Even if the developers
are wise enough to eschew tabs and use spaces for indentation (and sadly, not
all developers are this wise), code changes. A few lines of
code may be copied from one place and pasted into another, and re-indented to
fit with its new home. Or perhaps it might be added to a nested control
structure, or an additional control structure might be added after the current
un-braced if . All of this is simply an error waiting to happen.
Unless you are the only person who works on your code, and you never touch it
a second time after you type it, leaving the curly braces out will result in
errors.
Parentheses (...)
As a general rule, parentheses should always be used where the possibility of
ambiguity exists. Additionally, there should not be any added spacer between a
parenthesis and its contents. Control statements, such as if ,
for , and while , should have one space after the
keyword prior to the opening parenthesis. Methods should follow the rules laid
out already, i.e. no space between the method name and the opening parenthesis,
and no space between the parentheses and the arguments, but one space between
each argument; generally speaking, this is the only time an opening parenthesis
will not be preceded by a space.
- Do not pad the contents of parentheses with spaces. There should be no
whitespace between the parentheses and their contents.
- Do put a space before and after a parenthetical expression, except when
the first parenthesis follows a function name.
- Put a space between the end of the keyword and the first parenthesis.
- Do not put a space between the end of the function name and the first
parenthesis.
- Do not use parentheses in return statements when it's not necessary.
Justification
- Keywords are not functions. By putting parentheses next to keywords,
keywords and function names are made to look alike.
Exampleif ((condition1 || condition2) || (value1 < value2))
{
...
}
while (condition)
{
...
}
strcmp($s, $s1);
return 1;
Alignment of declaration blocks
A block of declarations may be aligned, but it does not have to be.
Justification
- Aligning the assignments aids clarity.
- Similarly, blocks of initialization of variables may be tabulated.
- The '&' token should be adjacent to the type, not the name.
Do not waste a lot of time aligning block of variable assignments: it can
easily become a neverending task.
Examplevar $date
var& $name
$date = 0;
$name = 0;
Operators and tokens
There are three types of operators. Firstly, there is the unary operator
which operates on only one value; for example '!' (the negation operator) or
'++' (the increment operator). The second group are termed binary operators;
this group contains most of the operators that PHP supports, such as '+', '-',
and so on. A list of binary operators follows below in the section on operator
precedence. The third group is the ternary operator: '... ? ... :
... ', which is an abbreviated form of if...
then... else . Surrounding the operands in ternary expressions with
parentheses is a very good idea.
There should not be space between a unary operator and the thing on which is
operates.
There should always be one space on either side of a token or binary
operator. The only exceptions are commas and semicolons, which should have one
space after, but none before (just like in English).
if ... then ... else
Always use braces.
Exampleif (condition)
{
// comment
...
}
else if (condition)
{
// comment
...
}
else
{
// comment
...
}
If you have else if statements then it is usually a good idea to
always have an else block for finding unhandled cases. Maybe put a
log message in the else even if there is no corrective action taken.
switch
- Falling through a case statement into the next case statement shall be
permitted as long as a comment is included.
- The default case should always be present and trigger an error if
it should not be reached, yet is reached.
- Although case blocks do not use braces, indent them as if they did.
Exampleswitch (...)
{
case 1:
...
// FALL THROUGH
case 2:
$v = get_week_number();
...
break;
default:
}
continue , break , and
... ? ... : ...
continue and break
Continue and break are really disguised gotos so they are covered here.
Continue and break like goto should be used sparingly as they are magic in
code. With a simple spell the reader is beamed to gods know where for some
usually undocumented reason.
The two main problems with continue are:
- It may bypass the test condition.
- It may bypass the increment/decrement expression.
Consider the following example where both problems occur: while (TRUE)
{
...
// A lot of code
...
if (/* some condition */)
{
continue;
}
...
// A lot of code
...
if ( $i++ > STOP_VALUE) break; // WRONG: braces are not optional
}
Note: "A lot of code" is necessary in order that the problem cannot be caught
easily by the programmer.
From the above example, a further rule may be given: mixing continue with
break in the same loop is a sure way to disaster.
This is also a good example of why braces are not optional,
and why we use spaces instead if tabs.
... ?... :...
The trouble with ternary operators is that people usually try and stuff too
much code into them. It is better to avoid ternary operators unless all of the
operands are very short and very simple. When in doubt, use if... then... else .
Documentation
Comments on comments
Comments should tell a story
Consider your comments a story describing the system. Expect your comments to
be extracted by a robot and formed into a man page. Class comments are one part
of the story, method signature comments are another part of the story, method
arguments another part, and method implementation yet another part. All these
parts should weave together and inform someone else at another point of time
just exactly what you did and why.
Document decisions
Comments should document decisions. At every point where you had a choice of
what to do place a comment describing which choice you made and why.
Archeologists will find this the most useful information.
Use headers
Use a document extraction system like phpDocumentor.
Comment layout
Each part of the project has a specific comment layout.
Make gotchas explicit
Explicitly comment variables changed out of the normal control flow or other
code likely to break during maintenance. Embedded keywords are used to point out
issues and potential problems. Consider a robot will parse your comments looking
for keywords, stripping them out, and making a report so people can make a
special effort where needed.
Gotcha keywords
- :TODO: topic -- There is more to do here, don't forget.
- :BUG: [bugid] topic -- There is a Known bug here, explain
it and optionally give a bug ID.
- :KLUDGE: -- When you've done something ugly say so and
explain how you would do it differently next time if you had more time.
- :TRICKY: -- The following code is very tricky so don't go
changing it without thinking.
- :WARNING: -- Beware of something.
- :PARSER: -- Sometimes you need to work around a parser
problem. Document it. The problem may go away eventually.
- :ATTRIBUTE: value -- The general form of an attribute
embedded in a comment. You can make up your own attributes and they'll be
extracted.
Gotcha formatting
- Make the gotcha keyword the first symbol in the comment.
- Comments may consist of multiple lines, but the first line should be a
self-containing, meaningful summary.
- The writer's name and the date of the remark should be part of the
comment. This information is in the source repository, but it can take a quite
a while to find out when and by whom it was added. Often gotchas stick around
longer than they should. Embedding date information allows other programmer to
make this decision. Embedding who information lets us know who to ask.
Example// :TODO: tmh 1996-08-10: possible performance problem
// We should really use a hash table here but for now we'll
// use a linear search.
// :KLUDGE: tmh 1996-08-10: possible unsafe type cast
// We need a cast here to recover the derived type. It should
// probably use a virtual method or template.
See also
See Interface and Implementation Documentation for more
details on how documentation should be laid out.
Interface and implementation documentation
There are two main audiences for documentation:
- Class users
- Class implementors
With a little forethought we can extract both types of documentation directly
from source code.
Class users
Class users need class interface information which when structured correctly
can be extracted directly from a header file. When filling out the header
comment blocks for a class, only include information needed by programmers who
use the class. Don't delve into algorithm implementation details unless the
details are needed by a user of the class. Consider comments in a header file a
man page in waiting.
Class implementors
Class implementors require in-depth knowledge of how a class is implemented.
This comment type is found in the source file(s) implementing a class. Don't
worry about interface issues. Header comment blocks in a source file should
cover algorithm issues and other design decisions. Comment blocks within a
method's implementation should explain even more.
Copyright notice format
"Copyright ©" is redundant. The circle-c symbol is intended to be an optional
replacement for the word "copyright", and "(c)", i.e. c-in-parentheses, has
never been given legal force.
The proper format in ASCII is "Copyright <year(s)> <copyright
owner>", e.g. "Copyright 2002 Merciless Hangover, LLC".
Server configuration
This section contains some guidelines for PHP/Apache configuration.
HTTP_*_VARS
HTTP_*_VARS are either enabled or disabled. When enabled all variables must
be accessed through $HTTP_*_VARS[key]. When disabled all variables can be
accessed by the key name.
- use HTTP_*_VARS when accessing variables.
- use enabled HTTP_*_VARS in PHP configuration.
Justification
- HTTP_*_VARS is available in any configuration.
- HTTP_*_VARS will not conflict with exsisting variables.
- Users can't change variables by passing values.
PHP file extensions
There are lots of different extension variants on PHP files (.html, .php,
.php3, .php4, .phtml, .inc, .class, etc.).
- Always use the extension .php.
- Always use the extension .php for your class and function libraries.
Justification
- The use of .php makes it possible to enable caching on other files than
.php.
- The use of .inc or .class can be a security problem. On most servers,
these extensions aren't set to be run by a parser. If these are accessed they
will be displayed in clear text.
Examplefilename.inc // NOT correct
filename.class // NOT correct
filename.php // correct
filename.inc.php // correct
filename.class.php // correct
Miscellaneous
This section contains some miscellaneous do's and don'ts.
Use if (0) to comment out very large
code blocks
Sometimes very large blocks of code need to be commented out for testing. The
easiest way to do this is with an if (0) block: function example()
{
great looking code
if (0)
{
lots of code
}
more code
}
You can't use /**/ style comments because comments can't
contain comments, and surely a large block of your code will contain a comment,
won't it?
Classes
Short methods
Methods should limit themselves to a single page of code.
Justification
- The idea is that the each method represents a technique for achieving a
single objective.
- Most arguments of inefficiency turn out to be false in the long run.
- True function calls are slower than not, but there needs to a thought out
decision (see premature optimization).
Do not do real work in object constructors
Do not do any real work in an object's constructor. Inside a constructor
initialize variables only and/or do only actions that can't fail.
Create an Open() method for an object which completes construction. Open()
should be called after object instantiation.
Justification
- Constructors can't return an error.
Exampleclass Device
{
function Device()
{
/* initialize and other stuff */
}
function Open()
{
return FAIL;
}
}
$dev = new Device;
if (FAIL == $dev->Open())
{
exit(1);
}
Make functions reentrant
Functions should not keep static variables that prevent a function from being
reentrant.
Error return check policy
- Check every system call for an error return, unless you know you wish to
ignore errors.
- Include the system error text for every system error message.
Document null statements
Always document a null body for a for or while statement so that it is clear
that the null body is intentional and not missing code. while ($dest++ = $src++)
{
; // VOID
}
Do not default if test to
non-zero
Do not default the test for non-zero, i.e. if (FAIL != f())
is better than if (f())
even though FAIL may have the value 0 which PHP considers to be false. An
explicit test will help you out later when somebody decides that a failure
return should be -1 instead of 0. Explicit comparison should be used even if the
comparison value will never change; e.g., if (!($bufsize %
strlen($str))) should be written instead as if (0 == ($bufsize
% strlen($str))) to reflect the numeric (not boolean) nature of the
test. A frequent trouble spot is using strcmp to test for string equality, where
the result should never ever be defaulted.
The non-zero test is often defaulted for predicates and other functions or
expressions which meet the following restrictions:
- Returns 0 for false, nothing else.
- Is named so that the meaning of (say) a true return is
absolutely obvious. Call a predicate IsValid(), not CheckValid().
Boolean types
Do not check a boolean value for equality with 1 (TRUE, YES, etc.); instead
test for inequality with 0 (FALSE, NO, etc.). Most functions are guaranteed to
return 0 if false, but only non-zero if true. Thus, if (TRUE == func())
{
...
must be written if (FALSE != func())
{
...
Usually avoid embedded assignments
There is a time and a place for embedded assignment statements. In some
constructs there is no better way to accomplish the results without making the
code bulkier and less readable. while ($a != ($c = getchar()))
{
process the character
}
The ++ and -- operators count as assignment statements. So, for many
purposes, do functions with side effects. Using embedded assignment statements
to improve run-time performance is also possible. However, one should consider
the tradeoff between increased speed and decreased maintainability that results
when embedded assignments are used in artificial places. For example, $a = $b + $c;
$d = $a + $r;
should not be replaced by $d = ($a = $b + $c) + $r;
even though the latter may save one cycle. In the long run the time
difference between the two will decrease as the optimizer gains maturity, while
the difference in ease of maintenance will increase as the human memory of
what's going on in the latter piece of code begins to fade.
Complexity management
Layering
Layering is the primary technique for reducing complexity in a system. A
system should be divided into layers. Layers should communicate between adjacent
layers using well defined interfaces. When a layer uses a non-adjacent layer
then a layering violation has occurred.
A layering violation simply means we have dependency between layers that is
not controlled by a well defined interface. When one of the layers changes, code
could break. We don't want code to break, so we want layers to work only with
other adjacent layers.
Sometimes we need to jump layers for performance reasons. This is fine, but
we should know we are doing it and document it appropriately.
Open / Closed principle
The Open/Closed principle states a class must be open and closed where:
- open means a class has the ability to be extended.
- closed means a class is closed for modifications other than extension. The
idea is once a class has been approved for use having gone through code
reviews, unit tests, and other qualifying procedures, you don't want to change
the class very much, just extend it.
The Open/Closed principle is a pitch for stability. A system is extended by
adding new code not by changing already working code. Programmers often don't
feel comfortable changing old code because it works! This principle just gives
you an academic sounding justification for your fears :-)
In practice the Open/Closed principle simply means making good use of our old
friends abstraction and polymorphism. Abstraction to factor out common processes
and ideas. Inheritance to create an interface that must be adhered to by derived
classes.
Development process
Code reviews
Code reviews can be very useful. Unfortunately, they often degrade into nit
picking sessions and endless arguments about silly things. They also tend to
take a lot of people's time for a questionable payback.
First, code reviews are way too late to do much of anything
useful. What needs reviewing are requirements and design. This is where you will
get more bang for the buck.
Get all of the relevant people in a room. Lock them in. Go over the class
design and requirements until the former is good and the latter is being met.
Having all the relevant people in the room makes this process a deep fruitful
one as questions can be immediately answered and issues immediately explored.
Usually only a couple of such meetings are necessary.
If the above process is done well coding will take care of itself. If you
find problems in the code review the best you can usually do is a rewrite after
someone has sunk a ton of time and effort into making the code "work."
You will still want to do a code review, just do it offline. Have a couple
people you trust read the code in question and simply make comments to the
programmer. Then the programmer and reviewers can discuss issues and work them
out. Email and quick pointed discussions work well. This approach meets the
goals and doesn't take the time of 6 people to do it.
Create a source code control system early and
not often
A common build system and source code control system should be put in place
as early as possible in a project's lifecycle, preferably before anyone starts
coding. Source code control is the structural glue binding a project together.
If programmers can't easily use each other's products then you'll never be able
to make a good reproducible build and people will piss away a lot of time. It's
also hell converting rogue build environments to a standard system. But it seems
the rite of passage for every project to build their own custom environment that
never quite works right.
Some issues to keep in mind:
- Shared source environments like CVS and Subversion (SVN) usually work best
in largish projects.
- If you use CVS or SVN use a reference tree approach. With this
approach a master build tree is kept of various builds. Programmers checkout
source against the build they are working on. They only checkout what they
need because the make system uses the build for anything not found locally.
Using the -I and -L flags makes this system easy to setup. Search
locally for any files and libraries then search in the reference build. This
approach saves on disk space and build time.
- Get a lot of disk space. With disk space as cheap it is there is no reason
not to keep plenty of builds around.
- Make simple things simple. It should be dead simple and well documented on
how to:
- check out modules to build
- how to change files
- how to add new modules into the system
- how to delete modules and files
- how to check in changes
- what are the available libraries and include files
- how to get the build environment including all compilers and other tools
Make a web page or document or whatever. New programmers shouldn't
have to go around begging for build secrets from the old timers.
- On checkins log comments should be useful. These comments should be
collected every night and sent to interested parties.
Sources
I recommend Subversion for a version control system.
Create a bug tracking system early and not
often
The earlier people get used to using a bug tracking system the better. If you
are 3/4 through a project and then install a bug tracking system it won't be
used. You need to install a bug tracking system early so people will use it.
Programmers generally resist bug tracking, yet when used correctly it can
really help a project:
- Problems aren't dropped on the floor.
- Problems are automatically routed to responsible individuals.
- The lifecycle of a problem is tracked so people can argue back and forth
with good information.
- Managers can make the big schedule and staffing decisions based on the
number of and types of bugs in the system.
- Configuration management has a hope of matching patches back to the
problems they fix.
- QA and technical support have a communication medium with developers.
Not sexy things, just good solid project improvements.
Source code control should be linked to the bug tracking system. During the
part of a project where source is frozen before a release only checkins
accompanied by a valid bug ID should be accepted. And when code is changed to
fix a bug the bug ID should be included in the checkin comments.
Resources
I recommend Mantisfor bug tracking. Mantis intregrates with Subversion.
Honor responsibilities
Responsibility for software modules is scoped. Modules are either the
responsibility of a particular person or are common. Honor this division of
responsibility. Don't go changing things that aren't your responsibility to
change. Only mistakes and hard feelings will result.
Face it, if you don't own a piece of code you can't possibly be in a position
to change it. There's too much context. Assumptions seemingly reasonable to you
may be totally wrong. If you need a change simply ask the responsible person to
change it. Or ask them if it is OK to make such-n-such a change. If they say OK
then go ahead, otherwise holster your editor.
Every rule has exceptions. If it's 3 in the morning and you need to make a
change to make a deliverable then you have to do it. If someone is on vacation
and no one has been assigned their module then you have to do it. If you make
changes in other people's code try and use the same style they have adopted.
Programmers need to mark with comments code that is particularly sensitive to
change. If code in one area requires changes to code in an another area then say
so. If changing data formats will cause conflicts with persistent stores or
remote message sending then say so. If you are trying to minimize memory usage
or achieve some other end then say so. Not everyone is as brilliant as you.
The worst sin is to flit through the system changing bits of code to match
your coding style. If someone isn't coding to the standards then ask them or ask
your manager to ask them to code to the standards. Use common courtesy.
Code with common responsibility should be treated with care. Resist making
radical changes as the conflicts will be hard to resolve. Put comments in the
file on how the file should be extended so everyone will follow the same rules.
Try and use a common structure in all common files so people don't have to guess
on where to find things and how to make changes. Checkin changes as soon as
possible so conflicts don't build up.
As an aside, module responsibilities must also be assigned for bug tracking
purposes.
No magic numbers
A magic number is a bare-naked number used in source code. It's magic because
no-one has a clue what it means including the author inside 3 months. For
example: if (22 == $foo)
{
start_thermo_nuclear_war();
}
else if (19 == $foo)
{
refund_lotso_money();
}
else if (16 == $foo)
{
infinite_loop();
}
else
{
cry_cause_im_lost();
}
In the above example what do 22 and 19 mean? If there was a number change or
the numbers were just plain wrong how would you know?
Heavy use of magic numbers marks a programmer as an amateur more than
anything else. Such a programmer has never worked in a team environment or has
had to maintain code or they would never do such a thing.
Instead of magic numbers use a real name that means something. You should use
define(). For example: define("PRESIDENT_WENT_CRAZY", "22");
define("WE_GOOFED", "19");
define("THEY_DIDNT_PAY", "16");
if (PRESIDENT_WENT_CRAZY == $foo)
{
start_thermo_nuclear_war();
}
else if (WE_GOOFED == $foo)
{
refund_lotso_money();
}
else if (THEY_DIDNT_PAY == $foo)
{
infinite_loop();
}
else
{
happy_days_i_know_why_im_here();
}
Now isn't that better?
Thin vs. fat class interfaces
How many methods should an object have? The right answer of course is just
the right amount, we'll call this the Goldilocks level. But what is the
Goldilocks level? It doesn't exist. You need to make the right judgment for your
situation, which is really what programmers are for :-)
The two extremes are thin classes versus
thick classes. Thin classes are minimalist classes. Thin
classes have as few methods as possible. The expectation is users will derive
their own class from the thin class adding any needed methods.
While thin classes may seem "clean" they really aren't. You can't do much
with a thin class. Its main purpose is setting up a type. Since thin classes
have so little functionality many programmers in a project will create derived
classes with everyone adding basically the same methods. This leads to code
duplication and maintenance problems which is part of the reason we use objects
in the first place. The obvious solution is to push methods up to the base
class. Push enough methods up to the base class and you get
thick classes.
Thick classes have a lot of methods. If you can think of it a thick class
will have it. Why is this a problem? It may not be. If the methods are directly
related to the class then there's no real problem with the class containing
them. The problem is people get lazy and start adding methods to a class that
are related to the class in some willow wispy way, but would be better factored
out into another class. Judgment comes into play again.
Thick classes have other problems. As classes get larger they may become
harder to understand. They also become harder to debug as interactions become
less predictable. And when a method is changed that you don't use or care about
your code will still have to be retested, and rereleased.
Reusing your hard work and the hard work of
others
Reuse across projects is almost impossible without a common framework in
place. Objects conform to the services available to them. Different projects
have different service environments making object reuse difficult.
Developing a common framework takes a lot of up front design effort. When
this effort is not made, for whatever reasons, there are several techniques one
can use to encourage reuse:
Don't be afraid of small libraries
One common enemy of reuse is people not making libraries out of their code. A
reusable class may be hiding in a program directory and will never have the
thrill of being shared because the programmer won't factor the class or classes
into a library.
One reason for this is because people don't like making small libraries.
There's something about small libraries that doesn't feel right. Get over it.
The computer doesn't care how many libraries you have.
If you have code that can be reused and can't be placed in an existing
library then make a new library. Libraries don't stay small for long if people
are really thinking about reuse.
If you are afraid of having to update makefiles when libraries are recomposed
or added then don't include libraries in your makefiles, include the idea of
services. Base level makefiles define services that are each
composed of a set of libraries. Higher level makefiles specify the services they
want. When the libraries for a service change only the lower level makefiles
will have to change.
Keep a repository
Most companies have no idea what code they have. And most programmers still
don't communicate what they have done or ask for what currently exists. The
solution is to keep a repository of what's available.
In an ideal world a programmer could go to a web page, browse or search a
list of packaged libraries, taking what they need. If you can set up such a
system where programmers voluntarily maintain such a system, great. If you have
a librarian in charge of detecting reusability, even better.
Another approach is to automatically generate a repository from the source
code. This is done by using common class, method, library, and subsystem headers
that can double as man pages and repository entries.
|