Best Practices
IEEE Software, Vol. 13, No. 2, March 1996
Missing in Action: Information Hiding
REWARD for lost software-engineering concept. Responds to the name
"information hiding." Last seen in Canada during the late 1970s. Sometimes
answers to "encapsulation," "modularity," or "abstraction." If found, please
call 555-HIDE.
Information hiding is one of software-engineering's seminal design ideas.
What's happened to it? Most of the structured-design and
object-oriented-design books I checked recently list "information hiding" in
their indexes, but few give it more than a passing acknowledgement. That a
design textbook would not describe information hiding seems akin to the
response that Michael Stipe, leader of the rock group R.E.M., gave when
asked to describe the Beatles' influence on his music. He said he doubted
that he had ever listened to a whole Beatles album. They are irrelevant, he
said. "Elevator music."
As a musician and composer, Michael Stipe has missed something by not
listening to the Beatles. As software designers and implementers, some of us
have missed something by not thoroughly acquainting ourselves with
information hiding.
OUT OF THE DARK. Information hiding first came to public attention
in a paper David Parnas wrote in 1972 called "On the Criteria to Be Used in
Decomposing Systems Into Modules" (Communications of the ACM,
December 1972). Information hiding is characterized by the idea of
"secrets," design and implementation decisions that a software developer
hides in one place from the rest of a program.
Information hiding is part of the foundation of both structured design and
object-oriented design. In structured design, the notion of "black boxes"
comes from information hiding. In object-oriented design, it gives rise to
the concepts of encapsulation and modularity, and it is associated with the
concept of abstraction. It doesn't require or depend on any particular
design methodology, and you can use it with any design approach.
In the 20th Anniversary edition of The Mythical Man-Month, Fred
Brooks concludes that his criticism of information hiding was one of the few
ways in which the first edition of his book was wrong. "Parnas was right,
and I was wrong about information hiding," he proclaims (Brooks 1995). Barry
Boehm reported in 1987 that information hiding was a powerful technique for
eliminating rework, and he pointed out that it was particularly effective
during software evolution ("Improving Software Productivity," IEEE
Computer, September 1987). As incremental, evolutionary development
styles become more popular, the value of information hiding can only
increase.
DESIGN SECRETS. Suppose you have a program in which each object is
supposed to have a unique ID stored in a member variable called ID.
One design approach would be to use integers for the IDs and to store the
highest ID assigned so far in a global variable called MaxID. Each
place a new object is allocated, perhaps in each object's constructor, you
could simply use the statement ID = ++MaxID. (This is a C-language
statement that increments the value of MaxID by 1 and assigns the
new value to ID.) That would guarantee a unique ID, and it would
add the absolute minimum of code in each place an object is created. What
could go wrong with that?
A lot of things could go wrong. What if you want to reserve ranges of IDs
for special purposes? What if you want to be able to reuse the IDs of
objects that have been destroyed? What if you want to add an assertion that
fires when you allocate more IDs than the maximum number you've anticipated?
If you allocated IDs by spreading ID = ++MaxID statements
throughout your program you would have to change code associated with every
one of those statements.
The way that new IDs are created is a design decision that you should hide.
If you use the phrase ++MaxID throughout your program, you expose
the information that the way a new ID is created is simply by incrementing
MaxID. If, instead, you put the statement ID = NewID()
throughout your program, you hide the information about how new IDs are
created. Inside the NewID() function you might still have just one
line of code, return ( ++MaxID ) or its equivalent, but if you
later decide to reserve certain ranges of IDs for special purposes or to
reuse old IDs, you could make those changes within the NewID()
function itself--without touching dozens or hundreds of ID = NewID()
statements. No matter how complicated the revisions inside NewID()
might become, they wouldn't affect any other part of the program.
Now suppose you discover you need to change the type of the ID from an
integer to a string. If you've spread variable declarations like int ID
throughout your program, your use of the NewID() function won't
help. You'll still have to go through your program and make dozens or
hundreds of changes.
In this case, the design decision to hide is the ID's type. You could
simply declare your IDs to be of IDTYPE--a user-defined type that
resolves to int--rather than directly declaring them to be of type
int. Once again, hiding a design decision makes a huge difference in
the amount of code affected by a change.
SPARE CHANGES. To use information hiding, begin your design by
listing the design secrets that you want to hide. As the example suggested,
the most common kind of secret is a design decision that you think might
change. Separate each design secret by assigning it to its own class or
subroutine or other design unit. Then isolate--encapsulate--each design
secret so that if it does change, the change doesn't affect the rest of the
program.
Some of the design areas that are most likely to change will be specific to
specific projects, but you'll run into others again and again:
- Hardware dependencies for display screens, printers, plotters,
communications devices, disk drives, tapes, sound, and so on
- Input and output formats, both machine-readable and end-user readable
- Use of non-standard language features and library routines
- Difficult design and implementation areas, especially areas that might
be developed poorly and need to be redesigned or reimplemented later
- Complex data structures, data structures that are used by more than one
class, or data structures you haven't been able to design to your
satisfaction
- Complex logic, which is almost as likely to change as complex data
structures
- Global variables--probably never truly needed, but which always benefit
from being hidden behind access routines
- Data-size constraints such as array declarations and loop limits
- Business rules such as the laws, regulations, policies, and procedures
that are embedded into a computer system
HEURISTIC VALUE. Aside from providing support for structured and
object-oriented design, information hiding has unique heuristic power, a
unique ability to inspire effective design solutions.
Object design provides the heuristic power of modeling the world in
objects, but object thinking wouldn't help you avoid declaring the ID as an
int instead of an IDTYPE
in the example. The object designer would ask, "Should an ID be treated
as an object?" Depending on his project's coding standards, a "Yes" answer
might mean that he has to create interface and implementation source-code
files for the ID class; write a constructor, destructor, copy operator, and
assignment operator; document it all; have it all reviewed; and place it
under configuration control. Unless the designer is exceptionally motivated,
he will decide, "No, it isn't worth creating a whole class just for an ID.
I'll just use ints."
Note what just happened. A useful design alternative, that of simply hiding
the ID's data type, was not even considered. If, instead, the designer had
asked, "What about the ID should be hidden?" he might well have decided to
hide its type behind a simple type declaration that substitutes IDTYPE
for int. The difference between object design and information
hiding in this example is more subtle than a clash of explicit rules and
regulations. Object design would approve of this design decision as much as
information hiding would. Rather, the difference is one of
heuristics--thinking about information hiding inspires and promotes design
decisions that thinking about objects does not.
WHAT TO HIDE? Information hiding can also be useful in designing a
class's public interface. The gap between theory and practice in class
design is wide, and among many class designers the decision about what to
put into a class's public interface amounts to deciding what interface would
be the easiest to write code to, which usually results in exposing as much
of the class as possible. From what I've seen, most programmers would rather
expose all of a class's private data than write 10 extra lines of code to
keep the class's secrets intact. Asking, "What does this class need to
hide?" cuts to the heart of the interface-design issue. If you can put a
function or data into the class's public interface without compromising its
secrets, do. Otherwise, don't.
Asking about what needs to be hidden supports good design decisions at all
levels. It promotes the use of named constants instead of literals at the
implementation level. It helps in creating good subroutine and parameter
names inside classes. It guides decisions about class and subsystem
decompositions and interconnections at the system level. Get into the habit
of asking, "What should I hide?" You'll be surprised at how many difficult
design decisions vanish before your eyes.
Editor: Steve McConnell, Construx Software, 11820 Northup
Way
#E200, Bellevue, WA 98005.
E-mail: steve.mcconnell@construx.com
- WWW:
http://www.construx.com/stevemcc/