Greg Wilson asks how computer scientists should distill their knowledge into a one-week course for physical scientists and engineers. He doesn’t propose teaching the theoretical underpinnings of computer science, and I think that’s wise. But he goes too far the other way in suggesting a week’s curriculum that is overly specific about particular tools and helpful hints. There is a middle ground: professional software developers are beginning to differentiate between the more theoretical study of computer science and the more practical field of software engineering. It’s learning about the latter-the principles of constructing good programs and systems-that will do computational scientists the most good in the long run.
Learn to build good software
A generally accepted body of knowledge has begun to emerge in software engineering. It is not yet as well-defined as the bodies of knowledge that comprise other engineering fields, but it nevertheless contains much that could be useful to a scientist-programmer taking a one-week course in software development.
Software development? Perhaps assuming this broad topic as the sole subject of our course alters Wilson’s premise a little, but for good reason. While nonprogramming skills such as using LaTex, e-mail, and the Web may be important, people will be able to pick these up on their own or from colleagues. What they’ll be less likely to pick up-what a computer scientist can help most with-is a solid grasp of how to produce good programs. This includes, but goes beyond, programming itself; thus the software-industry term software development. In my course we’ll focus on that subject.
Though he adds these other subjects that I would omit, Wilson does devote a fair portion of the five class days to software development. From my vantage point as a software engineer, however, even the programming parts of his course focus too heavily on specific programming tools and not enough on underlying software development principles. His article is titled, “What Should Computer Scientists Teach to Physical Scientists and Engineers?” but a more accurate title would be, “How to Teach Physical Scientists and Engineers Everything They Always Wanted to Know about Unix but Were Afraid to Ask.”
It has been my observation that the main reason a Sally Synthesis or a Harold Helmet gets into trouble is not unawareness of Unix commands, Emacs, or Perl, but unawareness of the fundamental principles of software design, programming, quality assurance, and project management. Non-professional programmers–people who do some programming but whose primary training and expertise lies elsewhere–can usually muddle along quite well on small projects. They learn enough about tools along the way to get the job done. What they do not learn along the way is the more abstract, seemingly theoretical knowledge that seems not to produce any immediate payoff, but which is invaluable in the long run.
As Wilson points out, the formal training of most non-professional programmers is limited to two or three terms centered on the use of Fortran. To someone just learning about computers, Fortran itself can seem plenty daunting. But on medium and large software projects, language-use details are the smallest of the potential problems.
People who have written a few small programs in college sometimes think that writing large, professional programs is the same kind of work-only on a larger scale. It is not the same kind of work. I can build a beautiful doghouse in my backyard in a few hours. It might even take first prize at the county fair’s doghouse competition. But that does not imply that I have the expertise to build a skyscraper. The skyscraper project requires an entirely more sophisticated kind of expertise. The difference in complexity between student programs and professional programs can be just as great, and non-professional programmers -underestimate the difference in required expertise at their own peril.
Software development for scientists and engineers: The one-week course
A week-long software development course should focus, I think, on how to keep medium to large software projects from spinning out of control, how to keep group projects from becoming chaotic, and how to keep long-lived programs from deteriorating to the point of uselessness. These projects are disasters waiting to happen. The proper goal of such a course should not be to marginally increase the efficiency of the students, but to provide them with the knowledge of how to avoid catastrophe. There’s little to gain from teaching someone how to march toward certain ruin 25 percent faster.
A week-long course can provide more benefit by awakening students to the world of possibilities than by immersing them in the details of a handful of specific tools and methodologies. Let’s revise Greg Wilson’s proposed curriculum accordingly.
Day 1: Programming practices
Because the students have already been programming, the course begins with the familiar subject area of programming details. Topics include:
- Coding for humans. This includes a discussion of variable and function naming, layout, and documentation. It introduces the idea that software development is an exercise in managing complexity.
- Control issues. This segment describes the use of structured control constructs, loop controls, conditionals (if statements), Boolean expressions, and use of the goto statement.
- Integration strategies. This is an exploration of incremental integration, big-bang integration, and evolutionary-development practices.
- Recommended additional sources of information.
Profiling and code-tuning are not discussed. I think it is not just inappropriate but dangerous to focus on code-tuning in a short, one-week course. As W.A. Wulf said, “More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason–including blind stupidity.” The time is better spent discussing effective design and implementation practices, which produce efficient programs as by-products.
Though I won’t mention it again, each day will end with recommended additional sources of information.
Day 2: Software design
The goal of Day 2 is to provide a few practical design guidelines and to expose the students to the different schools of thought in software design. Topics this day include
- Importance of design. This segment explains the costs of not doing design and the critical role that design plays in the success of medium and large projects. It explains the role that good design plays in managing complexity.
- Information hiding
- Overview of structured design
- Overview of data-structure design
- Overview of object design
Day 3: Quality assurance
In programming, quality doesn’t just happen by itself, even if you’re being careful. Topics on Day 3 include
- Unit testing
- Peer reviews
The emphasis of this day is peer reviews–formal or informal reviews of designs, code, and other work products by one’s colleagues. Peer reviews are a critical element of success on any project and are one of the few methods that can be implemented in virtually any organization, any scientific discipline, and any hardware and software environment. The peer-review segment would include some “peer-review role playing” so that each person gets at least 15 minutes’ practice both reviewing and being reviewed.
In addition to their considerable quality-assurance benefit, reviews provide a valuable opportunity for nonprogrammers to exchange information about effective and ineffective programming practices and tools. Teaching students about reviews sets them up to continue learning about software development from their peers long after the one-week course is over.
Day 4: Software project management
Day 4’s topics deal with how software development complexity is addressed at the project-management level:
- Revision management (including a discussion of make-files and revision-control software)
- Waterfall life-cycle model and major alternatives (spiral model, evolutionary prototyping, etc.)
- Software maintenance
- Coordination of group projects
Day 5: Tools and wrap-up
The last day focuses on tools and pulls together the themes outlined in the earlier lectures. Rather than explaining how to use specific tools, the goal of Day 5 is to identify the kinds of tools that are available. If people can be shown a tool’s value, they will seek it out and learn to use it themselves. Tools have a place on the software engineering menu, but they should be presented as garnishes rather than the main course.
Day 5 will cover the following points:
Overview and demonstration of programming tools. This can be customized to the specific group and can focus on Unix, Windows, Macintosh, or other environments as appropriate. Good choices would include code editors, debuggers, database management software, and command-line utilities such as grep. I agree with Wilson’s point that the tools selected for discussion should have proved themselves and should not be likely to change in ways that will nullify the students’ learning.
Overview of scientific tools. This can also be customized to the particular needs of the students and might include statistical software, specialized word processors, Mathematica, Matlab, modular visualization environments (MVEs), and so on.
Summary of themes running through the whole lecture series.
Immersing physical scientists and engineers for a week in the topic of what it means to build a computer program is where the leverage for lasting improvement lies. Even the best tools come and go, but a body of long-lasting programming principles has begun to emerge, and knowledge of these principles can greatly benefit the non-professional programmer.
It is easy to write a course outline. It is another matter entirely to move the content of that outline from the writer’s pen into the student’s brain. The’ real test of this outline or any other is how it would work in practice-how it would move from thought experiment to curriculum. Whatever the specifics of the curriculum, Wilson’s article should be applauded for getting us thinking, and, I hope, doing.