The last section discussed Agile processes, which are largely a reaction to and rejection of the traditional project planning approach to software development. This section discusses the traditional project planning approach to software development, called “waterfall.” The essence of waterfall is that the project proceeds in phases: being chartered, gathering requirements, analyzing requirements to determine desired functions, architecting a solution, designing the individual pieces of the solution, developing the pieces of the solution, testing the pieces, integrating the pieces into subassemblies, testing the subassemblies, integrating the subassemblies into the solution, testing the solution, and finally fielding the solution. The watchwords throughout are planning: planning the software, planning the business processes using the software, planning the construction effort, planning the deployment; and monitoring: measuring deviations from the plan and either correcting the deviations, accepting the deviations, or canceling the project if it becomes unsalvageable. As with any other complex endeavor, failure to plan is planning to fail; but recall also General Eisenhower said, “plans are useless, but planning is indispensable.” No battle plan survives contact with the enemy, but the planning process deeply investigated the terrain and available resources, revealed alternatives, and identified fall backs for when things go wrong.
Any waterfall process takes a long time, and costs a lot. This is not a defect in the waterfall process: this is because software takes a long time, and costs a lot. But despite this, waterfall is, when performed correctly, in fact the quickest, cheapest way yet discovered to produce a new system for which the desired functions are known. However, both the “performed correctly” and the “for which the desired functions are known” parts are quite substantial hurdles.
New buildings typically start tentatively. An idea for a new building might be floated years before formal construction activity starts: things such as the purpose of the building, possible sites, impact on operational processes, even consultations with architects and construction firms happen before the decision to build is even taken. So much of the preparatory work for a new building is done invisibly, before the project clock starts ticking: the architect presenting drawings, models, and plans is likely one of the earliest exposures most of the organization has to the project, and by that time half the project is done already. All that remains is actual construction. A shed or garage can be constructed with only a verbal description of what is wanted; anything more complex needs an actual design and construction blueprints. Typically in construction there is a go/no go decision on producing the construction blueprints, based on the organization’s needs and a preliminary cost estimate; and another go/no go decision after the construction blueprints have been produced and there is a believable cost estimate.
Software, in contrast, typically starts with a commitment to build a new system, or replace an existing system, and this decision frequently comes with a budget already attached. (Never replace an existing system. The existing system has decades of business process and organizational work arounds embedded in it; the replacement system will not, and will therefore be useless.) The early deliverables — user studies, functional requirements, specifications, designs — are very obviously not software. The early investigation and planning happen publicly and noisily, rather than in private and quietly, in waterfall, and management is always ready to say “just start coding” because code is the ultimate deliverable. But just starting coding has about the same effectiveness as lining up the cement trucks out front as soon as the idea for a new building arises: we’re ready to pour the slab, where do you want it?, and you really should have graded the site first, and for that matter you really should have picked a site first. Instead of lengthy analysis before the fact, for some reason software is supposed to spring fully armèd from the head of the programmers. That doesn’t happen with buildings, and that doesn’t happen with software. Like a building, software needs a great deal of thought to make it useful, and some of these thoughts are not obvious: after all, even a warehouse needs bathrooms, even though bathrooms have nothing to do with the warehouse’s business function. There is simply no substitute for thinking: about the business problem, about the users, about the organization’s resources and culture and needs, about available technologies, about effective approaches, about how the system will turn a mission statement into actual work.
But thinking before you act is counter-cultural these days. It should come as no surprise then that most software projects fail. Estimates of failure rates range from 50% to 80%. Almost all failures are due to management not permitting the project to plan adequately, and a runner-up cause being starving the project of resources and information about the business environment. You wouldn’t decide to build a new building and start ground breaking tomorrow despite not being able to describe the building’s purpose; why do you think software is different? Because software is just typing into a computer, and your 10 year old can do that? These days many surgeries are just typing into a computer as well: would you be happy with your 10 year old being your eye surgeon?
Fortunately, the process of building software is solved. We know how to do it, without project failure, with project manageability, with effective progress metrics. It’s called CMMI and PMBOK and Earned Value Analysis. Yes, a high-level CMMI organization is expensive. But project failure — pouring millions of dollars into a project and getting nothing useful out of it — is also expensive. Unfortunately, “penny wise and pound foolish,” also known as “Ready! Fire! Aim!” is, for some reason, the mantra of many MBAs who mistake spreadsheets for reality.
The CMMI processes, and the PMI mechanisms, are how to “perform waterfall correctly.” These consist of painstaking, mind numbing, planning: planning the success cases, envisioning the failure cases, planning the recovery from the failure cases. And painstaking execution: doing the work, checking the work, fixing the work, and measuring the work. And painstaking monitoring of the measurements to detect when you are entering a failure case and recovery is needed. And the function of Earned Value Analysis is to tell you, as early as 10% of the way into the project, just how late and over budget it will be. And no, you can’t change that: the work takes what the work takes, you can’t “negotiate” the work away.
Now, a CMMI level 5 organization is useful when there must not be a failure. Manned spacecraft are the archetypical example, but any life-critical processing also requires this level of assurance — it typically doesn’t get it, but it requires it. Depending on the organization’s appetite for risk, a CMMI 3 level organization can be adequate for commercial software development: there will be cost overruns and occasional outright failures, but the organization can typically muddle along with only minor trauma. Unfortunately, most software development organizations are CMMI level 1 (the level where “individual heroics” sometimes pull success out of the jaws of failure). You can tell a CMMI level 1 organization as the managers stand around on the delivery deadline date, wondering aloud if they will make the 5:00 PM deadline. (I can answer that one: if you don’t know the answer is “yes” at least two weeks before the deadline, the answer is “no.”) Climbing the CMMI ladder is probably the single most effective investment a software development function can make.
Waterfall development aims at hitting a known target. When developing the software is the wrong time to have controversy about what you are building, and exactly how it is to work. You must surface controversy, bringing forth and resolving all the differing ideas of what the software is to be, before ever firing up a UML editor.
Secret agendas and surprise requirements are deadly in waterfall. And in fact this is the primary weakness of waterfall: secret agendas and surprise requirements are endemic in some organizations — where knowledge is power and fiefdoms jealously hide their internal processes from “outsiders” — and this means waterfall, and the straightforward development effort it grants, are simply out of reach of those organizations. The presence of secret agendas and surprise requirements mean you have to use agile processes. (This is actually the real reason for the widespread use of agile processes. It is not that they are good — they are in fact horribly wasteful — it is that they can, after enough money, produce something useful despite profound organizational dysfunction. That, and since there is no schedule, there is no budget to overrun. And even if management thinks there is a schedule, management has participated in the scheduling, and gets a reminder of what actual progress is every two weeks.)
Typically, the customers for a new system are management, who are distinct from those who will be the users of the new system. These customers will naturally think primarily of the portions of the system with which they will interact: they will mistake the dashboard for the system. The rest of the system is largely invisible to them: it is magic that happens without their awareness, and frequently without their knowledge. This is about as accurate as thinking of “the airport” as “the ticket counter” because that’s the part you interact with. But: parking garage, fuel depot, fuel trucks, fire department, security, hangars, maintenance personnel, meal catering, weather station, control tower, air traffic controllers, ground traffic controllers, taxiways, runways, navigation aids, bathrooms, restaurants, hotels, VIP lounges, gift shops, news stands, flight crew lounges, pilot shop, cleaning staff, landscaping staff, and a thousand other things go into an actual airport. It helps to ensure the system is referred to by its myriad functions, and to always report status on implementing all those functions, even when management is interested only in “the GUI” that they will play with. Management is rarely aware of the details of the work they manage: they may know “orders get processed,” but what an “order” is typically is more complex and less constrained than what they understand, and what “processing” is typically is simply beyond their comprehension. The line workers — those who actually know what work is involved because they are the ones that actually do it — are frequently not even consulted in the software development effort; and even when consulted, they frequently discuss only the mainstream, unexceptional process where everything works as it is supposed to, and not all the details of what sort of interventions need to occur for typical exceptions where things work as they actually do instead of the way they are supposed to.
The requirements gathering phase of waterfall is absolutely critical to project success. Management can give a business overview, and an overall work flow overview, but to get the actual work flow and the actual exception conditions that arise, the line workers must be interviewed in depth, with creative attention to “when does that not work and what do you do instead?” questions. It helps to remember that the workers, not the managers and certainly not the software developers, are the actual experts here. When the purpose of the system is perceived to be to replace workers (instead of to augment them), you can expect to receive misdirection: “we want to keep our jobs” is a not-so-secret agenda. But people are people, so there’s not a great deal to be done about that.
The waterfall methodology gets its name from its phases: each phase feeds into the next, as in a multi-step waterfall each waterfall feeds into the next. Such multi-stage waterfalls are also known as “rapids;” there is a hint in that name. The overall phases are:
Requirements gathering. This has been dealt with in some detail in the previous topic. This phase typically involves detailed interviews with management and workers.
Requirements analysis. After the requirements are known — at least as much as is possible this early in the project — they must be analyzed for consistency (lack of contradiction) and completeness (all possibilities accounted for). What happens if the business rule must be violated? What happens if the “intermittently connected” system goes down for an extended period of time? Does the product need to do anything special about backup, or restart, or disk space management, or database failover, or the thousand other exceptional conditions that occur in practice? This is the phase where you realize even a warehouse needs bathrooms.
After the requirements have been analyzed, a functional specification of the
system is next.
This specification might be somewhat general (“simultaneously debit the
target account and credit the source account”) or might, particularly in
the user interface, be extremely detailed (the screen will look exactly like
There are three pitfalls here: talking in generalities (“the system will
support order entry” instead of exactly what information the system will
accept and process), having non-measurable goals (“the system will be easy
to use” — and how will you determine whether you succeeded?), and
handwaving around exceptional conditions.
You do not get to “figure that out later if it comes up;” the
software is going to do something when (not if) it comes up, so you have to
get all your decisions out of the way before the software gets written.
If something can happen, it will; and many things that cannot happen will
Users are very creative that way.
Otherwise, you will be astonished at how many
exceptions get coded as the programmers reach the end of where prior thought
Architecture. Computer systems come in many forms. Will this one be microservices? Will this one have a data bus or message queue? Will this one have local databases that queue transactions to the main database? Will this one be monolithic? Will this one be event-driven, or multi-threaded, or a combination? Do we want to require the database system to run on the same host as the application? What parts of the system will be desktop applications and what parts running in the datacenter? Represent the system as a block diagram, assign each function to one of the blocks, and assign each block an environment. Where does the data come from? Where does it go? How does it get there? A number of cross-cutting concerns — logging framework, overall application object hierarchy, unit test framework, third party products — are also typically addressed in the architecture. This phase is the equivalent of the architect’s model and proposed structure watercolors.
Design. Each of those blocks is the architecture is itself a computer system. What is its internal structure? Are special implementation techniques needed for it? Have you proven the design to be deadlock free? This is the phase where you do requirements traceability: each feature of the design must involve the satisfaction of at least one requirement, and each requirement must be satisfied by the design. (And that “at least one” is where structural decomposition, rather than functional decomposition, comes in to play. You don’t need to write the same code three times just because you need it in three places, as often happens with agile processes.) These designs typically mention which third-party APIs will be used for what, and typically name the individual classes (but not methods). This phase is the equivalent of the construction blueprints.
Detailed design. Frankly, whether you need this phase depends on the quality of your personnel. If the developers can handle a detailed goal (the design) and an otherwise blank sheet of paper, you don’t really need detailed designs. For junior developers, or if there is something really special for one of the blocks — it needs multi-threading dispatch of events, say — then a detailed design, either narrative or pseudocode, is called for. These designs typically give method signatures and may describe internal state manipulation. This phase is the equivalent of detailed renderings of particularly tricky or decorative construction elements.
Implementation. Once the designs are firm, you can finally start coding. And developing unit tests. And running unit tests. And repairing software as the unit tests fail.
Integration. As code becomes available, it can be integrated into larger assemblies, such as subsystems and systems. These are then tested and repaired as the tests fail and as changes make the existing unit tests fail.
Deployment. Finally, when the software is believed to be complete, it is deployed and the system tests or acceptance tests begun, with rework as needed when these tests fail.
Now, you may notice that testing is a central element of this sequence, once there is software to test. But how do you test a requirements document, or an architecture, or a design?
Defects are a part of production. Anything produced directly by humans, or produced by something humans produced, or produced by something produced by something humans produced, or — you get the idea — will have defects. This should be no surprise: there are hints from modern physics that reality itself has defects, and certainly biology is a lot more of a kludge tower than is physics. The incidence and severity and visibility of defects can be adjusted, to some extent, by skill and process; the existence, not at all. The defects are there.
So it behooves us to understand what happens to defects. Defects are introduced at a pretty constant rate, actually, throughout all the work that is put into a software product: Halstead’s research can be interpreted as indicating an overall human average of about 1 defect injected per 3000 information theoretic “bits” of raw (unreviewed) human creation — about a page and a half of narrative text, or about 100 lines of code. Typically, if a defect is detected and repaired in the phase in which it is introduced, the repair costs are minor: someone rewords a paragraph, or adds in the missing section, or removes the extra verbiage, and life goes on. It is when defects escape to later phases that they get costly. Folklore (but little actual research, alas) holds repair costs typically go up by a factor of 10 or so for each phase in which the defect survives. This seems plausible: a whole lot of designs need to be updated when the architecture changes when the missed requirement is discovered. And some missed requirements are “discard the completed software and start over” severe; I have actually seen this happen. (“The software doesn’t deal with cycles right.” Tappety tap. “The requirements document does not contain the word ‘cycles.’” “How could you forget cycles? They’re absolutely central to the product’s function!” “You’d have to ask the person who came up with the requirements. Fortunately, you’re right here.”)
Defects get introduced, just by the nature of things: you can’t change
that, you can only adapt to it.
Defects either get captured and repaired, or they escape.
So if a requirements defect is captured in the requirements stage, fixing it
costs 1 unit: just some typing, basically.
If it escapes into requirements analysis and is captured there, fixing it costs
an estimated 10 units: still just some typing, but you have to figure out what
places were affected and how to fix them.
If it escapes into the functional specification and is captured there, fixing it
costs an estimated 100 units: a whole lot of scrutiny of a whole lot of things,
a whole lot of typing, probably several diagrams.
Escape into designs, 1000 units: a lot of rework from requirements on down, a
lot of schedule impact.
Escape into code, 10000 units.
Escape into released product, 100000 units: corporate embarrassment, a snarky
article in The Register, loss of market position.
Whereas a simple code defect — it rounds the excise tax the wrong way
— costs 1 unit to fix before release — change
rebuild — and only 10 units after — spend 10 minutes figuring out
how to word the release note so it doesn’t make us sound like complete
idiots — a design flaw or, worse, a requirements defect making it out
through release can easily be a disaster.
All software has bugs and no one, even the taxman, is going to get upset about
a $0.01 discrepancy that will be fixed in the next release.
But forgetting about excise tax completely is a much bigger deal.
So testing the initial non-executable deliverables of the development process is critical: defects that survive these early stages have tremendous impacts on the delivered product. It is crucial to capture and repair these defects. So how do you test a requirement, or an architecture, or a design?
Non-executable artifacts produced by the development process are “tested” with a procedure called a Formal Inspection or Fagan Inspection. This is a process where the document is reviewed in detail individually by a team of Inspectors, then in a meeting of the Inspectors is read out loud line by line by a Reader, with defects called out by the Inspectors and recorded by a Recorder. The Author then takes the resulting list of defects and addresses each one. The Inspectors might call for another, post-repair Inspection, or might assume the Author’s repairs will be adequate, and a decision either way is about the context, not the Author: hard things are hard, and the hardest of things are frequently assigned to the best personnel.
In the Fagan Inspection process as defined, all artifacts produced by the development process are Inspected, including code. I personally have found code Inspection (which at least theoretically takes place prior to testing) to be less useful than unit testing, and the code Inspection often devolves into the detestable “code review” where the “defects” are “well, I wouldn’t have done it that way, so this is wrong.”
Defects are categorized as they are recorded, and typical statistical process control analysis can then be performed on the defects to determine the health of the development process. As the Author and the objecting Inspector are both named for each discovered defect, the defect information is sensitive, and therefore should never be available to management. Summary statistics, showing Inspection effectiveness in removing defects, and possibly some overall characterization of the types of defects removed, can and should be shared, as long as the data cannot be de-anonymized. (Management, at least American corporate management, typically is incapable of recognizing the distinction between process properties and personnel properties, so wind up blaming the workers for inappropriate tooling and ineffective processes. I suspect this is because inappropriate tooling and ineffective processes directly implicate management but personnel “shortcomings” only indirectly implicate management, and it’s emotionally much easier to project responsibility for a problem than to accept it. There is a very vulgar but also accurate rule of thumb that lets you diagnose when you are refusing to accept responsibility: if everywhere you look, all you see is shit, it means your head is stuffed up your ass. Managers who find themselves constantly railing about their workers need to closely examine a type of device called a mirror.)
One curious, and at least initially puzzling, defect attribute in the Fagan Inspection process is “visible/invisible,” which in engineering terms is “failure/fault.” A fault is something that is incorrect. A failure is something that causes incorrect behavior. So a misspelled comment (yes, that is a defect) is a fault but not a failure. A misspelled user message, on the other hand, is a failure: there are circumstances in which a user will be misspelled at. It is a rare discovered defect that is an actual failure, almost all are merely faults. One might think that an defect that does not cause incorrect behavior is innocuous; if it doesn’t make any difference, who cares? Experienced developers have noticed, though, that incorrect behavior — a software “bug” — is typically not the result of a fault, but the result of several faults that happen to interact. You can typically identify half a dozen “causes” of a failure, faults that had to align for the failure to propogate out to visibility; and repair should ideally address several of them, to prevent the failure’s recurrence when faults are reintroduced during maintenance. Fault reintroduction is a common feature of maintenance: after all, mistakes that were easy to make once are easy to make again. An Ishikawa diagram can help identify immediate repair candidates in the software, but a root cause analysis is typically more useful in identifying repair candidates in the software development process itself. (If your process is turning out defective product, why, then, isn’t your process itself defective? Why aren’t you fixing the process so it quits spitting out defects?) So yes, fixing non-failure faults is important, because you can never predict what faults will interact to produce failures.
A common theme in this discussion of waterfall processes is that producing software that actually works seems to be toxic to management. This was also a common theme in the discussion of agile processes, although it was perhaps less obvious there. This is largely a result of MBA programs, and so current management practices, focusing on manufacturing organizations as their archetypical “business.” As mentioned several times in this course, per-unit cost is an independent variable in manufacturing regimes. You can set it to whatever you want, by capitalizing the manufacturing process. Software development is not manufacturing, it is engineering: you get to set two of good, fast, and cheap, and the third can be estimated and influenced but not controlled. The third factor, whatever that is, is influenced by capitalization: but management that is happy to capitalize a $15/hr. assembly line worker at $1M in production equipment is for some reason resistant to capitalizing a $65/hr. software developer at more than a $5k computer and a $2k/yr. Visual Studio subscription. Source code static analyzers? Assembly reference analyzers? API trackers? Test code and branch coverage measurement tools? Who needs those? Imagine if your dentist had the same attitude regarding capitalization of a professional activity: no adjustable chair, or high-intensity lamp, or high-speed drill, or X-Ray imaging system, or irrigation and suction ... just a brace and bit, and a pair of slip-joint pliers. With or without capitalization, the accuracy of the estimate of the third factor depends a great deal on how much you are willing to spend on estimating, and an off-the-cuff WAG or SWAG is worth every penny you paid for it. And there is a vast difference between an estimate and a quote, one management ignores at its peril: if nothing else, a quote factors in risk, and an estimate is typically “if all goes well.”
The concept of “you have to pay for what you get” is actually foreign to current management cultures. Management is regarded as successful exactly for not paying for what it gets. Exporting costs through externalization, creating the tragedy of the commons, and other techniques to decouple price paid from costs incurred are all regarded as management success. Software development is typically funded with income and expense statement money, not balance sheet money, and this causes it to be starved of resources. Software development, at least project-based software development, is actually a form of capital accumulation: durable software artifacts capitalize the organization’s operations. So while you don’t want to be wasteful, you also don’t want to withhold.
So management, mistaking software development for manufacturing, resists tenaciously anything that does not directly result in deliverable product. This includes requirements, designs, functional tests, tooling, usage tests, internal documentation, and of course searching for defects to remove. (Seriously. I interviewed at one place where the standard practice was not giving requirements to their developers. Requirements and desired features were settled on after the product was developed, to be “responsive to customer demand.” Oddly, the products never met the requirements or had the desired features, and the business was quite infuriated by that.)
Management has traditionally been very hostile to Fagan Inspections, except in CMMI Level 5 organizations where they understand these things because they make actual measurements and have actual numbers to look at. Inspections look like gross featherbedding: just review one another’s work the normal way, maybe take a little more care, and you don’t need all these six- and eight-person two-hour meetings for each little thing. But appearances are deceiving, and actual studies have shown that Fagan Inspections are in fact the best software development productivity improvement technology known to the industry. Catching defects early, when they are introduced, makes them cheap; and that repays all the investment several times over. Software developers are human beings: they will introduce defects; you need to deal with it. Fagan Inspections deal with introduced defects in the most definitive way known.
On the other hand, most software development management questions even testing — Just don’t make mistakes, then you don’t need to test! Brilliant! Where’s my bonus? — so asking them to take Inspections on faith is probably a little too much. Unfortunately, even presenting the results of small trials is easily ignored — the trials were small, so can be regarded as easy, so the positive results can be regarded as meaningless: evidence can always be interpreted as consistent with a set of preconceptions. (Have you ever received one of those infuriating performance reviews? “The pieces you developed all slipped into place without a problem and worked without a hitch, while everyone else’s pieces had to be hammered in with great difficulty and needed tons of bug fixes. Obviously you got all the easy pieces. You meet needs, everyone else gets a big raise for going above and beyond.”) There is no good answer, and management can easily — and perhaps even inadvertently, although I frequently question that — hoodwink itself by believing its organization is special (and thus atypical) in some way. “Your organization is just as mediocre as everyone else’s” is a hard message to hear, even — or perhaps especially — when its truth is obvious.
Waterfall software development projects have been extensively studied and modeled. Famous results include Boehm’s COCOMO model and Putnam’s SLIM model, although there are many others. Although the details of these models differ — not surprising, as they deal with different development regimes — cost models typically have two features that might be surprising:
Putnam’s work in particular indicates that the “premium” is an inverse 4th power relationship to duration: delivering in half the time costs 16 times as much, or has a run rate 32 times as large, as does the nominal schedule. Boehm’s work in particular indicates that there is a minumum duration schedule, also: that no matter how much effort is applied, there is a schedule that cannot be further compressed. Boehm finds the minimum schedule is 75% of what he calls “nominal,” but Boehm’s “nominal” is DoD weapons system development, where there is already a premium paid for schedule compression; a more reasonable estimate for commercially developed software is 50% of what you and I would call “nominal.” Although extrapolation to where there is little data is dangerous, and very few people try stretching out the schedule, the minimum cost schedule seems to be about 200-400% of what most would call “nominal,” as long as staffing is an integer numbers of workers. (People cannot actually multitask.)
Putnam’s work also finds there is a maximum effective staffing level, and that level depends on where in the project one is. This is actually to be expected: some parts of the project, such as coding to a detailed design, are embarrassingly parallel, and others, such as architecture, design, or integration testing, are inherently sequential. Each task in the development process has “fan in” (other tasks that must be complete for the task to start) and “fan out” (other tasks that cannot start until this task is done); because the project both starts and ends with single tasks (“decide to do it” and “turn it on”), the amount of work available to be done at any time obviously rises then falls. Putnam found the function to be approximately the Rayleigh distribution in particular, which has a steep peak and long tail. Adjusting staffing to available work to be done is one way a waterfall project can deliver much more quickly than the equivalent agile project.
The nature of the work to be done also varies over time. Architects will be idle except as high-powered analysts once architecture is complete; analysts will be idle except as high-powered coders once detailed design is done; coders will be idle once all code has been assigned. Testers will be idle until there is code, and will then be increasingly busy until the system is very near completion, when testing becomes more sequential. There will of course be rework, but (one hopes) rework will be minor compared to mainline production.
Therefore “best” staffing in waterfall projects is quite dynamic. The overall nature of the work changes as the project progresses; the amount of work that is available to do changes, profoundly, as the project progresses; the amount of inter-worker coordination required starts and ends very high, and minimizes during the code production phase; and so forth. Actual organizations typically cannot adjust to even a doubling in staffing, much less the order of magnitude needed to fully exploit the parallelism in waterfall’s intermediate phases, or to the constant churn in specialties called for; this naturally stretches out project duration, as any sensible fixed staffing level will fall substantially short of that needed to achieve minimum duration, and as “jacks of all trades” must be used instead of a parade of highly skilled specialists being introduced, oriented, used for a short period of time, and retired from the project. A very large, very active software development organization — one where there is always another project needing each specialty — might be able to optimally staff waterfall projects; but few others.
For the rest of us, there is project management software. Typical project management software lets you plan a project as a network of tasks, with inter-task sequencing dependencies: this task cannot start until these other tasks are complete, or that external event occurs. Each task has an estimated duration, typically dependent on labor content. An algorithm called PERT can then determine the “critical path” of tasks: the sequence of tasks that takes the longest and that drives the overall schedule. Tasks not on the critical path each have “slack,” which let them be delayed without affecting the overall schedule. The staffing demand will closely approximate Putnam’s Rayleigh distribution; if it’s not, you’ve forgotten tasks and dependencies. Tasks can be assigned to identified labor resources (named personnel) or to categories of labor resources (programmer, tester). A “labor leveling” algorithm can then adjust task start dates to reflect the limits of the labor pool involved. The result will be an optimistic, “if all goes well” schedule — that reflects the quality of the task labor estimates, which are always themselves optimistic, and the quality of the task list, which is always incomplete. Actually planning more than a few steps in the future is beyond human capabilities, and creative task time estimation is intrinsically WAGs.
Do not believe the schedule that comes out of the project management software: “in theory, there is no difference between theory and practice.” Project management software does not tell you when the project will finish: it tells you the earliest date for which you cannot prove the project will not yet be done. (Think about that one for a minute or ten. It’s important.)
As mentioned elsewhere in this course, research cannot be scheduled. Most software actually solves new problem variations: if you can solve your problem with off-the-shelf software, you are a fool to build new software yourself. Therefore research into your problem is needed. Research in software development typically involves prototypes and proof of concept code. As Fred Brooks notes, “plan to throw one away; you will anyhow.” The prototype or proof of concept is the throwaway. During analysis and design, there will be half a dozen places where you realize you actually have no idea how to do what needs to be done there. This is where you need to prototype. Start prototypes early, because you need them to be done — and you, by definition, have no idea how long it will take to do them — when it comes time to do it for real. Prototypes and proof of concept code are in addition to product development and are not part of the project schedule, even though they consume project resources. So yes, your most senior personnel will be off figuring out how to do those few things, just when you need them for architecture and designs. No one said it was easy.
When expense is literally no object — as with the Manhattan Project during World War II — the question of “sensible” staffing levels disappears, and waterfall can reach its peak production rate. There are other tricks, taken from military “crash” projects, that can even further pull in delivery. These are:
Overtime: working longer days, harder, but definitely not smarter. This is the least effective technique, and beloved of project managers working with fixed salary personnel because it hides the costs in the future. Humans cannot actually keep focused 40 hours a week, but can typically sequence work to keep the creative parts in the 20 to 30 hours a week they can focus, and use the remaining time for uncreative rote work. Beyond 50 or so hours, however, error rates skyrocket, even in rote work, leading to net negative productivity in the extra labor.
Multiple shifts: there are 24 hours in a day, and shift work can (if handled well, which is itself difficult) have them all productive. Shift changeover in creative work typically takes an hour or two of handoff, as the going off-duty personnel bring the fresh personnel up to speed on the balls in the air at the moment. Quality can become an issue: balls will get dropped in the handoff.
Racing implementations: there is an amount of randomness in the time required for creative production, and this randomness can be exploited by assigning the same work to several teams or persons, and taking the one that finishes first. This turns the task duration from a single observation of the task duration distribution into an order statistic of that distribution, which can be a significant benefit if the distribution has a large variance. There are clearly morale implications, though, of workers seeing their work repeatedly discarded just as it is nearing completion; there are some reports that this can be alleviated to some extent by treating the race protocol as an inter-team rivalry.
Fragmentation: tasks were identified on the basis of indivisibility: this task is a single unit of work. It is sometimes possible to fragment tasks into multiple units which can then be worked on in parallel and integrated when the individual units are complete. There are risk and quality implications of this approach as conceptual integrity is sacrificed. Some of the time, this approach will simply fail; other times, it may appear to succeed; other times, it may actually succeed. Racing an unfragmented implementation against a fragmented implementation of a given task, and exhaustive testing, is a wise schedule precaution. Using the fragmented implementation to satisfy immediate downstream dependencies during development, and replacing it with the unfragmented implementation when it becomes available, is typically a wise quality precaution.
Early start and rework: development of one component can be started before work is complete on predecessor components, with concomitant rework when the predecessor components stabilize. For example, code can serve to satisfy downstream dependencies when it is initially developed, and then be corrected as unit tests find defects. The corrections are then reflected in breaking downstream code for which the assumptions of their environment have become incorrect: rework. In commercial software development, this technique reaches its zenith in “concurrent development,” where hardware and the software using the hardware are developed simultaneously. Whether this approach actually increases delivery speed is debated; that it increases frustration is not.
Using any of these techniques dramatically increases risk, worker burn-out rates, technical debt, and of course immediate cost. These should be reserved for meeting truly existential issues, and if being first to market is a truly existential issue you have other, much larger, problems.
Waterfall, as a process, is very vulnerable to bad management; and most management, as with most everything else, is of average quality, with half of it worse than that. The impact of bad management becomes visible at the end of the process, when the software cannot be delivered: typically everything looks (or can be made to look) good until about 90% of the way through the schedule, when the façades start to break down when final conformance reports cannot be produced because the software cannot run because the software cannot be built because most of the software does not actually exist yet. Suppressing the preliminary design and analysis work merely pushes effective work off to past the date failure is recognized, to the “can we salvage something from this mess?” phase, where it becomes maintenance rather than development and costs five to ten times as much is it should. Basically, a mismanaged waterfall project fails, and frequently is followed up with a much longer agile project trying to salvage something useful.
Agile, in constast, is very vulnerable to a bad product owner and moderately vulnerable to a bad scrum master, but vulnerable to very little else; and because of the short cycle times troubling symptoms appear almost immediately in the form of “follow-on user stories” and rejected work. Agile also maintains software into existence intentionally, rather than as a disaster response, so avoids the massive sticker shock of a failed waterfall project. It is obvious from the start that an agile project produces at a low rate; and because at any point there is running software, it is close to impossible to delude oneself as to the state of the product.
Waterfall development is a project management discipline, and so it deals in projects. Projects have defined deliverables, and projects also have ends. What to do with your development team after they deliver is an actual question that needs an actual answer in waterfall development. Announced plans to, or a history of, penalizing success with layoffs or ending contracts sets up a powerful negative incentive to progress; and doing so unexpectedly is something you get to do exactly once. Thereafter, it ensures you, like companies requiring but not sponsoring security clearances, will forever after be stuck with workers who cannot get jobs elsewhere.
Financially, waterfall produces capital assets, which in turn capitalize the business processes they support, making them more effective and efficient, increasing the business throughput of the organization. Within this framework, and while acknowledging the potential pitfalls, waterfall is an effective, economic software development strategy. Just don’t believe anything that purports to be a schedule that comes out of a tool: “it’s tough to make predictions, especially about the future.” When your grizzled veteran developer gets a far-away look in his eye and says “we are five quarters from delivery” when the tool says six months, you are in fact five quarters from delivery. Start taking notes of all the things you wind up having to do that you forgot to put in the project plan, so you can put them into the next project plan.