The role of object-oriented metrics
Bertrand Meyer
A shorter variant of this article appeared in Computer (IEEE), as part of the Component and Object Technology department, in the November 1998 issue.
The request I hear most commonly, when talking to project managers us ing object technology or about to use it, is for more measurement tools. Some of those people would kill for anything that can give them some ki nd of quantitative grasp on the software development process.
There is in fact an extensive literature on software metrics, includi ng for object-oriented development, but surprisingly few publications ar e of direct use to actual projects. Those who are often go back quite fa r in time; an example is Barry Boehm's 1981 Software Engineering Econ omics (Prentice Hall), with its COCOMO cost predictio model: despite the existence of many more recent works on the subject, it is still amo ng the most practical sources of quantitative information and methodolog y.
Metrics are not everything. Lord Kelvin's famous observation that
When you cannot measure, when you cannot express [what you are speaking about] in numbers, your knowledge is of a meager and unsatisfactory kind: you have scarcely, in your thoughts, advanced to the stage of a science is exaggerated. Large parts of mathematics, including most of logic, are not quantitative; but we don't dismiss them as non-scientific. This also puts in perspective some of the comments published recently in this magazine (July 1998) by Walter Tichy and Marvin Zelkowitz on the need for more experimentation, which were largely a plea for more quantitative data. I agree with their central argument - that we need to submit our hypotheses to the test of experience; but when Tichy writes
Zelkowitz and Wallace also surveyed journals in physi cs, psychology, and anthropology and again found much smaller percentage s of unvalidated papers [i.e. papers not supported by quantitative evalu ation] than in computer science one cannot help thinking: physics, OK - but do we really want to tak e psychology as the paragon of how «scientific» computer science should be? I don't think so. In an engineering discipline we cannot tole rate the fuzziness that is probably inevitable in social sciences. If we are looking for rigor, the tools of mathematical logic and formal reaso ning are crucial, even though they are not quantitative.
Still, we need better quantitative tools. Numbers help us understand and control the engineering process. In this column I will present a cla ssification of software metrics and five basic rules for their applicati on.
Types of metrics
The first rule of quantitative software evaluation is that if we coll ect or compute numbers we must have a specific intent related to underst anding, controlling or improving software and its production.
This implies that there are two broad kinds of metrics: product metrics that measure properties of the software products; and proce ss metrics that measure properties of the process used to obtained t hese products.
Product metrics include two categories: external product metrics cove r properties visible to the users of a product; internal product metrics cover properties visible only to the development team. External product metrics include:
- Product non-reliability metrics, assessing the number of remaining defects.
- Functionality metrics, assessing how much useful functionality the product provides.
- Performance metrics, assessing a product's use of available resour ces: computation speed, space occupancy.
- Usability metrics, assessing a product's ease of learning and ease
of use.
- Cost metrics, assessing the cost of purchasing and using a product.
Internal product metrics include:
- Size metrics, providing measures of how big a product is internall y.
- Complexity metrics (closely related to size), assessing how comple x a product is.
- Style metrics, assessing adherence to writing guidelines for produ ct components (programs and documents).
Process metrics include:
- Cost metrics, measuring the cost of a project, or of some project activities (for example original development, maintenance, documentation ).
- Effort metrics (a subcategory of cost metrics), estimating the hum an part of the cost and typically measured in person-days or person-mont hs.
- Advancement metrics, estimating the degree of completion of a prod uct under construction.
- Process non-reliability metrics, assessing the number of defects u ncovered so far.
- Reuse metrics, assessing how much of a development benefited from earlier developments.
Internal and external metrics
The second rule is that internal and product metrics should be design ed to mirror relevant external metrics as closely as possible.
Clearly, the only metrics of interest in the long run are external me trics, which assess the result of our work as perceptible by our market. Internal metrics and product metrics help us improve this product and t he process of producing it. They should always be designed so as to be e ventually relevant to external metrics.
Object technology is particularly useful here because of its seamless ness properties, which reduces the gap between problem structure and pro gram structure (the «Direct Mapping» property). In particular, one may a rgue that in an object-oriented context the notion of function point, a widely accepted measure of functionality, can be replaced by a much more objective measure: the number of exported features (operations) of rele vant classes, which requires no human decision and can be measured trivi ally by a simple parsing tool.
Designing metrics
The third rule is that any metric applied to a product or project sho uld be justified by a clear theory of what property the metric is intend ed to help estimate.
The set of things we can measure is infinite, and most of them are no t interesting. For example I can write a tool to compute the sum of all ASCII character codes in any program, modulo 53, but this is unlikely t o yield anything of interest to product developers, product users, or pr oject managers.
A simple example is a set of measurements that we performed some time ago on the public-domain EiffelBase library of fundamental data structu res and algorithms, reported in the book Reusable Software (Prentice Hall). One of the things we counted was the number of arguments to a feature (attribute or routine) over 150 classes and 1850 features, and found an average of 0.4 and a ma ximum of three, with 97% of the features having two or less. We were not measuring this particular property in the blind: it was connected to a very precise hypothesis that the simplicity of such interfaces is a key component of the ease of use and learning (and hence the potential succe ss) of a reusable component library. These figures show a huge decrease as compared to the average number of arguments for typical non-OO subro utine libraries, often 5 or more, sometimes as much as 10. (Note that a C or Fortran subroutine has one more argument than the corresponding OO feature.)
Sometimes people are skeptical of the reuse claims of object technolo gy; after all, their argument goes, the idea of reuse has been around fo r a long time, so what's so special with objects? Quantitative arguments such as provided by the EiffelBase measurements provide some concrete e vidence to back the OO claims.
The second rule requires a theory, and implies that the measurements will only be as good as the theory. Indeed, the correlation between smal l number of feature arguments and ease of library use is only a hypothes is. Authors such as Zelkowitz and Tichy might argue that the hypothesis must be validated through experimental correlation with measures of ease of use. They would have a point, but the first step is to have a theory and make it explicit. Experimental validation is seldom easy anyway, gi ven the setup of many experiments, which often use students under the so metimes dubious assumption that their reactions can be used to predict t he behavior of professional programmers. In addition, it is very hard to control all the variables. For example I recently found out, by going b ack to the source, that a nineteen-seventies study often used to support the use of semicolons as terminators rather than separators seemed to r ely on an unrealistic assumption which casts doubt on the results.
Two PhD theses at Monash University, by Jon Avotins and Glenn Maughan under the supervision of Christine Mingins, have applied these ideas fu rther by producing a «Quality Suite for Reusable Software». Starting fro m several hundred informal methodological rules in the book «Reusable So ftware» and others, they identified the elements of these rules that cou ld be subject to quantitative evaluation, defined the corresponding metr ics, and produced tools that evaluate these metrics on submitted softwar e. Project managers or developers using these tools can assess the value s of these measurements on their products.
In particular, you can compare the resulting values to industry-wide standards or to averages measured over your previous projects. This brin gs the fourth rule, which states that measurements are usually most usef ul in relative terms.
Calibrating metrics
More precisely, the fourth rule is that most measurements are only me aningful after calibration and comparison to earlier results.
This is particularly true of cost and reliability metrics. A sophisti cated cost model such as COCOMO will become more and more useful as you apply it to successive projects and use the results to calibrate the mod el's parameters to your own context. As you move on to new projects, you can use the model with more and more confidence based on comparisons wi th other projects.
Similarly, many internal product metrics are particularly useful when taken relatively. Presented with an average argument count measure of 4 for your newest library, you will not necessarily know what it means -- good, bad, irrelevant? Assessed against published measures of goodness, or against measures for previous projects in your team, it will become more meaningful. Particularly significant are outlying points: if the av erage value for a certain property is 5 with a standard deviation of 2, and you measure 10 for a new development, it's probably worth checking f urther, assuming of course (rule 2) that there is some theory to support the assumption that the measure is relevant. This is where tools such a s the Monash suite can be particularly useful.
Metrics and the measuring process
The fifth rule is that the benefits of a metrics program lie in the m easuring process as well as in its results.
The software metrics literature often describes complex models purpor ting to help predict various properties of software products and process es by measuring other properties. It also contains lots of controversy a bout the value of the models and their predictions. But even if we remai n theoretically skeptical of some of the models, we shouldn't throw away the corresponding measurements. The very process of collecting these me asurements leads (as long as we confine ourselves to measurements that a re meaningful, at least by some informal criteria) to a better organizat ion of the software process and a better understanding of what we are do ing. This idea explains the attraction and usefulness of process guideli nes such as the Software Engineering Institute's Capability Maturity Mod el, which encourage organizations to monitor their processes and make th em repeatable, in part through measurement. To quote Emmanuel Girard, a software metrics expert, in his advice for software managers: before you take any measures, take measurements.