Software measurement and estimation a practical approach pdf




















Howard Published 20 February Computer Science Kybernetes We may not be able to make you love reading, but software measurement and estimation a practical approach will lead you to love reading starting from now. Book is the window to open the new world. The world that you want is in the better stage and level.

World will always guide you to even the prestige stage of the life. You know, this is some of how reading will give you the kindness. In this case, more books you read more knowledge you know, but it can mean also the bore is full. View via Publisher. Save to Library Save. Create Alert Alert. Share This Paper. Background Citations. Some organizations use these to drive 2. For example, the Software Engineering Institute software maturity model requires the measurement of system size, project time, level of effort, and software defects.

SEI integrates these measures with the required processes in support of Project Management and Continuous Improvement. We consider the SEI set, along with productivity, to be the minimal set for any organization. To reiterate, we consider the minimal set to be: System size Project duration Effort Defects Productivity Different industries may have their own standards for metrics, reliability, and safety. The mechanism for collecting the metrics data must be well understood and agreed to before implementing the program.

Goal Question Metric Mechanism: This includes identifying who will be responsible for ensuring the collection and reporting of valid data, how frequently the data will be collected, how frequently the data will be reported, and what infrastructure e.

Data is incomplete or invalid because no one has ensured that it is entered in a timely and correct manner. Data is unavailable when needed. Project budgets are overrun due to the cost of the metrics program infrastructure. Project schedules are overrun due to unplanned staff time for data entry and validation. First, what to measure certainly varies based on the current position in the software development and software product lifecycles. For example, code inspection metrics are collected and monitored during the code development time in the lifecycle.

Reliability of the software may need to be measured in the early stages of product delivery and deployment, while cost to maintain might be the area of interest when a product is near the end of its life.

Second, business needs change over time and the metrics program must change to remain in alignment. For example, if customer surveys show dissatisfaction with product reliability, a system availability metric may need to be created and monitored. If competitors are beating our products to market with similar functionality, we may need to establish development process measures that will allow us to focus on the most time consuming areas in order to drive improvement.

This may necessitate selecting a different metric that supports the goal or changing the way the existing metric is calculated. Gaining agreement from all stakeholders at the start will ensure that the metrics needed to make decisions and assess goal attainment are available when and where they are needed.

Every metrics program should be revisited regularly to ensure continuing alignment with changing business and project needs. All of your friends, including your boss, think they look great. You present them to your director. She is not thrilled. What might have gone wrong? He is responsible for system testing. What decisions might he need to make? What might be some reasonable metrics? You decide to start with your own job and lead by example.

Kaplan and D. Basili, G. Caldiera, H. Rombach, and R. McGary, D. Card, C. Jones, B. Layman, W. Clark, J. Dean, and F. What rules did you decide to use? As with most everything, it depends on how you count and what the rules are. Did you count comment lines? What does complexity even mean? How do you measure productivity? And what is that productivity? Better yet, if someone else can program the same function in one line of code, in one hour, what is their productivity?

Whose productivity is better? It is also a relatively new discipline: we still are learning. A model is an abstraction, which strips away unnecessary details and views an entity or concept from a particular perspective.

Models allow us to focus on the important parts, ignore those that are irrelevant, and hypothesize and reason about an entity. Models make measurement possible. We must have models of whatever we want to measure. For example, say we want to know how much of the total system development effort is testing. If our model starts with unit test by the programmer, it is a different model and will give different results than one that includes only system test. There are three types of models you can use—text, diagrammatic, and algorithmic—that is, words, pictures, and numbers.

In general, effort is a function of size and results in cost. Features: The requirements of the product to be developed. Size: The magnitude of the product to be developed. In general, size is a function of features. Defects: The incompleteness of the product. In general, defects are a function of size and schedule. Schedule: The total development time; completion times for principal milestones.

In general, schedule is a function of effort and resources. Resources: The number of developers applied to the product development. This text model has advantages and disadvantages. But notice that this text model describes software development in such a way that we can discuss it, measure it, and predict it: if the size changes, the number of defects will change.

These tend to work well, due to the breadth of meaning we associate with metaphors. The downside is that these models can limit, as all models, our creative thinking as they structure it [2].

Some examples of text model metaphors for software development are: The Wild, Wild West Agile Development both a metaphor and a name. Death March. Software Factory. Notice how each metaphor evokes a different mental image and response. You probably can envision the environment, the types of people, and processes from just the few words. Brooks [3]. What is it? They allow you to model the entities, the relationships between them, and their dynamics.

Use one of the formal diagram modeling techniques if you will be doing extensive modeling. Figure 3. Diagrammatic model of software development. What would it look like? Although a picture may be worth a thousand words, sometimes using the right words is best.

In the right situations, they can be extremely powerful, as they can clearly describe the relationship between entities. The metric is the average RT within a typical hour. Diagrammatic: See Figure 3. Response time model. We must have a model of whatever we are measuring. Models document entities within a system and their interactions in a consistent manner. We cannot interpret data without reference to an underlying model, nor can we really reason about a system without an underlying model.

Models allow us to consider improvement strategies and predict the expected results. The Pantometric Paradigm [6] is a simple method to produce a purely visual and quantitative model of anything within the material world. You can use it to create an initial model that can evolve to meet your needs. The simple process is: 1. Strip away all extraneous information. Visualize it on a piece of paper or in your head. Divide it in fact or in your imagination into equal parts. Then measure it e. You can now manipulate it, reason about it, experiment with it, and evolve it.

One model could be the lectures. Another might be the students taking the class. If it were the lecture, the model could be a list of the lectures by week.

We could then measure the number of lectures and the number of lectures by topic. An example of this methodology, using the same response time example, is shown in Figure 3. Meta-model for metrics. Example using meta-model for response time. What you measure is what you get. Or more accurately, it is what you get people to do. Measurement typically causes people to focus on whatever it takes to score well on that measurement. You can easily cause unproductive behavior as people try to look the best on the wrong metrics.

If you measure people by the number of lines of code they produce, they will create a larger system. If you measure people by the number of defects they create, the volume of defects will somehow start to decrease. I worked as an accountant in a paper mill where my boss decided that it would improve motivation to split a bonus between the two shifts based on what percentage of the total production each one accomplished.

The workers quickly realized that it was easier to sabotage the next shift than to make more paper. Co-workers put glue in locks, loosened nuts on equipment so it would fall apart, you name it. The bonus scheme was abandoned after about ten days, to avoid allout civil war. They studied manufacturing and believed that if the environment was changed in certain ways such as more light , it would improve productivity.

And they were right. Productivity improved. Then they changed another factor and measured the result. Then they changed it back to the original state. What was the conclusion? Whatever management paid attention to and measured improved. The difference was not the changes. It was the attention and measurement. We only report that it occurs. It drives behavior. If you ever question it, remember that when your 5th grade teacher told you that you had to write neatly to get an A, you did your best to write neatly.

Measurement theory is a branch of applied mathematics. It formalizes our intuition about the way the world actually works. In this theory, intuition is the starting point for all measurement [1]. Any data manipulation of the measurements must preserve the relationships that we observe between the real-world entities that we are measuring. Measurement theory allows us to validly analyze and manipulate our measurement data.

As an example, consider a customer satisfaction survey. Your users were asked what they thought of your customer support. The possible answers were: 1—Excellent 2—Good 3—Average 4—Inadequate 5—Poor The result was that you ended up with a score of 3. So, you have average support, and it means you just need to improve a bit, right? Well, maybe. Or maybe not. No one thought your customer support was average.

Measurement theory dicates that taking the average of this kind of data is invalid and can give misleading results. Measurement theory allows us to use statistics and probability to understand quantitatively the possible variances, ranges, and types of errors in the data.

Assume that you are responsible for estimating the effort and schedule for a new project. You use a prediction model for development effort, and it predicts that, on average, the effort will be 10 staff years with a duration of 1 year. What would you then tell your boss as your estimate? Maybe you would pad it by one staff year, just to be safe. What if you knew that the standard deviation is 2 staff years? Now what would you estimate?

Or 10? What if, instead, you knew that the standard deviation was 4 months? Then what would you estimate? By the way, if we were the boss, we would prefer an answer that included the range with probabilities. We would want the additional information to better trade-off the risks and the rewards. What is the difference? Which is better?

The balance scale is a relative scale. It compares the weights of objects. The bathroom scale is an absolute scale that gives you an absolute number. Initially, you may think the second scale is better. It does give you an answer. Bathroom scales frequently are inaccurate. And as you think more and more about it, you probably reach the conclusion that the scales are neither better nor worse, just different ways of measuring.

For example, military projects are consistent in that the number of lines of code produced per staff hour tends to be less than an MIS project. Alternatively, you may wish to count the number of the different types of projects in your data sample. However, with nominal scales, there is no sense of ordering between different types, based on the categories. Using the nominal scale, you do not know how they compare. You need additional relationships and attributes, such as size and complexity, to make comparisons.

An example would be the criticality of trouble reports TRs. However, you do not know if two major TRs are worse than four minor TRs. The interval scale is an ordinal scale with consistent intervals between points on the scale, such that addition and subtraction make sense, but not multiplication and division.

Examples are the Fahrenheit and Centigrade temperature scales, where it does make sense to say that if it is in Miami and in Buffalo, then it is warmer in Miami, but it makes no sense to say that it is 8 times hotter in Miami. Interval scales are rarely used in software measurement. The ratio scale is an ordered, interval scale, where the intervals between the points are constant, and all arithmetic operations are valid. With software measurement, examples abound, such as defects per module or lines of code developed per staff month.

The absolute scale is a ratio scale that is a count of the number of occurrences, such as the number of errors detected. The only valid values are zero and positive integers. What kind of scale is the face scale?

Answer: Ordinal. It is an ordered scale, but the intervals between items are not necessarily consistent. Scales can be subjective or objective. For example: Likert Scale: This program is very reliable. Do you. Strongly agree, agree, neither agree nor disagree, disagree, strongly disagree Verbal Frequency Scale: How often does this program fail? Always, often, sometimes, seldom, never These subjective scales are ordinal scales; there is an implied order and relationship.

Averages or ratios are not valid although frequently used. Median is the middle occurrence in an ordered set. Mode is the most frequent occurrence. For the nominal scale, only mode makes sense, since there is no ordering. For ordinal scales, mode and median apply, but mean is irrelevant. For example, if you used a Likert Scale e.

The most meaningful statements you could make would speak to the bimodal distribution. For interval, ratio, and absolute scales, mean, mode, and median are all meaningful and relevant. Not only do we need to describe the sameness, but we also need to describe the differences and variability. The standard and simplest measures of variability are range, deviation, variance, standard deviation, and index of variation.

Range: The range of values for the mapping of the data that is calculated by subtracting the smallest from the largest. Deviation: The distance away from the mean of the measurement. Then the deviations for our modules A, B, and C are 18, 4, and 22, respectively. Variance: A measurement of spread, which is calculated differently if it is for a full population or a sample of a population.

To pick the right SD formula, you need to decide if your data is a complete population or a sample of that population. In this example, we assume that the data is for a complete population, that is, there are only three modules to consider. Index of Variation IV : An index that indicates the reliability of the measurement. This metric normalizes standard deviation by dividing it by the mean.

A SD of 1 with a mean of is entirely different from a SD of 1 with a mean of 2. The lower the index of variation, the less the variation. With low variance, the values cluster around the mean. With higher variance, they spread farther out. Standard deviation is the classic measurement of variation.

As the variance increases, so does the standard deviation. Examples of variance. Which should you choose? And what exactly is the difference? Valid measurements measure what we intend to measure. A reliable measure is one that is consistent. Assume that you have a tool that counts defects, but it only counts the defects discovered during system and acceptance testing.

This is a reliable measure. The tool will consistently count the defects. It is not totally valid as a count of all defects in a project, but it is reliable and useful.

Another example is a watch. This is a reliable measure of time, but not valid. The index of variation is one measure of the reliability of a measurement: the smaller the IV, the more reliable the metric.

Theoretically, there are three types of metric validity [1]: construct, criterionrelated, and content. Construct validity refers to whether the metric is constructed correctly.

For example, an invalid construct would be one that uses the mean as a measure of central tendency for a metric that has an ordinal scale.

So far within this chapter, we have been discussing construct validity. Predictive measures consist of both mathematical models and predictive procedures. For example:. Function points can predict lines of code LOC based on language. The predictive procedures are the method for determining A, B, and C. Predictive procedures need to be validated to understand their inherent accuracy.

The validation is done by comparing empirical data with the outcomes predicted by the predictive models. Productivity measures based on functionality produced rather than lines of code produced have higher content validity.

The traditional belief is that valid metrics are somehow better than reliable ones, if you have to choose. Both do serve their purposes, and in many cases, the reliability of a measurement that is invalid think of the scale that always underweighs by 10 pounds can be attenuated by always adding 10 pounds to the result.

You might measure the wrong part of a foot and always get the wrong measurement. Or you might make an error in reading the measurement, reading a little too high or a little too low. Foot measuring stick [10]. It will show up every time. It affects the validity of the measurement. In software development, if you were recording and calculating defects found during code inspections, a systematic error would occur if no one compared the code to the requirements, so those defects were never found or counted.

Random errors would be errors that the inspector made in recording the number and types of defects found. Random errors increase the variance but do not change the mean. Systematic errors change the mean but not the variance. Distributions of X with and without random error. Distributions of X with and without systematic error. Valid measurements can have random errors. Reliable measurements can have systematic errors.

You can reduce and manage measurement errors by: 1. Attenuating the measurement for systematic errors. Test pilot your measurements, looking for both systematic errors and sources of random errors. Triangulate your metrics.

Use different measurements and measurement processes without the same systematic errors. All effort estimation methods have high variance. Use at least three methods, and look at the mean and standard deviations of the results. Use statistics to determine and adjust for random errors. If reality is the vertical line on the left, then the distance from the actual mean to reality is the bias, or systematic error.

Measurement error. Precision of a measurement represents the size of differentiation possible with a measurement. In software engineering, we measure our processes and our products, and use those measures to control, predict, monitor, and understand. Many of these measures have inherent variations, making highly precise measurements impossible or irrelevant. Consider counting the size of the code.

Estimation is not precise by its very nature, and software estimation is no exception. This lack of precision means that reporting an estimate to be LOC is misleading. It implies that you actually believe this estimate is a better estimate than LOC, which is highly doubtful. An entity and the measurement of it are not equivalent. For example, aspects of the quality of software can be measured by using defect data, or defect data and performance data, but these are only certain aspects, not the whole story.

For each concept we want to measure, such as quality or complexity, we need to decide which aspects are the most relevant and important to us and use those aspects to create a model of the concept.

There are different techniques to take these abstract concepts and create real-world metrics that measure them. Measurement theory and statistics are tools that allow us to validly analyze and compare the metrics data. This chapter contains an introduction to both, including measurement scales, measures of central tendencies and variation, and measurement error. These concepts are the basic building blocks of measurement and metrics.

What do you think will happen? We get the following data: 10, 20, 15, 18, and Calculate the mean, SD, and IV for the number of defects per module. What would be the valid way to report the central tendency of the answers to this survey question? What conclusions can I draw from the following table?

What conclusions can I now draw from the previous data? What is the scale of this index? Is it a reliable measure? Why or why not? You tell him it is an invalid metric. To which type of metric in validity are you referring: construct, criterion, or content?

Is this metric reliable? How would you validate it? Rather than relying on instinct, the authors of Software Measurement and Estimation offer a new, tested approach that includes the quantitative tools, data, and knowledge needed to make sound estimations. The text begins with the foundations of measurement, identifies the appropriate metrics, and then focuses on techniques and tools for estimating the effort needed to reach a given level of quality and performance for a software project.

All the factors that impact estimations are thoroughly examined, giving you the tools needed to regularly adjust and improve your estimations to complete a project on time, within budget, and at an expected level of quality.

With its classroom-tested features, this is an excellent textbook for advanced undergraduate-level and graduate students in computer science and software engineering. Micro and nanosystems represent a major scientific and technological challenge, with actual and potential applications in almost all fields of the human activity. The aim of the present book is to present how concepts from dynamical control systems modeling, estimation, observation, identification, feedback control can be adapted and applied to the development of original very small-scale systems and of their human interfaces.

The application fields presented here come from micro and nanorobotics, biochips, near-field microscopy AFM and STM and nanosystems networks.

Alina Voda has drawn contributions from leading experts at top research universities in France to produce a first overview of the major role that control systems science can play in t Measurement and Instrumentation Theory and Application.



0コメント

  • 1000 / 1000