2020-02-08; published 2021-12-31
I recently asked some engineering managers that I am acquainted with about how they handle story points: Do they think it's best to connect story points to a particular unit of time? And if not, how do they use story points to provide usable metrics? We compared some notes, and this article is the result / outcome / summary of those conversations as well as my own experiences working with a variety of engineering teams.
Story points are used in software engineering as a tool to estimate the amount of effort that a particular task will take to complete, from start to finish. Story points can be very useful to a team, its managers, and the clients:
Story points create a great conversation framework for discussions between customers / business owners, project managers, and developers.
Story point are a way of measuring the value or amount of work that developers have completed. What is the best measure of development work?
Let's suppose that we have a project with a number of features. Each of these features requires implementing a certain amount of working, tested, peer-reviewed code that passes all tests and can be merged into the main development branch. For each feature, we can make an estimate based on the amount of code, its complexity, and its uncertainty.
For a given feature, when we understand the amount and complexity of the code to be implemented, along with the uncertainty in implementing it, we are in a good position to understand the amount of work that feature will require to implement. The question is, how do we convert our understanding of the feature into numbers that can be used for tracking purposes?
When we estimate story points for a feature, we need to convert our qualitative understanding of the feature into a measurement — a number of points. There are two ways to do this:
On the face of it, option #1 seems a lot more straightforward. So why not do that? To answer that question, let's see how it might work out in practice.
Let's suppose we have three developers on our team: Alice (the rock star) is a senior developer who quickly writes correct, performant code that meets all requirements for the feature she is implementing. Bob (the bungler) is as experienced as Alice, but he's not as smart or careful, so it takes him twice as long to finish the same feature as Alice, and sometimes he has to make significant changes during code review, such as adding missed test cases and getting his feature to pass all the automated checks on the continuous integration (CI) server. Then there's Chris (the junior developer), who usually takes five times as long as Alice to finish implementing the same feature, and twice as long as Bob to pass the testing and review phase.
Why not just assign story points a particular amount of time? It's simpler, isn't it? Not really, and it's unreliable, for three reasons:
In short, developer time estimates are unreliable. If we rely on developers' time estimates, then we won't really understand the project's velocity, we won't have any measurable insight into the relative value of each member of the team, and our project completion estimates will be wrong.
Thankfully, we don't ever need developers to make time estimates for their tasks in order to obtain accurate analytics of important measures like individual and team velocity, estimated project size, and estimated project completion date.
Instead, we just need developers to accurately rank how much effort a particular feature will require relative to other features in the list: Does this feature require more, less, or approximately the same amount of effort as the other features?
Here's how it works: Instead of asking developers to estimate tasks in terms of time, we ask developers to estimate the amount of work required to complete a task by comparing it to other tasks that they have done and are doing. Then, rather than assigning time values directly to tasks (which developers cannot know), we use the following rules:
It is common practice to limit the number of point values available to choose from, since having more than 5 or 6 choices becomes unmanageable in practice. It is most common to use the sequence {1, 2, 3, 5, 8, 13}, which provides fine-grained choices for small tasks and only a few spread-out choices for larger tasks. This limitation might seem to lead to inaccurate estimates, but in practice allows our work estimates to be accurate enough for all useful purposes, and it allows the development team not to get bogged down in the difference between a 4 or 5 point task.
Let's compare these "relative story points" (point value based on comparison between tasks) to time-based story points in terms of reliability and accuracy (each of the following list items corresponds to the time-based list above):
It turns out that developers are much better at estimating and ranking relative work effort than they are at estimating work time. In other words: Points based on estimated work effort are much more reliable than points based on estimated work time.
For relative-value story points to work well, it is important that their value be stable. To achieve stability, we play a game together that we might call "Task Estimation" (often pronounced "project estimating" or "backlog grooming" or "sprint planning"). This game has the following procedure:
By playing the "Task Estimation" game regularly (at least once per iteration), the team will quickly come to consensus on what their story points mean (as a relative measure of work). And so the value of the points will be stable.
Now we have story points for all of our tasks. How can we use these story points to track individual and team velocity and to estimate the project completion date?
Answering this question goes back to the basic usage of points to track work completed.
For example: After a couple of iterations of the development cycle, you see that Alice completes 12 points per iteration, Bob averages 7 points, and Charlie is lucky to finish 3. That information is actionable intelligence for managers.
Once our story points have a stable value, those points can be used to accurately measure / estimate:
Each of these measures / estimates can be made available in business and management reports, providing actionable intelligence.
Here are a couple of report charts from a recent project I've been involved in.
The velocity for that three-week iteration was 20 points.
The project team made steady progress on this particular project in October and November, got a lot done in December, but was focusing more on other projects in January while this project was being reviewed by the customer. We can now finish the project by the end of March if we work at least at the average pace of the past 4 months.
This chart shows clearly how the team's velocity on this project has changed each week — it shows the project kickoff, steady progress in October and November, great progress in December as we put in extra hours to complete a milestone before Christmas, and very little progress in January while the customer was reviewing the project and doing market testing.
As you can see, these reports provide actionable intelligence: They help the customer, project manager, and developers see the project status and make appropriate changes.
To use story points effectively, you need good tools. I've had good success with clubhouse.io: It has excellent and very usable tools for doing this work, and it scales nicely to many complex projects and large teams.
I have also used Jira for team and project tracking. Jira is much more mature (older) than Clubhouse. It includes more reports, and a lot more warts.
There are certainly other tools that can serve these roles — for example, GitHub and GitLab issues. The important thing is to use tools that give you access to the data, so you can develop reports that the tools themselves don't provide.
You can skip this section if you don't care about philosophy.
What do relative-value story points mean? If points aren't tied to a specific amount of time, How do you know that they have any meaning?
Relative-value story points are based on the inherent properties of the software development tasks themselves: The smallest trackable task is 1 point, and every other task is relative to that.
This feels too squishy to a lot of people. So let me draw on an analogy: What is a dollar? What does "one dollar" mean? How is its value determined? What is it based on?
Before 1933, one dollar was backed by 0.05 oz of gold stored in Fort Knox. But that changed in 1933. Since then, "one dollar" does not mean "X amount of gold." There is no longer any fixed relationship between gold and dollars.
So what is a dollar? The best answer I know of is:
A dollar is a unit of value that represents a certain amount of buying power.
A dollar's buying power (and therefore value) is stable, and it is agreed upon by some sort of consensus, such as the market and/or the Federal Reserve. (Stable, but not fixed: In college, I paid $5 for a sandwich that now costs $10, but wages are higher and dollars are smaller, so the result is a similar value. The change happened slowly. Economies only really suffer from this not-fixed arrangement when the value of the currency is unstable and changes too quickly.)
This definition of dollars might seem too squishy, but it's accurate to the world we live in. Also, dollars work fine in practice. We all know what a dollar is, at least in terms of being able to use it to live our lives: It's the amount we pay and are paid for goods and services. We don't trouble ourselves about existential questions like, "What is the meaning of a dollar?" — Those questions are unimportant to most of us most of the time. (Economists, central bankers, gold bugs, and other armchair philosophers are the exception.)
What matters is that we can use dollars to conduct transactions, which only requires that dollars should have a stable value that we all agree on, in relation to goods and services.
(We might disagree about the value of a good or service, but that's a different matter. In an economy with a stable currency, no buyer negotiates price by saying, "Actually, dollars are worth more today, haven't you heard? So I should pay you fewer dollars.")
As with dollars, it doesn't matter exactly what story points mean, it only matters that:
If those two elements are in place, then story points can be used to track project development, just as dollars are used to track financial transactions.
To make best use of story points to track software development projects and teams, one should:
Although it might seem like a logical choice to assign a fixed-time value to story points, it is counter-productive to do so. Instead, it works well in practice to teach development teams to assign relative-value story points based on the estimated work required to complete each task. These practices give each team story points with stable values that we all can agree on (per team), which makes it possible to create accurate analytics of individual and team productivity, and accurate estimates of project completion time frames.
There are some excellent non-quantifiable benefits of these practices as well. I have seen in practice how story points and the estimation game help teams mentor each other and grow together to become outstanding engineers. I am convinced that the application of these principles is why some engineering groups have such consistently high-quality engineers.
What are story points? https://www.mountaingoatsoftware.com/blog/what-are-story-points
Why Your IT Project May Be Riskier Than You Think. https://hbr.org/2011/09/why-your-it-project-may-be-riskier-than-you-think
Why Software Development Time Estimation Doesn't Work and Alternative Approaches. https://www.innoarchitech.com/blog/why-software-development-time-estimation-does-not-work-alternative-approaches
Why Asking Software Developers for Time Estimates Is a Terrible Idea and How to Bypass It. https://www.romenrg.com/blog/2015/09/28/why-asking-developers-for-time-estimates-in-software-projects-is-a-terrible-idea-and-how-to-bypass-it-with-scrum/
What is Scrum? https://www.atlassian.com/agile/scrum
The Scrum Guide. https://www.scrumguides.org/scrum-guide.html