This post contains the full list of engineering team health scorecard metrics in each category. Please see our original engineering team health scorecard post for an introduction to the scorecard and more general information about how to use it.
The scorecard items here apply to teams that do software engineering project work, meaning that they primarily plan and implement larger changes, either for internal or external use.
For teams that primarily respond to smaller service requests or do other types of work like design or content production, most of these scorecard items do not apply. However, a few of them may still be pertinent in different work scenarios.
This category covers best practices related to version control for software development.
For each of these items, the portion of time should be calculated based on all development time, not including other things like meetings, non-development tasks, etc.
What portion of time is spent on code and configuration stored in a commit-based version control system?
The most popular version control system nowadays is Git, but older systems like Subversion also check this box. You should not be using file-based version control like CVS or a general-purpose system like Dropbox.
While commit-based version control has been popular for code for decades, it only recently became a best practice for things like infrastructure-as-code (e.g., Terraform, AWS CloudFormation), which should be included here.
The recommended goal for this item is 100%.
What portion of time is spent on commits in a non-main branch?
Committing directly to the main/master branch makes it difficult to maintain a working version of the code, prevents code review (which is a security risk), and makes it hard to tell which tasks are associated with the changes. It is also never necessary because changes can always be done using a branch and pull request.
In addition to tracking this metric, you can forcibly prevent main branch commits using settings in a system like GitHub.
The recommended goal for this item is 99% since it is possible to do this virtually all the time, but a few straggling commits will not have a major impact.
What portion of time spent on branches is merged through a pull/merge request?
Git itself does not have the concept of a pull or merge request, so you will have to use a system like GitHub, GitLab, Bitbucket, or Azure DevOps Git to implement this mechanism. However, it may still be possible for developers to do direct merges. This should be avoided because it bypasses code review and automated branch build processes, which puts quality and security at risk.
Systems like GitHub can also be configured to force people to use pull requests.
The recommended goal for this item is 100% because it is never necessary to bypass the pull request process.
What portion of time is spent on small, feature-specific branches?
You can pass the “Use of Branches” check while still making a large batch of changes in a branch for a big release or project. It is better to do changes in small branches that are specific to individual features (the threshold we recommend is no more than a week’s worth of work), and have an intermediate project/release branch if necessary. This makes it easier to review code, have a working central branch for features/projects, and track which tasks are associated with each branch.
The recommended goal for this item is 95% because every once in a while individual features may spill over the one-week threshold, but it should be rare.
What portion of time is spent on code that is committed and pushed daily?
When working with git, it is up to developers to commit their code locally and push it to a server. If this is not done at least once a day, the risk of data loss increases, and other team members cannot see recent work, which makes collaboration more difficult.
The recommended goal for this item is 99% because developers may occasionally forget to commit their work, but it should happen almost all the time.
This category covers best practices related to using tickets to track work.
For each of these items, the portion of time can be calculated based on time spent writing code that is linked to a ticket. If code is not directly traceable through branch names matching ticket identifiers, we use a heuristic in minware of linking branches to the most recently started ticket assigned to the same person, though this may not be as accurate (which is why it’s important to link branches to tickets!)
Is a ticketing system in place to track work? (yes/no)
Tracking everything engineers do (aside from certain exceptions mentioned above) is important because it provides visibility into how people are spending their time and the current status of each task.
Ticket tracking systems should support at least an identifier, ticket type, description, assignee, estimate, parent ticket, label, and status for each task. Jira, ClickUp, and many other tools meet these requirements. General to-do list systems like Trello do not meet this requirement because they don’t have ticket identifiers, which makes it difficult to link them to changes in the version control system.
What portion of time spent on bugs uses the bug ticket type?
It is important to designate work on bugs and defects with a bug ticket type to differentiate it from new feature work or other tasks. This makes it easier to track the frequency and severity of bugs, and for developers to properly prioritize bugs.
It is difficult to measure how many bugs were filed with non-bug ticket types, so we recommend setting a goal that at least 5% of time spent on all tickets is spent on bugs to make sure that bug ticket types are used regularly. However, if you review each ticket for improper classification, we recommend a goal of 95% of time spent on bugs being properly classified as a bug ticket.
What portion of time spent is on tickets that have an “epic” parent ticket?
Jira uses the ticket type “epic” to describe parent issues for larger projects, but any type of parent ticket meets this qualification. When tickets are part of a larger effort and not one-off tasks, it is important to group them together. This enables tracking project progress and seeing how effort rolls up to larger initiatives.
Also, having a higher number here indicates that a sufficient amount of the team’s time is available for planned initiatives rather than handling isolated tickets.
The recommended goal for this item is 75%, though may be adjusted lower for teams that primarily handle one-off requests rather than work on larger initiatives.
What portion of time spent on tickets without epics has a label indicating the purpose?
It makes sense for some tasks like isolated bug fixes not to have a parent epic. However, it is helpful to know the purpose of those tickets to measure the total effort that goes into different team goals or focus areas. This helps you see, for example, the bug load associated with different components of a product.
The recommended goal for this item is 75%, though may be higher depending on your reporting needs.
What portion of time spent on code is traceable to a ticket?
By putting the ticket identifier in the branch name, systems like Jira and minware can automatically infer the ticket associated with coding work. This is important because it lets you see the ticket that was associated with each branch and vice versa. Consistent linking is also necessary for further analysis of time spent on tasks and things like estimate accuracy.
The recommended goal for this item is 95% because there will occasionally be changes that are trivially small or associated with things like resolving merge conflicts where it is okay not to have a ticket in the branch name. To disambiguate between accidental and intentional omissions, we recommend adding a token like “NOTICKET” to branches that are intentionally created without a ticket identifier.
What portion of time spent is on tickets that have estimates?
Adding some form of estimates to tickets is important for predictability. Either story points or time estimates meet this requirement. Without an estimate, it is difficult to have any expectations about how long a ticket will take and proactively break down tickets that are too big.
The recommended goal for this item is 95%, because tickets should almost always have estimates even if they are just set to one story point.
What portion of time spent is on tickets with less than a week of development work?
Large tasks are the enemy of productivity. Big tasks make it harder to understand the status of work, are difficult to estimate, and pose a greater risk of not having a clear definition of done.
One way to avoid this is to always try to break down tickets with estimates over a certain size.
If this scorecard item isn’t passing, then improving it should also improve estimate accuracy.
The recommended goal for this item is 95%, because every once in a while individual tickets may spill over the one-week threshold, but it should be rare.
What portion of time spent is on tickets having development time within the normal range for their estimate?
Computing this metric is a little more complicated because it involves looking at time spent on code that is linked to each ticket, so it helps to have a system like minware in place. You can then look at the distribution of coding time for tickets with each estimate (e.g., 1 story point, 2 story points), and look for outliers outside of a normal range. Because story points are not meant to be exact, we have found that counting tickets below 0.3x or above 3x the average for tickets with the same story point value provides a reliable indicator of whether estimates are done in a consistent way.
The recommended goal for this item is 75% when using the 0.3-3x range, but we have seen teams with mature code bases and experienced developers hitting over 90%, so you may want to adjust this goal progressively based on each team’s maturity level.
What portion of time spent is on estimated tickets uses a consistent estimation process?
The exact process may vary from team to team, but it is important that teams have a consistent process that involves the assignee and someone more knowledgeable about the task if the assignee is inexperienced. Teams can get into trouble when the person doing the estimate doesn’t fully understand the task, which can degrade estimate accuracy.
If this scorecard item isn’t at a passing level, then improving it is a good first step to improving overall estimate accuracy.
The recommended goal for this item is 100%. Because it involves manual feedback about the process, we recommend asking if it was satisfied on a per-sprint (or some fixed time interval if not using sprints) rather than per-ticket basis.
What portion of time spent on bug tickets uses a non-default priority?
Not all bugs are created equal. High-impact bugs are important to monitor closely to make sure that they are fixed right away and that teams work to mitigate their underlying causes, such as inadequate testing. Failing to record the bug severity makes it difficult to do this.
Reviewing the priority of every bug can be cumbersome, so we recommend setting a goal that at least 5% of work on bugs has a non-default priority to ensure that priorities are being used. If you review every bug, we recommend a 95% goal of proper priority classification.
What portion of time spent on bug tickets has reproduction steps?
Filing bugs without reproduction steps makes them a lot harder to fix because the person working on it may not have the same context as the person who filed the bug.
This can be enforced with guidelines about explicitly stating reproduction steps when filing a bug.
The recommended goal for this item is 95%. Computing it may require manually reviewing tickets if they are not consistently filed with the words “reproduction steps” or something similar.
What portion of time spent on non-bug tickets has acceptance criteria?
Explicitly stating acceptance criteria or the “definition of done” ensures that the person who created the ticket and the person working on it are in agreement about its requirements. Not stating acceptance criteria can lead to unnecessary work, unexpected deliverables, and missed time estimates.
The recommended goal for this item is 75%. This is enough to ensure that explicit acceptance criteria is the default, while not being onerous about simple tickets where the acceptance criteria is clearly implied by the definition even though it isn’t mentioned explicitly.
This section describes best practices related to the development workflow. It is meant to cover modern delivery methodology, but not be prescriptive about specific choices like whether to have dedicated QA/testing engineers.
For each of these items, the portion of time can be calculated based on time spent writing code, which doesn’t include things like meetings.
What portion of time spent on merged branches received a code review?
Code reviews are important for sharing knowledge across the team and upholding standards for security, performance, and other coding best practices. Unreviewed code loses all of those benefits.
You may also want to require code reviews prior to merging using a system like GitHub, though this can slow down trivially small changes and cause problems if there’s an emergency and only one person is around.
The recommended goal for this item is 95% to allow for edge cases where not having a review is okay.
What portion of time spent on merged branches received a code review with substantive commentary?
Some pull requests truly don’t need any feedback, but having too many rubber-stamp reviews can indicate a lack of thoroughness, which defeats the purpose of having code reviews.
The recommended goal for this item is 75% to allow for some “perfect” pull requests, but you may want to set it higher or lower depending on the seniority of your team.
Because it’s difficult to judge whether a review was 100% thorough without being nit-picky using automated analysis, you may also want to audit a selection of reviews to ensure that they are up to standard.
What portion of time from first to last commit was spent actively working on each ticket?
This check measures the amount of context switching between tickets. For each developer on each ticket, it is computed by dividing that developer’s work time for that ticket by their work time on all tickets. If there is no interleaving, flow efficiency is 100%.
The reason why flow efficiency is important is that context switching between tickets creates overhead. Each switch incurs mental overhead from switching contexts. Tasks that sit idle also risk creating merge conflicts, delaying feedback from customers, and not wrapping up by the end of the sprint.
Low flow efficiency happens when developers have to wait for others to answer questions, provide review, or finish dependent tasks. High flow efficiency indicates a healthy development process.
The recommended goal for this item is 75%. Some context switching is inevitable with complex engineering work, but this target ensures that there is a low amount of parallel work in progress. If your baseline number is a lot lower (< 50%), you may want to set an intermediate goal.
What portion of time is spent on tickets that are independently developed, reviewed, tested, and merged?
Batching multiple tasks together at stages in the development process decreases flow efficiency, which causes several inefficiencies described in that section.
Single task flow means that the entire development process happens independently for each ticket. It is measured by whether any steps prior to merging wait for multiple tasks before they begin. Sometimes multiple reviews queue up before a reviewer has time to handle them, which doesn’t count here unless the reviewer deliberately waits for this situation to occur.
If the metrics for single-task flow are not passing, then improving them should increase flow efficiency.
Because this item relates to general team processes, the recommended goal is 100%. Since it involves manual feedback about the process, we recommend asking if it was satisfied on a per-sprint (or some fixed time interval if not using sprints) rather than per-ticket basis.
What portion of time spent on tickets happens while they are marked “in progress”?
If developers work on tickets without marking them as in progress (or worse, after they are marked done), then it is difficult to see what people are doing and when tasks are likely to wrap up.
The recommended goal for this item is 90%, which provides some leeway for people forgetting to start tasks from time to time.
What portion of in-progress tickets that are in sprints and have at least one commit receive commits at least weekly?
Leaving tickets that are in a sprint in an in-progress state when they are inactive also creates confusion about what people are actually doing and may hide the fact that they are blocked. This metric looks at tickets that have no commits for 7 days while they remain in progress in the ticketing system.
The recommended goal for this item is 95% of tickets in sprints, which allows for the occasional oversight but ensures general process consistency.
What portion of time spent on tickets happens while they have an assignee?
It is okay for tickets to be unassigned during the planning stage, but once work begins, they should have an assignee. Otherwise, you can’t tell who is working on them without asking around, which leads to inefficiency.
The recommended goal for this item is 95%, which allows some room for developers occasionally forgetting to assign a task to themselves when they start work.
This category covers best practices related to planning deliverable work in fixed time increments, referred to as sprints. It is meant to be permissive and does not require following a strict “scrum” agile process, but instead covers specific elements of incremental planning that are more universal.
These scorecard items are not compatible with a waterfall process, which is generally considered an inferior way of developing software.
As mentioned earlier, this entire scorecard is meant for teams that do project work, so the items related to sprints do not apply to teams where a process like kanban is more appropriate.
What portion of time is spent on tickets that have clear prioritization?
If you are using sprints, then this item is satisfied. If you are not yet doing sprints, then this measures whether tickets are at least clearly ranked with a process like kanban so that people know what they should work on next.
The recommended goal for this item is 100%, and it may be measured by surveying people over broader time ranges rather than looking at each ticket.
What portion of time is spent on tickets that are in active sprints?
Sprints are important for teams that do planned project work because they create an expectation about which tasks will be done within a certain time frame, and provide a cadence for retrospectives, releases, and customer feedback.
Some very early teams (particularly those on pre-launch products) survive without sprints by discussing expectations in an ad hoc way. However, this quickly breaks down as soon as they become responsible for maintaining existing software and delivering updates predictably to stakeholders and customers.
On the flip side, there is little downside to doing sprints right from the get-go, especially if you shorten them to one week to avoid rework from rapidly changing plans.
The recommended goal for this item is 50%. The intent of this scorecard item is just to serve as a basic check for whether sprints are happening on a regular basis for a majority of the team’s work. The next check looks at a stronger condition of whether all work is flowing through sprints.
What portion of time is spent on tickets that are in active sprints?
(This question is the same as the previous scorecard item, but with a higher goal threshold.)
Once sprints are in use, it is important that developers primarily work on tickets that are in an active sprint rather than working off-the-book tickets. If people don’t work in the sprint, then it is difficult to predictably meet expectations.
This off-sprint anti-pattern may also indicate that work isn’t flowing through the proper channels and developers are working directly for stakeholders rather than having the work go through the sprint planning process.
The recommended goal for this item is 95%, because tickets should almost always just be added to a sprint before work begins.
What portion of time is spent on tickets that have active work in no more than two sprints?
It is normal for some tickets not to finish by the end of a sprint and spill into the next one. Tickets may also get bumped from multiple sprints before work starts. However, once work begins on a ticket, it is a problem if it spans more than two sprints. In addition to the ticket likely being too big or having flow efficiency problems, this means that sprint commitments are unreliable and defeats many of the benefits that sprints provide.
The recommended goal for this item is 95%, because multi-sprint rollovers may happen occasionally, but should be very rare to avoid degrading sprint reliability.
What portion of time is spent on tickets that complete in the current sprint?
Even if there are no multi-sprint rollovers, it is still important for work that is started in a sprint to finish by the end. This allows everyone to depend on sprint commitments and know when they can expect work to wrap up. The ability to quickly and reliably deliver the highest priority tasks (i.e., those that start in a sprint) is critical for a well-functioning organization.
That being said, it’s also important not to discourage developers from starting on new tasks near the end of a sprint just to avoid having incomplete work at the boundary.
The recommended goal for this item is 75%, leaving room for up to a quarter of people’s time to be spent on tasks that don’t wrap up at the end of the sprint, while still ensuring that most work is completed. You may want to increase this goal if your sprints are longer than two weeks.
What portion of time is spent on tickets that were in the sprint at the beginning?
Another issue that can disrupt sprint reliability is when teams are inundated with last-minute requests, urgent bugs, or new work that comes up due to lack of internal planning. This leads to the original sprint work getting pushed back, which erodes the reliability of sprint commitments.
While teams should strive to eliminate avoidable scope changes through better planning and quality control, it is also important to remain flexible when new information arises after the start of the sprint and not blindly follow an outdated plan (which is the problem with waterfall development in the first place).
If teams are having trouble hitting this goal because plans change too frequently, then their sprints may be too long. Shortening them to as little as one week can help in this case.
The recommended goal for this item is 75%, which leaves room for some urgent tasks and scope changes, but ensures most work is defined at the start of the sprint.
How much sprint work was completed relative to the original sprint commitment?
To calculate this metric, divide the total estimate units (e.g., story points) that were completed by the end of the sprint by the total amount that were in the sprint at the start. You can then multiply this completion ratio by the time each person spent on each sprint if you want to aggregate over multiple teams or longer time ranges.
This sprint completion metric tells you whether the total amount of work completed matches the original sprint commitment, regardless of scope changes. It is a useful barometer for whether estimates are accurate in aggregate, and is important for making sprint commitments reliable.
The recommended goal for this item is 75%, because teams often fall somewhat short of sprint goals, but this level ensures sprints are mostly completed.
What portion of time is spent on sprints where work is validated with customers?
One important reason for doing sprints is having a cadence for getting customer feedback on working software. If you “ship” software each sprint but no one uses it, then the sprint iteration lengths are illusory because they don’t represent a real feedback cycle.
If you can’t actually ship software to production each sprint, then it is important to have a real customer or customer representative use a working demo.
The recommended goal for this item is 100%. It can be computed by surveying team members each sprint about whether the team received customer feedback at least once on the sprint work.
What portion of time is spent on sprints that have retrospectives?
A sprint retrospective is a meeting where people discuss problems that arose during the sprint, dig into root causes, and come up with action items for improvement. Essentially, teams should be looking at the types of things covered in this scorecard, as well as discussing general sentiment. Retrospectives are important because it’s very hard to improve if you don’t talk about what went wrong as a team.
One common anti-pattern that can lead teams to stop doing retrospectives is if they talk about the same problems each sprint and nothing changes, which makes the retrospectives seem like a waste of time.
Another issue that can steer teams away from retrospectives is if there are cultural problems that lead to a lack of psychological safety. If the last retrospective was filled with blame and finger-pointing, team members may not be keen to do another.
With retrospectives, it is very important to address issues that prevent people from wanting to do them, not just make retrospectives themselves the goal.
The recommended goal for this item is 100%. It can be computed by looking at whether a retrospective took place at the end of each sprint.
What portion of time is spent on sprints that have retrospectives with recorded action items?
This check measures whether retrospectives are actually leading to tangible action items for improvement. Sometimes teams fall into a pattern of just talking about how they feel in each retro, but not making changes. Looking at whether each retro has recorded action items ensures that the retrospectives are producing a result.
The recommended goal for this item is 75%, which allows some room for retrospectives where no new issues arise, but ensures that most retrospectives are leading to action. You may consider setting a lower threshold if the team is very mature and nearly all the other scorecard items are passing.
What portion of time is spent on sprints that have regularly scheduled stand-up calls?
The main purpose of stand-up calls is to identify tasks that are blocked or off-track. Not having them can lead to people going down a rabbit hole or waiting around for someone else for days.
The exact frequency of stand-ups may depend on the team’s experience level. Senior engineers who know how to unblock themselves may not need to talk every day. However, it is helpful for any team to touch base at least a few times per week.
The recommended goal for this item is 100%. Stand-ups should be recurring calendar events that are only canceled if the team is out-of-office.
This section of the scorecard covers best practices for predictably planning and implementing multi-task projects.
This is one section where later items (5.6-5.9) relating to roadmaps may not be beneficial for all teams. While predictable roadmaps are important for external stakeholders and established products, planning too far into the future can be counterproductive if priorities are changing rapidly during a product’s early development.
What portion of time spent on tickets in epics has a functional specification?
Functional specifications can take different forms. They are typically published in a document, but may also be in a system like Confluence or directly in the description of the epic.
To count as a function specification, the write-up for each epic should at least address three essential elements: goals, risks, and functional requirements. Without considering and documenting these three things, the chances of engineers wasting time implementing the wrong functionality increase dramatically.
The recommended goal for this item is 95%, which leaves some room for small projects to occasionally not have functional specs.
What portion of time spent on tickets with functional specifications was reviewed by engineers before starting work?
One common anti-pattern is when work flows in a one-way direction from product managers to engineers rather than engineers being actively involved with projects at the functional specification stage. It is important that engineers carefully review each aspect of functional specifications prior to starting work to uncover potential risks, as well as opportunities to achieve goals in a more efficient way.
To make this easy, there should be some sort of process where reviewers state explicitly that they have reviewed each specification.
Teams that don’t follow this practice may see projects take far longer than originally planned with significant scope creep and long implementation times.
The recommended goal for this item is 95%, as measured by functional specifications that have an explicit record of a review by an engineer according to the team’s process.
What portion of time spent on tickets in epics has a technical specification?
A technical specification may live in the same document as a functional specification, but the key difference is that it focuses on technical risks and is written by an engineer.
A good technical specification should serve as a blueprint for implementation, with adequate specification of architecture, interfaces between components, and a breakdown of work into small tasks.
Having a technical specification enables more accurate time estimates, and greatly reduces the risk of overruns caused by unforeseen technical problems in areas like security, performance, and limitations of third-party dependencies.
One common technical planning anti-pattern – which can be encouraged by a misunderstanding of agile principles – is to only plan work for the coming sprint. The problem with this is that it defers risk identification to later in a project, which is a frequent root cause when projects blow up to several times their original size.
The recommended goal for this item is 95%, which leaves some room for small projects to occasionally not have technical specs.
What portion of time spent on tickets with technical specifications was reviewed by another person before starting work?
Reviewing technical specifications is important because a main goal is risk mitigation, and risks are easy to overlook. There typically isn’t one single person on a software team who is an expert in every area of the software, so reviewers are likely to catch things that the author missed.
The recommended goal for this item is 95%, as measured by technical specifications that have an explicit record of a review according to the team’s process.
What portion of time spent on tickets in epics was on tickets that were created before implementation began on the epic?
There is always going to be some amount of scope creep on a complex project, and some of it is good as people identify new opportunities to add value. However, projects that run way over their estimates cause problems. First, they can disrupt plans for anything that depends on their completion date. Furthermore, the justification for pursuing a project (its return on investment) depends on the level of effort required. Projects with large amounts of scope creep may not have been worthwhile compared to other opportunities if the team knew the full scope up front.
The most common way to improve this item is to improve the quality of technical specifications.
The recommended goal for this item is 75%, measured by work on tickets that were created before the first non-planning implementation ticket is marked in progress. This provides a buffer for scope to increase by 33%, which is enough to provide flexibility without being majorly disruptive.
Does the team have a roadmap that specifies planned upcoming work at the initiative level? (yes/no)
This is a basic check on whether a roadmap exists in any form, even if it’s just a slide in a presentation or document. A basic roadmap doesn’t necessarily need to specify individual projects, but should at least state what initiatives the team plans to pursue.
A roadmap is an important communication tool that helps people in the organization know what each team is working on and the results they can expect the team to deliver within some time frame. Not having a roadmap can quickly lead to chaos and inefficiency as organizations grow.
Is the roadmap represented in the ticketing system and linked to epics or tickets? (yes/no)
A basic roadmap may be a separate document or plan in an external system that is not directly linked to the ticketing system. Having it represented in the ticketing system is important so that teams can easily see how work is progressing on the roadmap items and whether they are on track.
What portion of roadmap epics have a rough estimate made by an engineer?
At the roadmap planning stage, deciding how much effort to put into planning each project before scheduling it in the roadmap requires a difficult balance. It’s impossible to commit to something without knowing its size. However, you also don’t want to do the same level of technical planning that you would right at the start of a project. This would take too much effort, and things might change significantly by the time the project actually starts.
This scorecard item looks at whether epics in a roadmap at least have a rough initial estimate (at a granularity like weeks or tens of story points) made by an engineer to provide a ballpark idea of what the estimate will be once technical planning is complete.
One anti-pattern that can occur is when product managers do the estimates at this stage without talking to the engineering team. Product managers usually have a decent sense of how long things take, but are prone to overlooking major technical hurdles, which could significantly blow up the project scope.
The recommended goal for this item is 100%. Rough estimates can be recorded using the estimate field on an epic, or using a custom rough estimate field to make it clear that it carries less certainty since full technical planning has not yet occurred.
Are roadmap dates derived from historical velocity measurements? (yes/no)
An essential feature of roadmaps is that they show what will be done within some time frame. Once roadmap estimates are in place, items can be added to the roadmap with planned dates based on the available capacity.
This scorecard item looks at whether the capacity used for adding items to the roadmap is based on actual velocity measurements for the team, including the amount of time they spend on bug fixes and other unplanned work.
A common mistake is to just approximate the capacity based on the time frame of the roadmap (e.g., 3 months) and rough estimates denominated in the same units (e.g., weeks). The problem is that this fails to account for the actual velocity the team is able to achieve and other non-roadmap work that will get in the way, which can lead to roadmaps running significantly behind schedule.
What portion of epics have an eventual starting scope within the normal range of the initial rough estimate?
In addition to doing rough estimates during roadmap planning, it is important to follow up on those estimates and see how close they come once a project goes through the full planning process and is ready to start.
This metric looks at how many epics have a rough estimate within a normal range (we recommend 0.5-2x) of the post-planning estimate (before implementation starts, since scope creep is measured separately). It is important to count all epics – not just those that start – because epics that are badly underestimated are likely to get canceled or pushed back.
The recommended goal for this item is 75%, which accounts for the fact that rough estimates are not meant to have a lot of effort put into them and sometimes won’t pan out.
This section of the scorecard covers modern best practices for testing software. In particular, it assumes the testing pyramid model where there is a combination of automated unit, integration, and end-to-end tests.
What portion of time is spent in repositories that have automated unit, integration, and end-to-end tests?
This scorecard check looks for at least having the infrastructure in place to create tests of each type in the testing pyramid, which is a requisite starting point for healthy test automation. The exact definition of what constitutes each type of test will depend on the nature of the repository, such as whether it is a library or contains user interface code.
The recommended goal for this item is 95%, which allows for spending some time in non-core or legacy repositories that don’t have full test automation. Repositories that don’t have code (such as documentation or content repositories) may be excluded entirely.
What portion of time spent in repositories with test automation is in repositories with comprehensive test coverage?
The definition of “comprehensive” here is intentionally subjective, and meant to reflect whether engineers would say that automated tests are generally up to the standard set by the organization. More quantitative code coverage metrics are covered later.
The recommended goal for this item is 95%, though if it is lower you may want to set incremental goals.
What portion of time is spent on code that contains adequate test plan documentation in the pull request?
It is up to teams to decide the standards for documenting manual test plans (i.e., they may not be needed for small changes that can be fully covered with automated tests). This scorecard check looks at how much time is spent on pull requests that either don’t require a test plan or have a test plan that is up to standard.
When engineers do not document the manual tests they performed, this causes a few problems. At best, the reviewer has to conceive and perform any necessary manual tests themselves, which takes more time. However, this also increases the risk of missing bugs. If the reviewer instead has a test plan to start from, they can focus on gaps in the testing that are more likely to have problems.
The recommended goal for this item is 95%. It should be easy to ensure that test plans are created when necessary almost all the time by reminding people with a section in the pull request template.
What portion of time is spent in repositories where necessary manual regression tests are documented, if applicable?
The practice of continuous deployment has moved a lot of teams away from required manual regression tests. But, if they are being used, it is important that they are well documented. Otherwise, you are liable to start having regressions when the person doing the tests changes.
The recommended goal for this item is 100%, though ideally no repositories would rely on manual regression tests.
What portion of time spent in repositories with test automation is in repositories that have code coverage metrics in place?
Having metrics in place to measure how much code is covered by tests is helpful for monitoring the overall trend in testing to ensure that teams are maintaining or improving their test coverage.
The primary way to achieve good test coverage should still be reviewing tests as part of the code review process (since it’s possible to have good code coverage metrics and still have major gaps), but code coverage metrics are a useful backstop to detect problems in the code review process.
The recommended goal for this item is 100%.
This section of the scorecard covers best practices related to maintaining high code quality and minimizing the frequency of bugs.
How much time is spent on code that survives at least 1 day out of all time spent on code that merges to the main branch?
Some amount of code is bound to be removed or replaced within a day of merging, particularly if you are using continuous integration with multiple merges per day. However, large removals that revert changes indicate failures in the testing process.
The recommended goal for this item is 95%, because reverts should be rare if proper pre-merge quality controls are in place.
How much time is spent on code that survives at least 14 days out of all time spent on code that merges to the main branch?
Even if code doesn’t cause an immediate outage that requires reversion, large changes to code within a short time frame indicate that it has significant quality problems of one form or another. This metric looks at how much code is removed or replaced in the weeks immediately following its introduction to the main branch.
If teams are not doing thorough code reviews or have major gaps in test coverage, it often manifests in poor post-merge churn metrics.
The recommended goal for this item is 85%, because some amount of removal and replacement is healthy, but large amounts may reflect quality problems.
Is the rate of bug creation monitored? (yes/no)
The rate of new bugs being filed is an important indicator of whether the development process has adequate quality controls. If nobody is looking at the rate of new bugs, it is easy to get into a bad situation where teams have such a large backlog of bugs that it is very difficult to dig out, and poor quality can become a serious problem for the organization.
This basic check looks at whether bug creation metrics are in place and someone is looking at them to keep an eye on overall quality.
What portion of filed bugs are fixed within 30 days?
This metric looks at whether teams make a habit of quickly fixing bugs, or whether they are allowed to accumulate in a backlog.
It may also be helpful to split up this metric and set different goals for bugs of different priority levels, such as fixing all high-priority bugs within 30 days.
The recommended goal for this item is 75%, but if you split it up by priority, we recommend setting a higher goal like 95% for higher priority bugs. Joel Spolsky has a good explanation of why this is important in The Joel Test, where he talks about why Microsoft famously adopted a “zero defects methodology.”
How many story points of bugs are filed relative to overall story point velocity?
This metric essentially tells you how much of your team’s capacity would be taken up if they fixed every single bug. It is helpful to keep this low so you avoid getting into danger territory where there are so many bugs that it is difficult to do new work while maintaining quality.
The recommended goal for this item is 25%. Anything much higher poses a serious threat to productivity and indicates a need for better quality assurance during the development process.
What portion of newly filed bugs are attributed to the specific change (i.e., not just a larger release) that caused them?
It is helpful to have a record of which changes are causing bugs. This makes it easier to understand the specific types of changes that are leading to bugs so that the team can make better decisions about where to improve quality controls.
The recommended goal for this item is 75%, which leaves room for not attributing long-standing bugs that are difficult to pinpoint, while still providing adequate coverage to support decisions about quality improvements.
Does the team have documented standards for code and testing? (yes/no)
On newer teams, code and testing standards may emerge organically based on the standards applied by senior engineers doing code review.
However, as teams mature, it is important to write down standards in a document so that everyone is clear on what those standards are. Without this, standards may vary depending on who is doing a code review.
Onboarding is also a lot more difficult as new engineers have to learn the implicit standards through trial and error, often having to make significant changes in response to code reviews.
Another common anti-pattern that emerges in the absence of documented standards is that people will just copy whatever is already in the code base, sometimes propagating bad practices and unknowingly increasing technical debt.
What portion of time is spent on code in repositories that have automated linting checks in place?
In addition to documenting coding standards for more complex things like design patterns, many best practices can be enforced automatically with a static linting tool. Linting is an easy way to eliminate whole classes of potential problems and ensure consistent style, which makes code review a lot easier because it doesn’t have to cover as many smaller details.
The recommended goal for this item is 95%, which allows for some work in non-core repositories that don’t follow best practices.
What portion of time is spent on code that at least uses the previous generation of languages and frameworks?
Keeping up with the latest frameworks and programming languages can be a bit of a rat race, and it is rare for software that has been around for a long time to be fully up-to-date.
However, things are significantly worse if the code base is behind by multiple generations, like if you are still using jQuery.
This scorecard check looks at how much time engineers spend working in code that is no more than one generation behind the team’s latest best practices. Here, the exact technologies that constitute one generation are defined by the team, as they depend on which major transitions the team has made over time.
The recommended goal for this item is 95%. It makes sense to cordon off rather than refactor old code if it is rarely modified, but on a healthy team, modifications to code in this state should be pretty rare. Teams may want to set intermediate goals right after introducing a new technology that makes a large batch of code multiple generations old.
What portion of time is spent on code that uses the latest generation of languages and frameworks?
Newer generations of languages and frameworks provide significant advantages to productivity by reducing the mental burden of software development.
This check looks at how much time people spend working on code that uses the latest technologies. Time spent is an important metric because it emphasizes refactoring code that is modified the most frequently. Teams can often realize most of the benefits of new technologies by only refactoring a small portion of code at first.
The recommended goal for this item is 75% when the team is in a steady state. However, it will be zero when the team first switches to a new framework or language, so it makes sense to have intermediate goals during an initial refactoring push.
What portion of time is spent on code in repositories that are free of end-of-life dependencies?
Larger dependencies like operating systems, language runtimes, and frameworks will publish support timelines that dictate when versions become end-of-life and stop receiving updates, often even for major security vulnerabilities.
Having dependencies that are end-of-life is a major risk due to the lack of updates. However, the cost is compounded by the fact that multi-version upgrades (end-of-life versions are generally multiple versions behind the latest long-term-support version) tend to be a lot riskier and more time consuming than regularly advancing single versions.
One reason for this is that more code becomes dependent on the older version than would with frequent updates, which makes fixing it a lot harder. Multi-version jumps are also more difficult because larger changes are generally more challenging to plan, test, and debug.
The recommended goal for this item is 100%.
This section relates to best practices for tracking and prioritizing technical debt (tech debt for short), which is a type of work that involves improving the internal quality of the code base and tools without enhancing functionality (though sometimes new feature work and tech debt improvements may be combined).
What portion of time spent on tech debt uses a dedicated tech debt ticket type or field?
Separately designating tickets that address tech debt in the ticketing system is important because it enables tracking the size of the tech debt backlog and looking at how much time the team spends fixing tech debt.
It is difficult to measure how many tech debt issues were filed without proper designation, so we recommend setting a goal that at least 5% of time spent on all tickets is spent on tickets marked tech debt to make sure it is done regularly. However, if you review each ticket for improper classification, we recommend a goal of 95% of time spent on tech debt issues being properly marked.
Does the team maintain a backlog of tech debt tickets? (yes/no)
This check looks at whether tickets classified as tech debt are maintained in a backlog that receives regular prioritization and grooming. It is sufficient if they are in the same backlog as other tickets, but it is not good if nobody looks at tech debt tickets after they are filed.
Are metrics in place that show the amount of effort the team puts into tech debt tickets? (yes/no)
If the team doesn’t know how much time it is spending on fixing tech debt, then it is difficult to ensure that the amount of tech debt work is in line with the team’s goals.
Does the team have guidance about the portion of time they should be spending on tech debt? (yes/no)
This guidance can be in the form of a percentage of velocity, or simply adding specific tech debt work to the roadmap.
If there is no guidance, then that means the product owner will just decide how much tech debt to fix each sprint. Since the product owner tends to focus on functionality, this often leads to the anti-pattern of the product owner scheduling an insufficient amount of tech debt work. On the flip side, teams can also decide to invest way more effort in tech debt than is appropriate given the current state of the organization.
Providing explicit tech debt allocation guidance ensures that everyone is on the same page and surfaces any disagreements about how much tech debt work is appropriate.
What portion of high priority tech debt tickets are resolved within 30 days?
One form of tech debt occurs when teams deliberately cut corners and write code that doesn’t meet current standards to hit a deadline. When this happens, it is important to classify the tech debt as high priority and fix it immediately after the project is complete.
If you don’t immediately fix corner-cutting tech debt, then the cut corners become permanent, which erodes standards. In some cases it may make sense to dial back overly aggressive standards, but this should be an explicit choice where code that violates the previous standards is no longer considered tech debt.
The recommended goal for this item is 95%, because a few things may slip through the cracks or take longer than 30 days to address, but any significant lapse starts to erode standards.
This section relates to best practices for holding meetings and making bigger decisions that involve multiple stakeholders.
What portion of overall work time do developers have available free from meetings to work in 2+ hour stretches?
Development work is complex and often involves significant ramp-up time getting in the mental state to fully understand the code and be productive.
This metric looks at how much time is available in chunks of 2+ hours for development work, which is enough time to ramp up and be productive for most of the session. With shorter periods, such as one hour, a significant portion of the time may be lost due to just orienting oneself to begin working at full productivity.
The recommended goal for this item is 60% for full-time engineers, though it may be less for those with technical or other leadership responsibilities.
What portion of time is spent in meetings that have agendas?
Having an agenda is important to keep meetings on track and make sure they stick to the intended topic. However, they have another less obvious benefit as well: they make it easier for people who don’t need to be in the meeting to opt out and get their time back.
The recommended goal for this item is 95%, though certain meetings like one-on-ones with managers or code pairing sessions may have implicit or recurring agendas so may be excluded from consideration.
What portion of time is spent in meetings that distribute notes afterward?
In addition to meeting notes serving as a helpful reminder of action items for participants, meeting notes – like agendas – enable people who don’t need to participate in the discussion to opt out because they can rely on the fact that they can see the results of any important conversations after the fact.
The recommended goal for this item is 95%, though certain meetings like one-on-ones with managers or working sessions may be excluded.
Are explicit decision-making roles used regularly for decisions with multiple stakeholders? (yes/no)
There are various frameworks for decision-making roles like RACI and DACI. The exact framework you use is up to you, but having some explicit roles is helpful. The full details of these frameworks are beyond the scope of this document, but one of their main benefits is to clarify who should be included in meetings and other communication related to the decision.
Without explicit decision-making roles, a common anti-pattern is that everyone will be invited to every meeting when some people could simply be notified after the fact. Another problem that can occur is important stakeholders are left out, and even more work has to be done to change the decision after people thought it had been made.
What portion of time is spent in meetings that could not have easily been handled via written communication?
The title here is a bit tongue in cheek, but the idea is that teams should use available collaboration tools to avoid meetings whenever doing so makes sense.
In particular, many meetings involve reviewing and providing feedback on some work artifact, such a plan, prototype, or draft. In this scenario, the organizer should circulate the work artifact prior to the meeting to solicit written feedback.
The meeting may still need to happen if the feedback is extensive, or if the organizer doesn’t hear much and wants to be sure others have carefully thought things through. In many cases, however, participants can just provide minor suggestions and opt out of the meeting.
One note here is that the organizer wanting to save time by talking through their proposal rather than writing it down is not a legitimate reason to call a meeting. Sure, it might be less work for the organizer, but it is more time-consuming for everyone else. Just consider how much more information you can digest reading a news article versus watching a video.
If there are aspects of communication that require visual demonstration or where it is particularly important to hear the speaker’s tone of voice, it’s always possible to share a short video in advance of the meeting. That way, others can still view the content, but do it quickly on their own time and opt out of the more costly meeting.
The recommended goal for this item is 90%, as measured by the time participants spend in meetings where they respond affirmatively that their participation could not have easily been handled offline via written communication.
What portion of time do people spend in meetings where they actively participate in dialog?
Active participation should be the end result if people are diligently following all of the other meeting best practices. If someone is not an active participant in dialogue during a meeting, then it is often possible to have them skip the meeting entirely by following the other practices outlined in this section.
The recommended goal for this item is 75%, as measured by the time participants spend in meetings where they respond affirmatively that they were actively engaged in dialogue during the meeting (talking back and forth, not just providing information or stating a position).