The Pitfalls of Local Optimization

All Posts
Share this post
Share this post

Why do smart engineers do stupid things?

If you understand how smart people can blow things up through a series of seemingly logical decisions, then you’ll know when you need a management solution and whether the process du jour might work.

When software is brand new and managed by a small team, it’s easy to make good decisions:

The code base is small, and you don’t think about the future because you’re just trying to get off the ground. As the team grows and the product matures, it becomes increasingly difficult to see the big picture when making decisions.

This is fine when local goals align with global priorities, like writing high-quality code.

However, the best local decisions can sometimes be harmful when extrapolated to a broader context over time.

Here we’ll look at a few common ways this problem can manifest.

Underweighting Consistency vs. Flexibility

When faced with a problem, engineers like to find the best solution. However, this often requires complex analysis. The alternative is to apply a general heuristic that works well on average but may not always be the best.

This situation comes up frequently with software-development guidelines. It’s tempting to make exceptions for extenuating circumstances — but doing so erodes consistency. The cost of this can be hard to grasp since the effects are often subtle and long-term, while the benefits of flexibility are more apparent and immediate.

While test-coverage flexibility can make better local decisions possible, the inconsistency it can create may take a major toll over time.

First, flexibility is only better if people actually make the right decisions; less experienced engineers may make more mistakes. For decision flexibility to be valuable, one must specify who can make decisions, train those people, establish a review process, and evaluate decision quality as part of performance reviews.

Another cost of flexibility is it leads to a lot more decision-making overhead, which comes in the form of discussions and the documentation of the decisions.

Ignoring Aggregate Costs and Benefits

One way sensible local decisions can lead to bad global outcomes is when there are significant costs or benefits that only have an impact in aggregate. Because no single decision will tip the scale, people don’t consider aggregate effects when making each decision.

Perhaps the most familiar aggregate effect in software engineering is the fixed cost of using a particular technology or method, which further compounds the more in play.

Consider Programming Languages

Each language requires recruiting engineers who know the language, establishing training programs, creating development practices, purchasing tools, managing knowledge, etc. Having multiple languages also makes it harder to allocate developers to projects.

Even a small amount of code in a different language can incur high global fixed costs, and eliminating a language entirely will have huge savings.

The same principle applies to using other technologies and practices, like different libraries or frameworks, operating systems, or even agile planning processes.

Keeping up with best practices is an ongoing battle for engineering teams. This is particularly challenging when you have a large codebase, making it impractical to migrate to new technology all at once.

The common way to deal with this is to refactor code progressively, moving things over piece by piece. Because the main benefit of refactoring is lowering maintenance overhead, it makes sense to prioritize modules that require the most maintenance first.

Here are some questions to consider to avoid falling into the trap of overlooking aggregate effects:

  • Is there a person explicitly responsible for the long-term maintenance costs of the entire software stack? Does this person have the authority to prioritize improvements that may not be worthwhile for an individual team but are good for the company overall?
  • For code modules and systems you don’t have plans to update, what’s the likelihood updating them will become a priority in the future versus the odds of them being thrown out entirely?
  • How isolated are modules and systems that have fallen behind? Do they really have a low maintenance cost, or do they require frequent bug fixes and testing when making changes to unrelated code?
  • Which old technologies that are still in use will become harder to work with overtime due to a lack of developer knowledge and public support? Which would be the most problematic if current employees left and you had to hire new people to work on them?
  • Where’s the presence of multiple technologies or methods for the same purpose bloating overall code complexity?

The Bottom Line

In this article, we focused on local versus. global optimization and covered a few ways people make bad decisions.

While these examples will hopefully prevent certain mistakes, they’re by no means comprehensive.

Lasting success requires ongoing vigilance by engineering leaders who have a global perspective.

Engineers First
Measure the business impact of things that slow down engineering