Software and its Discontents, Part 2: An Explosion of Complexity
This is part 2 in my “Software and its Discontents” series. This series is the product of my asking a bunch of folks about the current state of software engineering, the sense that it is not going well, that people are disillusioned and frustrated. In part 1 I talked a bit about the macro economic trend, namely the end of the decade of cheap money for tech investments, which is driving this conversation, and shaped some of the changes to the industry over the last decade.
see: Software and its Discontents, January 2023, Part 1 for more context and background.
In my conversations I found 4 interdependent trends that have substantially increased the difficulty of building software.
- an explosion in the complexity of software development
- tech talent becoming significantly more expensive
- success becoming more elusive than ever, with startups having “lost that magic feeling”
- conflicts over changing expectations of the work environment
Talking primarily to engineering leaders, but also CEOs, VCs, ICs, and other practitioners, the most common response to the question of “has something substantially changed?” is that software, counter intuitively, has gotten harder to build. This is counter intuitive because the tools are orders of magnitude better, the amount of work you can cheaply outsource is nearly miraculous, computers are so damn fast and cheap these days, the quality of resources, much of it free, is off the charts, and the talent pool has exploded, and shows every sign of being smarter and better educated than ever. But software has gotten harder to build in one very particular and important way: it’s gotten more complex.
In both systems thinking and software the term “complex” is a technical one. It refers to the number of distinct parts in a system, and the connections between them. Complex systems are characterized by nonlinearity, randomness, emergence, and surprise. Complexity is why communication and coordination dominate all other costs when it comes to building software. And complexity has exploded. (thank you to John Allspaw for first introducing me to the concept of complexity as opposed to the merely complicated)
Complexity has not only exploded, it’s exploded in multiple distinct ways that have distinct root causes but interact. I’ve tried to break up the explosion in complexity into the following categories:
- The complexity of rising standards
- The complexity of too many choices
- Complexity and technical decision making in the era of abundance
- The complexity of large teams and aging code bases
- Aspirational complexity
Some of this complexity is directly attributable to the decade of cheap money, some is just the natural result of our industry maturing. Some of this complexity will be addressable with better practices, better leadership, and a better understanding of the sources of complexity. Some of the complexity is here to stay, and we’ll need to recalibrate our expectations about how difficult it is to build software.
The complexity of rising standards
We expect more of software than we used to. Some of this is customer preference, some is regulation, and some is professional aesthetics.
Regulatory requirements, e.g. around data privacy and financial controls, are significantly more complex than they used to be. GDPR, AADC, DMA, DSA, HADOPI, FOSTA-SESTA, BITV, etc. But also FedRAMP, HIPAA, SOX, not to mention SOC2, and HITRUST, have become critical much earlier in a company’s life cycle, either to access critical customers, critical resources, or both. The regional and geographic variations can be particularly challenging and undermine a key productivity win that early online businesses enjoyed. Amazon, for example, didn’t even bother collecting sales tax in their early days, a price win for customers, but also a massive reduction in complexity vs a multi-geography brick and mortar business. In the early days, we on the Web, were all playing on regulatory easy mode. That window has largely closed. Especially as startups, searching for new problem spaces to deploy their capital and technology, have moved into highly regulated domains, like health, finance, and civic infrastructure.
The web, at its inception, was a triumph of simplicity. Its rapid rise to dominance was driven in large part by how it reduced the complexity of delivering software to customers. It was a single unified platform. It was open and non-proprietary. It was simple by design, built around a stateless protocol and a simple declarative UI paradigm. It was available over the internet and didn’t require anything to be bundled or shipped. These radical simplifications allowed effective asymmetric competition with established players developing desktop software and delivering it via physical media. Over the intervening decades we’ve largely compromised all these simplifying properties of the web. Even when all we’re doing is delivering software via the internet (and not say scooters out of the back of a fleet of vans) we’re now targeting many different platforms: desktop web, mobile web, and also the two dominant and semi-incompatible mobile walled garden ecosystems. Meanwhile state management has become so complex that it is the primary job we adopt heavy frontend frameworks, like React, to help us address. This complexity has driven the need of a specialized frontend engineering discipline, someone who can wrangle a Typescript type system of modular components populated via React Query talking to Apollo GraphQL backed by a gRPC Envoy proxy to a SOA stack. Similarly machine learning, mobile, infra, and backend, have all specialized with their own unique complexities. With multiple specializations, we now have more distinct “resources’’, each with their own work in progress queues, biases, hiring loops, onboarding, culture, sick days, and needs to coordinate. Explosions of complexity.
Rising standards have benefits as well as costs. Regulatory complexity is often driven by regulators’ concern for customers. More directly however, the raised expectations of what success looks like means that customers who were ignored in the early days of tech can no longer be ignored by a team wishing to be successful. Accessibility and internationalization have both become critical for success. In the early days, when broadband and then mobile adoption were rapidly doubling, you could count not just on new customers being regularly minted, but that the vast majority of those new customers would match the demographics of early adopters: young and wealthy, with many of them living in US cities. Even those early web adopters aren’t that young anymore, and a company that is only able to get adoption among some idealized fantasy model of young, perfectly healthy, US consumers isn’t viable in 2023. But both accessibility and internationalization require higher coordination of software development across previously unexplored dimensions with adaptive designs and translation. And, perforce, at least some of this work your software team is unlikely to be able to evaluate, complicating your acceptance criteria. Complicated processes are a classic source of complexity.
Similarly even without regulatory pressure you need to be designing for safety, security and anti-abuse from day one. Succeeding at defending against the global legions of the poor, bored or both is a high bar and now required at launch.
The complexity of too many choices
In many ways we’re living through a golden age of software development: more tools than ever, more affordable than ever. I’m old enough to remember when IDEs cost hundreds if not thousands of dollars, and there was a real ecosystem of people selling third party libraries and widgets (advertised in the back of Dr. Dobbs). Today we have more: more tools, more languages, more frameworks, more databases, and more services. Most of these tools represent real progress in terms of increased capabilities, and outsourcing non-core parts of your business. However the range of choice has real impacts on complexity.
Anyone joining a company today is looking at a stack that is at least as bespoke as the worst Not-Invented-Here stacks of the previous era. Rails was Rails, LAMP was LAMP, and while Vercel is better than anything we built for ourselves during that earlier era, it comes with a full manual, and its own quirks. So does Google PubSub versus some shitty solution we built on top of MySQL, and Launch Darkly can do so much more than anything we might have expressed with a shitty YAML config file. Those home rolled systems of an earlier era lacked both features and documentation, but our current systems are just as unique in their composition. Given the huge number of choices and the configurability of each of these professionally developed and documented components, the odds that you’ve seen this exact combination of technologies, tools, and services configured this particular way before is extremely low. We’re a long way from the era when everyone configured their LAMP app the same way, and a community of practice grew up around it.
Not only is each stack novel to each new team member this cross product of complexity means we have fewer mavens and experts. At Etsy when we needed to scale PHP we could hire Rasmus. Very few teams these days can find that kind of expert, and fewer of those experts will have seen the relevant scale on that exact stack.
Complexity and technical decision making in the era of abundance
In the conversations I’ve been having with engineering leaders a huge source of anxiety has been the impact that the explosion of technical choices has had on the quality of technical decision making.
As an engineering leader raising the quality of technical decision making is arguably your most important job after building the team itself. Eight years after I left Etsy I’m still getting new notes from people telling me that, no matter how frustrated they were with me at the time, in subsequent jobs they’ve come to appreciate and desperately miss how well defined the “Etsy Way” of building software was.
Today any team that has been around for more than a minute not only has chosen a unique combination of technologies, they’ve changed their mind about it a couple of times, often in logically inconsistent ways. With so many great technologies out there, and so many of them backed by well funded marketing teams (see: cheap money and marketing), it’s never been harder to keep your stack simple, and logically consistent. Many teams have given up entirely and are leaning into developer empowerment and polyglot infrastructures. We’ve collectively taken on the complexity of targeting multiple stacks, their idiosyncrasies, their need for training, and their upgrade cycles due to raising standards, while we’re simultaneously splitting our resources for managing that complexity by taking on the needed training, upgraded cycles, and idiosyncrasies of these complex polyglot stacks. Not to mention the unique interactions of these technologies, with our previous technology choices, which are still lingering in the stack. The real horror stories these days in infrastructure aren’t the load spikes of days of yore (“getting Slashdotted!”) but those complex interactions: how PHP’s GRPC library interacts with Envoy, how Scala’s JSON library tickles Varnish caching issues, how MySQL’s weird implementation of utf8mb4 is incompatible with storing your data literally anywhere else. There is a reason that tech debt has become the favorite bugbear of teams everywhere.
Without standardization in your company, without a small number of well known tools in which you’re developing expertise as a team, the hope that you can grow your team logarithmically but see exponential results is a fantasy. That discipline is harder than ever to enforce.
The complexity of large teams and aging code bases
There is so much to say on the topic of large teams and aging code bases, and so much of it has been covered well elsewhere. I want to focus on just the important changes we’ve seen related to the other trends we’re discussing in this post.
Cheap money and founder friendly funding in the last decade has led to more founder control and deeper pockets. That control means we’re more likely to see attempts at continuity in companies. That means two decades into the Internet era of tech startups and a decade into cheap money, we’re seeing significantly older codebases. Older codebases compound the explosion of technical choices, and the sometimes poor technical decision making. Older codebases, with a longer history, mean more choices. More choices, and a lack of clarity around which of those choices are load bearing means significantly increased complexity for anyone onboarding to the codebase.
Teams are also getting larger, as we discussed in part 1. As teams get larger, complexity goes up for several reasons. First as we slice up responsibility for developing our software into thinner slices there are fewer people who have touched the whole system and have a coherent view of the whole architecture. Coherence is one of the key characteristics we look for in simple architectures, and its absence drives complexity. Additionally large teams spend more time dealing with coordination and are more likely to reach for architecture and abstractions that they hope will reduce coordination costs, aka if I architect this well enough I don’t have to speak to my colleagues. Microservices, event buses, and schema free databases are all examples of attempts to architect our way around coordination. A decade in we’ve learned that these patterns raise the cost of reasoning about a system, during onboarding, during design, and during incidents and outages. Finally, as teams have grown, and individuals’ scope of responsibility have narrowed, resume and promotion driven design has found increasingly fertile ground. How do you stand out as the 500th person maintaining a system you didn’t build? Build something new! And all of the complexity inherent in it. Google, as with so many of the best and most problematic patterns in this era, is well known as the epicenter of this phenomena, but you see it broadly as teams grow.
Aspirational Complexity
As an industry we’ve always been enamored with new technology and shiny objects. For years it was almost definitional, otherwise why did you go into this industry? Interestingly, even as the job has mainstreamed, the infatuation with complexity has remained, and even grown.
First, complexity lies at the heart of our industry’s mythologies. New people joining the industry are taught our myths about Google, Facebook, Amazon, and a sense that these companies’ approaches are what software is “supposed to” look like. And fewer and fewer people are in position to have a wide enough scope of responsibility to learn pragmatic counter lessons the hard way.
Second, during the era of abundance, when OpEx was easier to deploy than CapEx, cloud and SaaS exploded. These services come backed with significant marketing budgets whose job is to convince you that you need the complexity. Why deploy a database when you could deploy a non-relational datacluster, why deploy a server, when you could deploy a Kubernetes cluster, why build simple web pages when you could use React. Hacker News in particular has an interesting role in this cycle, being both a community driven by industry mythology, and also the marketing arm of a major source of funding for new developer oriented SaaS offerings. Now your community is reinforcing the message that good software is complex software, and that last year’s technical choices are out of date, and probably why your productivity is suffering.
And it was easier to raise capital if what you’re doing sounds high tech and complicated. Really it was a flywheel of people being able to raise money by sounding complicated and smart, and then spending that money on people who made them feel like they could help solve a hard problem in a complicated and smart way, with everyone getting paid and emotionally validated along the way. We’ve developed an aesthetics of complexity: the sense that a good system is a complex one, that you should prefer a SPA over a web page, a distributed system over a simple one, a service over a config file, the idea if you aren’t on the latest technology you’re wasting your time, and potentially damaging your career.
The more things change
The race between improved productivity from better tools and the drag of increased complexity, inherent, accidental, and aspirational, isn’t particularly new for our industry. If you talk to people who worked at Sun, SGI, or Oracle at the end of the 90s they’ll quickly point out to you that much of this is cyclical. The era of cheap money certainly juiced some of these trends, but without other conflicts in the workplace around outcomes and expectations we wouldn’t be at this inflection point.