From 20y experience and CS degree, I see software engineering as a constant struggle against accidental complexity. Like quicksand, every movement you make will pull you deeper, even swimming in the right direction. And like entropy, it governs all things (there are no subfields that are free of complexity). It even seems impossible to give a meaningful, useful definition, perhaps by necessity. All is dark.
But now and then, something beautiful happens. Something that used to be dreadful, becomes "solved". Not in the mathematical strict sense, but some abstraction or some tool eliminates an entire class of issues, and once you know it you can barely imagine living without it. That's why I keep coming back to it, I think.
As a species, I think we are in the infancy stages of software engineering, and perhaps CS as well. There's still lots of opportunity to find better abstractions, big & small.
I'm an Engineering Manager, and I think I have a similar role just applied to people processes rather than code. One nuance though - a lot of the time I suspect it's deliberate complexity designed to obfuscate how little people actually do.
Well, maybe. It's projection, because I certainly don't make simple processes myself a lot of the time, but I do try to optimize them afterwards. I have a few decades of seeing people implement processes than I've had to use, and then had to simplify as I moved into more senior roles. I've had people push back quite forcefully when I've pointed out they do things like writing reports that no one reads or gathering data that teams ignore. People often fight for added complexity because their perception is that it's important, and that means they must be important because they're the one in control of it.
There is an element of projection because there is in most things people talk about; I'm speaking about this through my filters and biases after all. But it's grounded in a fair chunk of experience.
Maybe you are saying the same thing, but couldn't that be explained better by those people being afraid to be made obsolete? Or at least, afraid if having to retrain?
This was really well written and I agree with you completely. Though I am not so optimistic as a species we have much runway left to get meaningfully much farther out of that infancy.
As tech progresses and those abstractions become substantially more potent, it only amplifies the ability of small groups to use them to massively shape the world to their vision.
On the more benign side of this is just corporate greed and extraordinary amplification of wealth inequality. On the other side is authoritarian governments and extremist groups.
Perhaps, but generally annoying millions of technology people tends not to end well for firms. Usually the market simply evolves to better match the fiscal conditions.
HTML attributes are all strings, so javascript's type coercion in general was this (doesn't just apply to ==) - a way to avoid having to do explicit conversions and make values act semantically equal without having to think about types.
A strict ban has always felt to me like we're leaving behind useful functionality.
To get perspective(we know what worked), here’s some 50+ years abstractions:
A file is a simple stream of bytes in Unix. (If you think what else it might be then compare to Multics’ segments). Separate processes that may be connected using simple standard I/O streams [pipe] (vs everything is DLL in Multics) — the concept of shell itself (policy vs. mechanism separation http://www.catb.org/esr/writings/taoup/html/ch01s06.html ).
I disagree. I worked at a protocol designer and implementor for years before people settled on the message queue as the universal abstraction. at the bottom end dumping serialized objects into tcp connections gets you most of the way. and at the top end there is so much leverage around locality, addressing, and transport that we are leaving a lot on the table.
message queues arent at all bad, but they come with additional complexity (most of it operational), and come with a set of limiting assumptions. so my frustration is that they are now the default answer for everything, and we're ignoring this lovely design space, one that becomes increasingly important when talking about scale.
Build tools that enforce hermeticity (cannot depend on files not declared as a dependency) and hashes files (as opposed to using timestamps). This eliminates whole classes of complaints against make.
There's a metric I see omitted here which I call Rube Goldberg complexity.
I'm working on a multiplayer game, for which I haven't touched the code in a while. The other day I asked myself, "what happens when I press the shoot button?"
Well, it sends a message to the server, which instantiates a bullet, and broadcasts a bullet spawn message to all clients, which then simulate the bullet locally until they get a bullet died message. (And the simulation on both ends happens indirectly via global state and the bullet update function).
My actual analysis was like 3x longer than that because it focused on functions and variables, i.e. the chain of cause and effect: the Rube Goldberg machine.
I laughed when I realized that name was actually too charitable because at least in a Rube Goldberg machine, all the parts that interact are clearly visible and tend to be arranged in a logical sequence, whereas in my codebase it was kind of all over the place.
So that made me realize, a function is not really a sensible unit of analysis. They're too isolated. You want to try and understand a feature.
Also, I'm experimenting with organizing the code by feature, rather than by "responsibility." i.e. the netcode for the bullet should be in the bullet module, not the netcode module.
I forgot the punchline: Rube Goldberg is the right frame, but it's aspirational. You achieve Rube Goldberg when all the parts of a feature are clearly visible and arranged in a logical sequence...
> So that made me realize, a function is not really a sensible unit of analysis. They're too isolated. You want to try and understand a feature
The central element of code, IMO, is data. Where it comes from, where it goes to, and how it transforms. Whatever your paradigm (imperative, OOP, functional, relational,…), you need to design your data model first and its shapes through the stage of its lifecycle.
Once you isolate some part of the data, what I do next is to essentially create subsystems that will provide behavior. That’s the same idea as the entity component system. It’s also found in Clean Architecture (data is the core entities, uses cases are entities behavior, and everything else are subsystems) and DDD.
In your example, the netcode should be focused on state replication mechanisms, not the state itself. The state being replicated should be managed by the bullet entity, which is a core part of the game.
Such separation of mechanisms (subsytems) and policy (core logic) is a very good way of keeping cost of changes down.
Related problem I've been exploring lately: finding which files are most worth refactoring, with complexity as one of the inputs.
I've built a small, opinionated tool for that [1]. It can rank files by a "Refactor Priority" score based on structural signals - size, callable burden, cyclomatic complexity, nesting - with churn and co-change from local git history layered on top.
It's more of an exploratory tool than a general solution, but it's been practically useful for quickly spotting painful files.
Part of why it was built: keeping coding agents in check. They tend to produce code that gets complex fast, don't feel the complexity building up, and eventually start making changes that break things.
So the tool helps me catch files that are getting out of hand before that happens. It can also generate a refactoring prompt explaining why a given file is problematic - as a conversation starter for the agent.
The article gave me a few more metric ideas to try, thanks.
Claude Code and others often write code that is more complex than it needs to be. It would be nice to measure the code complexity before and after a change made by the agent, and then to tell it: "You increased code complexity by 7%. Can you find a simpler solution?".
Something that I have started doing is to tell it to spawn a reviewer agent after every code change, and in the Claude/rules folder I specify exactly how I want stuff done.
Mostly I use it to write unit tests (just dislike production code that is not exactly as I want). So there is a testing rule for all files in the test folder that lays out how they should be done. The agent writing the tests may miss some due to context bloat but the reviewer has a fresh context window and only looks at those rules. So it does result in some simpler code.
You dont need to measure the code complexity, you can completely hallucinate those numbers if you feel that code is too complex and then watch how LLM will respond.
> A famous example of something developers might be able to explain easily but still cannot easily comprehend in code is the concept of a Monad in Haskell.
Isn't it exactly the other way around? Monads are easy to use, yet seemingly no one has ever been able to explain them...
Complexity directly impacts security. Simple systems are:
Maintainable: Easier to change and manage.
Reliable: Less prone to logic errors.
Testable: Easier to validate and test.
There was a study I read recently that analyzed the different complexity metrics and tested whether they relate to developers ability to understand the code.
Most of them, especially Cyclomatic, did not align very well with the ability to understand, there was only one of the standard ones (can't remember which one) that kind of got close.
I haven't read this yet, but from the title I'm surprised that it doesn't mention Kolmogorov complexity. How does Kolmogorov complexity relate to the concepts in this article?
Wonderful article, thanks for sharing. These complexity definitions and the connection to linguistic complexity are useful. Also enjoyed this line:
> The cognitive complexity of a function can only be determined by the reader, and only caring about the reader can enable the writer to improve the learning experience.
Spiral development with dependency injection is avoidable. A zero-cost solution is enforcing workmanship standards, well documented simple/clean design-patterns, and doxygen discipline.
But now and then, something beautiful happens. Something that used to be dreadful, becomes "solved". Not in the mathematical strict sense, but some abstraction or some tool eliminates an entire class of issues, and once you know it you can barely imagine living without it. That's why I keep coming back to it, I think.
As a species, I think we are in the infancy stages of software engineering, and perhaps CS as well. There's still lots of opportunity to find better abstractions, big & small.
There is an element of projection because there is in most things people talk about; I'm speaking about this through my filters and biases after all. But it's grounded in a fair chunk of experience.
As tech progresses and those abstractions become substantially more potent, it only amplifies the ability of small groups to use them to massively shape the world to their vision.
On the more benign side of this is just corporate greed and extraordinary amplification of wealth inequality. On the other side is authoritarian governments and extremist groups.
https://en.wikipedia.org/wiki/Competitive_exclusion_principl...
The Internet itself will likely further fracture into different ecosystems. =3
Language specific for JavaScript: Strict comparison operator === that disables type coercion, together with banning ==.
== allows "5" equals 5.
A strict ban has always felt to me like we're leaving behind useful functionality.
A file is a simple stream of bytes in Unix. (If you think what else it might be then compare to Multics’ segments). Separate processes that may be connected using simple standard I/O streams [pipe] (vs everything is DLL in Multics) — the concept of shell itself (policy vs. mechanism separation http://www.catb.org/esr/writings/taoup/html/ch01s06.html ).
https://retrocomputing.stackexchange.com/questions/15685/wha...
For comparison, you need a new app on iOS for what might have been a shell pipeline (hierarchical file system is absent at user level).
message queues arent at all bad, but they come with additional complexity (most of it operational), and come with a set of limiting assumptions. so my frustration is that they are now the default answer for everything, and we're ignoring this lovely design space, one that becomes increasingly important when talking about scale.
I'm working on a multiplayer game, for which I haven't touched the code in a while. The other day I asked myself, "what happens when I press the shoot button?"
Well, it sends a message to the server, which instantiates a bullet, and broadcasts a bullet spawn message to all clients, which then simulate the bullet locally until they get a bullet died message. (And the simulation on both ends happens indirectly via global state and the bullet update function).
My actual analysis was like 3x longer than that because it focused on functions and variables, i.e. the chain of cause and effect: the Rube Goldberg machine.
I laughed when I realized that name was actually too charitable because at least in a Rube Goldberg machine, all the parts that interact are clearly visible and tend to be arranged in a logical sequence, whereas in my codebase it was kind of all over the place.
So that made me realize, a function is not really a sensible unit of analysis. They're too isolated. You want to try and understand a feature.
Also, I'm experimenting with organizing the code by feature, rather than by "responsibility." i.e. the netcode for the bullet should be in the bullet module, not the netcode module.
The central element of code, IMO, is data. Where it comes from, where it goes to, and how it transforms. Whatever your paradigm (imperative, OOP, functional, relational,…), you need to design your data model first and its shapes through the stage of its lifecycle.
Once you isolate some part of the data, what I do next is to essentially create subsystems that will provide behavior. That’s the same idea as the entity component system. It’s also found in Clean Architecture (data is the core entities, uses cases are entities behavior, and everything else are subsystems) and DDD.
In your example, the netcode should be focused on state replication mechanisms, not the state itself. The state being replicated should be managed by the bullet entity, which is a core part of the game.
Such separation of mechanisms (subsytems) and policy (core logic) is a very good way of keeping cost of changes down.
I've built a small, opinionated tool for that [1]. It can rank files by a "Refactor Priority" score based on structural signals - size, callable burden, cyclomatic complexity, nesting - with churn and co-change from local git history layered on top.
It's more of an exploratory tool than a general solution, but it's been practically useful for quickly spotting painful files.
Part of why it was built: keeping coding agents in check. They tend to produce code that gets complex fast, don't feel the complexity building up, and eventually start making changes that break things. So the tool helps me catch files that are getting out of hand before that happens. It can also generate a refactoring prompt explaining why a given file is problematic - as a conversation starter for the agent.
The article gave me a few more metric ideas to try, thanks.
[1] https://github.com/etechlead/token-map
Mostly I use it to write unit tests (just dislike production code that is not exactly as I want). So there is a testing rule for all files in the test folder that lays out how they should be done. The agent writing the tests may miss some due to context bloat but the reviewer has a fresh context window and only looks at those rules. So it does result in some simpler code.
Now you're assuming a human is actually trying to understand the code. What a world we live in (sarcasm).
Isn't it exactly the other way around? Monads are easy to use, yet seemingly no one has ever been able to explain them...
https://www.youtube.com/watch?v=SxdOUGdseq4
https://www.sonarsource.com/docs/CognitiveComplexity.pdf
Complexity directly impacts security. Simple systems are: Maintainable: Easier to change and manage. Reliable: Less prone to logic errors. Testable: Easier to validate and test.
Most of them, especially Cyclomatic, did not align very well with the ability to understand, there was only one of the standard ones (can't remember which one) that kind of got close.
> The cognitive complexity of a function can only be determined by the reader, and only caring about the reader can enable the writer to improve the learning experience.
https://en.wikipedia.org/wiki/The_Power_of_10:_Rules_for_Dev...
Maintainable coding practices are a skill like any other. =3