Residuality Theory for Antifragile Software Architecture

Small Talk with Barry O'Reilly (Transcript)

This blog post is the transcription of the chat we had with Barry O'Reilly about his Advanced Software Architecture Workshop on 16 November 2023.

The conversation has been slightly edited to better fit the written format. Enjoy!


Avanscoperta: Your workshop is going to be a three-day course in person and it's about Residuality Theory.
Would you like to introduce us to your workshop and what is it about?

Barry: The workshop is called the Advanced Software Architecture Workshop and it's a workshop for people who are looking for more advanced techniques and new techniques for doing software architecture.
Its roots go back over 10 years, and when I started I was trying to answer questions like: “How do we teach architects to be architects? What is it that they actually do? What do we need to teach them? What's the difference between a developer and an architect? And ultimately - what actually is architecture?”
And this was a project that I took on in 2013 while I was working in the role of the global lead for the Solutions Architecture community at Microsoft.

Obviously, at that time, we were undergoing a massive change where we were moving from these huge on-premise platforms, things like SharePoint, CRM and BizTalk into the world of Azure and cloud. Suddenly developers and architects were working outside of the boundaries that had been drawn for them for a very long time, and they had to suddenly piece together solutions from many different past cloud components and they had to start thinking architecturally on a different level and needed a much better grasp of distributed systems and decision-making, following things up, testing and verifying their architectures.

And I started to think about: “Well, what is it that we should teach them? What does an architect need to know in this modern environment?”
I started digging into the materials that were already there and already available to software architects. Obviously, in the course of the last 25 years, a lot of the traditional architectural tools that we've leaned on, like requirements engineering, process mapping, capability mapping and all of these heavy frameworks that we inherited from the 80s and 90s, started to be questioned.
So part of that work I was doing involved asking myself: “Do these things actually work? Are they actually necessary? Is the architect some kind of oracle who sits in the middle of a project and makes all these decisions, or is there a lot more going on?”

And I came to the conclusion that we were using entirely the wrong tools and the wrong underlying philosophies to describe architecture.
I started to dig into the complexity sciences looking for ideas and inspirations about: how do other professions and disciplines deal with working with problems when we don't actually know what's going on, when we don't have models that we can easily refer to, when the ground keeps shifting beneath us.
It turns out there's a fair body of work there that's very relevant to software engineering, but it hasn't really been touched seriously, certainly not in an academic sense.

I started putting these ideas together and I came up with something that originally was called “anti-fragile systems design”, which was a way of building systems that would survive in environments where we don't know what's going to happen.

Because in truth, that's what an architect does; you have to design a system and you're going to put it into an environment but you don't know what's going to happen in that environment. But when you put it into that environment, it will live or die based on the structural decisions you make about the software system.

So that's essentially what we do as software architects - we make decisions about the structure of a system and we don't know what that structure is going to be exposed to.

Over time, I started talking about these ideas, I talked about them at DDD Europe, way back pre-COVID, and started to get a lot of interest in what I was doing. And I decided to take a very different path than a lot of other thought-leaders in the IT industry, saying to myself: “I'm going to do this properly. I'm going to go back to university. I'm going to turn this into a research project and I'm going to scientifically beat up my own ideas. And I'm going to either prove or disprove that this actually holds in reality.”

And right now, we're getting very close to the point of publishing and saying that actually, we've shown that this holds in reality.

And the product of all of this is something which can be called Residuality Theory, which is a set of ideas that combine ideas from philosophy, the complexity sciences, and software engineering to produce a very lightweight set of tools that help us to navigate uncertainty and to turn uncertain situations into coherent software architectures without any hand waving, hocus pocus or magical thinking - in a way that's validated, verified and can work.

The result? A completely different way of thinking about software architecture, a completely different way of relating software to the environment that has to exist in and to the world that it's inevitably connected to.

Avanscoperta: It’s pretty cool that you’ve invested so much time in validating all of this from a scientific perspective. Why did you decide to do it this way?

Barry: What I found when I was teaching architects is that generally, if you're a junior architect, on a theoretical basis, there's very, very little for you to grasp. There are a bunch of books and stuff on patterns that get recommended. But the honest truth about architecture is that most of us learn how to do architecture by just jumping in headfirst and dealing with it.
And you either sink or swim as an architect, you pick it up or you don't. And no one really knows how that happens. No one really knows what it is that works.

What tends to happen is when you meet architects and senior software engineers in real life, and you’re junior or have not really arrived yet at a sense of confidence in your architectural ability - you’re basically going through this whirlwind of ideas and books and blog posts and memes that you’ve picked up. As a result, you tend to find yourself believing in things a lot of the time.

And so when I'm teaching architects, one of the first things I have to do is get these folks to question their current set of beliefs. “Where has this come from? Why do you believe that this particular process, framework or methodology actually works in reality? What's the evidence that this is real?”

And even if you've got it to work on a bunch of projects, that's not scientific evidence, that could be a coincidence. You don't know that it's the actual method or the framework or the thing that you do or the book that you've read or the language that you use that's given the results. There's no way to separate these things.

And so the first part of training architects, when they're junior, is to push back and say: “These ideas, this catalogue or this toolbox that you bring with you, how do you know that that's what's actually working?”

And a big part of it is getting architects to understand that it's not the toolbox that's delivered successful projects or successful architectures - the key factor is you! It's your ability to think through problems and all of the tools and frameworks are mostly just ceremonies and other people's opinions that you've borrowed.

I find that when I started to take this perspective, I would listen to people talk about things like: “You have to do this, and you have to do that, you have to stand up in your meetings, you have to do all of these things”... And every time I would ask the question to myself, not openly: “Where's the scientific verification for what's being said here? Where's the proof that this actually holds?” And there never is any. And that led me to be deeply skeptical about a lot of the work that's out there and a lot of the recommendations that are made.

So I wanted to separate my work from that kind of thinking and make sure I could say what works and what doesn’t - from a scientific point of view.
This has given me a chance to really and deeply think about what for me initially were heuristics and things that I picked up during my own career, habits that I formed, and not backed up by science at all.

Being able to put those through this validation and verification from very grumpy academic people who won't let you use a word without defining it and things like that, makes something much, much richer and much more solid at the end of the day.

Basically, this was my motivation for doing it in this particular way, because I want to be able to point at something and say: “This holds, this is real”. And from there, build something concrete that won't be washed away by the next trend, wave or hyped-up thing that comes along.
And hopefully, also makes architects a little bit more rigorous and skeptical to people who are trying to sell them things as well when they're exposed to this thinking and these kinds of ideas.

Avanscoperta: Interesting to see that you’re aiming to put together a set of principles and tools that also work in very new things and things you don't know.
How is that exactly working? How do you have a scientific basis for something that's new, uncertain and we don't know about yet?

Barry: The way to do that is to design an experiment. And so, and if you go back to the YouTube videos from DDD Europe where I show the way that this approach works, we take an architecture, any architecture, we just say: “We have this problem, how would we solve it?” And you come up with a very quick sketch and say: “This is my naive architecture”.
And the way that this approach works is that rather than gathering requirements, rather than seeking information from people directly, rather than asking the question: “How should this work?”, we start to ask questions like: “How does this fall apart? How does it break? How does it crumble?”

Because all architectures will crumble and they crumble at different stages. So your picture of the architecture, the structure that you have in your head, your beliefs about how this human and business system is going to work, that's a structure that you create at the start of the project.
And inevitably that structure will crumble and you'll realise that you were wrong about almost everything. Everything you thought at the start of this project was wrong. That's something that the agile movement should have taught us, they should have taught us that that's the way it works.
But there's still people out there who try to define everything on day one or they don't realize that even if we work in an agile way, if you have an idea and you're stuck on that idea, that idea will stay in the project forever. We have to be able to get rid of it.

And so what I do is I take this naive architecture and I start to stress it. And I start to say: “Well, what if we're wrong?” We do this in a very, very different way than people usually do with edge cases, scenario analysis or risk analysis because those are very, very different things which happen at different stages in the project.

Now with this approach, I suggest we're actually testing our structure and we're saying: “What could I have misunderstood?” And so what we do is we make things up and we say: “We have this thing here, and there's an assumption like a customer has a certain amount of money. What happens if they don't have that amount of expendable income anymore? There's an assumption that the market is open. What happens if there are changes on a regulatory basis? What happens if our competitors start to behave differently? What happens if the partners that we need to work with start to behave differently in the market? How does that affect the structure or my understanding of that system? And what would this naive architecture that I came up with, what would that look like in terms of this particular form of stress?”

This changes very much how we describe our system. So instead of me saying: “Let's map some user stories or let's ask our business stakeholders ‘What are your requirements?’”, and instead of just copying someone else's software, we say: “What can go wrong? What keeps you up at night?” And it leads to an entirely different kind of conversation.

It requires a whole other level of architectural skill because I've got to be able to go into a business and talk to them about their business model and say: “What is this? How do you make money? How does that fall apart? What are your competitors doing? Who are your competitors? What's your relationship to your government, to your society, to your markets, to your customers, to social norms, trends and political ideas that are moving and shifting all the time?”

And for a lot of software developers, that's a huge jump in terms of how big the problem is. So we're still doing programming. We're still talking about what is the structure of this program, but we're talking about things like organizational culture, political change, economic variability and volatility, and how that affects the software that we're going to build.

And it moves us very, very far away from this traditional architectural approach where you wanted to map some capability and processes or capture things in terms of language and model that language.

So this is what happens; over time, as you stress the application, and this was the way I worked as default, and I never thought that much about it, I thought I was just negative in that I approached things from this perspective… so when you stress an architecture like this, your original naive architecture starts to fall apart. You have to put it back together every single time in a slightly different way.

After a certain amount of stressing of the architecture, you start to notice that as you introduce new stress, the system doesn't need to be changed. It starts to survive things that it hasn't been designed for, which is exactly the goal that we should be aiming for as architects.

So as a programmer, you're taught at university that you have a problem and there is an answer, and that answer is correct or not correct. Very binary - it either solves the problem or it doesn't. There is a correct answer.
As an architect, there's no correct answer. There's no correct structure that is the absolute right. There are many possible structures that could solve the problem, and there's no way of knowing which one of them is correct.

This methodology allows us to test and say: “This architecture might not be correct, but is it critical? Doesn't have the property of criticality.” And the property of criticality means: “Can it survive in conditions that it hasn't been developed for?”
So we're taking this naive architecture and we're stressing it until it falls apart, and then we're using different methods to put it back together.

I gave a talk yesterday at a conference on AI where I was talking about the similarities between this kind of thinking and the use of something called a diffusion model in AI where we add noise to pictures and chart the journey from a picture to noise, and then use that journey to move backwards from noise with new pictures. That's more or less what we're doing here. The mathematics for us isn't as formalized, because there are too many variables and too many things moving.
But it's a very similar process to a diffusion model within AI, so it allows us to produce new architectures out of noise.

These stressors, these things that happen to a system, aren't necessarily things that are going to happen in real life. It doesn't matter whether they happen or not. It's very different than in edge cases, very different from risk management. People mix those things up because they feel that they're similar. We're talking about things that are outside of our structural understanding of what's actually happening, and that's what we have to do, that's what we have to test.

One of the things we've lost in traditional architecture is that there's a very big separation between the fractures in architecture, the way that we break architecture down, and the kind of things that can expose our structure to problems. There's a very big difference between organizational and business risk. Those are two completely different things. Also, it's a completely different thing than technical risk, such as: “How can this system break?”

So there are three things: technical risk, business risk, and architectural change. Because they all look similar to each other, we bake them into one process in most projects, and we lose a lot because then we're not doing architecture anymore, we're just doing risk management or edge case management. But this is more about doing architecture than any of those other things.

Avanscoperta: Good hook for the next question. At one point you said that this is something that developers need to do, maybe not all developers are ready for that, we need different levels of expertise. So my next question would be: Who are we aiming for with this workshop and with your theory as a whole?

Barry: To understand even what I'm talking about, you have to have been around for a few years at least. You have to have designed software solutions, and you have to have experienced that the tools that you're using, the way that you map things are actually incredibly limited compared to the complexity of the environment that you're working in.

For most software engineers, if you're working within enterprise environments, it only takes a few years to figure out, or in some cases a few months, that the way we've been taught to think in terms of rationality, logic and math at university doesn't hold up in the real world.
We can't predict or control the way that things are going to move in a business environment.

So this approach is for anyone who's had a couple of years of experience in software engineering and is starting to make design decisions, starting to realize that there's more going on than just code, algorithms and patterns, so it is for anyone who's been exposed at that level.

What I find in reality is that once you teach a few people this kind of approach, it spreads right through even the junior developers who are just starting because they know what's coming further down the pipeline and as they're making decisions, as they're putting perhaps smaller code structures together, they'll start to stress the environment.
It becomes a way of behaving as we're designing software systems.

But in the workshop itself, I've had senior enterprise architects and senior developers, there's quite a few senior developers who take the course and get a lot out of it. I've had people from operations, I've had project managers, I've had CTOs who come on this kind of course. Sometimes I get senior managers who are very interested in the ideas and so it seems to have a very broad appeal.

It's for anyone who is interested in finding new ways of answering the question: “What should the structure of this software application be?” And that's not just architects, that's any software engineer. And it’s for anyone who's feeling the pain of a lack of organization and how you make structured decisions about software.

But at the same time, I know that's a lot of people who are very skeptical about traditional approaches to architecture and enterprise architecture. For those people, this is a brilliant way out of all that formality, control and centralized thinking that traditional architecture had.

This gives a much better way to make solid architectural decisions that you can back up without having to go into that world, which has largely been discredited in the last 10 years.

Avanscoperta: A question from Marco Perone: “How do you prove things in software architecture? Does it happen like you do in math or in some other way?”

Barry: It's not possible with a software architecture to produce mathematical proof in the same way that you can produce a mathematical proof for a particular algorithm or for a mathematical theorem.

Within software architecture, the only thing we can do is to prove that the architecture will survive things that it hasn't been designed to prove. We can prove that it has some degree of criticality. And there's no number that we can put on that because you can't, and this is one of the huge problems with software engineering research.

You can't compare a project that's being executed in Milan by a team of five people with a project that's being executed in Berlin with a team of 50 people from a completely different industry.

And so actually proving things in software engineering is very, very difficult and it's easy to fudge things and say that a certain approach works, and so what we get is case studies and then that case study gets presented as proof.
See what happened with Spotify - the world wanted to implement that particular thing all over the place, then it doesn't work anywhere else and everyone gets confused.

The way I do things is to use our own initial structure as the control. So you design an architecture, that's your naive architecture, and then you start to stress it. And at the end of the day, using a bunch of techniques, such as stress, and matrices to compress these architectures, filter them and put them back together, which was kind of a denoising process, or tools like failure mode analysis to push this into a single coherent architecture.

At the end of that, we run a fairly simple mathematical test against the naive architecture, which is our control, and the new architecture, which is what we've done, and say: “Which one of these is most likely to survive something that it hasn't seen before?” What I've shown is that, statistically, and in a statistically significant way, when we go through this process, we will survive things that we haven't designed the system for in a better way.

That's how we prove what we've actually done, and that the software architecture actually holds up.
And then of course, as a normal developer, you have to prove that it works functionally. And we do that, obviously, with testing in whatever way it feels most appropriate to us.
But that's how we prove that software architecture actually does something useful. And that's how we would compare two software architectures because that's the thing that's important.

For any one particular problem, there are thousands of potential architectures that could work. Which one do we want to have? We want to have the one that's going to survive the surprises that we all know are there.

Avanscoperta: New question by Ashley. “What are the 'patterns' or archetypes of stressors? Ways that you can formalize the concept and not take an ad hoc approach to identifying relevant stressors?”

Barry: The answer to this question is that indeed you have to take an ad hoc approach to identifying relevant stressors.

This is one of the key aspects of Residuality Theory. The things that are going to stress your architecture have to be concretely related to the context that we're working in. And those contexts shift and change from project to project.
So one of the first questions I get when I present these ideas is: “Is there a list of stressors that we can just take and use in every single project and then we won't have to think about any of these things?”

And that tendency has always been there in architecture: “How do we boil this down? How do we make this abstract? How do we make this repeatable so that we can do it over and over again?”

I try to stop architects from thinking along those lines.

So every time you go into a context, you can't assume that things are going to be the way they were before. And one of the things that I was talking about in LinkedIn yesterday, I got a little bit of heat for it, was that we need to stop doing architecture by checklist.

You can't do architecture from a unique context with a checklist that's been produced in the past because things are shifting and changing and moving all the time.

And so one of the things I do know a little bit better, which I think is an answer to Ashley's question, is that there are certain classes of stressors that if you miss them, will make your architecture much, much weaker.

So one type of stress is technical stress. So I have a bunch of components connected to each other. What happens if one of those components blows up? What happens if there's a change in the structure of one of those components? What happens if there's a third-party component and that third party goes away?

Another kind of stress and one that people usually struggle with is business model stress.
So I have a business model. I have money coming in. I have money going out. What happens if the money going out suddenly increases? How does that impact my architecture?
In that business model, I have partners. What happens if those partners raise their prices and stop delivering in the way that they've delivered? What happens if they start to compete with me? What happens if the value in the product that I'm delivering suddenly is seen as not being particularly valuable or that it's actually a cost?
That allows us to generate stress around the business model. But in order to properly talk about stress in our architecture, I have to be able to connect those business-level stressors, political stressors and economic stressors to the technical stressors inside my architecture.

Every single time we do this, that's a journey, because you're constantly learning: “What is this environment? How does it work? What's the relationship between this component breakdown and this particular economic situation and this particular market at this particular time with this particular product?”

Part of the skill that an architect has to have is to be able to answer such questions, so one of the ways that I'll introduce this topic is by saying: “You're building an architecture. What happens in your architecture when a competitor drops their price?”
And this is a test of where you are as an architect: “Are you still down in the weeds dealing with the technical stuff or can you relate this change in a market which seems very far away, very esoteric to a lot of technical people? When a competitor drops their price, where does it hit your architecture?”

And so part of the point of this course is to get people to have the level of architectural maturity to be able to answer these questions because they do impact the architecture.
Your architecture will change based on your business's response. And that's a form of stress.

The way of generating stress has to be left open. And as an architect, you have to realize that this is going to be a new thing every single time - it's not something that can be written as an algorithm or follow a certain set of practices. If we tried to do that, it would just end up being another checklist. Do A, then do B, then do C. And then there would be some sort of magic formula for producing architecture.

One of the skills that I teach, rather than a process or a pattern, is thinking that you're an architect, you're going into an entirely new environment with a new market, new customers, and new technologies, because these things are changing all the time. When you put all these things together, what's going to stress them? And being able to answer that question is what architecture is all about, I think.

Avanscoperta: We never say it was easy, right?

Barry: It's a very common question. And the truth is not as black and white as I've painted it.

As you start using these ideas, you'll have a set of stressors in your back pocket that you bring with you all the time. The important thing is to realise that those aren't the right stressors for every single project. And you'll have to modify them and make them grow.

You'll learn that...

the richest source of stress is in the business model and in the assumptions that we make about our relationships with other actors in the market.

You’ll also learn that technical stress is related to it and creating the relationships and understanding how those things move and affect each other are incredibly important.

Avanscoperta: We already mentioned not only who this is for, but who is a software architect, so I’d like to dig a bit more into this. Of course we might have a fairly agreeable understanding of who is a software developer, but how about software architects? So what do they do?

Barry: This is an enormous question. I've been a member of architecture communities on different levels for many years and I've led large architect communities at Microsoft and it's absolutely fascinating how this question never goes away and the reason the question never goes away is because we never actually answer it.

An easier question to answer for me is: “What is software architecture?” And software architecture is the structure, the structure of a software application.
Meaning: “What hardware is it running on? What are the components? What are their boundaries? How do they talk to each other? Where are they? And more importantly, of course, how do they feel and how do they fall apart?”

And so Grady Booch would say that a software architecture is all the significant decisions that we make in a project and I more or less agree with that. The truth is that software architecture pops up at some point in our work. There is a structure here and the question is: “Where did that come from?”
And the idea that it's one person who has the title “Architect” who has made all of those decisions because there are tens of thousands of decisions in any simple software structure and this is a meticulous idea.

Decisions are made by developers as they should be, as they build components, as they relate those components to other components, and as they make calls and interactions. Some of those decisions are made by people who have the title architect. And architects are really just advanced programmers.

So they started to answer questions above and beyond: “How many functions should that have? How many arguments should it have? How should I put a boundary around them?”
And they start to ask questions like: “How does that boundary relate to the business environment that we're actually working into the commercial environment, and to the social environment?”

There’s an article I read last week from the 1980s about how architecture really is just advanced programming.

But there's also project managers, and they are making decisions, and they're making architectural decisions, and they're not always entirely aware of that that's what they're doing. And there were people within the business structures, within the managers and domain experts who are making decisions that affect the architecture, and all of those things together produce the software structure.

So that's what architecture is, and all of those people contribute. And the more of those people who understand that these things are deeply uncertain and have to be stressed in order to give us some viability within a changing environment, the better things will be.

The actual role of the software architect then becomes the role of making sure that there's coherence across these decisions, that people are free to make the decisions they need to make, and that those decisions are coalescing towards something coherent in terms of the structure.

And I see it very much as a rule that sometimes we (architects) will take the bigger, heavier decisions that have to be made. Sometimes we'll make people aware of those decisions. Sometimes we're just a funnel for someone deeper in the project, a developer who discovers something that isn't good or isn't thought through and pushes that back up into the business layer.

But software architecture is something that's done in a very distributed fashion. It's not something that's done by one particular person. But you'll find that the more senior someone is, the more time they're spending on architectural decision making.

Everyone is somewhere on the ladder. One of the things I'll teach is that if you're a senior architect, you're capable of working at this level, and you're capable of looking at business stress, it's your job to make sure that the developers and the team also know what you're doing, and they also learn how to do it from you.
Because a huge problem we've had in software architecture is how we train the next generation. And that becomes very difficult whenever architects are seen as these old fuddy-duddies who run around with PowerPoint, lines and boxes, completely disconnected from the real world of code, crashes and incidents.

Avanscoperta: Nice to hear you talking so much about decision-making. A few years back we published a book called Decision Making for Software Development Teams by one of our trainers, Francesco Strazzullo.
One of the key points is that software teams and developers make decisions without even knowing about it. This applies not only at an operational level but also at a higher level. Probably the more senior you get, the more you are aware of that.
Generally speaking, there is a lot going on when it comes to decision-making, which is just not even considered probably as part of the process.

Barry: Decision-making as a collective, as a group of people, is something that is so complex, varied and implicitly volatile that it's not something that we can capture in lines, boxes, processes and frameworks, and say: “This is the way we make decisions about this project”, it just doesn't happen like that.

Traditional old-school architectural approaches have all assumed that it does actually happen like that, which is why they've more or less been excluded from modern discussions about architecture, because it just cannot work like that.

And it's one of the things I do as a consultant is I tend to work a lot with crash projects. So we'll go into projects that have burned out or they're not working, they haven't delivered, and I'll come in.
Whenever someone assumes there's been an architectural failure, I'll come in and they'll say: “What did we do wrong? What's happening here?”
The first thing I'll do is ask them to show me an architecture, a drawing, and someone will sketch something. And I'll say: “Here you've chosen to do this particular task using two components. Why two? Why not one? Why not 15? How did that decision get made?”
And the whole room at that point will look terrified and say: “We don't know. We don't know why it looks like that. We have no clue”.

And there's actual research, which I talk about in the course, that's been done on a wider scale that shows that...

in the broader industry, developers and architects don't know why their solutions look the way they look, which means that decision-making is some sort of weird ad hoc woo-woo thing that no one really understands.

One of the results of taking this workshop will be that you'll be able to point at that and say: “It's two components because of this, because of the way that stress interacts with the system. We see the fracture lines here and we've put them here because this has given us a system that empirically shows a better ability to respond to unknown sources of stress”. This gives us a justification for our component boundaries.

Avanscoperta: A question from Sebastián now: “How does residuality translates in praxis in development? Are experiments like code probes that you regression test in a pre-mortem fashion as a mechanism of stress or is this purely done in modelling?”

Barry: From a business perspective, when I pick up on a business stressor, for example, it's a thought experiment, it's done in modelling, and it has to be because I can't go out into the actual market and force a competitor to drop their price. That's a very difficult experiment to set up and run. So it becomes a thought experiment. As we get closer and closer to technical stress, it begins to become something that we can actually run as probes.

So I say: “If a competitor drops their price, we're going to have to drop our price, which leads to an increase in volume, which means we're going to have to scale up on these servers. Can we actually do that? Does the architecture allow us to scale up?” Then as an architect, you may say: “Yes, it does”. Then you have to ask the technical question: “Does that scaling up actually work the way we think it's going to work?” That has to be done with an actual probe. “How do you count? Can we test this? Can we make this actually work?”

All of those steps are part of the work of software architecture. That's what we're actually doing. It's not only: “How this thing should look? How should the components be broken down? Do they allow us the flexibility to move to a different kind of architecture in a different set of business circumstances? Will it actually work?”

A lot of this work is done on whiteboards before we ever get anywhere close to code so that we have a good understanding of the structures that are going to exist before we get there.
In development, it requires a lot more thought than a lot of teams currently put into the structure. It can be accused of doing too much upfront, but that's absolutely nothing to be ashamed of. I think that's something we're going to see change over the next couple of years, that we will be doing more thought work upfront.

In praxis, there are a bunch of tools we use.
The stressor analysis, which is very human-focused, is very much about capturing uncertainty and is very much about having discussions. What we do is actually replace the process of requirements engineering, and a lot of modelling and risk management. We replace those in the architectural phase.
There's a lot of use of matrices to decide: “What are the actual components in this structure? How do we know where the boundaries are? Where are the fault lines in this architecture?”
Then we move into things like FME analysis (FME stands for Feature Manipulation Engine), which makes a lot more sense to developers, and it’s very much about hammering the technical architecture until we see where the fault line is and making sure that there's a relationship between these things.

A lot of the focus is about making explicit the decisions that you've made, and validating and verifying that those decisions are correct. But before that, we're giving developers and architects a philosophical and scientific basis for making those decisions so that we're not learning by trial and error, which is a very, very expensive way to do architecture.

So one way you can solve these problems is to just go out and say that you’ve used two components because it felt good, put them into production and watch how they fall apart and then rebuild it and then rebuild it again and rebuild it again. Which is exactly what we're trying to avoid here.

Avanscoperta: How does this whole thing work in practice? We are basically talking about a complete switch in mentality and way of working.
A two-fold question: How does it work in a team once you implement this new way? And have you seen any resistance, as it always happens with new things?

Barry: I find that this works in a team incredibly well once people start thinking this way and they start approaching things this way.

Let’s say you have four architects in a room. They've all become architects or senior software developers usually through completely different career paths, different projects, different experiences, different industries, and there are very different ways of doing things.

One of the things we lack as an industry is a coherent way to talk about what we do. So it becomes a lot of gut feeling and when you have a team with more than one experienced person you’ll find people clashing over approaches where you might hear them say: “No, I want to do it this way”, or: “I want to do this way, I read this blog, I know this person and they said this”, and: “I read this book” and so on.
What we tend to find is that it's very difficult to communicate with each other about what's the best approach and there's no scientific way to figure that out.

Once you have this kind of approach, it becomes much much easier for a team to express themselves in a coherent way. Instead of me saying: “I think we should use this language, this pattern, this particular component”, and then fighting with everyone about why it's better than theirs, the rest of the team will say to me: “How did you get there? What kind of stress exists in the environment that we need these particular things for?” Or, if my idea is particularly naive, then the whole team will stress it until it breaks. And then people start to move towards a centre where it becomes easier to solve problems.

So in terms of making a team work together in software engineering, it works much, much better.

In terms of resistance, there is always resistance to new ideas. And that's part of the fun.
When I first started talking about these ideas 10 years ago, some people would get very upset because there were certain things you weren't allowed to say 10 years ago, such as enterprise architecture as a field project, agile doesn't solve all our problems. It used to be incredibly difficult and controversial to just make these statements.

I've had people in London who were ready to throw chairs at me from the audience because I was saying that you can't do architecture by checklist. That drawing processes in a room in the basement for each other isn't architecture. That architectures can't emerge, or when they do, they're rubbish.

And people tended to get offended because a lot of what I'm doing is pointing out the flaws in some of these older methods, these older ideas.
For a lot of people, there's a lot of comfort in those older ideas, especially if they've seen them work in a kind of anecdotal fashion in projects that they've worked in.

There are some people who push back and say: “You can't do this because you can't meet every stressor, you can't solve every single problem because it costs too much.”
When people say that, they've misunderstood the approach, because we're not going to solve every single problem.

In fact, the stressors that we use to generate an architecture, we throw them away at the end of the process. They're not even relevant. And they can be entirely fantastical. They don't have to be real. They just have to let us understand where the fault lines in our architecture are.
All they have to do is produce an architecture that moves easier. It doesn't necessarily have to solve every single problem.

Some people will say. “This is just edge cases, we've already done this”. Mathias Verraes from Domain-Driven Design Europe wrote an excellent blog article where he talks about the “It’s Just Like” fallacy - where people try to understand a new idea in terms of old ideas.
So they say: “This is just risk management”, but it's not risk management because we're not actually managing the risks. We're looking for fault lines in our architecture. We'll manage risk at a later point in the project.

It's not just edge cases because the very word “edge case” means that we're working on the edges of the structures that we think exist. And therefore, we're putting faith into the structure, where this is about destroying the structure from within. It's much more radical and fundamental in edge case analysis.

There has been a lot of resistance, but one of the things I've noticed in the last year is that it used to take me four days to teach an architect how to use these ideas. And those are four tough days because there's a lot of new ideas, a lot of thinking. And then it started to work after three. Recently I was in Germany and I did this in one day, I got people up, working and proving that they'd improve their architectures and coming with numbers at the end of the day.

The resistance is getting less and less, partially because people are getting used to the ideas and the things I say aren't as controversial as they were 10 years ago, partially because they're meeting people who have tried it already and they say that it works. And partially because hopefully I'm getting better at explaining it.

An idea should have resistance. I've seen people in our industry thinking you can get up in a stage, talk about something and get 100% of the people in the room to applaud and agree with me. That would be a sure sign that you’ve said nothing concrete, nothing sensible and nothing useful. There has to be some tension and you should feel some tension when you're introduced to a new idea.

And in the long term, within the next few months, I'm going to publish my PhD thesis which shows that these ideas work in a statistically significant way. So if you want to resist the approach, that's fine. But if you want to argue about it with me, you'll have to produce some peer-reviewed evidence that it doesn't actually work.

Resistance is natural and normal. And if people are honest about why they're resisting something, such as being afraid or scared about something, then that's fine. But a lot of people tend to listen to me talk about this for an hour and then come up with some knee-jerk reaction, which is normally one of five or six different arguments.

What I talk about is a big thing. This is as big in terms of the space it needs to fit in your head. This is as big as object-orientation and we're looking at turning this into undergraduate courses for universities right now. And this is going to be two university modules.

There's a lot going on here. There's a lot that you have to understand. And there will be resistance simply because of that - there's a lot of work that has to be done to really get these ideas to click.
I gave a talk in Stockholm last year and a guy came up to me at the end of the talk and I thought: “I recognize this person”. And he said to me: “This is the fourth time I've seen your talk and now I get it”.

And it seems to be a general rule of thumb that you have to see the talk four times before because there's so much going on, because this goes right down to philosophy, how do I look at the world, how do I think about the world and the organizations around me… it takes a lot to get these ideas.

Check out Small Talk on YouTube or on Spotify.

Credits: photo by Belinda Fewings on Unsplash.


Learn with Barry O'Reilly

Barry is the trainer of the Advanced Software Architecture Workshop (Berlin, 4-5-6 December 2024).

Check out the full list of our upcoming training courses: Avanscoperta Workshops.

Subscribe to our Newsletter 📩 (available in Italian and English) and be the first to know when we schedule a new workshop or we publish a new workshop.