Maybe it’s due to all the Sherlock Holmes I read as a kid but I enjoy the process of debugging and exploring technical outages and failures. The more cryptic and challenging the problem the more fun it is to sink my teeth into and the more rewarding it is when I finally get to the root cause. The real benefit of going through failures is that it is one of the best ways to get familiar with a tool or a technology. When things work we often only get a superficial understanding of the technology since we have no reason to go beyond the surface. When things fail, on the other hand, we’re forced to dig deep until we understand the cause of the failure and can make the necessary modifications to prevent the same issues in the future.
Digging through a problem makes us understand how sensitive many of our systems are and what sorts of edge cases will cause them to break. Rather than thinking about failures we often get tunnel vision and build something to work in a perfect world where everything is going to come in as expected. Outages break us out of this optimism and get us to actually think about and build for the real world.
Anna Karenina starts with the sentence “All happy families are alike; each unhappy family is unhappy in its own way” and the equivalent version for software engineering is “All functional code is alike, each dysfunctional code is dysfunctional in its own way.” Rather than getting frustrated at broken systems we should use them as an opportunity to learn since we’re not going to get that from a perfect system.