By: Anurag Sharma
As a big fan of reliability engineering, I couldn’t miss reading the biggest, thick and exciting Google SRE series book: ‘Building Secure and Reliable Systems’, Recently launched in April 2020.
The second word of the title is ‘Secure’ suggesting that now security is going to be an addition in the reliability equation and make it complete. Security completes reliability; if you have read previous editions of the SRE book then you must have realized security aspects were perhaps missing.
One of the best things about the book is its format, it’s to the point, clearly articulating where principles, practices, and theory matter. This is why it’s relatively bigger. You also need to understand its Google way story – it’s very hard to provide an explanation of everything they do at Google. Before you read it further, my review is purely based on my learning experience.
The following are my key learnings from the book:
- Security is just not fulfilling compliance. Don’t compromise security and reliability by getting tempted with velocity. A strong foundation is set-up in the first few chapters aiming to address challenges related to security and reliability. This clearly mentions if a system can’t respect a user’s privacy it can’t be reliable. Remember you need to earn and keep the trust of the customer – communicate to customers even if everything is alright!
- The initial chapters are fantastic and explain about attackers’ behaviour and motive with simple explanations. This is something expected, considering Google must be having great fun when it comes to security. This also demonstrated that all security attacks can be traceable to find their motive.
- Part II is where things start getting more interesting compared to the previous one. It’s where the book focuses on principle and practices to implement security and reliability requirements in the most cost-effective way. Security by design holds greater importance. You have excellent chances to implement security in every IT management process if you formalize infrastructure design and automated control from the design stage
- The Safe Proxies case study is really nice. At last someone is talking about proxies. Proxies can help you to make systems more reliable and secure. Also this approach can be a more cost-effective option for an existing system landscape
- The concept of least privilege is explained nicely. Google definitely has spent significant effort in implementation of this model. But here I was expecting some diagrams to get more clarity. The book simply advocates this phenomenal paradigm to protect systems and data from malicious or accidental damages
- I love the proliferation of ‘-illities’ : Forget about reliability and availability; here they are talking about ‘understandability’, ‘extensibility’ and ‘Readability’. I like understandability most. Understandability is a simple term for its ability to represent information in such a way so that the reader can easily comprehend it. By understandability Google truly intends to design a system to be understandable and maintain understandability over the time. The book states understandability has concrete benefits:
- Decreases the likelihood of security vulnerabilities or resilience failures
- Effective incident response
- Increases confidence in assertions about a system’s security posture
Some other ‘-illities’ directly from book which anyone can enjoy:
Usability: The library has a simple and easy-to-use API, so a software engineer can focus on the desired functionality – for example, implementing block and streaming Authenticated Encryption with Associated Data (AEAD) primitives.
Readability and auditability: Functionality is clearly readable in code, and Tink maintains control over employed cryptographic schemes.
Extensibility: It’s easy to add new functionality, schemes, and formats – for example, via the registry for key managers.
Agility: Tink has built-in key rotation and supports deprecation of obsolete/broken schemes. Interoperability Tink is available in many languages and on many platforms
In short, from idea to market to value, in every step you should watch ‘-illities’. This can be a game changer while designing a reliable and secure system.
- The book section on designing for a changing landscape clearly articulates an approach to understand types of changes. This provides massive help in managing expectations of users. This also advocates design strategies with frequent rollouts, containerization, and microservices with manageable workloads for your team and keeping the system more reliable
- Every single chapter of books tells a real life story. Chapter 17 is a wonder, providing a good taste of incident management. It’s truly a demonstration of the art of the possible
After every single page, I was struggling with facts to implement experience in practice but the last Part actually answered it.
It precisely says that culture is the most powerful and unique component and that the engineering practices highlighted in this book can help organizations to build secure and reliable systems, but your efforts will be effective, only if your entire organization is invested in a culture of security and reliability. The Chrome security case study in Chapter 19 is wonderful; it talks about promoting a security centric culture in the team.
I enjoyed this book but I didn’t think it was as good as the previous SRE book. I wished this had more diagrams but considering the complexity and depth of detail, it’s an excellent piece of software literature, an awesome set of valuable lessons and explanations. The book does help to draw attention to security and reliability aspects.
I personally recommend reading this book when you are mostly free and code along with it. You will definitely enjoy it!
Finally, I am very thankful to the entire author team for sharing valuable lessons and making it available for everyone in industry for free.