TechLead Bits – Telegram
TechLead Bits
424 subscribers
62 photos
1 file
157 links
About software development with common sense.
Thoughts, tips and useful resources on technical leadership, architecture and engineering practices.

Author: @nelia_loginova
Download Telegram
External Consistency

Recently, I read some Google Research whitepapers and came across several concepts that are not widely used but very interesting from a system design point of view. One of such concepts is external consistency.

We’re all more or less familiar with common consistency levels like sequential, strict, linear, casual and eventual. But external consistency is a little bit different:
To be externally consistent, a transaction must see the effects of all the transactions that complete before it and none of the effects of transactions that complete after it, in the global serial order.

It means that if transaction A commits before transaction B (as observed externally by clients), then timestamp(A) < timestamp(B). So all transactions can be represented as a sequential changelog.

In other words, external consistency guarantees that all clients see changes in the same global order, no matter where they are (same datacenter, different datacenters, different regions).

This consistency level is based on timestamp uniqueness across system components. To avoid timestamp duplication Google implemented a special centralized clock synchronization service called TrueTime. It allows to generate monotonically increasing timestamps across all servers.

For more technical details, you can check Why you should pick strong consistency, whenever possible.

External consistency is actively used in Google Cloud Spanner, Google Zanzibar, probably something similar exists in AWS Aurora. What I like about this model is that it shifts the complexity to the storage layer, so app developers can rely on consistency guarantees in their business logic.

#systemdesign #patterns
🔥2👍1
Latency Insurance: Request Hedging

One more interesting concept I came across recently is request hedging. I've not seen it's actively used in enterprise software, but it can be useful for some scenarios where tail latency is critical.

Imagine that service A calls service B, service B has multiple instances. These instances can have different response times—some are fast, some are slow. There are number of protentional reasons for such behavior, but we'll skip them for simplicity.

Request hedging is a technique where the client sends the same request to multiple instances, uses the first successful response, and cancel the other requests.

Obviously, if you do this for all requests, the system load will increase and the overall performance will degrade.

That's why hedging is usually applied only to a subset of requests.

The following strategies are used for a request selection:
✏️ Token Buckets. Use a token bucket that refills every N operation and send a sub-request only if there is an available token (rate-limiting).
✏️ Slow Responses Only. Send hedged requests only if the first request takes longer than a specific latency threshold (95th percentile, 99th percentile)
✏️ Threshold. Send hedge requests only if Nth percentile latency exceeds expectation. For example, if the threshold is the 99th percentile, only 1% of requests will be duplicated.

Request hedging is efficient approach to reduce tail latency. It prevents occasional slow operations from slowing the overall user interaction. But If the variance in a system is already small, then request hedging will not provide any improvement.

#systemdesign #patterns
3👍1
Zanzibar: Google Global Authorization System

Finally I had a chance to go into details about Zanzibar - Google global authorization system. I already mentioned it in OpenFGA overview where authors said that they based their solution on Zanzibar architecture principles.

Let's check how the system that performed millions of authorization checks per minute is organized:

✏️ Any authorization rule takes a form of a tuple `user U has relation R to object O`. For example, User 15 is an owner of doc:readme. This unification helps to support efficient reads and incremental updates.

✏️ Zanzibar stores ACLs and their metadata in Google Spanner database. Zanzibar logic strongly relies on Spanner external consistency guarantees. So each ACL update gets a timestamp that reflects its order. If update x happens before y, then x has an earlier timestamp.

✏️ Each ACL is identified by shard ID, object ID, relation, user, and commit timestamp. Multiple tuple versions are stored in different rows, that helps to evaluate checks and reads at any timestamp within the garbage collection window (7 days).

✏️ Each Zanzibar client gets a special consistency token called zookie. Zookie contains the current global timestamp. Client uses zookie to ensure that authorization check is based on ACL data at least as fresh as the change.

✏️ Zookies are also used in read requests to guarantee that clients get a data snapshot not earlier than a previous write.

✏️ Incoming requests are handled by aclservers clusters. Each server in the cluster can delegate intermediate results computation to other servers.

✏️ To provide performance isolation Zanzibar measures how much CPU each RPC uses in cpu-seconds. Each client has a global CPU usage limit, and if it goes over, its requests may be slowed down. Each aclserver also limits the total number of active RPCs to manage memory usage.

✏️ Request hedging with 99th percentile threshold is used to reduce tail-latency.

According to the whitepaper, authorization checks are performed for each object independently. It means that each search request for service like Drive or Youtube can trigger from tens to hundreds of authorization checks. That's why the overall architecture is heavily focused on keeping authorization request latency as low as possible.

Implementation results are impressive: Zanzibar handles over 2 trillion relation tuples, that occupy more than 100 terabytes of storage. The load is spread across 10,000+ servers in dozens of clusters worldwide. Despite that scale, it keeps the 95th percentile latency at ~9 ms for in-zone requests and ~60 ms other requests.

#systemdesign #usecase #architecture
🔥41
Documentation As a Code

I have a really strong opinion that the documentation is part of the application. It should be developed, updated and reviewed using the same processes and tools as the application code.

If the documentation is stored somewhere else, like in a separate wiki, it's basically dead within 5 minutes after it's published.

This means documentation should live in the git repo. If some system behavior is changed during bugfixing or a new feature development, the relevant documentation should be updated in the same PR. This approach helps to keep documentation up to date.

It's really simple if you use a monorepo. All docs and code are placed in one place, so it's easy to find what you need. Things become more complicated if you have lots of microrepos. Even if docs are up to date, it's quite hard for users to find them. Usually, this is solved by publishing docs to a central portal as part of the CI process, or nowadays by using an AI bot to help.

Recently, Pinterest published an article about how they adopted the documentation-as-code approach. Since they use microrepos, the main challenge was to make documentation discoverable for their users across hundreds of repos.

What they did:
🔸 Moved their docs to git repos using markdown.
🔸 Used MkDocs in CI to generate HTML versions of the docs.
🔸 Created a central place to host and index docs called PDocs (Pinterest Docs).
🔸 Integrated docs with GenAI — an AI bot connected to the main company communication channels.
🔸 Built a one-click tool to migrate old wiki pages to git.

I don’t know any standard solution for doc aggregation across multiple repos, so it would be great if Pinterest open-sourced their PDocs in the future. I think it could really help a lot of teams to improve their documentation processes.

#engineering #documentation
🔥3👍1
Building Reusable Libraries

For many years, I've worked in product teams where we don't just deliver product features but also build shared services and libraries for other development teams. So the question how to create a good reusable library is really important to me.

That’s why the recent Thoughtworks publication, “Decoupled by design: Building reusable enterprise libraries and services”, caught my attention.

The authors defines the following success factors to build shared libraries:
🔸 Build with at least one customer in mind. Be sure that at least one team is ready to use your library. Use them as early adopters to collect real feedback.
🔸 Set a product vision. Your library or service should solve a specific problem. Know exactly what it is and stay focused on it.
🔸 Make it easy to use. Adoption depends on simplicity. Good documentation, self-service access, clear migration paths — all these help teams use your library.
🔸 Design for extensibility. Follow the open-closed principle — open for extension but closed for modification. This ensures teams can extend the library to meet their specific needs.
🔸 Encourage contributions.  Use open-source model: allow internal teams to contribute to common libraries, services and platforms.
🔸 Continuous improvements. Maintain multiple versions, set versioning strategy (like semver), define clear deprecation policies, stay aligned with industry's tech standards.
🔸 CI/CD. Use functional and cross functional tests to ensure stability and quality for the delivered artifacts.


While the article doesn’t reveal anything really new, it contains good principles to keep in mind during implementation. When done right, shared libraries and services can significantly reduce development costs and time to market for new features and capabilities.

#engineering #architecture
👍1
Documentation as a Code: Tips & Tricks

Last week I shared Documentation As a Code approach that I actively use in my teams. But to be honest when you just introduce it, you'll face with some resistance.

The usual objections sound like
"The wiki is more convenient", "It has better formatting", "It has native integration with diagrams" (drawio, escalidraw or whatever you use)

I've heard them all, and I want to share some tips how to address them.
Disclaimer: I work mostly with Gitlab and Github, but I'm sure it covers the majority of use cases.

✏️ Diagrams Integration. I use drawio diagrams as they can be easily integrated as pictures into markdown docs:
- PNG Option. You can create diagrams in the draw.io app and export them as PNG with the “Include a copy of my diagram” option. That way, the image is stored in your Git repo, easily embedded in markdown, and still fully editable later — just reopen it in draw.io.
- SVG Option. Another option is to use SVG format + draw.io plugin for your IDE. You can directly add noscript files to your markdown document as well as edit this files with drawio later. I use it with IntelliJ IDEA, and I saw the extension for VS Code.

✏️ Complex Formatting. Honestly, in 99% cases you don't need that, and it's solved by restructuring the document. But when you do need something special, then you can use HTML inside markdown.

✏️ Convenience. It's quite subjective point, but I don't know developers who cannot easily work with markdown inside IDE. By the way, I even my personal notes write in markdown, it's just a habit 😉

✏️ Built-in docs into your dev process. Documentation must be updated with the code that brings the changes. To control that I include Definition of Done checklist into MR\PR template, where updating docs is one of the standard items like writing tests.

✏️ Linting. When docs live in the repo, you can apply some automated quality control checks as for any other code. For example, use prettier to keep docs consistently formatted.

✏️ GenAI Integration. Makrdown is really perfect for LLM integration and it's really good starting point to integrate some GenAI bot.

✏️ Keep the Wiki. If you still want to keep your wiki, you can autogenerate it from markdown sources. There are a bunch of tools for that: MkDocs, Github\Gitlab Pages, Confluence git plugin, etc.

That's usually enough for the teams to get started. After that you can tune the process to fit your needs by adding more tools, linters or integrations.
Start simple, make the documentation part of your dev routine, and your documentation becomes alive.


#engineering #documentation
5👍1
Obsidian: Task Management. Part 1.

That's a 3rd part of my Obsidian setup series (see Note-Taking with Obsidian and Obsidian - Knowledge Base Approach). Today I'll show how I tune Obsidian for task management and how I actually use it.

Let's start with some additional configuration:

Tasks Plugin
To make suitable for me I added the following:
🔸 "In Progress" Status. By default, tasks are Todo or Done. In most cases it's not enough. Some tasks require time, so "In Progress" indicates tasks that I've already started but not finished.
🔸 Custom Statuses. I also created a few extra statuses like Critical, Idea, Escalation, Talk. They actually work more like task types to distinguish different activities or workflows (and it's super useful to query data in different statuses).
🔸 Status Transitions. Plugin allows to set transitions between statuses: TODO -> In Progress -> Done, Talk -> Done.
🔸 Status Visualization. Additionally I assign icons to the statuses to make tasks easier to identify in the list. I use CSS snippet from ITS Theme.

Colored Tags Wrangler
Tags are good, but color-coded tags are great. I assigned colors for most frequently used tags to group them visually in the file.

Task Board Plugin
The plugin is not not actively maintained, but still useful. It shows tasks grouped by timeline: Today, Tomorrow, Future, Overdue, etc. For me it looks very similar to a Jira Dashboard 😉

On that the configuration is completed, and you're ready to track your own tasks!

#softskills #productivity
👍3
My task statuses configuration

#softskills #productivity
👍1🔥1
Obsidian: Task Management. Part 2.

When Obsidian plugins are configured, it's time to organize a process to work with tasks.

My approach is simple:

1. Add tasks to related notes. During the work, meetings, investigations I put tasks directly to the related notes with short denoscription, priority, tags and due date.
For example, I had a meeting where we discussed CI improvements. As a result I created a note with meeting minutes and add tasks that were on my side - talk with IT team to improve CI cluster stability, add retries to the test collections, etc.

2. Create TODO. When I don't want to think where I should put a task, I just write it down to the TODO file. TODO is an unsorted list of tasks that I collect during the day.
For example, a colleague requested to help or provide specific information, PM requested to provide sprint status, etc.

3. Create Today view. As tasks are spread across different notes, I need to collect them in a single place. So I created a special page called Today. I don't write any tasks here, I use it as a daily dashboard with the following sections:
🔸 Focus. Key global topics to focus on, static.
🔸 Doing: Tasks are already in progress
 ```tasks  
status.type is in_progress
sort by due
short mode
group by tags
```


🔸 Do Now: My backlog, grouped by context like management, tech, education. I also have an “others” group for everything else (otherwise I sometimes loose tasks without tags 🫣 ):
```tasks
not done
sort by due
sort by priority
short mode
tags include #management
group by tags
```


➡️ Tip: short mode gives a link to the source note with the task, it's helpful to navigate into a full context.

4. Task Board. I use it as time line for the tasks: today, tomorrow, overdue, etc.

I don't pretend my system is ideal, it just works for me and I periodically tune it when I feel something doesn't really work.

Hope it gives you a good starting point to build your own task management system. Start simple, experiment, make the system works for you.

#softskills #productivity
👍4
Secure by Design at Google

"Secure by design" is well-known software architectural principle. In recent years, as number of security incidents increased across the industry, it gain more and more attention.

But what does it actually mean?

According to Google’s Well-Architected Framework:
Secure by design: emphasizes proactively incorporating security considerations throughout a system's development lifecycle. This approach involves using secure coding practices, conducting security reviews, and embedding security throughout the design process.


Sometimes it is used as a synonym to secure by default, but actually terms are different:
Secure by default: focuses on ensuring that a system's default settings are set to a secure mode, minimizing the need for users or administrators to take actions to secure the system.


Google shared a paper about how they implemented Secure by Design approach . What I really liked is the idea that guidelines and education don't work: they cannot prevent human errors in large code base. The only way to make secure software is to build safe development ecosystem.

Instead of relying on developers to “do the right thing,” Google embeds security directly into the tools, frameworks, and libraries they use, for example:
🔸 application frameworks with built-in authentication and authorization
🔸 libraries with built-in XSS and other types of injections protection
🔸 memory-safe languages usage

Safe coding practices provide high confidence that if program compiles and runs then it's free of relevant vulnerabilities because if code isn't secure enough it won't even compile.

"Secure By Design" is applicable not only for development but for SRE activities as well.

Good example is Zero Touch Prod.
This principle means nobody can make any changes directly to the production systems. All changes must be done by trusted automation (GitOps), approved software with a list of relevant validations or by audited break-glass mechanism. This significantly reduces the risk of accidental or unauthorized changes.

Security by design is not just an architectural principle, it’s something that should be built into the core of your software and development ecosystem.

#engineering #security
2👍2
The History of Microservices

Do you know how the microservices were "invented"?

Back in 2011, a group of engineers was on a workshop in Castle Brando. They discussed software engineering problems and they felt tired with large monoliths, slow releases, and heavyweight tooling.
Then they came up with the idea:
Maybe the problem is the size. Maybe if we built lots of smaller things instead that might help.


This is the part of the story that is shared in The Magic of Small Things talk by James Lewis.

In the video James shares his memories of how the idea was born, how it was populated in the community, became an industry trend and how this new concept introduced new challenges.
The talk gives you a clear look at the real reasons of this architectural pattern, its drawbacks and important characteristics without all that hype around.

For me it's a good reminder that every trend starts with an attempt to solve a real problem. And sometimes the initial problem is lost during the time, but produced a set of new problems.
So make decisions based on your actual needs.

Good video to check during the weekend 😎

#architecture
👍2
Don't Shoot the Dog. Part 1: Overview.

Today I want to share a book that can help you not just at work, but in your everyday life to build better relationships with your family and friends. It's called Don’t Shoot the Dog: The Art of Teaching and Training by Karen Pryor.

Karen Pryor is a scientist specialized in marine mammal biology and behavioral psychology. She spent many years training dolphins and studying their behavior in oceanariums. Interesting, that her openings are applicable not only to animals but for humans with the same rate of success.

In her book Karen describes principles of training desired behavior with reinforcing:
Usually we are using them [principles] inappropriately. We threaten, we argue, we coerce, we deprive. We pounce on others when things go wrong and pass up the chance to praise them when things go right. We are harsh and impatient with our children, with each other, with ourselves even.

I see this quote as the main problem that author tries to address.

A bit of theory from the book:
A reinforcer is anything that, occurring in conjunction with an act, tends to increase the probability that the act will occur again.

There are 2 types of reinforces: positive and negative. Positive is something that subject wants to get, negative is something subject wants to avoid.

The key idea of the book is that ❗️negative reinforcement doesn't work. Punishments don't work.

I think that’s really important, because our first instinct is usually to use negative reinforcements.
What do we do when a child doesn’t do their homework? Or ignores us when we ask for something? Or when a puppy chews the furniture? Even at school, teachers highlights our mistakes to show we did something wrong.

The book provides a theory of how behavior is formed, how different types of reinforces affect it, how behavior can be trained or untrained to achieve the desired results.

Practices to change undesired behavior are actually one of the most interesting part of the book. I'll talk about that in the next post.

#booknook #softskills #communications
👍1🔥1
Don't Shoot the Dog. Part 2: Change the Behavior

Let's continue with `Don’t Shoot the Dog` by Karen Pryor.

The author says that there are only 8 methods to change the undesired behavior.

To make explanation simple let's use some real-life situation:
Your roommate has an annoying habit to throw socks around the room.

#1. Shoot the animal 🔫
Get rid of the source of the problem, so they physically can’t do it anymore.
Example: Change the roommate.

#2. Punishment 🤬
It's the most popular and the most inefficient method. When punishment doesn't work, people try to add more and more serious punishments. But it leads nowhere.
Example: Yell and scold. Threaten to throw the socks away.

#3. Negative Reinforcement 😞
Remove something unpleasant after the desired behavior happens. The idea is that the person will behave a certain way to avoid discomfort.
Example: Ignore the roommate until socks are picked up.

#4. Extinction 😐
Remove any reinforcement, after that unwanted behavior will die on its own. The method is best applied to verbal behavior - whining, teasing, and bullying.
Example: Just wait and hope your roommate realize it’s a bad habit.

#5. Train an Incompatible Behavior 🍬
Train a new behavior that can’t happen at the same time as the bad one.
Example: Pick up and wash socks together to make it fun activity, get a reward.

#6. Put the Behavior on Cue 🔕
Train the person to do the behavior only when given a specific signal. Without the cue, the behavior disappears.
Example: Have a laundry fight. See how a big mess you can both make in the room.

#7. Shape the Absence 🍺
Reward any behavior except the problem one.
Example: Buy a beer to your roommate when the room is clean.

#8. Change the Motivation ☺️
Make an accurate estimate of what the motivation is, and reward it.
Example: Find a motivational reward for picking up socks — or just hire a housekeeper.

As you can see there are 4 negative methods (1-4) and 4 positive (5-8). Negative methods don't teach anyone to anything and provide unpredictable results.
So if you really want to shape someone's behavior, then your choice is positive reinforcement.

I read this book a while ago and started using these ideas in my real life. What can I say: the most difficult part is to change my own instincts and avoid negative methods. Each time I need to stop, think and act in a different way. This process takes efforts and energy, sometimes I fail, but I see that positive reinforcement results are really better.

Strongly recommend to add the book to your reading list 📚.

#booknook #softskills #communications
Please open Telegram to view this post
VIEW IN TELEGRAM
👍4
I haven’t drawn anything for a while, but this week I had an inspiration and prepared a sketchnote for you on the 8 methods for changing behavior by Karen Pryor!

#booknook #sketchnote #softskills #communications
🔥3
Designing Distributed Systems

I think you will agree that book name Designing Distributed Systems: Patterns and Paradigms for Scalable, Reliable services sounds very promising. The book was published in 2018 and it's pretended to be a catalog of modern system design patterns like GoF patterns for software design 20 years ago.
Spoiler: it did not.

Actually the book describes very basic stuff like sidecar, load balancing, sharding, leader election and some others. Patterns are presented without deep details with focus on Kubernetes objects creation.

For example: this is sharding, it helps to distribute the data across replicas, consistent hashing can be use to define appropriate shards, here is the k8s service and stateful set to do that.

One more thing that I don't like that the book contains too much recommendations to use sidecar containers. Maybe 7 years ago it look like a new trend in distributed systems development (let's remember first Istio implementation on sidecars), but it was not.

You should clearly understand when and why sidecars are applicable. Additional containers add extra resource consumption, complexity and maintenance overhead. In most cases, it's cheaper to implement required features inside a main application.

To summarize, the book suites well for junior and mid-level developers to get basic understanding of cloud architecture patterns. But for senior developers, techleads and architects it will be definitely boring 🥱 .

#booknook #systemdesign #patterns
5👍1
About Career Choices

A few days ago I wrote an essay about my career path for one educational program. It made me reflect a bit on career choices I've made.

I became a teamlead very quickly. The term techlead was not popular that time, but I usually combined both roles. And I spent more than 10 years on this position.
Do you think I got stuck?
Actually, I don't think so. It was my decision to stop on this level.

The reason is that I really enjoyed being a teamlead\techlead: research new technologies, develop products, apply engineering practices to solve operational problems and, of course, build something significant and valuable with the team, something that is not possible to build on your own.

During these years I grew up mostly in width extending my technical expertise and team management skills. And the last 5 years were the most amazing and interesting from that perspective in my career.

What I want to say: you don’t always have to chase a new noscript or position. If you’re not ready to get more responsibility right now, it's fine. Sometimes, it’s enough just to enjoy your work and have fun with it 😎.

This year I finally moved to another level of technical leadership - head of division. I'm now responsible for management, architecture and roadmap across six teams with around 50+ people. And I really feel I'm ready for it now. But this is an another story 😉.

#softskills #career
🔥101👍1
Lessons Learnt form Big Failures

Apple, Facebook, Google, Netflix, OpenAI - we all know these examples of success stories. The problem is that each success story is a unique combination of many factors that are very difficult to reproduce.

It's much more perspective to study failures, as they have more or less the same reasons and show what definitely will not lead you to any success.

Here's a collection of IT project failure case studies that cost companies tens of millions of dollars. There are cases that happened for the last ~15 years, and if you quickly go through them you will realize that most problems look very common:

📍 Corporate Culture. It's not so obvious but it's actually the root cause of many other problems like unpredicted complexity, underestimation, lack of transparency, etc. Why? When you develop the system the technical team usually knows about all that problems, moreover they know whether the system is ready for production or not. The question is if they have an ability to explain that to the management and if the management is open enough to listen.

📍 Leadership Failures. There are wide range of problems from non clear responsibilities, poor ownership, ping-pong between the teams, lack of trust, communication failures and other issues.

📍 Risk Management. For any big project you should always have a plan "B". That’s why transparency and trust are so important, they're the only way to understand what's really going on and to have a chance to adjust the plan on time and avoid a complete disaster.

Software is a socio-technical system and most failures aren't about technologies they are about people. The good news is that we as technical leaders can improve that and make our projects more successful.

#leadership #management
👍3🔥3
GenAI for Go Optimizations

Today, code generation with an AI assistant doesn't impress anyone, but GenAI can be helpful not only for that. Uber recently published an interesting article about using LLMs to optimize Go services.

So what they did:
🔸 Collect CPU and memory profiles from production services.
🔸 Identify the top 30 most expensive functions based on CPU usage. If runtime.mallocgc consumes more than 15% of CPU time - additionally collect a memory profile.
🔸 Apply a static filter to exclude open-source dependencies and internal runtime functions. It allows to reduce noise and focus on business code only.
🔸 Prepare a catalog of performance antipatterns, most of them were already collected during past optimization work.
🔸 Pass source code and antipatterns list to LLM for analysis.
🔸 Validate the results using a separate pipeline: check whether an antipattern is really present and whether the suggested optimization is correct.

The article also contains interesting tips how they tuned prompting, reduce hallucinations and improve the trust for the tool among developers.

What I like about Uber’s technical articles is that they always calculate the efficiency of the results:
Over four months, the number of antipatterns reduced from 265 to 176. Projecting this annually, that’s a reduction of 267 antipatterns. Addressing this volume manually, as the Go expert team would have consumed approximately 3,800 hours.

we reduced the engineering time required to detect and fix an issue from 14.5 hours to almost 1 hour of tool runtime—a 93.10% time savings.


#engineering #usecase #ai
🔥6
Platform Engineering: Shift It Down

The great video from Google experts about platform engineering.

One of the most popular DevOps concepts for the last decade was "shift left". And it showed really good results improving overall products quality, reducing delivery time and decreasing the cost of errors. At the same time it significantly increased cognitive load on developers as it placed the full burden of implementation complexity on engineers.

Speakers suggests to use a new concept to solve this problem:
Don't just shift left, shift it down.

The idea is to put all quality attributes implementation (like reliability, security, performance, testability, etc.) to the platform teams. Anything that is not a product feature but architecture should go to the platform teams.

Technical toolbox to do that consists of 2 items:
1. Abstractions: well-defined parts and components. Provides understandability, accountability, risk management levels and cost control of your system.
2. Coupling: something that make your system greater than a sum of the parts. Provides modifiability of the system, golden paths and efficiency.

To apply this toolbox in practice you need governance, policies, and education. They call it "culture and shared responsibility".

One more interesting concept from the video that I really like is to use different levels of flexibility in following the rules, depending on the consequences of an error:
YOLO -> Adhoc -> Guided -> Engineered -> Assured


For example:
An unauthenticated API can be a critical business risk, so developers must use the proper security framework. Its usage can be checked at build time to ensure it’s not missed. Build time control provides assured type of flexibility as developers cannot avoid it.

I think these levels provide a really good principles for platform teams to decide where to invest to get the biggest impact. So I definitely recommend to check the full video if you're interested in platform engineering.

#engineering
🔥1