Shift Security Down
Last week CNCF Kubernetes Policy Working Group released a Security "Shift Down" whitepaper. The main idea is to shift the security focus down to the platform layer.
said Poonam Lamba, co-chair of the CNCF Kubernetes Policy Working Group and a Product Manager at Google Cloud.
While Shift-Left Security emphasizes developer responsibility for security, Shift-Down Security focuses on integrating security directly into the platform, providing an environment that is secured by default.
Key elements of the Shift-Down Strategy:
✏️ Common security concerns are handled on the platform level rather then by business applications
✏️ Security is codified, automated, and managed as a code
✏️ Platform security complements Shift-Left approach and existing processes
The whitepaper provides a shared responsibility model across developers, operations, and security teams, introduces common patterns for managing vulnerabilities and misconfigurations, promotes automation and simplification, enforces security best practices on the platform layer.
#engineering #security #news
Last week CNCF Kubernetes Policy Working Group released a Security "Shift Down" whitepaper. The main idea is to shift the security focus down to the platform layer.
By embedding security directly into the Kubernetes platform, rather than adding it as afterthought, we empower developers, operators, and security teams strengthening the software supply chain, simplifying compliance, and building more resilient and secure cloud-native environments.
said Poonam Lamba, co-chair of the CNCF Kubernetes Policy Working Group and a Product Manager at Google Cloud.
While Shift-Left Security emphasizes developer responsibility for security, Shift-Down Security focuses on integrating security directly into the platform, providing an environment that is secured by default.
Key elements of the Shift-Down Strategy:
✏️ Common security concerns are handled on the platform level rather then by business applications
✏️ Security is codified, automated, and managed as a code
✏️ Platform security complements Shift-Left approach and existing processes
The whitepaper provides a shared responsibility model across developers, operations, and security teams, introduces common patterns for managing vulnerabilities and misconfigurations, promotes automation and simplification, enforces security best practices on the platform layer.
#engineering #security #news
GitHub
sig-security/sig-security-docs/papers/shift-down/shift-down-security.md at main · kubernetes/sig-security
Process documentation, non-code deliverables, and miscellaneous artifacts of Kubernetes SIG Security - kubernetes/sig-security
DR: Main Concepts
Last months I've been working a lot on Disaster Recovery topics, so I decided to summarize key points and patterns for that.
Disaster recovery (DR) is an ability to restore access and functionality of IT services after a disaster event, whether it's natural or caused by a human action (or error).
DR is usually designed in terms of Availability Zones and Regions:
- Availability Zone (AZ) – minimal and atomic unit of geo-redundancy. It can be represented by the whole Data Center (physical building) or smaller parts like isolated rack, floor, or hypervisor.
- Region - a set of Availability Zones within a single geographic area.
The most popular setups:
✏️ Public clouds. AZ is represented as a separate datacenter, datacenters are located within ~100 km of each other. The chance that all datacenters will be broken at the same time is very low. So it's enough to distribute a workload across multiple AZ. Different regions may still make sense but mostly for load and content distribution.
✏️ On-premise clouds. In that case AZ is usually represented by different floors or racks in the same building. In that case it's better to have at least 2 regions to cover DR cases.
DR approach is measured by:
✏️ Recovery Time Objective (RTO) is the maximum acceptable delay between the interruption of a service and restoration of service. It's how long your service is not available.
✏️ Recovery Point Objective (RPO) is the maximum acceptable amount of time since the last data recovery point (e.g. backup). It's how much data you can loose in case of failure.
Disaster Recovery architecture is driven by requirements to RTO and RPO values for particular application. It's the first thing you should define before implementing any solution. In one of the next posts we'll check DR implementation strategies.
#architecture #systemdesign
Last months I've been working a lot on Disaster Recovery topics, so I decided to summarize key points and patterns for that.
Disaster recovery (DR) is an ability to restore access and functionality of IT services after a disaster event, whether it's natural or caused by a human action (or error).
DR is usually designed in terms of Availability Zones and Regions:
- Availability Zone (AZ) – minimal and atomic unit of geo-redundancy. It can be represented by the whole Data Center (physical building) or smaller parts like isolated rack, floor, or hypervisor.
- Region - a set of Availability Zones within a single geographic area.
The most popular setups:
✏️ Public clouds. AZ is represented as a separate datacenter, datacenters are located within ~100 km of each other. The chance that all datacenters will be broken at the same time is very low. So it's enough to distribute a workload across multiple AZ. Different regions may still make sense but mostly for load and content distribution.
✏️ On-premise clouds. In that case AZ is usually represented by different floors or racks in the same building. In that case it's better to have at least 2 regions to cover DR cases.
DR approach is measured by:
✏️ Recovery Time Objective (RTO) is the maximum acceptable delay between the interruption of a service and restoration of service. It's how long your service is not available.
✏️ Recovery Point Objective (RPO) is the maximum acceptable amount of time since the last data recovery point (e.g. backup). It's how much data you can loose in case of failure.
Disaster Recovery architecture is driven by requirements to RTO and RPO values for particular application. It's the first thing you should define before implementing any solution. In one of the next posts we'll check DR implementation strategies.
#architecture #systemdesign
👍1
RPO and RTO concepts visualization
Source: https://aws.amazon.com/blogs/mt/establishing-rpo-and-rto-targets-for-cloud-applications/
Source: https://aws.amazon.com/blogs/mt/establishing-rpo-and-rto-targets-for-cloud-applications/
DR Strategies
When RPO and RTO requirements are defined, it's time to select DR strategy:
✏️ Backup\Restore. The simplest option with quite big downtime (RTO) - hours or even days:
- the application runs on a single region only
- regular backups (full and incremental) are sent to another region
- only active region has reserved capacity to run the whole infrastructure
- in case of disaster the whole infrastructure should be rebuilt on a new region (in some cases it can be the same region), after that application is reinstalled and data is restored from backups
✏️ Cold Standby. This option requires less downtime but still it can take hours to fully restore the infrastructure:
- the application runs on a single region only
- minimal infrastructure is prepared in another region: copy of application or data storage may be installed but it's scaled down or run with minimum replicas
- regular backups (full and incremental) are sent to another region
- in case of disaster the application is restored from backups and scaled up appropriately
✏️ Hot Standby. The most complex and expensive option with minimal RTO measured in minutes:
- both regions have the same capacity reserved
- all applications are up and running on both regions
- data is replicated between regions in near real-time
- in case of disaster one of the regions continues to operate.
What to select is usually depends on availability and business requirements of the services you provide. But anyway DR plan should be defined and documented to know what to do in case of disaster. Moreover it's a good practice to provide regular testing on how to restore the system. Otherwise you may end up with the situation when you have a backup but cannot restore the system, or even worse there will be no actual backups at all.
#architecture #systemdesign
When RPO and RTO requirements are defined, it's time to select DR strategy:
✏️ Backup\Restore. The simplest option with quite big downtime (RTO) - hours or even days:
- the application runs on a single region only
- regular backups (full and incremental) are sent to another region
- only active region has reserved capacity to run the whole infrastructure
- in case of disaster the whole infrastructure should be rebuilt on a new region (in some cases it can be the same region), after that application is reinstalled and data is restored from backups
✏️ Cold Standby. This option requires less downtime but still it can take hours to fully restore the infrastructure:
- the application runs on a single region only
- minimal infrastructure is prepared in another region: copy of application or data storage may be installed but it's scaled down or run with minimum replicas
- regular backups (full and incremental) are sent to another region
- in case of disaster the application is restored from backups and scaled up appropriately
✏️ Hot Standby. The most complex and expensive option with minimal RTO measured in minutes:
- both regions have the same capacity reserved
- all applications are up and running on both regions
- data is replicated between regions in near real-time
- in case of disaster one of the regions continues to operate.
What to select is usually depends on availability and business requirements of the services you provide. But anyway DR plan should be defined and documented to know what to do in case of disaster. Moreover it's a good practice to provide regular testing on how to restore the system. Otherwise you may end up with the situation when you have a backup but cannot restore the system, or even worse there will be no actual backups at all.
#architecture #systemdesign
👍3
Thinking Like an Architect
What makes a good architect different from other technical roles? If you've ever thought about that - I recommend to check a talk from Gregor Hohpe "Thinking Like an Architect"
Gregor said that architects are not the smartest people, they make everyone else smarter.
And to achieve this, they use the following tools:
✏️ Connect Levels. Architects talk with management on a business language and with developers on a technical language. So they can translate business requirements to technical decisions and technical limitations to business impacts.
✏️ Use Metaphors. They use well-known metaphors to explain complex ideas in a simple way.
✏️ See More. Architects see more dimensions of the problem and can do more precise trade-off analysis.
✏️ Sell Options. Estimate and prepare options, sometimes defer decisions to the future.
✏️ Make Better Decisions with Models. Models shape our thinking. If solution is simple, the model is good, if it's not - there is probably something wrong with the model.
✏️ Become Stronger with Resistance. Not all people are happy with the changes, architects can identify what beliefs people hold that make their arguments rationale. By understanding this, architects can influence how people think and work.
I really like Gregor's talks, they are practical, make you think about standard things under different angle and contains a good piece of humor. So if you have time, I recommend to watch the full version.
#architecture
What makes a good architect different from other technical roles? If you've ever thought about that - I recommend to check a talk from Gregor Hohpe "Thinking Like an Architect"
Gregor said that architects are not the smartest people, they make everyone else smarter.
And to achieve this, they use the following tools:
✏️ Connect Levels. Architects talk with management on a business language and with developers on a technical language. So they can translate business requirements to technical decisions and technical limitations to business impacts.
✏️ Use Metaphors. They use well-known metaphors to explain complex ideas in a simple way.
✏️ See More. Architects see more dimensions of the problem and can do more precise trade-off analysis.
✏️ Sell Options. Estimate and prepare options, sometimes defer decisions to the future.
✏️ Make Better Decisions with Models. Models shape our thinking. If solution is simple, the model is good, if it's not - there is probably something wrong with the model.
✏️ Become Stronger with Resistance. Not all people are happy with the changes, architects can identify what beliefs people hold that make their arguments rationale. By understanding this, architects can influence how people think and work.
I really like Gregor's talks, they are practical, make you think about standard things under different angle and contains a good piece of humor. So if you have time, I recommend to watch the full version.
#architecture
YouTube
Thinking Like an Architect - Gregor Hohpe - NDC London 2025
This talk was recorded at NDC London in London, England. #ndclondon #ndcconferences #developer #softwaredeveloper
Attend the next NDC conference near you:
https://ndcconferences.com
https://ndclondon.com/
Subscribe to our YouTube channel and learn…
Attend the next NDC conference near you:
https://ndcconferences.com
https://ndclondon.com/
Subscribe to our YouTube channel and learn…
👍3
Arbnb: Large-Scale Test Migration with LLM
In all that hype about replacing developers by LLMs I really like to read about practical examples of how LLMs are used to solve engineering tasks. Last week Airbnb published an article Accelerating Large-Scale Test Migration with LLMs where they described the experience to automate migration of ~3.5K React test files from Enzyme to React Testing Library (RTL).
Interesting points there:
✏️ Migration was built as a pipeline with multiple steps, where files are moved to the next stage only after validation on the previous step passed
✏️ If validation is failed, result is sent to LLM one more time with request to fix it
✏️ For small and mid size files the most effective strategy was a brute force: retry steps multiple times until they passed or reached a limit.
✏️ For huge complex files the context was extended with the source code of the component, related tests in the same directory, general migration guidelines and common solutions. Note from the authors that the main success driver there was not prompt engineering but choosing the right related files.
✏️ The overall result was successful migration of 97% of tests, remaining part was fixed manually.
The overall story looks like a huge potential for routine tasks automation. Even with a custom pipeline and some tooling around it, the overall migration with LLM was significantly cheaper than doing it manually.
#engineering #ai #usecase
In all that hype about replacing developers by LLMs I really like to read about practical examples of how LLMs are used to solve engineering tasks. Last week Airbnb published an article Accelerating Large-Scale Test Migration with LLMs where they described the experience to automate migration of ~3.5K React test files from Enzyme to React Testing Library (RTL).
Interesting points there:
✏️ Migration was built as a pipeline with multiple steps, where files are moved to the next stage only after validation on the previous step passed
✏️ If validation is failed, result is sent to LLM one more time with request to fix it
✏️ For small and mid size files the most effective strategy was a brute force: retry steps multiple times until they passed or reached a limit.
✏️ For huge complex files the context was extended with the source code of the component, related tests in the same directory, general migration guidelines and common solutions. Note from the authors that the main success driver there was not prompt engineering but choosing the right related files.
✏️ The overall result was successful migration of 97% of tests, remaining part was fixed manually.
The overall story looks like a huge potential for routine tasks automation. Even with a custom pipeline and some tooling around it, the overall migration with LLM was significantly cheaper than doing it manually.
#engineering #ai #usecase
Medium
Accelerating Large-Scale Test Migration with LLMs
How Airbnb migrated nearly 3.5K Enzyme test files to React Testing Library in just 6 weeks using automation and LLMs
👍4
Balancing Coupling
Today we'll talk about Balancing Coupling in Software Design book by Vlad Khononov. That's a quite fresh book (2024) that addresses a common architecture problem - how to balance coupling between components to make it easy to support new features and technologies without turning the solution into a big ball of mud.
The author defines coupling as a relationship between connected entities. If entities are coupled, they can affect each other. As a result, coupled entities should be changed together.
Main reasons for change:
- Shared Lifecycle: build, test, deployment
- Shared Knowledge: model, implementation details, order of execution, etc.
The author defines 4 levels of coupling:
📍Contract coupling. Modules communicate through an integration-specific contract.
📍 Model coupling. The same model of the business domain is used by multiple modules.
📍Functional coupling. Share the knowledge of the functionality: the sequence of steps to do, sharing the same transaction, logic duplication.
📍Intrusive coupling. Integration through component implementation details that were not intended for integration.
Coupling can be described by the following dimensions:
📍Connascence. Shared lifecycle levels: static - compilation time or dynamic - runtime dependencies.
📍Integration Strength. The more knowledge components share, the stronger the integration is between them .
📍Distance. The physical distance between components: the same class, the same package, the same lib, etc. The greater the distance is, the more effort is needed to introduce a cascading change.
📍Volatility. How frequently the module is changed.
Then the author suggests a model to calculate coupling and other architecture characteristics using values of these dimensions.
For example,
It means that if both distance and volatility are high, the actual cost of changes is high.
Coupling balance equation:
Of course, the scale is relative and quite subjective but it allows you to have a framework to assess your architectural decisions, predict their consequences, and adjust solution characteristics to find the right balance.
Overall book impression is very positive: it has no fluff, it's clear, structured and very practical. Definitely recommend.
#booknook #engineering
Today we'll talk about Balancing Coupling in Software Design book by Vlad Khononov. That's a quite fresh book (2024) that addresses a common architecture problem - how to balance coupling between components to make it easy to support new features and technologies without turning the solution into a big ball of mud.
The author defines coupling as a relationship between connected entities. If entities are coupled, they can affect each other. As a result, coupled entities should be changed together.
Main reasons for change:
- Shared Lifecycle: build, test, deployment
- Shared Knowledge: model, implementation details, order of execution, etc.
The author defines 4 levels of coupling:
📍Contract coupling. Modules communicate through an integration-specific contract.
📍 Model coupling. The same model of the business domain is used by multiple modules.
📍Functional coupling. Share the knowledge of the functionality: the sequence of steps to do, sharing the same transaction, logic duplication.
📍Intrusive coupling. Integration through component implementation details that were not intended for integration.
Coupling can be described by the following dimensions:
📍Connascence. Shared lifecycle levels: static - compilation time or dynamic - runtime dependencies.
📍Integration Strength. The more knowledge components share, the stronger the integration is between them .
📍Distance. The physical distance between components: the same class, the same package, the same lib, etc. The greater the distance is, the more effort is needed to introduce a cascading change.
📍Volatility. How frequently the module is changed.
Then the author suggests a model to calculate coupling and other architecture characteristics using values of these dimensions.
For example,
Changes Cost = Volatility AND Distance
It means that if both distance and volatility are high, the actual cost of changes is high.
Coupling balance equation:
Balance = (Strength XOR Distance) OR NOT Volatility
Of course, the scale is relative and quite subjective but it allows you to have a framework to assess your architectural decisions, predict their consequences, and adjust solution characteristics to find the right balance.
Overall book impression is very positive: it has no fluff, it's clear, structured and very practical. Definitely recommend.
#booknook #engineering
Goodreads
Balancing Coupling in Software Design: Universal Design…
Microservices become immensely popular because it promi…
🔥4👍2
Some graphical representation for concepts from the book
Source: Balancing Coupling in Software Design
#booknook #engineering
Source: Balancing Coupling in Software Design
#booknook #engineering
Adaptive Integration
Modern solutions typically consists of a mix of services, functions, queues and DBs. To implement an E2E scenario developers need to build a chain of calls to get the result. And if some API is changed, the whole E2E may be broken.
Of course, we have proto specs, Open API, autogenerated clients, but the problem is that any change brings significant adoption overhead to all its dependencies.
Marty Pitt in his talk Adaptive Architectures - Building API Layers that Build Themselves presents an attempt to solve the problem and make changes cheap and fully automated.
I like the part with problem statement, it really describes the pain of existing microservice ecosystem: change API - integration is broken, change message format - integration is broken, change function - you get the idea, right? So you need to be really careful with any contract change and work with all your consumers to make the migration smooth.
Then the author assumes that the reason of that problem is the lack of business semantics in our API specs. And if we add them, the system can automatically generate chain calls to perform any requested operation.
Idea can be represented as the following steps:
✏️ Add semantics to the entities: for example, instead of
✏️ Register service specs during startup on a special integration service.
✏️ Any service can call the integration using DSL like
✏️ The integration service automatically generates an execution chain based on the registered specs. After that it orchestrates all queries and returns the result to the caller.
✏️ If a service changes its API, it simply uploads a new spec version, and the integration service rebuilds the call chain accordingly.
Author and his team already implemented the approach in https://github.com/taxilang/taxilang and https://github.com/orbitalapi.
From my point of view, the system that decides in runtime what APIs to call to perform a business transaction looks uncontrollable and difficult to troubleshoot. So I'm not ready to use the approach in a real production. But the idea sounds interesting, let's see if such tools usage will grow in the future.
#engineering
Modern solutions typically consists of a mix of services, functions, queues and DBs. To implement an E2E scenario developers need to build a chain of calls to get the result. And if some API is changed, the whole E2E may be broken.
Of course, we have proto specs, Open API, autogenerated clients, but the problem is that any change brings significant adoption overhead to all its dependencies.
Marty Pitt in his talk Adaptive Architectures - Building API Layers that Build Themselves presents an attempt to solve the problem and make changes cheap and fully automated.
I like the part with problem statement, it really describes the pain of existing microservice ecosystem: change API - integration is broken, change message format - integration is broken, change function - you get the idea, right? So you need to be really careful with any contract change and work with all your consumers to make the migration smooth.
Then the author assumes that the reason of that problem is the lack of business semantics in our API specs. And if we add them, the system can automatically generate chain calls to perform any requested operation.
Idea can be represented as the following steps:
✏️ Add semantics to the entities: for example, instead of
int id use accountId id across all services in the organization✏️ Register service specs during startup on a special integration service.
✏️ Any service can call the integration using DSL like
Get balance for the account X with a specified email✏️ The integration service automatically generates an execution chain based on the registered specs. After that it orchestrates all queries and returns the result to the caller.
✏️ If a service changes its API, it simply uploads a new spec version, and the integration service rebuilds the call chain accordingly.
Author and his team already implemented the approach in https://github.com/taxilang/taxilang and https://github.com/orbitalapi.
From my point of view, the system that decides in runtime what APIs to call to perform a business transaction looks uncontrollable and difficult to troubleshoot. So I'm not ready to use the approach in a real production. But the idea sounds interesting, let's see if such tools usage will grow in the future.
#engineering
YouTube
Adaptive Architectures - Building API Layers that Build Themselves • Marty Pitt • YOW! 2024
This presentation was recorded at YOW! Australia 2024. #GOTOcon #YOW
https://yowcon.com
Marty Pitt - Founder at Orbital
RESOURCES
https://twitter.com/marty_pitt
https://www.linkedin.com/in/martypitt
https://linktr.ee/martypitt
Links
https://taxilang.org…
https://yowcon.com
Marty Pitt - Founder at Orbital
RESOURCES
https://twitter.com/marty_pitt
https://www.linkedin.com/in/martypitt
https://linktr.ee/martypitt
Links
https://taxilang.org…
👍3
Kafka 4.0 Official Release
If you’re a fan of Kafka like I am, you might know that Kafka 4.0 was officially released last week. Except the fact that it's the first release that operates entirely without Apache Zookeeper, it also contains some other interesting changes:
✏️ The Next Generation of the Consumer Rebalance Protocol (KIP-848). The team promised significant performance improvements and no “stop-the-world” rebalances anymore.
✏️ Early access to the Queues feature (I already described it there )
✏️ Improved transactional protocol (KIP-890) that should solve the problem with hanging transactions
✏️ Ability to make a whitelist of OIDC providers via
✏️ Custom processor wrapping for Kafka Streams (KIP-1112) that should simplify common code usage across different streams topologies
✏️ Values for some default parameters were changed. Actually it's a public contract change with potential issues during upgrade, so need to be careful with that - KIP-1030
✏️ A big housekeeping work was done, so the version removes a lot of deprecations:
- v0 and v1 message formats were dropped (KIP-724)
- kafka clients versions <=2.1 are not supported anymore (KIP-1124)
- APIs and configs deprecated prior version 3.7 were removed
- Old MirrorMaker (MM1) was removed
- Old java versions support was removed, now clients require Java 11+, brokers - Java 17+
Full list of changes can be found in release notes and official upgrade recommendations.
New release looks like a significant milestone for the community 💪. As always, before any upgrade I recommend to wait for the first patch versions (4.0.x), which will probably contain fixes for the most noticeable bugs and issues.
#engineering #news #kafka
If you’re a fan of Kafka like I am, you might know that Kafka 4.0 was officially released last week. Except the fact that it's the first release that operates entirely without Apache Zookeeper, it also contains some other interesting changes:
✏️ The Next Generation of the Consumer Rebalance Protocol (KIP-848). The team promised significant performance improvements and no “stop-the-world” rebalances anymore.
✏️ Early access to the Queues feature (I already described it there )
✏️ Improved transactional protocol (KIP-890) that should solve the problem with hanging transactions
✏️ Ability to make a whitelist of OIDC providers via
org.apache.kafka.sasl.oauthbearer.allowed.urls property✏️ Custom processor wrapping for Kafka Streams (KIP-1112) that should simplify common code usage across different streams topologies
✏️ Values for some default parameters were changed. Actually it's a public contract change with potential issues during upgrade, so need to be careful with that - KIP-1030
✏️ A big housekeeping work was done, so the version removes a lot of deprecations:
- v0 and v1 message formats were dropped (KIP-724)
- kafka clients versions <=2.1 are not supported anymore (KIP-1124)
- APIs and configs deprecated prior version 3.7 were removed
- Old MirrorMaker (MM1) was removed
- Old java versions support was removed, now clients require Java 11+, brokers - Java 17+
Full list of changes can be found in release notes and official upgrade recommendations.
New release looks like a significant milestone for the community 💪. As always, before any upgrade I recommend to wait for the first patch versions (4.0.x), which will probably contain fixes for the most noticeable bugs and issues.
#engineering #news #kafka
🔥7
Netflixed - The Epic Battle for America's Eyeballs
Recently I visited a bookshop to pick up a pocket book to read during a long flight. I noticed something with a word
Initially I thought that's the book about technology or leadership. But it was a story about Netflix's way to success. The book was published in 2013 but it's still relevant as Netflix remains a leader in online streaming today.
The author tells Netflix’s history starting from online DVDs rental service to online movie streaming. A main part of the book focuses on Netflix’s competition with Blockbuster (it's America’s biggest DVD and media retailer at that time). It’s really interesting to see how their market and optimization strategies went through different stages of technology evolution.
I won’t retell the whole book, but there’s one moment that really impressed me. Blockbuster was one step before beating Netflix and become a market leader in online movies services. But at that critical time, disagreements among Blockbuster’s top management led to the company crash.
Most board members failed to see that the DVD era was ending and Internet technologies were the future. They fired the executive who drove the online program and brought a new CEO with no experience in the domain. This new CEO decided to focus on expanding physical DVD stores. Unfortunately, he didn't want to hear about new technologies at all. That leads to full Blockbuster bankruptcy.
What can we learn from this? Some managers cannot accept the fact they are wrong and a bad manager can ruin the whole business. Good leaders must listen to their teams, understand industry trends, and be flexible enough to adapt to the changes. For me, the book felt like a drama story, even though I already knew what's in the end.
#booknook #leadership #business
Recently I visited a bookshop to pick up a pocket book to read during a long flight. I noticed something with a word
Netflix and decided to buy it. It was Netflixed: The Epic Battle for America's Eyeballs by Gina Keating.Initially I thought that's the book about technology or leadership. But it was a story about Netflix's way to success. The book was published in 2013 but it's still relevant as Netflix remains a leader in online streaming today.
The author tells Netflix’s history starting from online DVDs rental service to online movie streaming. A main part of the book focuses on Netflix’s competition with Blockbuster (it's America’s biggest DVD and media retailer at that time). It’s really interesting to see how their market and optimization strategies went through different stages of technology evolution.
I won’t retell the whole book, but there’s one moment that really impressed me. Blockbuster was one step before beating Netflix and become a market leader in online movies services. But at that critical time, disagreements among Blockbuster’s top management led to the company crash.
Most board members failed to see that the DVD era was ending and Internet technologies were the future. They fired the executive who drove the online program and brought a new CEO with no experience in the domain. This new CEO decided to focus on expanding physical DVD stores. Unfortunately, he didn't want to hear about new technologies at all. That leads to full Blockbuster bankruptcy.
What can we learn from this? Some managers cannot accept the fact they are wrong and a bad manager can ruin the whole business. Good leaders must listen to their teams, understand industry trends, and be flexible enough to adapt to the changes. For me, the book felt like a drama story, even though I already knew what's in the end.
#booknook #leadership #business
Goodreads
Netflixed: The Epic Battle for America's Eyeballs
Netflix has come a long way since 1997, when two Silico…
👍1
ReBAC: Can It Make Authorization Simpler?
Security in general - and authorization in particular - is one of the most complex parts in big tech software development. At first look, it's simple: invent some role and add some verification at the API level, to make it configurable - put the mapping somewhere outside the service. Profit!
The real complexity starts at scale when you need to map hundreds of services with thousands of APIs to hundreds e2e flows and user roles. Things get even more complicated when you add dynamic access conditions—like time of day, geographical region, or contextual rules. And you should present that security matrix to the business, validate and test it. In my practice, that's always a nightmare 🤯.
So from time to time I'm checking what's there in the industry that can help to simplify authorization management. This time I checked the talk Fine-Grained Authorization for Modern Applications from NDC London 2025.
Interesting points:
✏️ Introduce ReBAC - relationship-based access control. That model allows to calculate and inherit access rules based on relationships between users and objects
✏️ To use this approach a special authorization model should be defined. It's kind of yaml configuration that describe types of entities and their relationships.
✏️ Once you have a model, you can map real entities to that and set allow\deny rules.
✏️ Opensource tool OpenFGA already implements ReBAC. It even has a playground to test and experiment with authorization rules.
Overall idea may sound interesting but a new concept still doesn't solve the fundamental problem - how to manage security at scale. That's just yet another way to produce thousands of authorization policies.
The author mentioned that the implementation of OpenFGA is inspired by Zanzibar - Google's authorization system. There is a separate whitepaper that describes main principles of how it works, so I added this whitepaper to my reading list and probably I will publish some details in the future 😉.
#architecture #security
Security in general - and authorization in particular - is one of the most complex parts in big tech software development. At first look, it's simple: invent some role and add some verification at the API level, to make it configurable - put the mapping somewhere outside the service. Profit!
The real complexity starts at scale when you need to map hundreds of services with thousands of APIs to hundreds e2e flows and user roles. Things get even more complicated when you add dynamic access conditions—like time of day, geographical region, or contextual rules. And you should present that security matrix to the business, validate and test it. In my practice, that's always a nightmare 🤯.
So from time to time I'm checking what's there in the industry that can help to simplify authorization management. This time I checked the talk Fine-Grained Authorization for Modern Applications from NDC London 2025.
Interesting points:
✏️ Introduce ReBAC - relationship-based access control. That model allows to calculate and inherit access rules based on relationships between users and objects
✏️ To use this approach a special authorization model should be defined. It's kind of yaml configuration that describe types of entities and their relationships.
✏️ Once you have a model, you can map real entities to that and set allow\deny rules.
✏️ Opensource tool OpenFGA already implements ReBAC. It even has a playground to test and experiment with authorization rules.
Overall idea may sound interesting but a new concept still doesn't solve the fundamental problem - how to manage security at scale. That's just yet another way to produce thousands of authorization policies.
The author mentioned that the implementation of OpenFGA is inspired by Zanzibar - Google's authorization system. There is a separate whitepaper that describes main principles of how it works, so I added this whitepaper to my reading list and probably I will publish some details in the future 😉.
#architecture #security
YouTube
Fine-Grained Authorization for Modern Applications - Ashish Jha - NDC London 2025
This talk was recorded at NDC London in London, England. #ndclondon #ndcconferences #developer #softwaredeveloper
Attend the next NDC conference near you:
https://ndcconferences.com
https://ndclondon.com/
Subscribe to our YouTube channel and learn…
Attend the next NDC conference near you:
https://ndcconferences.com
https://ndclondon.com/
Subscribe to our YouTube channel and learn…
👍4
Technology Radar
In the beginning of April Thoughtworks published a new version of Technology Radar with the latest industry trends.
Interesting points:
✏️ AI. There is a significant growth of AI agentic approach in technologies and tools, but all of them still work in a supervised fashion helping developers to automate the routine. No surprises there.
✏️ Architecture Advice Process. Architecture decision process moves to decentralized approach where anyone can make any architectural decision getting advice from the people with the relevant expertise. The approach is based on Architecture Decision Records (ADRs) and advisory forum practices. I made short ADR overview there.
✏️ OpenTelemetry Adoption. Most popular tools (e.g. Loki, Alloy, Tempo) in observability stack added OpenTelemetry native support.
✏️ Observability & ML Integration. Major monitoring platforms embedded machine learning for anomaly detection, alert correlation and root-cause analysis.
✏️ Data Product Thinking. In extended AI adoption many teams started treating data as a product with clear ownership, quality standards, and focus on customer needs. Data catalogs like DataHub, Collibra, Atlan or Informatica become more popular.
✏️ Gitlab CI\CD was moved to adopted state.
Of course, there are much more items in the report, so if you're interested I recommend to check and find trends that are relevant to your tech stack.
Since this post is about trends, I'll share one more helpful tool - StackShare. It shows the tech stacks used by specific companies, and how wide a particular technology is adopted across different companies.
#news #engineering
In the beginning of April Thoughtworks published a new version of Technology Radar with the latest industry trends.
Interesting points:
✏️ AI. There is a significant growth of AI agentic approach in technologies and tools, but all of them still work in a supervised fashion helping developers to automate the routine. No surprises there.
✏️ Architecture Advice Process. Architecture decision process moves to decentralized approach where anyone can make any architectural decision getting advice from the people with the relevant expertise. The approach is based on Architecture Decision Records (ADRs) and advisory forum practices. I made short ADR overview there.
✏️ OpenTelemetry Adoption. Most popular tools (e.g. Loki, Alloy, Tempo) in observability stack added OpenTelemetry native support.
✏️ Observability & ML Integration. Major monitoring platforms embedded machine learning for anomaly detection, alert correlation and root-cause analysis.
✏️ Data Product Thinking. In extended AI adoption many teams started treating data as a product with clear ownership, quality standards, and focus on customer needs. Data catalogs like DataHub, Collibra, Atlan or Informatica become more popular.
✏️ Gitlab CI\CD was moved to adopted state.
Of course, there are much more items in the report, so if you're interested I recommend to check and find trends that are relevant to your tech stack.
Since this post is about trends, I'll share one more helpful tool - StackShare. It shows the tech stacks used by specific companies, and how wide a particular technology is adopted across different companies.
#news #engineering
Thoughtworks
Technology Radar | Guide to technology landscape
The Technology Radar is an opinionated guide to today's technology landscape. Read the latest here.
❤1👍1
Measuring Software Development Productivity
The more senior position you have, the more you need to think about how to communicate and evaluate the impact of your team’s development efforts. The business doesn't think in features and test coverage, it thinks in terms of business benefits, revenue, costs savings, and customers satisfaction.
There was an interesting post for that topic in AWS Enterprise Strategy Blog called A CTO’s Guide to Measuring Software Development Productivity. The author suggests to measure development productivity in 4 dimensions:
✏️ Business Benefits. Establish a connection between particular feature and business value it brings. Targets must be clear and measurable. For example, “Increase checkout completion from 60% to 75% within three months.” instead of “improve sales”. When measuring cost savings from automation, track process times and error rates before and after the change to show the difference.
✏️ Speed To Market. It is the time from requirement to feature delivery to production. One of the tools that can be used there is value stream mapping. In that approach you draw you process as a set of steps and then can analyze where ideas spend time, whether in active work or waiting for decisions, handoffs, or approvals. This insight helps you plan and measure future process improvements.
✏️ Delivery Reliability. This dimension is about quality. It covers reliability, performance, and security. You need to transform technical metrics (e.g., uptime, rps, response time, number of security vulnerabilities) to the business metrics like application availability, customer experience, security compliance, etc.
✏️ Team Health. Burnout team cannot deliver successful software. The leader should pay attention to the teams juggling too many complex tasks, constantly switching between projects, and working late hours. These problems predict future failures. Focused teams are the business priority.
The overall author's recommendation is to start with small steps, dimension by dimension, carefully tracking your results and share them with the stakeholders at least monthly. Strong numbers shift the conversation from controlling costs to investing in growth.
From my perspective, this is a good framework that can be used to communicate with the business and talk with them using the same language.
#leadership #management #engineering
The more senior position you have, the more you need to think about how to communicate and evaluate the impact of your team’s development efforts. The business doesn't think in features and test coverage, it thinks in terms of business benefits, revenue, costs savings, and customers satisfaction.
There was an interesting post for that topic in AWS Enterprise Strategy Blog called A CTO’s Guide to Measuring Software Development Productivity. The author suggests to measure development productivity in 4 dimensions:
✏️ Business Benefits. Establish a connection between particular feature and business value it brings. Targets must be clear and measurable. For example, “Increase checkout completion from 60% to 75% within three months.” instead of “improve sales”. When measuring cost savings from automation, track process times and error rates before and after the change to show the difference.
✏️ Speed To Market. It is the time from requirement to feature delivery to production. One of the tools that can be used there is value stream mapping. In that approach you draw you process as a set of steps and then can analyze where ideas spend time, whether in active work or waiting for decisions, handoffs, or approvals. This insight helps you plan and measure future process improvements.
✏️ Delivery Reliability. This dimension is about quality. It covers reliability, performance, and security. You need to transform technical metrics (e.g., uptime, rps, response time, number of security vulnerabilities) to the business metrics like application availability, customer experience, security compliance, etc.
✏️ Team Health. Burnout team cannot deliver successful software. The leader should pay attention to the teams juggling too many complex tasks, constantly switching between projects, and working late hours. These problems predict future failures. Focused teams are the business priority.
The overall author's recommendation is to start with small steps, dimension by dimension, carefully tracking your results and share them with the stakeholders at least monthly. Strong numbers shift the conversation from controlling costs to investing in growth.
From my perspective, this is a good framework that can be used to communicate with the business and talk with them using the same language.
#leadership #management #engineering
Amazon
A CTO’s Guide to Measuring Software Development Productivity | Amazon Web Services
In response to CFOs demanding returns on technology investments, CTOs must measure software development productivity across four dimensions—business value, speed to market, delivery reliability, and team health—to effectively demonstrate how their teams create…
👍2🔥1