Designing Distributed Systems
I think you will agree that book name Designing Distributed Systems: Patterns and Paradigms for Scalable, Reliable services sounds very promising. The book was published in 2018 and it's pretended to be a catalog of modern system design patterns like GoF patterns for software design 20 years ago.
Spoiler: it did not.
Actually the book describes very basic stuff like sidecar, load balancing, sharding, leader election and some others. Patterns are presented without deep details with focus on Kubernetes objects creation.
For example: this is sharding, it helps to distribute the data across replicas, consistent hashing can be use to define appropriate shards, here is the k8s service and stateful set to do that.
One more thing that I don't like that the book contains too much recommendations to use sidecar containers. Maybe 7 years ago it look like a new trend in distributed systems development (let's remember first Istio implementation on sidecars), but it was not.
You should clearly understand when and why sidecars are applicable. Additional containers add extra resource consumption, complexity and maintenance overhead. In most cases, it's cheaper to implement required features inside a main application.
To summarize, the book suites well for junior and mid-level developers to get basic understanding of cloud architecture patterns. But for senior developers, techleads and architects it will be definitely boring 🥱 .
#booknook #systemdesign #patterns
I think you will agree that book name Designing Distributed Systems: Patterns and Paradigms for Scalable, Reliable services sounds very promising. The book was published in 2018 and it's pretended to be a catalog of modern system design patterns like GoF patterns for software design 20 years ago.
Spoiler: it did not.
Actually the book describes very basic stuff like sidecar, load balancing, sharding, leader election and some others. Patterns are presented without deep details with focus on Kubernetes objects creation.
For example: this is sharding, it helps to distribute the data across replicas, consistent hashing can be use to define appropriate shards, here is the k8s service and stateful set to do that.
One more thing that I don't like that the book contains too much recommendations to use sidecar containers. Maybe 7 years ago it look like a new trend in distributed systems development (let's remember first Istio implementation on sidecars), but it was not.
You should clearly understand when and why sidecars are applicable. Additional containers add extra resource consumption, complexity and maintenance overhead. In most cases, it's cheaper to implement required features inside a main application.
To summarize, the book suites well for junior and mid-level developers to get basic understanding of cloud architecture patterns. But for senior developers, techleads and architects it will be definitely boring 🥱 .
#booknook #systemdesign #patterns
Goodreads
Designing Distributed Systems: Patterns and Paradigms f…
Discover and share books you love on Goodreads.
❤5👍1
About Career Choices
A few days ago I wrote an essay about my career path for one educational program. It made me reflect a bit on career choices I've made.
I became a teamlead very quickly. The term
Do you think I got stuck?
Actually, I don't think so. It was my decision to stop on this level.
The reason is that I really enjoyed being a teamlead\techlead: research new technologies, develop products, apply engineering practices to solve operational problems and, of course, build something significant and valuable with the team, something that is not possible to build on your own.
During these years I grew up mostly in width extending my technical expertise and team management skills. And the last 5 years were the most amazing and interesting from that perspective in my career.
What I want to say: you don’t always have to chase a new noscript or position. If you’re not ready to get more responsibility right now, it's fine. Sometimes, it’s enough just to enjoy your work and have fun with it 😎.
This year I finally moved to another level of technical leadership - head of division. I'm now responsible for management, architecture and roadmap across six teams with around 50+ people. And I really feel I'm ready for it now. But this is an another story 😉.
#softskills #career
A few days ago I wrote an essay about my career path for one educational program. It made me reflect a bit on career choices I've made.
I became a teamlead very quickly. The term
techlead was not popular that time, but I usually combined both roles. And I spent more than 10 years on this position. Do you think I got stuck?
Actually, I don't think so. It was my decision to stop on this level.
The reason is that I really enjoyed being a teamlead\techlead: research new technologies, develop products, apply engineering practices to solve operational problems and, of course, build something significant and valuable with the team, something that is not possible to build on your own.
During these years I grew up mostly in width extending my technical expertise and team management skills. And the last 5 years were the most amazing and interesting from that perspective in my career.
What I want to say: you don’t always have to chase a new noscript or position. If you’re not ready to get more responsibility right now, it's fine. Sometimes, it’s enough just to enjoy your work and have fun with it 😎.
This year I finally moved to another level of technical leadership - head of division. I'm now responsible for management, architecture and roadmap across six teams with around 50+ people. And I really feel I'm ready for it now. But this is an another story 😉.
#softskills #career
🔥10❤1👍1
Lessons Learnt form Big Failures
Apple, Facebook, Google, Netflix, OpenAI - we all know these examples of success stories. The problem is that each success story is a unique combination of many factors that are very difficult to reproduce.
It's much more perspective to study failures, as they have more or less the same reasons and show what definitely will not lead you to any success.
Here's a collection of IT project failure case studies that cost companies tens of millions of dollars. There are cases that happened for the last ~15 years, and if you quickly go through them you will realize that most problems look very common:
📍 Corporate Culture. It's not so obvious but it's actually the root cause of many other problems like unpredicted complexity, underestimation, lack of transparency, etc. Why? When you develop the system the technical team usually knows about all that problems, moreover they know whether the system is ready for production or not. The question is if they have an ability to explain that to the management and if the management is open enough to listen.
📍 Leadership Failures. There are wide range of problems from non clear responsibilities, poor ownership, ping-pong between the teams, lack of trust, communication failures and other issues.
📍 Risk Management. For any big project you should always have a plan "B". That’s why transparency and trust are so important, they're the only way to understand what's really going on and to have a chance to adjust the plan on time and avoid a complete disaster.
Software is a socio-technical system and most failures aren't about technologies they are about people. The good news is that we as technical leaders can improve that and make our projects more successful.
#leadership #management
Apple, Facebook, Google, Netflix, OpenAI - we all know these examples of success stories. The problem is that each success story is a unique combination of many factors that are very difficult to reproduce.
It's much more perspective to study failures, as they have more or less the same reasons and show what definitely will not lead you to any success.
Here's a collection of IT project failure case studies that cost companies tens of millions of dollars. There are cases that happened for the last ~15 years, and if you quickly go through them you will realize that most problems look very common:
📍 Corporate Culture. It's not so obvious but it's actually the root cause of many other problems like unpredicted complexity, underestimation, lack of transparency, etc. Why? When you develop the system the technical team usually knows about all that problems, moreover they know whether the system is ready for production or not. The question is if they have an ability to explain that to the management and if the management is open enough to listen.
📍 Leadership Failures. There are wide range of problems from non clear responsibilities, poor ownership, ping-pong between the teams, lack of trust, communication failures and other issues.
📍 Risk Management. For any big project you should always have a plan "B". That’s why transparency and trust are so important, they're the only way to understand what's really going on and to have a chance to adjust the plan on time and avoid a complete disaster.
Software is a socio-technical system and most failures aren't about technologies they are about people. The good news is that we as technical leaders can improve that and make our projects more successful.
#leadership #management
Henricodolfing
Project Failure Case Studies
I research project failures and write case studies about them because it is a great way (for both of us) to learn from others' mistakes. Thi...
👍3🔥3
GenAI for Go Optimizations
Today, code generation with an AI assistant doesn't impress anyone, but GenAI can be helpful not only for that. Uber recently published an interesting article about using LLMs to optimize Go services.
So what they did:
🔸 Collect CPU and memory profiles from production services.
🔸 Identify the top 30 most expensive functions based on CPU usage. If runtime.mallocgc consumes more than 15% of CPU time - additionally collect a memory profile.
🔸 Apply a static filter to exclude open-source dependencies and internal runtime functions. It allows to reduce noise and focus on business code only.
🔸 Prepare a catalog of performance antipatterns, most of them were already collected during past optimization work.
🔸 Pass source code and antipatterns list to LLM for analysis.
🔸 Validate the results using a separate pipeline: check whether an antipattern is really present and whether the suggested optimization is correct.
The article also contains interesting tips how they tuned prompting, reduce hallucinations and improve the trust for the tool among developers.
What I like about Uber’s technical articles is that they always calculate the efficiency of the results:
#engineering #usecase #ai
Today, code generation with an AI assistant doesn't impress anyone, but GenAI can be helpful not only for that. Uber recently published an interesting article about using LLMs to optimize Go services.
So what they did:
🔸 Collect CPU and memory profiles from production services.
🔸 Identify the top 30 most expensive functions based on CPU usage. If runtime.mallocgc consumes more than 15% of CPU time - additionally collect a memory profile.
🔸 Apply a static filter to exclude open-source dependencies and internal runtime functions. It allows to reduce noise and focus on business code only.
🔸 Prepare a catalog of performance antipatterns, most of them were already collected during past optimization work.
🔸 Pass source code and antipatterns list to LLM for analysis.
🔸 Validate the results using a separate pipeline: check whether an antipattern is really present and whether the suggested optimization is correct.
The article also contains interesting tips how they tuned prompting, reduce hallucinations and improve the trust for the tool among developers.
What I like about Uber’s technical articles is that they always calculate the efficiency of the results:
Over four months, the number of antipatterns reduced from 265 to 176. Projecting this annually, that’s a reduction of 267 antipatterns. Addressing this volume manually, as the Go expert team would have consumed approximately 3,800 hours.
we reduced the engineering time required to detect and fix an issue from 14.5 hours to almost 1 hour of tool runtime—a 93.10% time savings.
#engineering #usecase #ai
🔥6
Platform Engineering: Shift It Down
The great video from Google experts about platform engineering.
One of the most popular DevOps concepts for the last decade was "shift left". And it showed really good results improving overall products quality, reducing delivery time and decreasing the cost of errors. At the same time it significantly increased cognitive load on developers as it placed the full burden of implementation complexity on engineers.
Speakers suggests to use a new concept to solve this problem:
Don't just shift left, shift it down.
The idea is to put all quality attributes implementation (like reliability, security, performance, testability, etc.) to the platform teams. Anything that is not a product feature but architecture should go to the platform teams.
Technical toolbox to do that consists of 2 items:
1. Abstractions: well-defined parts and components. Provides understandability, accountability, risk management levels and cost control of your system.
2. Coupling: something that make your system greater than a sum of the parts. Provides modifiability of the system, golden paths and efficiency.
To apply this toolbox in practice you need governance, policies, and education. They call it "culture and shared responsibility".
One more interesting concept from the video that I really like is to use different levels of flexibility in following the rules, depending on the consequences of an error:
For example:
An unauthenticated API can be a critical business risk, so developers must use the proper security framework. Its usage can be checked at build time to ensure it’s not missed. Build time control provides assured type of flexibility as developers cannot avoid it.
I think these levels provide a really good principles for platform teams to decide where to invest to get the biggest impact. So I definitely recommend to check the full video if you're interested in platform engineering.
#engineering
The great video from Google experts about platform engineering.
One of the most popular DevOps concepts for the last decade was "shift left". And it showed really good results improving overall products quality, reducing delivery time and decreasing the cost of errors. At the same time it significantly increased cognitive load on developers as it placed the full burden of implementation complexity on engineers.
Speakers suggests to use a new concept to solve this problem:
Don't just shift left, shift it down.
The idea is to put all quality attributes implementation (like reliability, security, performance, testability, etc.) to the platform teams. Anything that is not a product feature but architecture should go to the platform teams.
Technical toolbox to do that consists of 2 items:
1. Abstractions: well-defined parts and components. Provides understandability, accountability, risk management levels and cost control of your system.
2. Coupling: something that make your system greater than a sum of the parts. Provides modifiability of the system, golden paths and efficiency.
To apply this toolbox in practice you need governance, policies, and education. They call it "culture and shared responsibility".
One more interesting concept from the video that I really like is to use different levels of flexibility in following the rules, depending on the consequences of an error:
YOLO -> Adhoc -> Guided -> Engineered -> Assured
For example:
An unauthenticated API can be a critical business risk, so developers must use the proper security framework. Its usage can be checked at build time to ensure it’s not missed. Build time control provides assured type of flexibility as developers cannot avoid it.
I think these levels provide a really good principles for platform teams to decide where to invest to get the biggest impact. So I definitely recommend to check the full video if you're interested in platform engineering.
#engineering
YouTube
Shift down: A practical guide to platform engineering - Leah Rivers & James Brookbank
Drawing on years of experience building internal platforms at Google, this session provides actionable insights for creating effective development ecosystems. Attendees will learn how to prioritize safety, efficiency, and reliability through the collaboration…
🔥1
Important illustrations from the video.
Source: Shift down: A practical guide to platform engineering - Leah Rivers & James Brookbank
#engineering
Source: Shift down: A practical guide to platform engineering - Leah Rivers & James Brookbank
#engineering
👍1🔥1
AI-Literacy
The AI growth continues to bring new and new terms. Just look at the hype around Vibe Coding 😎. But today I want to talk about another term - AI-literacy.
Under AI-literacy the industry understands a set of competences to work with AI.
It consists of the following elements:
🔸 Know & Understand AI: common understanding how it works, critically evaluate its outputs.
🔸 Use & Apply AI: use AI tools and agents to solve different tasks, prompt engineering.
🔸 Manage AI: setting AI usage guidelines and policies, manage prompt libraries, education.
🔸 Collaborate with AI: work with AI to create innovative solutions, solve real-world problems.
Why is it interesting for us?
The competency exists, but in most companies it's not yet reflected in any policies or skill matrices. Moreover, there are often no AI usage guidelines at all. But employees definitely use it (not always effectively as they could), sometimes sending confidential data to public models 😱.
AI-literacy is a good concept that you can use to start manage AI knowledge within your team: education, guidelines, restrictions, sharing and collecting useful prompts, incorporating AI tools to your daily routine.
#leadership #ai #management
The AI growth continues to bring new and new terms. Just look at the hype around Vibe Coding 😎. But today I want to talk about another term - AI-literacy.
Under AI-literacy the industry understands a set of competences to work with AI.
It consists of the following elements:
🔸 Know & Understand AI: common understanding how it works, critically evaluate its outputs.
🔸 Use & Apply AI: use AI tools and agents to solve different tasks, prompt engineering.
🔸 Manage AI: setting AI usage guidelines and policies, manage prompt libraries, education.
🔸 Collaborate with AI: work with AI to create innovative solutions, solve real-world problems.
Why is it interesting for us?
The competency exists, but in most companies it's not yet reflected in any policies or skill matrices. Moreover, there are often no AI usage guidelines at all. But employees definitely use it (not always effectively as they could), sometimes sending confidential data to public models 😱.
AI-literacy is a good concept that you can use to start manage AI knowledge within your team: education, guidelines, restrictions, sharing and collecting useful prompts, incorporating AI tools to your daily routine.
#leadership #ai #management
❤1👍1
Uber Code Review AI Assistant
Uber continues to share their experience to integrate AI into different parts of development process. This time it's GenAI code review assistant (previously they published about GenAI On-Call Copilot and GenAI Optimizations for Go).
If you tried to make a code review with some GenAI tool you may notice it's not perfect yet: hallucinations, overengineering, noisy suggestions. It left the feeling that it produces more issues and consume more time than a human review process.
That's why Uber engineers created their own review platform.
So let's check what they implemented:
🔸 Define relevant files for analysis: filter out configuration files, generated code, and experimental directories.
🔸 Include PR changes, surrounding functions and class definitions to the LLM context.
🔸 Execute analysis calling number of different AI assistants:
- Standard: detects bugs, exception handling and logic flaws.
- Best Practices: enforces Uber-specific coding conventions and style guides.
- Security: checks application-level security vulnerabilities.
🔸 Execute another prompt to check quality of the previous step, assign a confidence score and merge overlapping suggestions.
🔸 Run a classifier for each generated comment and suppress categories with low developer value.
🔸 Publish result comments on PR.
Authors reported that the whole process takes around 4 minutes and already integrated with all Uber's monorepos: Go, Java, Android, iOS, Typenoscript, and Python.
One more interesting point, that for code analysis and comment grading 2 different models were used: Claude-4-Sonnet and OpenAI o4-mini-high.
As you can see, more and more AI systems start working in multiple stages, where one AI checks the results of another. This pattern is becoming popular and it shows really good results removing noise and decreasing the number of hallucinations.
#engineering #ai #usecase
Uber continues to share their experience to integrate AI into different parts of development process. This time it's GenAI code review assistant (previously they published about GenAI On-Call Copilot and GenAI Optimizations for Go).
If you tried to make a code review with some GenAI tool you may notice it's not perfect yet: hallucinations, overengineering, noisy suggestions. It left the feeling that it produces more issues and consume more time than a human review process.
That's why Uber engineers created their own review platform.
So let's check what they implemented:
🔸 Define relevant files for analysis: filter out configuration files, generated code, and experimental directories.
🔸 Include PR changes, surrounding functions and class definitions to the LLM context.
🔸 Execute analysis calling number of different AI assistants:
- Standard: detects bugs, exception handling and logic flaws.
- Best Practices: enforces Uber-specific coding conventions and style guides.
- Security: checks application-level security vulnerabilities.
🔸 Execute another prompt to check quality of the previous step, assign a confidence score and merge overlapping suggestions.
🔸 Run a classifier for each generated comment and suppress categories with low developer value.
🔸 Publish result comments on PR.
Authors reported that the whole process takes around 4 minutes and already integrated with all Uber's monorepos: Go, Java, Android, iOS, Typenoscript, and Python.
One more interesting point, that for code analysis and comment grading 2 different models were used: Claude-4-Sonnet and OpenAI o4-mini-high.
As you can see, more and more AI systems start working in multiple stages, where one AI checks the results of another. This pattern is becoming popular and it shows really good results removing noise and decreasing the number of hallucinations.
#engineering #ai #usecase
❤4👍3
Write It Down
Have you ever been on the meetings where people start yelling at each other? Or don't listen to each other? I've been there and what I can say: it's very difficult to manage and fix such situations.
There is one tip I learned on one of the soft skills trainings:
And you know what? It works perfectly well 👍.
Now when things start heating up, I open Notepad++, write down all the points, ask clarifying questions, and confirm I got it right. In online meetings, I share my screen so everyone can see my notes.
So next time in the meeting where the discussion becomes too emotional, keep calm and just write everything down.
#softskills #tips #leadership
Have you ever been on the meetings where people start yelling at each other? Or don't listen to each other? I've been there and what I can say: it's very difficult to manage and fix such situations.
There is one tip I learned on one of the soft skills trainings:
"If someone is yelling at you, start writing down what they say. It’s almost impossible to yell at someone who's taking notes on each word you said."
And you know what? It works perfectly well 👍.
Now when things start heating up, I open Notepad++, write down all the points, ask clarifying questions, and confirm I got it right. In online meetings, I share my screen so everyone can see my notes.
So next time in the meeting where the discussion becomes too emotional, keep calm and just write everything down.
#softskills #tips #leadership
🔥9👍2
Simple Prompt Techniques
GenAI continues to revolutionize the way we perform our tasks, and it really simplifies some part of daily routine execution. But to do that efficiently, you need to use correct prompts. The rule is simple: the better you specify the request, the better results you get.
So I’d like to share a few simple prompting methods that I’ve found really helpful.
RTF
It's perfect for simple tasks. According to the RTF you need to write your prompts in the following way:
🔸 Role: AI role and area of expertise.
🔸 Task: Task or questions denoscription.
🔸 Format: Output format or structure: code snippet, text, specific document, json structure, etc.
Example:
Role: You are an experienced Go developer.
Task: Analyze this Go function and suggest improvements to error handling and HTTP client reuse.
Format: Return a code snippet with inline comments explaining improvements.
RISEN
This framework suites for more complex tasks:
🔸 Role: AI role and area of expertise.
🔸 Instructions: Task or questions denoscription. The more details you specify, the better the output.
🔸 Steps: Steps to perform to complete the task.
🔸 Expectations: Goal of the output, what you aim to achieve. It can include examples, output format and other guidelines.
🔸 Narrowing: Limitations, restrictions, or what to focus on.
Example:
Role: You are an SRE engineer.
Instructions: Prepare outage report data [based on the provided details].
Steps: 1) Summarize timeline, 2) Identify root cause, 3) Suggest prevention.
Expectation: Output an incident report in Markdown with Summary, Impact, Root Cause, Action Items.
Narrowing: Keep it management-friendly but with enough technical detail for engineers.
I hope these prompt techniques will be useful for you as well.
#ai #tips
GenAI continues to revolutionize the way we perform our tasks, and it really simplifies some part of daily routine execution. But to do that efficiently, you need to use correct prompts. The rule is simple: the better you specify the request, the better results you get.
So I’d like to share a few simple prompting methods that I’ve found really helpful.
RTF
It's perfect for simple tasks. According to the RTF you need to write your prompts in the following way:
🔸 Role: AI role and area of expertise.
🔸 Task: Task or questions denoscription.
🔸 Format: Output format or structure: code snippet, text, specific document, json structure, etc.
Example:
Role: You are an experienced Go developer.
Task: Analyze this Go function and suggest improvements to error handling and HTTP client reuse.
Format: Return a code snippet with inline comments explaining improvements.
RISEN
This framework suites for more complex tasks:
🔸 Role: AI role and area of expertise.
🔸 Instructions: Task or questions denoscription. The more details you specify, the better the output.
🔸 Steps: Steps to perform to complete the task.
🔸 Expectations: Goal of the output, what you aim to achieve. It can include examples, output format and other guidelines.
🔸 Narrowing: Limitations, restrictions, or what to focus on.
Example:
Role: You are an SRE engineer.
Instructions: Prepare outage report data [based on the provided details].
Steps: 1) Summarize timeline, 2) Identify root cause, 3) Suggest prevention.
Expectation: Output an incident report in Markdown with Summary, Impact, Root Cause, Action Items.
Narrowing: Keep it management-friendly but with enough technical detail for engineers.
I hope these prompt techniques will be useful for you as well.
#ai #tips
👍4🔥4
Measuring System Complexity
I think we can all agree that the less complex our systems are, the easier they are to modify, operate and troubleshoot. But how can we properly measure complexity?
The most popular answer will be something related to cyclomatic complexity or number of code lines. But have you ever tried to use them in practice? I found them absolutely impractical and not actionable for huge codebases. They will always show you some numbers detecting the system is big and complex. Nothing new actually 🙃
I found more practical alternatives in Google SRE book:
🔸 Training Time: Time to onboard a new team member to the team.
🔸 Explanation Time: Time to explain high-level architecture of the service.
🔸 Administrative Diversity: Number of ways to configure similar settings in different parts of the system.
🔸 Diversity of Deployed Configurations: Number of configurations that are deployed in production. It can include installed services, their versions, feature flags, environment-specific parameters.
🔸 Age of the System: The older system tends to be more complex and fragile.
Of course, these metrices are not mathematically precise, but they provide high level indicators of the overall complexity of the existing architecture, not just individual blocks of code. And most importantly, they show what direction we should take to improve the situation.
#engineering #systemdesign
I think we can all agree that the less complex our systems are, the easier they are to modify, operate and troubleshoot. But how can we properly measure complexity?
The most popular answer will be something related to cyclomatic complexity or number of code lines. But have you ever tried to use them in practice? I found them absolutely impractical and not actionable for huge codebases. They will always show you some numbers detecting the system is big and complex. Nothing new actually 🙃
I found more practical alternatives in Google SRE book:
🔸 Training Time: Time to onboard a new team member to the team.
🔸 Explanation Time: Time to explain high-level architecture of the service.
🔸 Administrative Diversity: Number of ways to configure similar settings in different parts of the system.
🔸 Diversity of Deployed Configurations: Number of configurations that are deployed in production. It can include installed services, their versions, feature flags, environment-specific parameters.
🔸 Age of the System: The older system tends to be more complex and fragile.
Of course, these metrices are not mathematically precise, but they provide high level indicators of the overall complexity of the existing architecture, not just individual blocks of code. And most importantly, they show what direction we should take to improve the situation.
#engineering #systemdesign
🔥3👍2
Pipeline Patterns
Today we cannot imagine our CI\CD processes without pipelines. They’re everywhere: building, linting, testing, verifying compliance, deploying, and even handling maintenance tasks.
Have you ever seen the internals of those pipelines? I've seen: it's often a full mess.
So it's not a big surprise that someone started to think about how to write pipelines in resource efficient and easy to support way. That's exactly one of the topics from recent NDC Oslo conference: Pipeline Patterns and Antipatterns by Daniel Raniz Raneland.
It may not be any rocket science, but there is a good set of useful recipes:
🔸 Right pipeline for the job: Select only required steps for the task. For example, in build pipeline we can execute unit tests on PR and on the main, but we should not execute them on nightly CI with integration tests.
🔸 Conditional steps: Define a logic to skip not needed steps. For example, if you change only docs, you don't need to run build and tests.
🔸 Steps results reuse: Use artifacts from one step as the input to another steps.
🔸 Fail fast: Put steps that are failed more frequently to the beginning of the pipe.
🔸 Parallel run: Execute steps in parallel where it's possible.
🔸 Isolation: Result of one pipeline should not affect the results of the another.
🔸 Artifacts Housekeeping: Define cleanup policies for the artifacts.
🔸 Reasonable HWE: Carefully define required HWE to execute pipeline steps.
The key idea from the talk is that we should treat pipelines as any other software and we should apply the same architecture principles and best practices as for any other application.
#engineering
Today we cannot imagine our CI\CD processes without pipelines. They’re everywhere: building, linting, testing, verifying compliance, deploying, and even handling maintenance tasks.
Have you ever seen the internals of those pipelines? I've seen: it's often a full mess.
So it's not a big surprise that someone started to think about how to write pipelines in resource efficient and easy to support way. That's exactly one of the topics from recent NDC Oslo conference: Pipeline Patterns and Antipatterns by Daniel Raniz Raneland.
It may not be any rocket science, but there is a good set of useful recipes:
🔸 Right pipeline for the job: Select only required steps for the task. For example, in build pipeline we can execute unit tests on PR and on the main, but we should not execute them on nightly CI with integration tests.
🔸 Conditional steps: Define a logic to skip not needed steps. For example, if you change only docs, you don't need to run build and tests.
🔸 Steps results reuse: Use artifacts from one step as the input to another steps.
🔸 Fail fast: Put steps that are failed more frequently to the beginning of the pipe.
🔸 Parallel run: Execute steps in parallel where it's possible.
🔸 Isolation: Result of one pipeline should not affect the results of the another.
🔸 Artifacts Housekeeping: Define cleanup policies for the artifacts.
🔸 Reasonable HWE: Carefully define required HWE to execute pipeline steps.
The key idea from the talk is that we should treat pipelines as any other software and we should apply the same architecture principles and best practices as for any other application.
#engineering
YouTube
Pipeline Patterns and Antipatterns - Things your Pipeline Should (Not) Do - Daniel Raniz Raneland
This talk was recorded at NDC Oslo in Oslo, Norway. #ndcoslo #ndcconferences #developer #softwaredeveloper
Attend the next NDC conference near you:
https://ndcconferences.com
https://ndcoslo.com/
Subscribe to our YouTube channel and learn every day:…
Attend the next NDC conference near you:
https://ndcconferences.com
https://ndcoslo.com/
Subscribe to our YouTube channel and learn every day:…
👍4
The Art of Systems Thinking
We live in a world of systems. They are everywhere: businesses, family, teams, software and even ourselves. All of that are examples of complex systems. That's why systems thinking is a key skill that allows to see common systems patterns, apply changes, predict side effects, and adopt to the results of the implemented changes.
I'd like to share one of the books regarding to this topic - The Art of Systems Thinking: Essential Skills for Creativity and Problem Solving by Joseph O'Connor and Ian McDermott.
Some Takeaways:
🔸 A system is more than just the sum of its parts. If you analyze system parts separately, you can’t predict the behavior of the system.
🔸 Stable systems are more resistant to the changes.
🔸 It's not possible to make an isolated change within the system. It will always create side effects.
🔸 The leverage principle: systems resist to any change. But if you understand the system well, you can find its weak elements. A small shift there can trigger big changes.
🔸 Connections between system parts create feedback loops. They can be of 2 types:
- Reinforcing: when changes keep going in the same direction—like a snowball rolling down.
- Balancing: when changes push the system to restore the balance—like a thermostat keeping the specified temperature.
🔸 Changes are not happened immediately. If you don’t account for this, it can lead to overreaction and oscillations.
🔸 To change a system, you need to destroy the old state and build a new stable one.
Of course, these are just basics. The book goes deeper into our mental models and cognitive traps, learning principles, how shared mindset shapes people behavior (e.g., the tradegy of commons ), how escalations work and what are the reasons under main social and financial patterns.
The book is easy to read, it's written with simple language and a lot of real-life examples.
So if the topic sounds interesting, I recommend to check the whole book.
#booknook #softskills #thinking
We live in a world of systems. They are everywhere: businesses, family, teams, software and even ourselves. All of that are examples of complex systems. That's why systems thinking is a key skill that allows to see common systems patterns, apply changes, predict side effects, and adopt to the results of the implemented changes.
I'd like to share one of the books regarding to this topic - The Art of Systems Thinking: Essential Skills for Creativity and Problem Solving by Joseph O'Connor and Ian McDermott.
Some Takeaways:
🔸 A system is more than just the sum of its parts. If you analyze system parts separately, you can’t predict the behavior of the system.
🔸 Stable systems are more resistant to the changes.
🔸 It's not possible to make an isolated change within the system. It will always create side effects.
🔸 The leverage principle: systems resist to any change. But if you understand the system well, you can find its weak elements. A small shift there can trigger big changes.
🔸 Connections between system parts create feedback loops. They can be of 2 types:
- Reinforcing: when changes keep going in the same direction—like a snowball rolling down.
- Balancing: when changes push the system to restore the balance—like a thermostat keeping the specified temperature.
🔸 Changes are not happened immediately. If you don’t account for this, it can lead to overreaction and oscillations.
🔸 To change a system, you need to destroy the old state and build a new stable one.
Of course, these are just basics. The book goes deeper into our mental models and cognitive traps, learning principles, how shared mindset shapes people behavior (e.g., the tradegy of commons ), how escalations work and what are the reasons under main social and financial patterns.
The book is easy to read, it's written with simple language and a lot of real-life examples.
So if the topic sounds interesting, I recommend to check the whole book.
#booknook #softskills #thinking
Goodreads
The Art of Systems Thinking: Essential Skills for Creat…
Great book with excellent info, this copy has slightly …
❤5👍1
Kafka 4.1 Release
At the beginning of September Kafka 4.1 was released. It doesn't contain any big surprises but it follows the overall industry direction to improve security and operability.
Noticeable changes:
🔸 Preview state for Kafka Queues (Detailed overview there). It's still not recommended for production, but it's a good time to check how it works and what scenarios it really covers.
🔸 Early access to Stream Rebalance protocol. It moves rebalance logic to the broker side. Initially the approach was implemented for consumers and now it's extended for streams (KIP-1071)
🔸 Ability for plugins and connectors to register their own metrics via Monitorable interface (KIP-877)
🔸 Metrics naming unification between consumers and producers (KIP-1109). Previously Kafka consumer replaces periods (
🔸 OAuth jwt-bearer grant type support in addition to
🔸 Ability to enforce explicit naming for internal topics (like changelog, repartition). A new configuration flag prevents Kafka Streams from starting if any of their internal topics have auto-generated names (KIP-1111).
Full list of changes can be found in release note and official upgrade recommendations.
#news #technologies
At the beginning of September Kafka 4.1 was released. It doesn't contain any big surprises but it follows the overall industry direction to improve security and operability.
Noticeable changes:
🔸 Preview state for Kafka Queues (Detailed overview there). It's still not recommended for production, but it's a good time to check how it works and what scenarios it really covers.
🔸 Early access to Stream Rebalance protocol. It moves rebalance logic to the broker side. Initially the approach was implemented for consumers and now it's extended for streams (KIP-1071)
🔸 Ability for plugins and connectors to register their own metrics via Monitorable interface (KIP-877)
🔸 Metrics naming unification between consumers and producers (KIP-1109). Previously Kafka consumer replaces periods (
.) in topic names in metrics with underscores (_), while producer keeps topic name unchanged. Now both producers and consumers preserve original topic name format. Old metrics will be removed in Kafka 5.0.🔸 OAuth jwt-bearer grant type support in addition to
client_credentials (KIP-1139)🔸 Ability to enforce explicit naming for internal topics (like changelog, repartition). A new configuration flag prevents Kafka Streams from starting if any of their internal topics have auto-generated names (KIP-1111).
Full list of changes can be found in release note and official upgrade recommendations.
#news #technologies
👍1🔥1
GenAI as a Thought Partner
AI is mostly used to get answers, provide summaries, generate some text or automate routine tasks. But it can be much more than that if it's used in a Thought Partner mode.
What does it mean?
It means that you can ask AI to generate ideas, challenge your solutions, offer alternative options, or even play devil’s advocate.
This mode is really helpful for the leaders. For example, I use it to challenge my proposals, find alternative options and arguments. It helps me to be more prepared for the meetings with management and customers.
Basic template:
🔸 Role: "Act as my Strategic Thought Partner"
🔸 Context: situation or problem denoscription, objectives
🔸 Task: what to do
Example:
Act as my Strategic Thought Partner by engaging me in a structured problem-solving process. Here’s the situation: [provide necessary context].
My goal is to [state objective].
Challenge my current assumptions, ask clarifying questions, and help me think through alternative solutions. I’d like you to surface blind spots and uncover insights I may have overlooked.
More ideas to use:
🔸 Give me 10 unexpected angles to consider for...
🔸 Act as a devil's advocate and challenge my current assumptions about...
🔸 Evaluate the pros and cons for...
🔸 Help me uncover blind spots and overlooked insights related to...
Thought partner mode is a great tool, but don’t take everything as the absolute truth. If you miss any important details, it can give you totally wrong results. And of course, it still can lie, make mistakes and hallucinate 😵💫. Use it with a critical eye.
#ai #tips
AI is mostly used to get answers, provide summaries, generate some text or automate routine tasks. But it can be much more than that if it's used in a Thought Partner mode.
What does it mean?
It means that you can ask AI to generate ideas, challenge your solutions, offer alternative options, or even play devil’s advocate.
This mode is really helpful for the leaders. For example, I use it to challenge my proposals, find alternative options and arguments. It helps me to be more prepared for the meetings with management and customers.
Basic template:
🔸 Role: "Act as my Strategic Thought Partner"
🔸 Context: situation or problem denoscription, objectives
🔸 Task: what to do
Example:
Act as my Strategic Thought Partner by engaging me in a structured problem-solving process. Here’s the situation: [provide necessary context].
My goal is to [state objective].
Challenge my current assumptions, ask clarifying questions, and help me think through alternative solutions. I’d like you to surface blind spots and uncover insights I may have overlooked.
More ideas to use:
🔸 Give me 10 unexpected angles to consider for...
🔸 Act as a devil's advocate and challenge my current assumptions about...
🔸 Evaluate the pros and cons for...
🔸 Help me uncover blind spots and overlooked insights related to...
Thought partner mode is a great tool, but don’t take everything as the absolute truth. If you miss any important details, it can give you totally wrong results. And of course, it still can lie, make mistakes and hallucinate 😵💫. Use it with a critical eye.
#ai #tips
✍3🔥2👍1
A Few Words About Configurations
Ability to change system configuration is very important aspect of service operability. But too much configurations can turn system support into a nightmare.
From my experience, dev teams tend to overcomplicate provided configs. They try to allow as much options as possible. The common explanation is "We don't know what would be really needed". Then all configurations are carefully documented in a several-thousand lines guide and delivered to the ops team. Of course, ops team never ever reads it 😁
There is a really good metaphor from Google SRE Book that illustrates this situation:
Configuration is intended to be used by humans, and it should be designed for humans.
Main principle here is simplicity and reasonable defaults. The less configuration is required, the simpler system is to operate and maintain.
One more important aspect of configurability is testing surface. It is quite expensive to check all possible parameters and their combinations. As a result too much variety increases the risk of errors and human mistakes.
So next time you think about adding new configuration parameter, keep in mind that the best configuration is no configuration.
#systemdesign #engineering
Ability to change system configuration is very important aspect of service operability. But too much configurations can turn system support into a nightmare.
From my experience, dev teams tend to overcomplicate provided configs. They try to allow as much options as possible. The common explanation is "We don't know what would be really needed". Then all configurations are carefully documented in a several-thousand lines guide and delivered to the ops team. Of course, ops team never ever reads it 😁
There is a really good metaphor from Google SRE Book that illustrates this situation:
A user can ask for “hot green tea” and get roughly what they want. On the opposite end, a user can specify the whole process: water volume, boiling temperature, tea brand and flavor, steeping time, tea cup type, and tea volume in the cup.
Configuration is intended to be used by humans, and it should be designed for humans.
Main principle here is simplicity and reasonable defaults. The less configuration is required, the simpler system is to operate and maintain.
One more important aspect of configurability is testing surface. It is quite expensive to check all possible parameters and their combinations. As a result too much variety increases the risk of errors and human mistakes.
So next time you think about adding new configuration parameter, keep in mind that the best configuration is no configuration.
#systemdesign #engineering
❤3👍3
Is Open Source Free?
Do you know that the term
Dylan Beattie presented the open source history and its current trends in the talk Open Source, Open Mind: The Cost of Free Software.
The history itself is very interesting: from pirating computer games, creating first linux distro to licenses evolution, CLAs and current number of limitations to use
But here I want to highlight the following:
🔸 Open source projects provide us a code. Not more. If you want continuity, support, availability, or convenience, expect to pay. It can be licenses, managed services, sponsorship or even your own investments.
🔸 "People who share the source code do not owe you anything". They don’t even promise the software works properly or it works at all.
🔸 Open source projects can change their license to commercial at any time. You should be just ready for that (remember Redis, Graylog, Vault, ElasticSearch, etc.).
Some example:
You can take postgres for free, it's fully open.
But can you use for production? Probably not 😯.
First you need to package it, prepare installation and upgrade procedures, implement HA, configure metrics and monitoring dashboards, provide backup approach, tune security, teach operations team, etc.
You can do it on your own or pay to another company to do it for you.
So anyone who says open source is free and it costs nothing has clearly never run it in production. Open source software is really "open" but not free.
#technologies
Do you know that the term
open source was invented in 1998 to replace the term free software? To highlight "free as in freedom, not free as in beer" 😀?Dylan Beattie presented the open source history and its current trends in the talk Open Source, Open Mind: The Cost of Free Software.
The history itself is very interesting: from pirating computer games, creating first linux distro to licenses evolution, CLAs and current number of limitations to use
open software. That part I recommend to watch once you have free time, it's really entertaining.But here I want to highlight the following:
🔸 Open source projects provide us a code. Not more. If you want continuity, support, availability, or convenience, expect to pay. It can be licenses, managed services, sponsorship or even your own investments.
🔸 "People who share the source code do not owe you anything". They don’t even promise the software works properly or it works at all.
🔸 Open source projects can change their license to commercial at any time. You should be just ready for that (remember Redis, Graylog, Vault, ElasticSearch, etc.).
Some example:
You can take postgres for free, it's fully open.
But can you use for production? Probably not 😯.
First you need to package it, prepare installation and upgrade procedures, implement HA, configure metrics and monitoring dashboards, provide backup approach, tune security, teach operations team, etc.
You can do it on your own or pay to another company to do it for you.
So anyone who says open source is free and it costs nothing has clearly never run it in production. Open source software is really "open" but not free.
#technologies
YouTube
Open Source, Open Mind: The Cost of Free Software - Dylan Beattie - NDC London 2025
This talk was recorded at NDC London in London, England. #ndclondon #ndcconferences #developer #softwaredeveloper
Attend the next NDC conference near you:
https://ndcconferences.com
https://ndclondon.com/
Subscribe to our YouTube channel and learn…
Attend the next NDC conference near you:
https://ndcconferences.com
https://ndclondon.com/
Subscribe to our YouTube channel and learn…
❤4👍2
Failure Is Always An Option
One more great video from Dylan Beattie - Failure Is Always An Option. This time it's the talk about software reliability and risks of system misbehavior.
Key ideas:
🔸 Use System Thinking. Reliability is not just about software, it's about holistic view on the system that includes software, hardware, finance and people.
🔸 Design For Failure. Be ready for failure at all system levels and components.
🔸 Measure Risk by Impact not Frequency. You might never had a car accident, but it doesn’t mean you don’t need airbags and seat belts.
🔸 Focus on Results. Define
🔸 Expect Surprises. Users are really creative, they can use features in an unpredictable way. Don't be arrogant to say "This is wrong". Learn from them to build awesome stuff together.
The talk is full of interesting samples of building complex reliable systems. The most impressive part for me was the story around Apollo 13 mission 🚀.
Just imagine: shuttle, astronauts, space, and some software... The whole mission success and astronauts lives depend on software quality and reliability. Sounds like a horror, right? 😃
HA & DR for Shuttle software was implemented using 6 computers:
🔸 4 identical computers to compare results and provide availability.
🔸 5th computer to perform the same logic but with software written by a different vendor.
🔸 6th computer without software at all, the idea was to use it to install the software from scratch in case there are issues with software on all other computers. Lately 6th computer was removed from shuttles as it was "never really used".
The video has many more great examples from software engineering history, I really watched it in one sitting. And I love Dylan’s presentation style: energetic, with a good dose of humor, engaging and inspirational. Recommend 👍.
#systemdesign #engineering #reliability
One more great video from Dylan Beattie - Failure Is Always An Option. This time it's the talk about software reliability and risks of system misbehavior.
Key ideas:
🔸 Use System Thinking. Reliability is not just about software, it's about holistic view on the system that includes software, hardware, finance and people.
🔸 Design For Failure. Be ready for failure at all system levels and components.
🔸 Measure Risk by Impact not Frequency. You might never had a car accident, but it doesn’t mean you don’t need airbags and seat belts.
🔸 Focus on Results. Define
things done by outcomes not executed steps or procedures.🔸 Expect Surprises. Users are really creative, they can use features in an unpredictable way. Don't be arrogant to say "This is wrong". Learn from them to build awesome stuff together.
The talk is full of interesting samples of building complex reliable systems. The most impressive part for me was the story around Apollo 13 mission 🚀.
Just imagine: shuttle, astronauts, space, and some software... The whole mission success and astronauts lives depend on software quality and reliability. Sounds like a horror, right? 😃
HA & DR for Shuttle software was implemented using 6 computers:
🔸 4 identical computers to compare results and provide availability.
🔸 5th computer to perform the same logic but with software written by a different vendor.
🔸 6th computer without software at all, the idea was to use it to install the software from scratch in case there are issues with software on all other computers. Lately 6th computer was removed from shuttles as it was "never really used".
The video has many more great examples from software engineering history, I really watched it in one sitting. And I love Dylan’s presentation style: energetic, with a good dose of humor, engaging and inspirational. Recommend 👍.
#systemdesign #engineering #reliability
YouTube
Failure Is Always An Option • Dylan Beattie • GOTO 2023
This presentation was recorded at GOTO Amsterdam 2023. #GOTOcon #GOTOams
https://gotoams.nl
Dylan Beattie - Consultant, Software Developer & Creator of the Rockstar Programming Language @DylanBeattie
RESOURCES
https://twitter.com/dylanbeattie
https://…
https://gotoams.nl
Dylan Beattie - Consultant, Software Developer & Creator of the Rockstar Programming Language @DylanBeattie
RESOURCES
https://twitter.com/dylanbeattie
https://…
🔥2👍1
Open Infrastructure is Not Free
There was a piece of news last week that might not be very noticeable, but it's really important for the whole open source community. On Sep 23 open source foundations like Sonatype (Maven Central), Open Source Security Foundation (OpenSSF), Python Software Foundation (PyPI) and others published a joined letter - Open Infrastructure is Not Free: A Joint Statement on Sustainable Stewardship.
What problem they highlighted:
🔸 Open source infrastructure is the foundation of any modern digital infrastructure.
🔸 User expects this infrastructure to be secure, fast, reliable, and global.
🔸 Public registries are often used to distribute proprietary software (it may have open source license but it can work only as a part of a paid product).
🔸 Commercial organizations heavily use open source infrastructure as free CDN and distribution systems.
🔸 Open source infrastructure is supported by non-profit foundations and enthusiasts. They don't have enough resources to meet growing expectations.
🔸 Load on the infrastructure grows exponentially, donations - linearly.
🔸 This situation produces a disbalance: billion-dollars ecosystems live on services that are built on goodwill, unpaid weekends and sponsorships.
The problem is obvious: too many companies make money on open source infrastructure without giving a cent back. They profit, while the real costs are carried by volunteers and foundation sponsors. The claim is fair enough.
Proposed ideas:
🔸 Commercial Partnership: Fund infrastructure in proportion to usage.
🔸 Tiered Access: Free access for individual contributors, paid options for scale and performance for high-volume consumers.
🔸 Additional Capabilities: Provide additional capabilities that might be interesting for commercial entities (e.g. some statistics or analytics)
The authors said that this letter is only the beginning: they will start to actively work with foundations, governments, and industry partners to improve the situation. Looks like in 2-3 years we'll have totally different infrastructure, and, most probably, it will not be free.
#news #technologies
There was a piece of news last week that might not be very noticeable, but it's really important for the whole open source community. On Sep 23 open source foundations like Sonatype (Maven Central), Open Source Security Foundation (OpenSSF), Python Software Foundation (PyPI) and others published a joined letter - Open Infrastructure is Not Free: A Joint Statement on Sustainable Stewardship.
What problem they highlighted:
🔸 Open source infrastructure is the foundation of any modern digital infrastructure.
🔸 User expects this infrastructure to be secure, fast, reliable, and global.
🔸 Public registries are often used to distribute proprietary software (it may have open source license but it can work only as a part of a paid product).
🔸 Commercial organizations heavily use open source infrastructure as free CDN and distribution systems.
🔸 Open source infrastructure is supported by non-profit foundations and enthusiasts. They don't have enough resources to meet growing expectations.
🔸 Load on the infrastructure grows exponentially, donations - linearly.
🔸 This situation produces a disbalance: billion-dollars ecosystems live on services that are built on goodwill, unpaid weekends and sponsorships.
The problem is obvious: too many companies make money on open source infrastructure without giving a cent back. They profit, while the real costs are carried by volunteers and foundation sponsors. The claim is fair enough.
Proposed ideas:
🔸 Commercial Partnership: Fund infrastructure in proportion to usage.
🔸 Tiered Access: Free access for individual contributors, paid options for scale and performance for high-volume consumers.
🔸 Additional Capabilities: Provide additional capabilities that might be interesting for commercial entities (e.g. some statistics or analytics)
The authors said that this letter is only the beginning: they will start to actively work with foundations, governments, and industry partners to improve the situation. Looks like in 2-3 years we'll have totally different infrastructure, and, most probably, it will not be free.
#news #technologies
🔥3👍2😱1
Software Quality: What does it mean?
We all want to build high-quality products. But what do we understand under
Actually, developers, business and users mean different things under the quality.
There is a really good publication from Google team regarding this topic - Developer Productivity for Humans, Part 7: Software Quality.
The authors break down software quality into 4 types:
🔸 Process Quality. It usually includes code reviews, organizational consistency, effective planning, testing strategy, tests flakiness, distribution of work. Typically, higher process quality leads to higher code quality.
🔸 Code Quality. It's code testability, complexity, readability and maintainability. High code quality improves quality of the system by reducing defects and increasing reliability.
🔸 System Quality. It means high reliability, high performance, security, privacy and low defect rates.
🔸 Product Quality. It's the type of quality experienced by the customers. It includes utility, usability, and reliability. Also this level includes other business parameters: brand reputation, costs and overheads, revenue.
These four types of quality impact each other: the process quality affects code quality, which affects system quality, which affects product quality. The end goal is always to improve product quality.
This model also explains why ideas like "we'll improve test coverage to X% and we'll get the good quality" rarely works in practice. It might help a little bit, but the connection with product quality is far away.
So if the team is concerned about the quality, they need to analyze what type of quality they want to work on and select appropriate metrics.
#engineering #quality
We all want to build high-quality products. But what do we understand under
high-quality? Is it high test coverage? Low defects rate? Reliability? Compliance? Actually, developers, business and users mean different things under the quality.
There is a really good publication from Google team regarding this topic - Developer Productivity for Humans, Part 7: Software Quality.
The authors break down software quality into 4 types:
🔸 Process Quality. It usually includes code reviews, organizational consistency, effective planning, testing strategy, tests flakiness, distribution of work. Typically, higher process quality leads to higher code quality.
🔸 Code Quality. It's code testability, complexity, readability and maintainability. High code quality improves quality of the system by reducing defects and increasing reliability.
🔸 System Quality. It means high reliability, high performance, security, privacy and low defect rates.
🔸 Product Quality. It's the type of quality experienced by the customers. It includes utility, usability, and reliability. Also this level includes other business parameters: brand reputation, costs and overheads, revenue.
These four types of quality impact each other: the process quality affects code quality, which affects system quality, which affects product quality. The end goal is always to improve product quality.
This model also explains why ideas like "we'll improve test coverage to X% and we'll get the good quality" rarely works in practice. It might help a little bit, but the connection with product quality is far away.
So if the team is concerned about the quality, they need to analyze what type of quality they want to work on and select appropriate metrics.
#engineering #quality
research.google
Developer Productivity for Humans, Part 7: Software Quality
🔥2❤1👍1