TechLead Bits – Telegram
TechLead Bits
424 subscribers
62 photos
1 file
157 links
About software development with common sense.
Thoughts, tips and useful resources on technical leadership, architecture and engineering practices.

Author: @nelia_loginova
Download Telegram
The Motive

`Please, don’t be a leader, unless you’re doing it for the right reason, and you probably aren’t!` It may sound provocative, but Patrick Lencioni starts his book 'The Motive: Why So Many Leaders Abdicate Their Most Important Responsibilities' with this very idea. Released in 2020, this is one of his latest works where he challenges traditional views on leadership, exploring why individuals should choose to take on leadership roles.

The author defines two primary motives that drive leaders:

1️⃣ Reward: Leaders driven by rewards primarily seek personal gratification and enjoyment in their position. They avoid mundane tasks, focusing solely on activities that interest them. They actually dislike managerial tasks and would likely find greater satisfaction and effectiveness in other areas.

2️⃣ Responsibility: Leaders with a sense of responsibility prioritize the well-being and growth of their team. They take unpleasant tasks, recognizing their duty to their employees and organization.

While most individuals possess elements of both motives, Lencioni argues that one typically dominates, significantly influencing leadership effectiveness. According to him, responsibility-oriented leaders tend to achieve greater success.

Lencioni identifies five crucial activities that reward-oriented leaders tend to avoid:

1️⃣ Developing the leadership team: Taking personal accountability for the growth and development of team members.
2️⃣ Managing subordinates (and making them manage theirs): Providing guidance, sharing knowledge, and offering mentorship.
3️⃣ Having difficult and uncomfortable conversations: Addressing complex and uncomfortable issues promptly and effectively.
4️⃣ Running great team meetings: Facilitating productive discussions and decision-making.
5️⃣ Communicating constantly and repetitively to employees: Keeping the team informed about organizational vision, goals, decisions, and challenges.

For me, it was a valuable review that brought particular aspects into focus. After looking closely at how I lead, I realized that handling tough conversations (point 3) isn't my strongest skill. So, I marked it as an area to improve upon. But it's nice to know that this challenge isn't unique to me, as the book mentions 😉.

So, revise those unpleasant tasks, clarify your motives and be a good leader.

#management #leadership #booknook
🔥2
Dual-Write Problem

In the distributed world, it's often the case that two external systems need to be synchronized and updated simultaneously to maintain consistency. It’s called the dual-write problem. A classic example of this is when data needs to be stored both in a database and in Kafka.

So what's the best solution for this problem? What are common pitfalls and how to avoid them? These questions are addressed in the article 'Solving the Dual-Write Problem` by W.Waldron.

Identified antipatterns:
📍Relying on operation order
📍Wrapping dual writes into a database transaction
📍Retrying failed operations

Possible solutions:
📍Transactional outbox pattern. In this approach, an outbox table is set up in the database. Changes are made to the target tables and the outbox table within the same transaction. A separate process then reads the outbox table rows and sends them to Kafka, retrying if there's a failure.
📍Change Data Capture (CDC) if supported by the database. Modification of p.1
📍Event sourcing. Event sourcing: Every change is recorded in the database as an event. Since each event is written to a single row in a single table, transactions are unnecessary. A separate process can then read these events and send them to Kafka.
📍The listen-to-yourself pattern. Any change is directly sent to Kafka. A separate process can listen to these events and use them to update the database, retrying as needed. Database will be eventually consistent.

The core concept behind these solutions is to divide writes into two separate processes and establish a dependency between them. While this isn't an exhaustive list of all potential solutions, it provides a solid set of practices to begin with.

#architecture #systemdesign #patterns
❤‍🔥2
Flaky Tests Overhaul

If you've got a substantial set of tests, chances are very high that you've encountered situations where some test outcomes fluctuate between runs, even when there are no changes in the code. This inconsistency is called "flakiness".

Teams often may find themselves repeatedly retrying pipelines containing flaky tests (timeouts are also considered a form of flakiness), trying to make the pipeline green. However, this process often results in wasted engineering hours and CI resources. As the code base and number of teams grow, managing flaky tests becomes increasingly challenging, leading to more potential issues and accumulating technical debt.

The Uber engineering team recently published an article detailing their approach to improve CI stability and address the issue of test flakiness.

Key points:
📍Separate service Testopedia is introduced to visualize history of test execution and test performance characteristics
📍Testopedia is language/repo-agnostic. It operates with the term ‘test entity’. Each test entity is uniquely identified by a “fully qualified name” (FQN) that usually includes a full test address in the repo.
📍Tests can be grouped into realms, each realm is owned by some responsible team.
📍Testopedia analyzes test execution stats (including flakiness, reliability, staleness, execution time) , groups problem tests and triggers a JIRA ticket with the deadline to fix.
📍GenAI integration is a future step to auto-generate fixes for flaky tests. It’s under research now.

As a result, the authors noted that implementing the Testopedia approach significantly improved the reliability of CI and reduced the number of retries. If this tool were available as an open-source project, I would certainly give it a try, but unfortunately, it's not.

However, in the absence of such a tool, what steps can we take on our own to address this issue? Here are some suggestions:
📍Visualize pipeline health by implementing simple monitoring of CI statistics, including the number of retries, execution time, and other relevant metrics. To improve something it must be measurable.
📍Treat problematic tests as work items with clear deadlines for resolution.
📍Prioritize CI issues, recognizing them as critical technical debt that will require attention anyway
📍Implement measures to make retries more difficult or even impossible (validations, webhooks, etc.)
📍Clearly define roles and responsibilities for maintaining CI stability; otherwise there is a risk of collective irresponsibility.

#engineering #ci
👍3🔥3
Architectural Principles

Whether you set them or not, you and your team already use architectural principles. These principles may not be obvious, but they influence technical decisions, from writing any piece of code to preparing complex designs. The challenge is that each developer may have their own set of principles based on their previous experiences, best practices, or books they have read. This can lead to inconsistent system behavior across different modules, debates during code reviews, and the need for additional approvals to make decisions.

Let’s define what an architectural principle is. Eoin Woods, author of Continuous Architecture in Practice, offers a definition I like: an architectural principle is
a declarative statement made with the intention of guiding architectural design decisions to achieve one or more qualities of a system.


In essence, architectural principles are guidelines that establish a common framework for the team, enabling them to make informed decisions independently of any central authority. These principles not only enhance team responsibility for their actions but also streamline the achievement of business and technical goals. For architectural principles to be effective, it is crucial that the team participates in their development and accepts them.

Relationship Between Goals, Principles, and Decisions:
Goal -> Requirements -> Principles -> Decision


Recommendations for defining Architectural Principles:
📍Simplicity: Ensure that principles are straightforward and do not require additional context for understanding.
📍SMART Criteria: Define principles that are Specific, Measurable, Achievable, Realistic, and Testable.
📍Practical Guidance: Principles should be practical and usable as implementation guidance.
📍Relevance: Focus on the most significant principles that, if not properly defined, could lead to poor decisions.
📍Conciseness: Keep the list short (5-7 items) to reduce cognitive load.

Some examples of good principles:
✔️Degrade Gracefully: Design the system to continue functioning even when some components fail, providing a lower-quality service instead of a total failure.
✔️Self-Healing: Ensure that failed actions are continuously retried until they succeed
✔️Design to Be Monitored: Build systems that are self-diagnosing and can be easily monitored.

Architectural principles set boundaries and guidelines for teams, allowing the freedom to make independent decisions. This helps maintain consistency, improve decision-making, boost autonomy, and achieve technical and business goals.

#architecture #documentation
👍2🔥2
Structure Eats Strategy

Many are familiar with Conway’s Law, which states that the architecture of a system reflects the communication structure of the organization that created it. If architecture mirrors the organizational structure, then having a solid structure is crucial, right? Jan Bosch explores this idea in his article "Structure Eats Strategy". He posits that many organizations prevent their growth due to ineffective organizational structures, communication, and distribution of responsibilities.

Bosch identifies four fundamental building blocks within any organization:
📍Business (B): Defines business strategy, revenue, and innovations.
📍Architecture (A): Establishes the technology stack and system structure.
📍Processes (P): Outlines activities and ways of working.
📍Organization (O): Defines departments, teams, and responsibilities.

Most organizations operate in an OPAB model (organization-first), meaning that the existing organizational structure dictates processes, leading to an accidental architecture shaped by the company's past.

Bosch argues that the BAPO model (business-first) is significantly more effective. This model begins with defining the business strategy, which shapes the architecture. The architecture then forms the processes and tools, and only after these elements are in place is the organizational structure defined.

The article offers a valuable framework for understanding the relationship between organizational structure and architecture. Structure should be a tool to help the business achieve its goals and guide architecture in the right direction for the future. Unfortunately, there are many examples where existing team boundaries lead to flawed architectural decisions. This results in support overheads, lower maintainability, reinvented wheels, and higher costs for the business.

#architecture #management
1🔥1💯1
Pinterest Ad System Redesign

A recent article on the Pinterest Engineering blog, noscriptd "Redesigning Pinterest’s Ad Serving Systems with Zero Downtime," discusses the complete overhaul of Pinterest's recommendation system.
The ad-serving system is crucial to Pinterest's business, serving as the primary revenue channel. The previous system, known as "Mohawk," was implemented in 2014. However, as the company grew, Mohawk accumulated technical debt and complexity, making incident troubleshooting and bug analysis increasingly challenging.

The following architectural issues were identified as motivations for the reimplementation:
📍High coupling between business and infrastructure logic
📍Lack of modularization and ownership: Code from the same packages and files was owned by different teams.
📍No guarantees of data integrity
📍Unsafe multi-threading: Lack of error handling and potential race conditions.

The new design principles include:
📍Extensibility: Enable the addition of new features and the deprecation of old ones.
📍Separation of Concerns: Organize logic into separate modules, each owned by the appropriate teams.
📍Safety: Ensure the safe use of concurrency and enforce data integrity rules.
📍Development Velocity: Provide well-supported development environments and facilitate easy testing and debugging.

The team spent 2 years implementing the new version of the recommendation system - AdMixer. Rewritten in Java, like other Pinterest services, this new system helps unify the technological stack. It has been in production for three quarters now without significant outages. Additionally, it was designed to run on AWS Graviton instances (ARM architecture) to reduce AWS usage costs (more about that in Multi-Arch Images post ).

What I can say is that a complete system rewrite is a very expensive endeavor. It would be interesting to know the costs of this two-year project, as not all companies can afford such an exercise. An eight-year lifespan is not particularly long for a software system, indicating that serious architectural mistakes and unresolved technical debt prevented the previous system from evolving effectively.

Let's be honest: engineers often prefer to rewrite a system, especially one created by others😀. It’s fun, cool, and easier, and it makes a nice addition to a resume. However, it is far more challenging to build a system with the right architectural principles that support evolutionary rather than revolutionary development (do not confuse it with over-engineering!).

In any case, the Pinterest engineering team did a great job: they implemented a new approach, successfully delivered it to production, accelerated business features delivery, and increased developers satisfaction. I hope the new service incorporates lessons learned and it will be extensible for future growth and business needs without requiring a full rewrite.

#architecture #usecase #refactoring
4
Modern Software Engineering

Finally, I’ve finished reading Modern Software Engineering by D. Farley 📚. It took me longer than expected, as I spent a lot of time on the first two parts of the book, which focus more on general concepts rather than practical advice, making the initial experience a bit tedious for me.

Let's dive into the book to see what it's about. It is divided into four major sections:

Part I: What is Software Engineering?
The author provides a definition of software engineering that I find really great:
Software engineering is the application of an empirical, scientific approach to finding efficient, economic solutions to practical problems in software.

Each word in this definition is significant. Reflecting on them, the author concludes that we must manage the complexity of the systems we create and continuously learn new things and adapt to them.
Engineering is not just coding—it includes all aspects needed to create software: processes, tools, and culture. We should apply engineering practices everywhere to make effective decisions. And we need metrics to measure the effectiveness of those decisions. The author suggests the following metrics:
📍Stability: change failure rate, recovery time
📍Throughput: lead time (from idea to production), deployment frequency

Part II: Optimize for Learning
To effectively learn and adapt to changes, we need to:
Working iteratively
📍Employing fast, high-quality feedback
📍Working incrementally
📍Being experimental
📍Being empirical
As you can see, these principles align closely with modern agile practices.

Part III: Optimize to Manage Complexity
The fundamental principles for managing complexity include:
📍Modularity
📍Cohesion
📍Separation of concerns
📍Abstraction
📍Loose coupling
While these concepts are familiar to most programmers, the book provides a valuable review with practical examples and techniques for better understanding and application.

Part IV: Tools to Support Engineering in Software
To apply an engineering approach in practice, the following tools are suggested:
📍Testability
📍Deployability
📍Speed
📍Controlling variables
📍Continuous delivery
The author discusses specific tools and how they relate to fundamental principles. For example, achieving testability requires good abstraction, modularity, and separation of concerns.

For me, the book doesn't introduce new concepts but rather reinforces my current understanding of how we should engineer our systems. To conclude this review, I’d like to share a quote from the book that serves as a good rule for daily work:
If our ‘software engineering’ practices don’t allow us to build better software faster, then they aren't really engineering


#architecture #engineering #booknook
3
software_engineering.jpg
626.4 KB
And now there is a more entertaining part of the book review: a sketchnote🔥 to help simplify and remember what the book is about.

#architecture #engineering #booknook #sketchnote
🔥52👍1
The Cornell Note Taking System

When I study, I like to take notes and write summaries about what I’ve learned. This helps me rethink what I’ve read and provides a short denoscription of useful ideas that I can revisit from time to time.

Recently, I discovered the Cornell Note-Taking System. It was developed by Walter Pauk, a professor at Cornell University, in the 1950s. The system was initially designed to improve the efficiency of college-level students (I wish someone had shown it to me when I was a student!).

The Cornell Note-Taking System is very simple and consists of three parts: notes, keywords/questions, and summary. Here’s how it works:

📍Divide the paper into two columns: the questions column (1/3 of the width of the page) and the notes column (2/3 of the width of the page). Leave 5-7 lines at the bottom of the page for the summary.
📍During study, make notes in the notes section. It’s recommended to keep one line between different statements and two lines between different chapters. This is how you should work during lectures and readings.
📍Add key terms and relevant questions to the questions column. This helps emphasize important points.
📍Write a summary in your own words to describe what you’ve learned. A good tip is to imagine you’re explaining it to one of your friends.

Various studies show that the Cornell Note-Taking System is more effective than other styles when students need to apply learned knowledge in practice. I’ve tested this approach and can confirm that it works much better than standard note-taking. It requires to rework the material, which improves the level of retention and understanding. Now it’s my primary method for taking notes.

#softskills #productivity
3👍1
Here is the template for the Cornell Note-Taking System. I’ve also attached a sample of my notes from the book "Modern Software Engineering." In my sample, I included a summary for the entire book at the end of the notes rather than on each page. Nevertheless, it provides a good sense of how it works🙂.

#softskills #productivity
2👍1
Practical Architecture

I highly recommend "Practical (a.k.a. Actually Useful) Architecture" video by Stefan Tilkov where the author presents 10 recommendations for pragmatic architectural work:

✏️Choose your perspective(s) consciously. The author identifies 3 main architecture perspectives:
- Business domain: How the system is divided into logical modules (boxes).
- Integration: How different modules communicate with each other.
- Technology: The technology choices within the boxes.
These perspectives are usually maintained by different people, changed with different frequencies and for different reasons. It's crucial to focus on the right perspective so that decisions at one level don't negatively impact another. For example, a framework usage shouldn't affect integration contracts to prevent dependencies on the wrong level for the wrong reason.
✏️Explicitly architect your team setup. Define team topology based on the domain modules.
✏️Match your organizational setup to project size:
- Single team. Some tasks are architectural with no explicit architecture roles.
- Small number of collaborating teams. Hold an architecture committee meeting once a week with members from each team.
- Large number of independent teams. Establish explicit architecture roles, possibly forming a separate architecture team.
✏️Don't be afraid to decide things centrally. Some decisions need to be made centrally:
- What is centralized vs. what is left to individual teams.
- Team responsibilities.
- Any aspects relevant to more than one team.
- Global policies and strategic aspects.
✏️Pick your battles wisely. Avoid trying to address everything simultaneously. Focus on the most important aspect first and plan other aspects for later.
✏️Enforce the least viable amount of rules, rigidly. Provide teams with the autonomy they need to get things done, but define clear boundaries to avoid chaos. For example, while the choice of programming language can be a team decision, API standards should be centralized. Define a very few critical rules but enforce them strictly.
✏️Balance prenoscriptive vs. denoscriptive architecture. The goal of architectural work is to produce decisions, not just documentation. Good decisions might not satisfy everyone, but decisions that do are often bad. Make firm decisions and encourage others to follow them.
✏️Don't aim for perfection - iterate. Architecture is not about creating a snapshot in time but about managing a series of changes. Build systems so they can evolve over time and continuously re-evaluate past decisions.
✏️Architect for delivery flow as much as for runtime quality. Design both the runtime environment and the development pipeline. The quality of a system's architecture is proportional to the number of bottlenecks limiting its evolution, development, and operations. More bottlenecks mean a worse architecture.
✏️Be boring & do more with less. Prefer simple, straightforward solutions to overly clever and complicated cool approaches.

The video offers excellent advice and practical recommendations, with nearly every idea and statement being truly significant. I'll conclude the review with a quote that I really like:
If you get architecture wrong, everything breaks down - yet even if you get it right, success is not guaranteed


#architecture
👍3
Additionally, I want to share examples of how to define rules and responsibilities from the previous video overview. I believe it's a valuable tool to use in practice.

#architecture #tools
1
Architecture Without an End State

`Architecture Without an End State` is a concept introduced by M. Nygard. It describes architecture as a mix of past decisions, the current state, and future vision. This means transitioning a system from one architecture to another isn't possible; instead, iterative steps must be taken toward current business goals. But business goals and requirements may change over time, and this series of changes is endless.

Nygard suggests 8 rules for designing systems that adapt to change:

✏️Embrace Plurality: Avoid creating a single source of truth. Each system should have its own entity model with its own related data. Different representations are connected through a system of global identifiers (URNs and URIs). Assume an open world that can be extended anytime: new entities will be added, new integrations will appear, and new consumers will start using the system.
✏️Contextualize Downstream: Upstream changes can affect all downstream systems. Minimize the impact by localizing the context (since business rules have different interpretations and requirements across business units) and reducing the number of entities that all systems need to know about.
✏️Beware Grandiosity: Avoid enterprise modeling attempts and global object model creation, as they are often too complex to be practical. Instead, narrow the scope and build a model for a particular problem or area. Start small and make incremental changes based on feedback.
✏️Decentralize: This allows to make local optimization of resources and budgets.
✏️Isolate Failure Domains: Use modularity to define proper system boundaries and isolate failures. Resilient architecture admits componentwise changes
✏️Data Outlives Applications: Data lives longer than technology and specific applications. Use a hexagonal architecture (ports & adapters) to separate business logic from data storage.
✏️Applications Outlive Integrations: Integrations with other systems will change over time. Build a hexagonal architecture so that all systems are connected via explicit boundaries, making communication a boundary problem rather than a domain layer problem.
✏️Increase Discoverability of Information: Poor information flow within an organization leads to duplicate system functions and reinventing the wheel between different teams. Improving information discovery reduces costs of development (fewer duplicates). Some ideas include internal blogs, open code repositories, and modern search engines.

The only true end state for a system is when the company shuts down and all services are turned off. Main recommendation from the author is to stop chasing the end state; it's not achievable. Continuous adaptation is the most effective way to build a system, and these rules can help guide that process.

#architecture #systemdesign
1
Creative Thinking

According to the World Economic Forum Survey creative thinking is the most demanded skill in 2024. But is creativity important in software development?

Definitely yes. Robert E. Franken, in his book “Human Motivation,” defines creativity as
the tendency to generate or recognize ideas, alternatives, or possibilities that may be useful in solving problems, communicating with others, and entertaining ourselves and others

In our work, we frequently generate ideas to meet business requirements and solve technical issues (and occasionally to entertain others too! 😄). The more creative we are, the more ideas we can produce that lead to better and more efficient solutions.

The human brain generates new ideas based on existing knowledge, connecting it in innovative ways. This means the foundation for creativity is so-called `visual experience`: the more different elements we have in our minds, the more ideas we can produce.

In software development, it's essential to continually learn new things, even if they don't directly relate to our current technology stack or tasks. While it may seem pointless to study something unrelated to our immediate work because we might forget it quickly, this isn't entirely true. New knowledge serves as building blocks for generating new ideas. Designers even have a practice to study the works of others to develop a sense of good quality before creating their own style. Similarly, in software development, studying various use cases and technologies broadens our mind and enhances our visual experience.

Like any other skill, creativity can be trained. Many exercises to train creativity can be found on the Internet. Here are a few examples to start with:
✏️Study real use cases and patterns from your area of professional interest to expand your visual experience and sense of quality.
✏️Extend your standard subnoscriptions with content unrelated to your primary interests, such as science, drawing, medicine, etc. This helps introduce fresh ideas from other fields.
✏️ Select a simple object like a ruler, pen, or paper clip, and spend 10 minutes brainstorming as many alternative uses for it as possible. Try this exercise daily for a month and track whether the number of generated ideas increases.

To summarize, creativity is a valuable skill that need to be trained to enhance problem-solving abilities and boost professional efficiency.

#softskills #creativity
1🔥1
UUID vs ULID

UUID (Universally Unique Identifier) and ULID (Universally Unique Lexicographically Sortable Identifier) are unique identifier formats widely used in software development. While both are based on a 128-bit length and random generation, they differ fundamentally in structure and use cases.

UUID Key Implementation Details:
- Represented as a 32-character hexadecimal string. Example: 8ba8b814-9dad-11d1-80b4-00c04fd430c8
- Typically generated using the current time and a random number, but may include other input parameters
- Has 8 versions with different generation logic (e.g., v1 uses a MAC address and timestamp, while v4 uses only random or pseudorandom numbers), v4 is the most popular now
- Has an official RFC

ULID Key Implementation Details:
- Represented as a 26-character string. Example: 01ARZ3NDEKTSV4RRFFQ69G5FAV
- Generated based on the current time and a random number
- Common structure: 48 bits timestamp, 80 bits random number
- Lexicographically sortable, meaning they can be sorted in alphabetical order
- No official standard, but there is a spec on GitHub

Key Differences
✏️ Length: ULID is shorter.
✏️ Generation Speed: Some sources claim ULID is generated faster, but this depends on the implementation and technology used.
✏️ URL Safety: ULID is URL-safe without any encoding or escaping.
✏️ Sortability: ULID is sortable, which is useful for quickly sorting and searching large numbers of identifiers, it can be really efficient for data partitioning cases.
✏️ Security: UUID is generally safer from a security perspective as it is less predictable.
✏️ Adoption: UUID is widely adopted and can be generated at various levels (database, application).

Ultimately, the best choice depends on specific project requirements. Consider the importance of uniqueness, sortability, performance, and string length to make a final decision.

#architecture #systemdesign
31👍1
Snowflake ID

One more popular algorithm for generating IDs in distributed systems is Snowflake.
It was initially created by Twitter (X) to generate IDs for tweets.

In 2010, Twitter migrated their infrastructure from MySQL to Cassandra. Since Cassandra doesn’t have an in-built id generator, the team needed an approach for ID generation that met the following requirements:
- Generate tens of thousands of ids per second in a highly available manner
- IDs need to be roughly sortable by time, the accuracy of the sort should be about 1 second (tweets within the same second are not sorted)
- ID have to be compact and fit into 64 bits

As a solution Snowflake service was introduced that generates IDs with the following structure:
✔️ Sign bit: The first bit is always 0 to keep the ID a positive integer.
✔️ Timestamp (41 bits): Time when ID was generated
✔️ Node ID (10 bits): Unique identifier of the worker node generating the ID
✔️Step (12 bits): A counter that is incremented for each ID generated within the same timestamp

Snowflake IDs are sortable by time, because they are based on the time they were created. Additionally, the creation time can be extracted from the ID itself. This can be used to get objects that were created before or after a particular date.

There is no official standard for Snowflake approach, but there are several implementations available on Github. The approach has also been adopted by major companies like Discord, Instagram and Mastodon.

#architecture #systemdesign
👍2🔥2
GeoSharding in Tinder

Let’s continue improving our visual experience in system design. Today we’ll check the Tinder search architecture. I hope everyone knows what Tinder is. One of its main components is the search with real-time recommendations. Initial implementation was based on a single Elasticsearch cluster with only 5 shards. Over time the number of shards grew and more replicas were added until the system met its scaling limits. In 2019 this led to a decision to re-architect the component to satisfy new performance requirements.

Main Challenges:
📍 Location-Based Search with a maximum distance of 100 miles. For example, when serving a user in California, there is no need to include the users in London.
📍 Performance: index size growth decreases performance linearly. Multiple smaller indexes demonstrated better performance results.

Decisions Made:
✏️ Split Data: Storing users who are physically near each other in the same geoshard (a Tinder-specific term for their sharding implementation).
✏️ Limit Numbers of Geoshards: 40–100 geoshards globally results a good balance of P50, P90, and P99 performance under average production load.
✏️ Use the Google S2Geometry library to work with geo data:
- The library is based on the Hilbert curve: two points that are close on the Hilbert curve are close in physical space.
- It allows hierarchical decomposition of the sphere into "cells", each cell can also be decomposed on smaller cells.
- Each smallest cell represents a small area of the earth.
- The library provides built-in functionality for location mapping.
- S2 supports different-sized cells, ranging from square centimeters to miles.
✏️ Balance Geoshards: Not all locations have the same population density. So it’s needed to define the proper size of the shard and balance data to avoid a hot-shard issue. S2 cells were scored and combined to the geoshards, as a result each geoshard can have a different number of cells of the same size.
✏️ Mapping: Create a mapping between geoshards and S2 cells. For queries, the data service gets S2 cells to cover the query circle using S2 library, then map all the S2 cells to geoshards using the shard mapping.

It’s reported that the new approach improved performance by 20 times compared to the previous implementation with a single index setup. More importantly, it has the capacity to scale more in the future by extending the number of the geoshards. For me it was really interesting to read about location-based search approach, S2Geometry library and concepts under its implementation.

#architecture #systemdesign #scaling #usecase
👍3🔥2❤‍🔥1