Linear Regression
Linear regression is the simplest supervised ML model that finds relationships between features and labels.
Mathematically it looks like:
where
- y' - predicted value
- b - bias (calculated during training)
- wn - weight for a feature (calculated during training)
- xn - feature value (input to the model)
Loss for that type of model is usually calculated as a mean squared error(MSE) or mean absolute error (MAE):
- MSE is sensitive to outliers and adjusts the model toward them.
- MAE minimizes the absolute differences, making it less sensitive to outliers.
Training steps:
1. Calculate the loss with the current weight and bias.
2. Determine the direction to move the weights and bias that reduce loss.
3. Move the weight and bias values a small amount in the direction that reduces loss.
4. Return to step one and repeat the process until the model can't reduce the loss any further.
Example:
The model needs to predict taxi ride prices based on features like distance and ride duration. Past ride prices can be used as labels.
The model formula:
The goal is to find values for b, w1, and w2 that minimize the MSE for the given labels. A well-trained model should converge after limited number of iterations, where the loss cannot be optimized anymore.
Use Cases:
✏️ Predicting Outcomes. Forecast values based on multiple inputs, e.g., taxi fares, apartment rentals, or flight prices.
✏️ Discovering Relationships. Reveal how variables are related and how changes in one variable affect the whole result.
✏️ Processes Optimizations. Optimize processes by understanding the relationships between different factors.
Studying linear regression made me realize why I learned linear algebra and statistics at university 😄. I really had some fun with the math and dynamic examples.
References:
- Google ML Crash Course: Linear Regression
- Understanding Multiple Linear Regression in ML
#aibasics
Linear regression is the simplest supervised ML model that finds relationships between features and labels.
Mathematically it looks like:
y'=b+w1*x1 + w2*x2 + ... + wn*xn
where
- y' - predicted value
- b - bias (calculated during training)
- wn - weight for a feature (calculated during training)
- xn - feature value (input to the model)
Loss for that type of model is usually calculated as a mean squared error(MSE) or mean absolute error (MAE):
- MSE is sensitive to outliers and adjusts the model toward them.
- MAE minimizes the absolute differences, making it less sensitive to outliers.
Training steps:
1. Calculate the loss with the current weight and bias.
2. Determine the direction to move the weights and bias that reduce loss.
3. Move the weight and bias values a small amount in the direction that reduces loss.
4. Return to step one and repeat the process until the model can't reduce the loss any further.
Example:
The model needs to predict taxi ride prices based on features like distance and ride duration. Past ride prices can be used as labels.
The model formula:
y'=b+w1*distance + w2*ride_duration
The goal is to find values for b, w1, and w2 that minimize the MSE for the given labels. A well-trained model should converge after limited number of iterations, where the loss cannot be optimized anymore.
Use Cases:
✏️ Predicting Outcomes. Forecast values based on multiple inputs, e.g., taxi fares, apartment rentals, or flight prices.
✏️ Discovering Relationships. Reveal how variables are related and how changes in one variable affect the whole result.
✏️ Processes Optimizations. Optimize processes by understanding the relationships between different factors.
Studying linear regression made me realize why I learned linear algebra and statistics at university 😄. I really had some fun with the math and dynamic examples.
References:
- Google ML Crash Course: Linear Regression
- Understanding Multiple Linear Regression in ML
#aibasics
❤2
Visualization of how different loss functions can change model training results. As mentioned above MSE moves the model more toward the outliers, while MAE doesn't.
#aibasics
#aibasics
👍2
Minimum Viable Architecture
There is no one-size-fits-all architecture for all scales for all project phases. Architecture should evolve with the product and it should be adopted to the requirements at different stages of product lifecycle.
That's the main idea from Randy Shoup talk - Minimum Viable Architecture. He calls this approach “just enough architecture”- the architecture that's good enough for the product to be released at current project stage.
Product Stages and Their Architecture:
📍 Prototyping.
- Goal: proof business concept, test the market and acquire first customers.
- Rapid iterations, a lot of prototyping.
- Technology doesn't matter, use any tools that get results fast.
- No architecture
- Single team
📍 Starting.
- Goal: solve customer needs as cheap as possible, acquire more customers.
- Rapid learning and iterations.
- Use simple, familiar tech stack
- Typically monolith architecture with a single database
- Rely on cloud infrastructure and open-source tools.
- Focus on competency growth, outsource everything else.
- Number of teams grows.
📍 Scaling.
- Goal: stay ahead of rapidly growing business.
- Time to rearchitect: "Getting to rearchitect a system is a sign of success, not failure."
- Build scalable architecture, focus on latency and performance
- Perform migration from monolith to microservices
- Scale team numbers
📍 Optimizing.
- Goal: make a system more sustainable, efficient and effective.
- Focus on small, incremental improvements.
- No major architectural changes.
- Improve operational efficiency.
- Consolidate the teams
I like the idea of matching architecture to business priorities and not overcomplicating the solution on early stages. The talk also shares some tips when rearchitecturing is really needed and how to do it without breaking the existing solution. Some ideas and recommendations about architecture looks too dogmatic for me, but overall the talk is really good ad I recommend to check the full video.
#architecture
There is no one-size-fits-all architecture for all scales for all project phases. Architecture should evolve with the product and it should be adopted to the requirements at different stages of product lifecycle.
That's the main idea from Randy Shoup talk - Minimum Viable Architecture. He calls this approach “just enough architecture”- the architecture that's good enough for the product to be released at current project stage.
Product Stages and Their Architecture:
📍 Prototyping.
- Goal: proof business concept, test the market and acquire first customers.
- Rapid iterations, a lot of prototyping.
- Technology doesn't matter, use any tools that get results fast.
- No architecture
- Single team
📍 Starting.
- Goal: solve customer needs as cheap as possible, acquire more customers.
- Rapid learning and iterations.
- Use simple, familiar tech stack
- Typically monolith architecture with a single database
- Rely on cloud infrastructure and open-source tools.
- Focus on competency growth, outsource everything else.
- Number of teams grows.
📍 Scaling.
- Goal: stay ahead of rapidly growing business.
- Time to rearchitect: "Getting to rearchitect a system is a sign of success, not failure."
- Build scalable architecture, focus on latency and performance
- Perform migration from monolith to microservices
- Scale team numbers
📍 Optimizing.
- Goal: make a system more sustainable, efficient and effective.
- Focus on small, incremental improvements.
- No major architectural changes.
- Improve operational efficiency.
- Consolidate the teams
I like the idea of matching architecture to business priorities and not overcomplicating the solution on early stages. The talk also shares some tips when rearchitecturing is really needed and how to do it without breaking the existing solution. Some ideas and recommendations about architecture looks too dogmatic for me, but overall the talk is really good ad I recommend to check the full video.
#architecture
YouTube
Minimum Viable Architecture • Randy Shoup • YOW! 2022
This presentation was recorded at YOW! 2022. #GOTOcon #YOW
https://yowcon.com
Randy Shoup - VP Engineering & Chief Architect at eBay @randyshoup46
RESOURCES
https://twitter.com/randyshoup
https://linkedin.com/in/randyshoup
https://medium.com/@randyshoup…
https://yowcon.com
Randy Shoup - VP Engineering & Chief Architect at eBay @randyshoup46
RESOURCES
https://twitter.com/randyshoup
https://linkedin.com/in/randyshoup
https://medium.com/@randyshoup…
❤1
Today, I’m starting a topic with a picture first.
That's a D. Caruso Mood Map, a tool widely used in emotional intelligence techniques. The tool maps all our emotions on the grid with two scales:
- One scale is for a level of energy (low to high).
- The other scale is for a level of pleasantness (unpleasant to pleasant).
Explanation of how to use that will be shared in the next post 👇
#softskills #leadership
That's a D. Caruso Mood Map, a tool widely used in emotional intelligence techniques. The tool maps all our emotions on the grid with two scales:
- One scale is for a level of energy (low to high).
- The other scale is for a level of pleasantness (unpleasant to pleasant).
Explanation of how to use that will be shared in the next post 👇
#softskills #leadership
❤1
The Mood Map
Let's check what is interesting in that tool and how it can be used in our daily life and work.
✏️ Emotions Mapping. That's ability to recognize emotions in yourself, your colleagues, and your partners. One of the helpful techniques is mirroring — matching the other person’s speech rate and tone of voice. On the neurophysiology level it means “This person is like me!”, that makes communication more pleasant, gives the sense of safety and increases the chances to reach agreements.
✏️ Task Selection. The Mood Map helps to choose tasks for yourself or your team based on current emotional states. For example, anxiety can sharpen focus, happiness and joy are good for creativity, contentment improves chances to come to a consensus. The key idea is to either pick tasks that match your current state or shift your state to suit the task. This applies to the teams too: "If you have a brainstorming session and the team seems anxious, that’s not a good match. As a leader, you either have to change the tone of the room or change the agenda to match the tone".
✏️ Understanding. What makes you happy might not make someone else happy That's important to understand and learn what motivates and inspires your team members. At the same time, emotions have universal reasons. If you understand the root of someone’s behavior, you can address it effectively.
✏️ Changing Emotions. Agreements are hard to reach if you and the other person are in different quadrants of the Mood Map. Ideally, everyone needs to move to the `green` quadrant to reach a consensus. However, jumping directly from `
I used to be skeptical about emotional intelligence techniques, but that tool looks to be helpful and practical.
Additional trick there: during complex conversations, if emotions are escalating, pause and ask yourself, “What am I feeling right now? Why?”. Reflection helps to shift your brain from the emotional side to the logical side. Once you’re back in a logical state, you can better manage the situation and improve your chances of success.
References:
- David Caruso Youtube Channel
- Can emotional intelligence be learned?
- Emotional Intelligence in a Changing World
#softskills #leadership #communications #productivity
Let's check what is interesting in that tool and how it can be used in our daily life and work.
✏️ Emotions Mapping. That's ability to recognize emotions in yourself, your colleagues, and your partners. One of the helpful techniques is mirroring — matching the other person’s speech rate and tone of voice. On the neurophysiology level it means “This person is like me!”, that makes communication more pleasant, gives the sense of safety and increases the chances to reach agreements.
✏️ Task Selection. The Mood Map helps to choose tasks for yourself or your team based on current emotional states. For example, anxiety can sharpen focus, happiness and joy are good for creativity, contentment improves chances to come to a consensus. The key idea is to either pick tasks that match your current state or shift your state to suit the task. This applies to the teams too: "If you have a brainstorming session and the team seems anxious, that’s not a good match. As a leader, you either have to change the tone of the room or change the agenda to match the tone".
✏️ Understanding. What makes you happy might not make someone else happy That's important to understand and learn what motivates and inspires your team members. At the same time, emotions have universal reasons. If you understand the root of someone’s behavior, you can address it effectively.
✏️ Changing Emotions. Agreements are hard to reach if you and the other person are in different quadrants of the Mood Map. Ideally, everyone needs to move to the `green` quadrant to reach a consensus. However, jumping directly from `
red` to green is almost impossible. Instead, you can guide someone through smaller transitions, like red -> blue -> green. For example, if someone is in the red quadrant, speaking slowly and calmly can help reduce emotional intensity and shift them toward blue.I used to be skeptical about emotional intelligence techniques, but that tool looks to be helpful and practical.
Additional trick there: during complex conversations, if emotions are escalating, pause and ask yourself, “What am I feeling right now? Why?”. Reflection helps to shift your brain from the emotional side to the logical side. Once you’re back in a logical state, you can better manage the situation and improve your chances of success.
References:
- David Caruso Youtube Channel
- Can emotional intelligence be learned?
- Emotional Intelligence in a Changing World
#softskills #leadership #communications #productivity
🔥4👍1🥰1
Binary Data Classification
In the previous #aibasics post, I briefly explained the basics of machine learning with Linear Regression. Today let's talk about another type of task - binary data classification. Typical example is determining whether an email is spam or not spam.
Key steps for binary classification:
1. Predict Probability. Take a Logistic Regression model that predict probability (mathematically it returns values between 0 and 1). For example, the probability of an input email being either spam or not spam. If the model predicts 0.72, this means there is a 72% chance the email is spam and 28% chance the email is not spam.
2. Set a Classification Threshold . The classification threshold determines how to assign a binary label (e.g., spam or not spam) based on the predicted probability. For example, the model predicts that a given email has a 75% chance of being spam. Does it mean the email is spam? Actually, no. If the threshold is set at 0.8, then email will be classified as not spam.
3. Evaluate the Model Using a Confusion Matrix. To measure how good our model is, we need to summarize the number of correct and incorrect predictions using confusion matrix:
- True Positive (TP): Correctly predicted positive cases.
- False Negative (FN): Positive cases incorrectly predicted as negative.
- False Positive (FP): Negative cases incorrectly predicted as positive.
- True Negative (TN): Correctly predicted negative cases.
4. Measure Classification Quality. The following metrics are used to define the effectiveness of the result model:
- Accuracy. The proportion of all classifications that were correct, whether positive or negative.
- Recall. The proportion of all actual positives that were classified correctly as positives.
- False Positive Rate. The proportion of all actual negatives that were classified incorrectly as positives.
- Precision. The proportion of all the model's positive classifications that are actually positive
The classification threshold and quality metrics should be adjusted based on the cost of errors for particular domain. If marking important emails as spam is costly, you may increase the threshold to reduce false positives. Conversely, if missing spam emails is more problematic, you may lower the threshold to prioritize catching them.
References:
- Google ML Course: Logistic Regression
- Google ML Course: Classification
- Confusion matrix in machine learning
#aibasics
In the previous #aibasics post, I briefly explained the basics of machine learning with Linear Regression. Today let's talk about another type of task - binary data classification. Typical example is determining whether an email is spam or not spam.
Key steps for binary classification:
1. Predict Probability. Take a Logistic Regression model that predict probability (mathematically it returns values between 0 and 1). For example, the probability of an input email being either spam or not spam. If the model predicts 0.72, this means there is a 72% chance the email is spam and 28% chance the email is not spam.
2. Set a Classification Threshold . The classification threshold determines how to assign a binary label (e.g., spam or not spam) based on the predicted probability. For example, the model predicts that a given email has a 75% chance of being spam. Does it mean the email is spam? Actually, no. If the threshold is set at 0.8, then email will be classified as not spam.
3. Evaluate the Model Using a Confusion Matrix. To measure how good our model is, we need to summarize the number of correct and incorrect predictions using confusion matrix:
- True Positive (TP): Correctly predicted positive cases.
- False Negative (FN): Positive cases incorrectly predicted as negative.
- False Positive (FP): Negative cases incorrectly predicted as positive.
- True Negative (TN): Correctly predicted negative cases.
4. Measure Classification Quality. The following metrics are used to define the effectiveness of the result model:
- Accuracy. The proportion of all classifications that were correct, whether positive or negative.
- Recall. The proportion of all actual positives that were classified correctly as positives.
- False Positive Rate. The proportion of all actual negatives that were classified incorrectly as positives.
- Precision. The proportion of all the model's positive classifications that are actually positive
The classification threshold and quality metrics should be adjusted based on the cost of errors for particular domain. If marking important emails as spam is costly, you may increase the threshold to reduce false positives. Conversely, if missing spam emails is more problematic, you may lower the threshold to prioritize catching them.
References:
- Google ML Course: Logistic Regression
- Google ML Course: Classification
- Confusion matrix in machine learning
#aibasics
🔥2
Really nice demonstration how threshold affects prediction results (from Google ML Crash Course).
#aibasics
#aibasics
👍2
DORA: Measuring Delivery Performance
If you're interested in understanding how to measure the quality of your software delivery processes, then you've probably heard about DORA. DORA is a DevOps Research and Assessment project from Google that studies what helps teams to improve software delivery and operations performance.
DORA identified four software delivery metrics:
✏️ Deployment Frequency: How often an organization successfully releases to production.
✏️ Lead Time for Changes: The amount of time it takes a commit to get into production.
✏️ Change Failure Percentage: The percentage of deployments that cause failures in production and require hotfixes or rollbacks.
✏️ Failed Deployment Recovery Time: The time it takes to recover from a failed deployment. A lower recovery time indicates a more resilient and responsive system.
DORA’s research demonstrated that speed and stability are not tradeoffs, these metrics are correlated for most teams: top performers do well across all four metrics. The challenge is often to collect fragmented data across different devops tools, but there are some open-source solutions that can simplify the process.
The main pitfall in using DORA's delivery metrics is to set those metrics as the main goal of the teams work. Instead, think of them as a way to measure progress and guide the improvements.
#engineering #devops #delivery
If you're interested in understanding how to measure the quality of your software delivery processes, then you've probably heard about DORA. DORA is a DevOps Research and Assessment project from Google that studies what helps teams to improve software delivery and operations performance.
DORA identified four software delivery metrics:
✏️ Deployment Frequency: How often an organization successfully releases to production.
✏️ Lead Time for Changes: The amount of time it takes a commit to get into production.
✏️ Change Failure Percentage: The percentage of deployments that cause failures in production and require hotfixes or rollbacks.
✏️ Failed Deployment Recovery Time: The time it takes to recover from a failed deployment. A lower recovery time indicates a more resilient and responsive system.
DORA’s research demonstrated that speed and stability are not tradeoffs, these metrics are correlated for most teams: top performers do well across all four metrics. The challenge is often to collect fragmented data across different devops tools, but there are some open-source solutions that can simplify the process.
The main pitfall in using DORA's delivery metrics is to set those metrics as the main goal of the teams work. Instead, think of them as a way to measure progress and guide the improvements.
#engineering #devops #delivery
dora.dev
DORA | Get Better at Getting Better
DORA is a long running research program that seeks to understand the capabilities that drive software delivery and operations performance. DORA helps teams apply those capabilities, leading to better organizational performance.
❤1
DORA 2024 Report
Last time I wrote about DORA Key Delivery Metrics and today I want to share key trends from DORA 2024 State of DevOps Report:
✏️ Artificial Intelligence (AI) Adoption. Report shows increasing AI adoption especially for the following tasks:
- Writing code
- Summarizing information
- Explaining unfamiliar code
- Optimizing code
- Documenting code
- Writing tests
- Debugging code
- Data analysis
While AI boosts individual productivity, it has been also linked to a 1.5% reduction in delivery throughput and a 7.2% decrease in delivery stability. It can be explained that teams can produce larger changelists , which in turn increases the complexity of deployments and the risk of failure.
✏️ Platform Engineering. Platform engineering has become a critical discipline for high-performing teams: teams that leverage internal developer platforms saw a 10% increase in team performance and an 8% boost in individual productivity. But at the same time there is 14% decrease in change stability. So platform engineering needs to be carefully implemented to avoid increasing overall pipelines complexity and stability.
✏️ Developer Experience. Report shows that focusing on the user increases productivity and job satisfaction, while reducing the risk of burnout. When this focus on the user is combined with an environment of internal documentation quality, this increase in product performance is amplified.
✏️ Organizational Stability. Prioritizing stability in both technical and operational decisions can lead to higher team productivity and lower burnout.
✏️ Transformational Leadership. Transformational leadership is a model in which leaders inspire and motivate employees to achieve higher performance. A 25% increase in transformational leadership leads to 9% rise in productivity, reduced burnout and improved team and product performance. These leaders encourage their teams through the following dimensions:
- Vision. They have a clear vision of where their team and the organization are going.
- Inspirational Communication. They say positive things about the team; make employees proud to be a part of their organization.
- Intellectual Stimulation. They challenge team members to think about old problems in new ways and to rethink some of their basic assumptions about their work.
- Supportive Leadership. They consider others’ personal feelings before acting; behave in a manner which is thoughtful of others’ personal needs.
- Personal Recognition. They commend team members when they do a above an average job; acknowledge improvement in quality of team members' work.
The report highlights that transformation isn’t a one-time achievement but an ongoing process. Companies that are not continuously improving are actually falling behind. Companies that adopt a mindset of continuous improvement see the highest levels of success.
#engineering #news
Last time I wrote about DORA Key Delivery Metrics and today I want to share key trends from DORA 2024 State of DevOps Report:
✏️ Artificial Intelligence (AI) Adoption. Report shows increasing AI adoption especially for the following tasks:
- Writing code
- Summarizing information
- Explaining unfamiliar code
- Optimizing code
- Documenting code
- Writing tests
- Debugging code
- Data analysis
While AI boosts individual productivity, it has been also linked to a 1.5% reduction in delivery throughput and a 7.2% decrease in delivery stability. It can be explained that teams can produce larger changelists , which in turn increases the complexity of deployments and the risk of failure.
✏️ Platform Engineering. Platform engineering has become a critical discipline for high-performing teams: teams that leverage internal developer platforms saw a 10% increase in team performance and an 8% boost in individual productivity. But at the same time there is 14% decrease in change stability. So platform engineering needs to be carefully implemented to avoid increasing overall pipelines complexity and stability.
✏️ Developer Experience. Report shows that focusing on the user increases productivity and job satisfaction, while reducing the risk of burnout. When this focus on the user is combined with an environment of internal documentation quality, this increase in product performance is amplified.
✏️ Organizational Stability. Prioritizing stability in both technical and operational decisions can lead to higher team productivity and lower burnout.
✏️ Transformational Leadership. Transformational leadership is a model in which leaders inspire and motivate employees to achieve higher performance. A 25% increase in transformational leadership leads to 9% rise in productivity, reduced burnout and improved team and product performance. These leaders encourage their teams through the following dimensions:
- Vision. They have a clear vision of where their team and the organization are going.
- Inspirational Communication. They say positive things about the team; make employees proud to be a part of their organization.
- Intellectual Stimulation. They challenge team members to think about old problems in new ways and to rethink some of their basic assumptions about their work.
- Supportive Leadership. They consider others’ personal feelings before acting; behave in a manner which is thoughtful of others’ personal needs.
- Personal Recognition. They commend team members when they do a above an average job; acknowledge improvement in quality of team members' work.
The report highlights that transformation isn’t a one-time achievement but an ongoing process. Companies that are not continuously improving are actually falling behind. Companies that adopt a mindset of continuous improvement see the highest levels of success.
#engineering #news
dora.dev
DORA | Accelerate State of DevOps Report 2024
DORA is a long running research program that seeks to understand the capabilities that drive software delivery and operations performance. DORA helps teams apply those capabilities, leading to better organizational performance.
🔥1
Write Good Error Messages
"Cannot create entity X", "Connection to service Y failed", "Cannot read file Z".
These are typical error messages seen in many systems. And they are extremely bad.
So what's wrong with these messages? They are not actionable, they don't have any details what exactly went wrong (only for connection issue I can easily generate up to 10 reasons), they don't explain users or support engineers what to do next.
Good error message should:
- Be actionable
- Be detailed and clear
- Deliver the best user experience
- Enable users to help themselves
- Reduce support workload
- Speed up issue resolution
Google has a special chapter in their technical writing course about how to write good error messages. So let's check their recommendations:
✏️ Don't fail silently. Failing to report errors is unacceptable. Assume that humans will make mistakes using your software. Try to minimize ways for people to misuse your software, but assume that you can't completely eliminate that.
✏️ Have a common style guide. Examples: Google API Error Handling, Go Error Handling
✏️ Do not swallow the root cause. Generic messages like "Server error" don’t help users understand or fix the issue.
✏️ Fail fast. Report errors as soon as they occur. Raising them later significantly increases debugging costs.
✏️ Identify the cause. Clearly explain what went wrong. Help users understand requirements and constraints. Be specific. Don't assume that users know the limitations of your system.
✏️ Explain how to fix the problem. Create actionable error messages. After explaining the cause of the problem, explain how to resolve it.
Example:
❌ Invalid input.
✅ Enter the pathname of a Windows executable file. An executable file ordinarily ends with the .exe suffix. For example: C:\Program Files\Custom\AppName.exe
Take time to train your team to write good error messages, it improves user experience, reduces support costs and speed up problem resolution.
#engineering #documentation
"Cannot create entity X", "Connection to service Y failed", "Cannot read file Z".
These are typical error messages seen in many systems. And they are extremely bad.
So what's wrong with these messages? They are not actionable, they don't have any details what exactly went wrong (only for connection issue I can easily generate up to 10 reasons), they don't explain users or support engineers what to do next.
Good error message should:
- Be actionable
- Be detailed and clear
- Deliver the best user experience
- Enable users to help themselves
- Reduce support workload
- Speed up issue resolution
Google has a special chapter in their technical writing course about how to write good error messages. So let's check their recommendations:
✏️ Don't fail silently. Failing to report errors is unacceptable. Assume that humans will make mistakes using your software. Try to minimize ways for people to misuse your software, but assume that you can't completely eliminate that.
✏️ Have a common style guide. Examples: Google API Error Handling, Go Error Handling
✏️ Do not swallow the root cause. Generic messages like "Server error" don’t help users understand or fix the issue.
✏️ Fail fast. Report errors as soon as they occur. Raising them later significantly increases debugging costs.
✏️ Identify the cause. Clearly explain what went wrong. Help users understand requirements and constraints. Be specific. Don't assume that users know the limitations of your system.
✏️ Explain how to fix the problem. Create actionable error messages. After explaining the cause of the problem, explain how to resolve it.
Example:
❌ Invalid input.
✅ Enter the pathname of a Windows executable file. An executable file ordinarily ends with the .exe suffix. For example: C:\Program Files\Custom\AppName.exe
Take time to train your team to write good error messages, it improves user experience, reduces support costs and speed up problem resolution.
#engineering #documentation
👍7
Data Vectorization
Let’s talk about one of the most fascinating parts of machine learning - data vectorization.
At first look, it seems like ML models work directly with the raw data we provide. But actually, it doesn't. ML algorithms need a numerical representation of data. Specifically, they require data in the form of floating-point values called feature vector.
However, many features are naturally strings or other non-numerical values. The task is to transform these non-numerical values into numerical ones. That's the main purpose of feature engineering discipline.
Let me illustrate that on vectorization of categorical data.
Categorical data consists of specific, predefined values, like car colors, animal species, days of the week, or city street names. It can be low-dimensional (few possible values) or high-dimensional (many possible values).
For low-dimensional data, we can encode it as a vocabulary. Let’s use car colors as an example with 5 categories for simplicity: white, blue, red, black, and others (for any color not in the list).
To create a vector the following steps should be done:
1. Index each value:
- 0: white
- 1: blue
- 2: red
- 3: black
- 4: others
2. Represent each category as a vector (array) of N elements, where N is the number of categories.
|Feature|White|Blue|Red|Black|Others|
|------------|---------|-------|------|--------|---------|
|White | 1 | 0 | 0 | 0 | 0 |
|Blue | 0 | 1 | 0 | 0 | 0 |
|Red | 0 | 0 | 1 | 0 | 0 |
|Black | 0 | 0 | 0 | 1 | 0 |
|Others | 0 | 0 | 0 | 0 | 1 |
3. Convert to floating point values: replace 1 with 1.0 and 0 with 0.0. For example, the vector for blue would be
Of course, that's very basic example to illustrate the concept. Real-world cases often involve more complex transformations and math models, depending on the data and problem.
More details and vectorization strategies:
- Working with numerical data
- Working with categorical data
- Datasets, generalization, and overfitting
#aibasics
Let’s talk about one of the most fascinating parts of machine learning - data vectorization.
At first look, it seems like ML models work directly with the raw data we provide. But actually, it doesn't. ML algorithms need a numerical representation of data. Specifically, they require data in the form of floating-point values called feature vector.
However, many features are naturally strings or other non-numerical values. The task is to transform these non-numerical values into numerical ones. That's the main purpose of feature engineering discipline.
Let me illustrate that on vectorization of categorical data.
Categorical data consists of specific, predefined values, like car colors, animal species, days of the week, or city street names. It can be low-dimensional (few possible values) or high-dimensional (many possible values).
For low-dimensional data, we can encode it as a vocabulary. Let’s use car colors as an example with 5 categories for simplicity: white, blue, red, black, and others (for any color not in the list).
To create a vector the following steps should be done:
1. Index each value:
- 0: white
- 1: blue
- 2: red
- 3: black
- 4: others
2. Represent each category as a vector (array) of N elements, where N is the number of categories.
|Feature|White|Blue|Red|Black|Others|
|------------|---------|-------|------|--------|---------|
|White | 1 | 0 | 0 | 0 | 0 |
|Blue | 0 | 1 | 0 | 0 | 0 |
|Red | 0 | 0 | 1 | 0 | 0 |
|Black | 0 | 0 | 0 | 1 | 0 |
|Others | 0 | 0 | 0 | 0 | 1 |
3. Convert to floating point values: replace 1 with 1.0 and 0 with 0.0. For example, the vector for blue would be
(0.0, 1.0, 0.0, 0.0, 0.0)Of course, that's very basic example to illustrate the concept. Real-world cases often involve more complex transformations and math models, depending on the data and problem.
More details and vectorization strategies:
- Working with numerical data
- Working with categorical data
- Datasets, generalization, and overfitting
#aibasics
🔥1
Teams with high quality documentation are 2.4 times more likely to improve software delivery performance and meet reliability goals.
We all understand the importance of good documentation. But how often do you feel frustrated when trying to understand how a feature works? trying to find a solution for your problem? checking the source code instead of docs? Probably often.
Let's start with the difference between good and bad documentation. According to DORA, high-quality documentation should:
- Help users achieve their goals.
- Be accurate, up-to-date, and complete.
- Be easy to find, well-organized, and clear.
Tips to improve documentation:
✏️ Document Critical Use Cases. Clear use cases help users understand how to use your systems effectively.
✏️ Create Documentation Guidelines. When team members know when and how to make updates or remove inaccurate information, the team can maintain documentation quality over time. Check out Google’s Documentation Guideline.
✏️ Automate Guidelines Verification. Automate formatting and style checks with tools like Prettier.
✏️ Assign Ownership. Define clear responsibility and ownership for the documentation.
✏️ Define the Audience. Understand who will read your documentation, their background, and their goals.
✏️ Integrate into Development Process. Documentation should be part of your development process and stored close to the code. Documentation that is separate from development becomes dead right after publishing.
✏️ Automate testing for code samples or incomplete documentation.
✏️ Train your team. Teach your team how to write good documentation and explain why it’s important. I strongly recommend to check Google’s technical writing course for developers.
✏️ Recognize Documentation Work. Recognize and reward documentation efforts during performance reviews and promotions. Writing and maintaining documentation is a core part of software engineering work, and treating it as such improves its quality.
Writing good documentation isn’t easy. It requires clear processes, training, automation, and the right team culture. But investing in documentation improves your team’s development speed and overall delivery performance.
#documentation
We all understand the importance of good documentation. But how often do you feel frustrated when trying to understand how a feature works? trying to find a solution for your problem? checking the source code instead of docs? Probably often.
Let's start with the difference between good and bad documentation. According to DORA, high-quality documentation should:
- Help users achieve their goals.
- Be accurate, up-to-date, and complete.
- Be easy to find, well-organized, and clear.
Tips to improve documentation:
✏️ Document Critical Use Cases. Clear use cases help users understand how to use your systems effectively.
✏️ Create Documentation Guidelines. When team members know when and how to make updates or remove inaccurate information, the team can maintain documentation quality over time. Check out Google’s Documentation Guideline.
✏️ Automate Guidelines Verification. Automate formatting and style checks with tools like Prettier.
✏️ Assign Ownership. Define clear responsibility and ownership for the documentation.
✏️ Define the Audience. Understand who will read your documentation, their background, and their goals.
✏️ Integrate into Development Process. Documentation should be part of your development process and stored close to the code. Documentation that is separate from development becomes dead right after publishing.
✏️ Automate testing for code samples or incomplete documentation.
✏️ Train your team. Teach your team how to write good documentation and explain why it’s important. I strongly recommend to check Google’s technical writing course for developers.
✏️ Recognize Documentation Work. Recognize and reward documentation efforts during performance reviews and promotions. Writing and maintaining documentation is a core part of software engineering work, and treating it as such improves its quality.
Writing good documentation isn’t easy. It requires clear processes, training, automation, and the right team culture. But investing in documentation improves your team’s development speed and overall delivery performance.
#documentation
❤3
Embeddings
In the Data Vectorization post, we learned that ML algorithms work with feature vectors. For example, we had a vector for car colors, where a blue car was represented as
Imagine we create feature vectors for meal items in a dataset with 5,000 different elements. Each vector would have 5,000 elements, all set to
To optimize this embedding techniques are used. Embedding is a projection of high-dimensional space of initial data vectors into a lower-dimensional space.
For example, in the meal dataset, we could introduce a feature like "sandwichness" and evaluate how likely an item is a sandwich. A sandwich might have a score
With features like sandwichness, dessertness, and liquidness, the vector for a hotdog might look like
ML practitioners select dimensions based on the task they want to solve. This means that embeddings for the same items can be different depending on the context, task and provided data.
References:
- What are Embeddings in ML?
- ML Google Crash Course: Embeddings
#aibasics
In the Data Vectorization post, we learned that ML algorithms work with feature vectors. For example, we had a vector for car colors, where a blue car was represented as
(0.0, 1.0, 0.0, 0.0, 0.0).Imagine we create feature vectors for meal items in a dataset with 5,000 different elements. Each vector would have 5,000 elements, all set to
0.0 except for one element set to 1.0. This approach would require a high number of weights, a lot of memory and computational resources, making the model inefficient and hard to maintain.To optimize this embedding techniques are used. Embedding is a projection of high-dimensional space of initial data vectors into a lower-dimensional space.
For example, in the meal dataset, we could introduce a feature like "sandwichness" and evaluate how likely an item is a sandwich. A sandwich might have a score
0.99, shawarma 0.9, and soup 0.0. But one feature isn’t enough, so we could add other dimensions, like "dessertness", "vegannes" or "liquidness," to better describe each item.With features like sandwichness, dessertness, and liquidness, the vector for a hotdog might look like
(0.95, 0.8, 0.0). Real-world models use many dimensions, but these vectors are much shorter and more efficient than the original 5,000-element vectors with only 0.0 and 1.0 values.ML practitioners select dimensions based on the task they want to solve. This means that embeddings for the same items can be different depending on the context, task and provided data.
References:
- What are Embeddings in ML?
- ML Google Crash Course: Embeddings
#aibasics
❤1
Happy New Year!
Another year has come to an end.
For me, it was a heavy and challenging year. But without a doubt, I’ve learned a lot, grown, and I am ready to move forward.
I wish you health and inner harmony. Take care of yourself, your family, your friends, and maintain a balance between work and personal life.
May the coming year be better than the last! Dream, plan, achieve, learn, experiment, love, move forward and enjoy every moment!
Happy New Year, Friends!🎄🎄🎄
Another year has come to an end.
For me, it was a heavy and challenging year. But without a doubt, I’ve learned a lot, grown, and I am ready to move forward.
I wish you health and inner harmony. Take care of yourself, your family, your friends, and maintain a balance between work and personal life.
May the coming year be better than the last! Dream, plan, achieve, learn, experiment, love, move forward and enjoy every moment!
Happy New Year, Friends!🎄🎄🎄
🎉9
Why Empathy Matters
Conflicts typically happen when our expectations don’t match with the actions or expectations of others. Here’s the key question: if the expectations are ours, why do we blame someone else for not meeting them? All people are different with different values, priorities, principles, background and life situations. And that’s fine.
Most conflicts might be avoided if we initially discuss shared rules that both sides agreed to follow.
But what should we do if we already missed that part and have a conflict situation?
Here are some tips to follow:
✏️ Focus on understanding, not blaming: Explain your position and try to understand the other person’s point of view. Operate with facts, not emotions and personalities. Look for win-win areas where both sides can come to an agreement and benefit from it.
✏️ Listen and show empathy: Carefully listen to what the other person is saying. Try to understand their motivation, accept their feelings and fears. Remember, that people always act in the best possible way for them. Show that you’re on their side, not against them. Often, people don’t hear each other’s arguments because there’s not enough trust or interest in each other’s opinions.
✏️ Work together to find a solution: When you have an understanding of both sides of the conflict, think about a solution that satisfies the requirements and interests of both parties. Start working together toward selected direction. Collaboration helps to establish trust and achieve better results.
Empathy is a powerful tool. It's more efficient for the business to reach an agreement rather than proving someone is smarter. Remember, conflicts can be very expensive.
#softskills #communications
Conflicts typically happen when our expectations don’t match with the actions or expectations of others. Here’s the key question: if the expectations are ours, why do we blame someone else for not meeting them? All people are different with different values, priorities, principles, background and life situations. And that’s fine.
Most conflicts might be avoided if we initially discuss shared rules that both sides agreed to follow.
But what should we do if we already missed that part and have a conflict situation?
Here are some tips to follow:
✏️ Focus on understanding, not blaming: Explain your position and try to understand the other person’s point of view. Operate with facts, not emotions and personalities. Look for win-win areas where both sides can come to an agreement and benefit from it.
✏️ Listen and show empathy: Carefully listen to what the other person is saying. Try to understand their motivation, accept their feelings and fears. Remember, that people always act in the best possible way for them. Show that you’re on their side, not against them. Often, people don’t hear each other’s arguments because there’s not enough trust or interest in each other’s opinions.
✏️ Work together to find a solution: When you have an understanding of both sides of the conflict, think about a solution that satisfies the requirements and interests of both parties. Start working together toward selected direction. Collaboration helps to establish trust and achieve better results.
Empathy is a powerful tool. It's more efficient for the business to reach an agreement rather than proving someone is smarter. Remember, conflicts can be very expensive.
#softskills #communications
❤3👍3🔥1