Forwarded from AWS Notes (Roman Siewko)
🔥 FREE premium exam prep on AWS Skill Builder until Jan 5, 2026!
https://skillbuilder.aws/
🎓 𝗖𝗼𝘃𝗲𝗿𝘀:
🔸AWS Certified Cloud Practitioner (CLF-C02)
🔸AWS AI Practitioner
💡 𝗪𝗵𝗮𝘁 𝘆𝗼𝘂 𝗴𝗲𝘁 (𝗻𝗼𝗿𝗺𝗮𝗹𝗹𝘆 𝗽𝗮𝗶𝗱):
✅ Official practice exams
✅ Hands-on labs (SimuLearn)
✅ AWS Escape Room (learning by playing)
✅ Flashcards & learning plans
Plus, there are always-free resources:
• Official practice questions
• Free AWS training events
• AWS Educate (labs + potential free exam vouchers)
#AWS_certification
https://skillbuilder.aws/
🎓 𝗖𝗼𝘃𝗲𝗿𝘀:
🔸AWS Certified Cloud Practitioner (CLF-C02)
🔸AWS AI Practitioner
💡 𝗪𝗵𝗮𝘁 𝘆𝗼𝘂 𝗴𝗲𝘁 (𝗻𝗼𝗿𝗺𝗮𝗹𝗹𝘆 𝗽𝗮𝗶𝗱):
✅ Official practice exams
✅ Hands-on labs (SimuLearn)
✅ AWS Escape Room (learning by playing)
✅ Flashcards & learning plans
Plus, there are always-free resources:
• Official practice questions
• Free AWS training events
• AWS Educate (labs + potential free exam vouchers)
#AWS_certification
🔥3
This post compares Amazon EKS Auto Mode and Azure AKS Automatic, evaluating which platform offers a superior managed Kubernetes solution. While acknowledging AWS's progress, the author ultimately argues that AKS Automatic's more comprehensive, end-to-end automation makes it the clear winner for a truly hands-off experience.
https://pixelrobots.co.uk/2024/12/amazon-eks-auto-mode-vs-azure-aks-automatic-the-better-managed-kubernetes-solution/
https://pixelrobots.co.uk/2024/12/amazon-eks-auto-mode-vs-azure-aks-automatic-the-better-managed-kubernetes-solution/
This paper delves into disaster recovery architectures that go beyond simple high availability to ensure systems remain operational even when HA fails. Yakaiah Bommishetti outlines various DR strategies, from cold backups to active-active multi-site setups, emphasizing the critical difference between preventing failures and restoring services after a catastrophe.
https://hackernoon.com/beyond-high-availability-disaster-recovery-architectures-that-keep-running-when-ha-fails
https://hackernoon.com/beyond-high-availability-disaster-recovery-architectures-that-keep-running-when-ha-fails
Hackernoon
Beyond High Availability: Disaster Recovery Architectures That Keep Running When HA Fails
High Availability is not Disaster Recovery. This in-depth guide explores real-world Disaster Recovery architectures.
❤🔥3❤2
DevOps & SRE notes
Cloudflare, again
Will the "Code Orange" help Cloudflare?
https://blog.cloudflare.com/fail-small-resilience-plan/
https://blog.cloudflare.com/fail-small-resilience-plan/
The Cloudflare Blog
Code Orange: Fail Small — our resilience plan following recent incidents
We have declared “Code Orange: Fail Small” to focus everyone at Cloudflare on a set of high-priority workstreams with one simple goal: ensure that the cause of our last two global outages never happens again.
🤣4👍2🔥1
A set of modern Grafana dashboards for Kubernetes.
https://github.com/dotdc/grafana-dashboards-kubernetes
https://github.com/dotdc/grafana-dashboards-kubernetes
GitHub
GitHub - dotdc/grafana-dashboards-kubernetes: A set of modern Grafana dashboards for Kubernetes.
A set of modern Grafana dashboards for Kubernetes. - dotdc/grafana-dashboards-kubernetes
👍7💩1
This case study examines the build-versus-buy decision for Terraform CI/CD orchestration by analyzing a custom-built tool called Terraflow. The author reflects on the trade-offs between creating a bespoke solution that perfectly fits a specific workflow and the opportunity cost of diverting engineering resources from core business features.
https://terrateam.io/blog/build-vs-buy-terraflow-case-study
https://terrateam.io/blog/build-vs-buy-terraflow-case-study
Terrateam
function noscript(pageContext) {
const { post } = pageContext.data;
return (post == null ? void 0 : post.seoTitle) || (post ==…
const { post } = pageContext.data;
return (post == null ? void 0 : post.seoTitle) || (post ==…
function denoscription(pageContext) {
const { post } = pageContext.data;
return (post == null ? void 0 : post.denoscription) || "Blog post from Terrateam";
}
const { post } = pageContext.data;
return (post == null ? void 0 : post.denoscription) || "Blog post from Terrateam";
}
👍4❤2
This tutorial guides readers through building a unified OpenTelemetry pipeline in Kubernetes to correlate metrics, logs, and traces. Fatih Koç explains how to deploy the OTel Collector as both a DaemonSet and a gateway to centralize enrichment and sampling, ultimately reducing incident resolution time.
https://fatihkoc.net/posts/opentelemetry-kubernetes-pipeline/
https://fatihkoc.net/posts/opentelemetry-kubernetes-pipeline/
Fatih Koç
Building a Unified OpenTelemetry Pipeline in Kubernetes
Deploy OpenTelemetry Collector in Kubernetes to unify metrics, logs, and traces with correlation, smart sampling, and insights for faster incident resolution.
👍5
This documentation demystifies the structure of Kubernetes YAML files by breaking them down into their three core components:
https://medium.com/@thisara.weerakoon2001/demystifying-kubernetes-yaml-ef9e92acf3df
metadata, spec, and status. It explains how users define the desired state in the spec, while Kubernetes continuously works to align the actual status with that intent through its reconciliation loop.https://medium.com/@thisara.weerakoon2001/demystifying-kubernetes-yaml-ef9e92acf3df
Medium
Demystifying Kubernetes YAML
In the world of Kubernetes, YAML files are the bread and butter. They are the declarative way you tell Kubernetes what you want your…
👍3
This engineering publication from DoubleVerify presents a case study on synchronizing database schema updates across multiple projects and environments. The team developed a solution using a shared, standalone schema migrations repository and Kubernetes pre-install hooks to automate and coordinate the process.
https://medium.com/doubleverify-engineering/a-case-study-in-synchronizing-database-schema-updates-between-projects-and-environments-a69a3cc38985
https://medium.com/doubleverify-engineering/a-case-study-in-synchronizing-database-schema-updates-between-projects-and-environments-a69a3cc38985
Medium
A Case Study in Synchronizing Database Schema Updates between Projects and Environments
Written By: Chaim Leichman
👍3❤2
eBPF based cloud-native load-balancer for Kubernetes|Edge|Telco|IoT|XaaS.
https://github.com/loxilb-io/loxilb
https://github.com/loxilb-io/loxilb
GitHub
GitHub - loxilb-io/loxilb: eBPF based cloud-native load-balancer for Kubernetes|Edge|Telco|IoT|XaaS.
eBPF based cloud-native load-balancer for Kubernetes|Edge|Telco|IoT|XaaS. - loxilb-io/loxilb
👍2🔥1
Kubernetes v1.35: Timbernetes — Only the Important Parts (Part 1): Deprecations, removals
Removal of cgroup v1 support
Cgroup v2 is now the modern standard, Kubernetes is ready to retire the legacy cgroup v1 support in v1.35. This is an important notice for cluster administrators: if you are still running nodes on older Linux distributions that don't support cgroup v2, your `kubelet` will fail to start. To avoid downtime, you will need to migrate those nodes to systems where cgroup v2 is enabled.
Deprecation of ipvs mode in kube-proxy
Because of this maintenance burden, Kubernetes v1.35 deprecates `ipvs` mode. Although the mode remains available in this release, `kube-proxy` will now emit a warning on startup when configured to use it.
Final call for containerd v1.X
While Kubernetes v1.35 still supports containerd 1.7 and other LTS releases, this is the final version with such support. The SIG Node community has designated v1.35 as the last release to support the containerd v1.X series.
Removal of cgroup v1 support
Cgroup v2 is now the modern standard, Kubernetes is ready to retire the legacy cgroup v1 support in v1.35. This is an important notice for cluster administrators: if you are still running nodes on older Linux distributions that don't support cgroup v2, your `kubelet` will fail to start. To avoid downtime, you will need to migrate those nodes to systems where cgroup v2 is enabled.
Deprecation of ipvs mode in kube-proxy
Because of this maintenance burden, Kubernetes v1.35 deprecates `ipvs` mode. Although the mode remains available in this release, `kube-proxy` will now emit a warning on startup when configured to use it.
Final call for containerd v1.X
While Kubernetes v1.35 still supports containerd 1.7 and other LTS releases, this is the final version with such support. The SIG Node community has designated v1.35 as the last release to support the containerd v1.X series.
👍5🔥3👏2
Not so long ago, I posted news about moving CDK for Terraform to read-only mode, and now I think this outcome was inevitable.
- Programming languages are not well suited for describing infrastructure because they provide too much flexibility.
- Different companies can use different programming languages to describe essentially the same infrastructure.
- The entry barrier becomes higher: a DevOps engineer now needs to understand code written by someone else before them. We already have problems with code smells in application development, and this problem will not be any better when it comes to infrastructure denoscription.
- HCL is not perfect, but it is more straightforward. Terraform has become a de facto standard, and even its fork is not very popular (why change something if everything works?). The IaC world is generally inert.
- The market is already occupied by Pulumi, so to succeed you would need to be significantly better—but you can’t.
For all these reasons, CDK for Terraform never became popular. The same thing will likely happen to AWS CDK sooner or later.
https://news.1rj.ru/str/devops_sre_notes/2567
- Programming languages are not well suited for describing infrastructure because they provide too much flexibility.
- Different companies can use different programming languages to describe essentially the same infrastructure.
- The entry barrier becomes higher: a DevOps engineer now needs to understand code written by someone else before them. We already have problems with code smells in application development, and this problem will not be any better when it comes to infrastructure denoscription.
- HCL is not perfect, but it is more straightforward. Terraform has become a de facto standard, and even its fork is not very popular (why change something if everything works?). The IaC world is generally inert.
- The market is already occupied by Pulumi, so to succeed you would need to be significantly better—but you can’t.
For all these reasons, CDK for Terraform never became popular. The same thing will likely happen to AWS CDK sooner or later.
https://news.1rj.ru/str/devops_sre_notes/2567
👍11💯4🔥2❤1👌1
Kubernetes v1.35: Timbernetes — Only the Important Parts (Part 2): Features Graduating to Stable
Stable: In-place update of Pod resources
This feature allows users to adjust CPU and memory resources without restarting Pods or Containers.
PreferSameNode traffic distribution
A new option, `PreferSameNode` lets services strictly prioritize endpoints on the local node if available
Job API managed-by mechanism
The Job API now includes a `managedBy` field that allows an external controller to handle Job status synchronization.
Reliable Pod update tracking with .metadata.generation
Every time a Pod's `spec` is updated, the `.metadata.generation` value is incremented.
Configurable NUMA node limit for topology manager
Cluster administrators who enable it can use servers with more than 8 NUMA nodes.
Stable: In-place update of Pod resources
This feature allows users to adjust CPU and memory resources without restarting Pods or Containers.
PreferSameNode traffic distribution
A new option, `PreferSameNode` lets services strictly prioritize endpoints on the local node if available
Job API managed-by mechanism
The Job API now includes a `managedBy` field that allows an external controller to handle Job status synchronization.
Reliable Pod update tracking with .metadata.generation
Every time a Pod's `spec` is updated, the `.metadata.generation` value is incremented.
Configurable NUMA node limit for topology manager
Cluster administrators who enable it can use servers with more than 8 NUMA nodes.
❤6👍4🔥2
Kubernetes v1.35: Timbernetes — Only the Important Parts (Part 3): New Features in Beta
Pod certificates for workload identity and security
Native workload identity with automated certificate rotation.
Expose node topology labels via Downward API
The `kubelet` can now inject standard topology labels, such as `topology.kubernetes.io/zone` and `topology.kubernetes.io/region`, into Pods as environment variables or projected volume files.
Native support for storage version migration
With this release, the built-in controller automatically handles update conflicts and consistency tokens, providing a safe, streamlined, and reliable way to ensure stored data remains current with minimal operational overhead.
Mutable Volume attach limits
Opportunistic batching
The batching mechanism consists of two operations that can be invoked whenever needed - create and nominate. Create leads to the creation of a new set of batch information from the scheduling results of Pods that have a valid signature. Nominate uses the batching information from create to set the nominated node name from a new Pod whose signature matches the canonical Pod’s signature.
maxUnavailable for StatefulSets
You can use it to define the maximum number of pods that can be unavailable during an update.
Configurable credential plugin policy in kuberc
kuberc gains additional functionality which allows users to configure credential plugin policy.
KYAML
KYAML is a safer and less ambiguous subset of YAML designed specifically for Kubernetes.
Configurable tolerance for HorizontalPodAutoscalers
This enhancement allows users to define a custom tolerance window on a per-resource basis within the HPA `behavior` field.
Support for user namespaces in Pods
Kubernetes is adding support for user namespaces, allowing pods to run with isolated user and group ID mappings instead of sharing host IDs.
VolumeSource: OCI artifact and/or image
Support for the `image` volume type allowing Pods to declaratively pull and unpack OCI container image artifacts into a volume. This lets you package and deliver data-only artifacts such as configs, binaries, or machine learning models using standard OCI registry tools.
Enforced `kubelet` credential verification for cached images
This KEP introduces a mechanism where the `kubelet` enforces credential verification for cached images. Before allowing a Pod to use a locally cached image, the `kubelet` checks if the Pod has the valid credentials to pull it.
Fine-grained Container restart rules
Kubernetes v1.35 addresses this by enabling `restartPolicy` and `restartPolicyRules` within the container API itself. This allows users to define restart strategies for individual regular and init containers that operate independently of the Pod's overall policy.
CSI driver opt-in for service account tokens via secrets field
Kubernetes v1.35 introduces an opt-in mechanism for CSI drivers to receive ServiceAccount tokens via the dedicated secrets field in the NodePublishVolume request
Deployment status: count of terminating replicas
Kubernetes v1.35 promotes the `terminatingReplicas` field within the Deployment status to beta. This field provides a count of Pods that have a deletion timestamp set but have not yet been removed from the system.
Pod certificates for workload identity and security
Native workload identity with automated certificate rotation.
Expose node topology labels via Downward API
The `kubelet` can now inject standard topology labels, such as `topology.kubernetes.io/zone` and `topology.kubernetes.io/region`, into Pods as environment variables or projected volume files.
Native support for storage version migration
With this release, the built-in controller automatically handles update conflicts and consistency tokens, providing a safe, streamlined, and reliable way to ensure stored data remains current with minimal operational overhead.
Mutable Volume attach limits
CSINode.spec.drivers[*].allocatable.count is now mutable so that a node’s available volume attachment capacity can be updated dynamically.Opportunistic batching
The batching mechanism consists of two operations that can be invoked whenever needed - create and nominate. Create leads to the creation of a new set of batch information from the scheduling results of Pods that have a valid signature. Nominate uses the batching information from create to set the nominated node name from a new Pod whose signature matches the canonical Pod’s signature.
maxUnavailable for StatefulSets
You can use it to define the maximum number of pods that can be unavailable during an update.
Configurable credential plugin policy in kuberc
kuberc gains additional functionality which allows users to configure credential plugin policy.
KYAML
KYAML is a safer and less ambiguous subset of YAML designed specifically for Kubernetes.
Configurable tolerance for HorizontalPodAutoscalers
This enhancement allows users to define a custom tolerance window on a per-resource basis within the HPA `behavior` field.
Support for user namespaces in Pods
Kubernetes is adding support for user namespaces, allowing pods to run with isolated user and group ID mappings instead of sharing host IDs.
VolumeSource: OCI artifact and/or image
Support for the `image` volume type allowing Pods to declaratively pull and unpack OCI container image artifacts into a volume. This lets you package and deliver data-only artifacts such as configs, binaries, or machine learning models using standard OCI registry tools.
Enforced `kubelet` credential verification for cached images
This KEP introduces a mechanism where the `kubelet` enforces credential verification for cached images. Before allowing a Pod to use a locally cached image, the `kubelet` checks if the Pod has the valid credentials to pull it.
Fine-grained Container restart rules
Kubernetes v1.35 addresses this by enabling `restartPolicy` and `restartPolicyRules` within the container API itself. This allows users to define restart strategies for individual regular and init containers that operate independently of the Pod's overall policy.
CSI driver opt-in for service account tokens via secrets field
Kubernetes v1.35 introduces an opt-in mechanism for CSI drivers to receive ServiceAccount tokens via the dedicated secrets field in the NodePublishVolume request
Deployment status: count of terminating replicas
Kubernetes v1.35 promotes the `terminatingReplicas` field within the Deployment status to beta. This field provides a count of Pods that have a deletion timestamp set but have not yet been removed from the system.
👍2🔥2❤1❤🔥1
Upgrading a critical database cluster often involves anxiety, but this practical guide outlines a method to update PostgreSQL without losing data or incurring significant downtime. It covers the essential command-line steps and verification processes needed for a smooth transition.
https://palark.com/blog/postgresql-upgrade-no-data-loss-downtime/
https://palark.com/blog/postgresql-upgrade-no-data-loss-downtime/
Palark
Upgrading PostgreSQL with no data loss and minimal downtime | Tech blog | Palark
A technical story of upgrading a production PostgreSQL cluster from v13 to v16. It focuses on high availability and minimal downtime.
👍5