How to get into cloud/devops within 2-3 years of experience in Infrastructure Administration (Virtualization)
I'm currently working in service based company and my project is basically about Virtualization using Vsphere and Nutanix, I do find Cloud Computing intersting and I've been trying to self learn, improving my bash noscripting skills by doing projects and acquiring certifications. But the issue I face is how can I transition myself from a Virtualization Engineer role to a Cloud Computing role? Without much hands on experience? Like would working on projects on my own count as one? Since every job opening require 4+ years of experience. What are the best choices I could make? Switching internally to a cloud based project and then trying to switch companies?
What could be a better roadmap to get into cloud? Cause at times i feel like I'm just going around in circles without a defenitive idea, it feels like I need to master bash and move on to auto ating things with python, learn docker, kubernetes, terraform,jenkins etc sometimes I do feel like it's overwhelming but i really wanna crack it down, i just need some advise?
Could you please help me out?
https://redd.it/1pqf0tm
@r_devops
I'm currently working in service based company and my project is basically about Virtualization using Vsphere and Nutanix, I do find Cloud Computing intersting and I've been trying to self learn, improving my bash noscripting skills by doing projects and acquiring certifications. But the issue I face is how can I transition myself from a Virtualization Engineer role to a Cloud Computing role? Without much hands on experience? Like would working on projects on my own count as one? Since every job opening require 4+ years of experience. What are the best choices I could make? Switching internally to a cloud based project and then trying to switch companies?
What could be a better roadmap to get into cloud? Cause at times i feel like I'm just going around in circles without a defenitive idea, it feels like I need to master bash and move on to auto ating things with python, learn docker, kubernetes, terraform,jenkins etc sometimes I do feel like it's overwhelming but i really wanna crack it down, i just need some advise?
Could you please help me out?
https://redd.it/1pqf0tm
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Where can I host an API for free so a friend can pentest it?
Hey guys, I want to ask something.
I have an API built using Golang, and I want to host it so my friend can test it. He’s a pen tester, and I want to give him access to the API endpoint rather than sharing my API folders and source files right away.
The problem is, I’m not sure where to host it for free, just for testing purposes. This is mainly for security testing, not production.
Do you have any recommendations for free platforms or setups to host a Go API temporarily for testing?
Thanks in advance!
https://redd.it/1pqi9aa
@r_devops
Hey guys, I want to ask something.
I have an API built using Golang, and I want to host it so my friend can test it. He’s a pen tester, and I want to give him access to the API endpoint rather than sharing my API folders and source files right away.
The problem is, I’m not sure where to host it for free, just for testing purposes. This is mainly for security testing, not production.
Do you have any recommendations for free platforms or setups to host a Go API temporarily for testing?
Thanks in advance!
https://redd.it/1pqi9aa
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Who's responsible for contract testing on your team?
We are just starting off with contract testing in our organization and would love your inputs on which team typically owns the effort.
View Poll
https://redd.it/1pqj775
@r_devops
We are just starting off with contract testing in our organization and would love your inputs on which team typically owns the effort.
View Poll
https://redd.it/1pqj775
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Resistance against implementing "automation tools"
Hi all,
I'm seeing same pattern in different companies: "it"/"devops" team are mostly doing old-school manual deployment and post configuration.
This seems to be related with few factors like: time pressure, idleness, lack of understanding from management or even many silo's where some are already using those while other are just continue.
Have you seen such?
This is kicking back as ppl are getting out of touch with market. Plus it's on their free time and own determination to learn - what's not helpful as well.
https://redd.it/1pqk6m6
@r_devops
Hi all,
I'm seeing same pattern in different companies: "it"/"devops" team are mostly doing old-school manual deployment and post configuration.
This seems to be related with few factors like: time pressure, idleness, lack of understanding from management or even many silo's where some are already using those while other are just continue.
Have you seen such?
This is kicking back as ppl are getting out of touch with market. Plus it's on their free time and own determination to learn - what's not helpful as well.
https://redd.it/1pqk6m6
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Content Delivery Network (CDN) - what difference does it really make?
It's a system of distributed servers that deliver content to users/clients based on their geographic location - requests are handled by the closest server. This closeness naturally reduce latency and improve the speed/performance by caching content at various locations around the world.
It makes sense in theory but curiosity naturally draws me to ask the question:
>ok, there must be a difference between this approach and serving files from a single server, located in only one area - but what's the difference exactly? Is it worth the trouble?
**What I did**
Deployed a simple frontend application (`static-app`) with a few assets to multiple regions. I've used DigitalOcean as the infrastructure provider, but obviously you can also use something else. I choose the following regions:
* **fra** \- Frankfurt, Germany
* **lon** \- London, England
* **tor** \- Toronto, Canada
* **syd** \- Sydney, Australia
Then, I've created the following droplets (virtual machines):
* static-fra-droplet
* test-fra-droplet
* static-lon-droplet
* static-tor-droplet
* static-syd-droplet
Then, to each *static* droplet the `static-app` was deployed that served a few static assets using Nginx. On *test-fra-droplet* `load-test` was running; used it to make lots of requests to droplets in all regions and compare the results to see what difference CDN makes.
Approximate distances between locations, in a straight line:
* Frankfurt - Frankfurt: \~ as close as it gets on the public Internet, the best possible case for CDN
* Frankfurt - London: \~ 637 km
* Frankfurt - Toronto: \~ 6 333 km
* Frankfurt - Sydney: \~ 16 500 km
Of course, distance is not all - networking connectivity between different regions varies, but we do not control that; distance is all we might objectively compare.
**Results**
**Frankfurt - Frankfurt**
* Distance: as good as it gets, same location basically
* Min: 0.001 s, Max: 1.168 s, Mean: 0.049 s
* **Percentile 50 (Median): 0.005 s**, Percentile 75: 0.009 s
* **Percentile 90: 0.032 s**, Percentile 95: 0.401 s
* Percentile 99: 0.834 s
**Frankfurt - London**
* Distance: \~ 637 km
* Min: 0.015 s, Max: 1.478 s, Mean: 0.068 s
* **Percentile 50 (Median): 0.020 s**, Percentile 75: 0.023 s
* **Percentile 90: 0.042 s**, Percentile 95: 0.410 s
* Percentile 99: 1.078 s
**Frankfurt - Toronto**
* Distance: \~ 6 333 km
* Min: 0.094 s, Max: 2.306 s, Mean: 0.207 s
* **Percentile 50 (Median): 0.098 s**, Percentile 75: 0.102 s
* **Percentile 90: 0.220 s**, Percentile 95: 1.112 s
* Percentile 99: 1.716 s
**Frankfurt - Sydney**
* Distance: \~ 16 500 km
* Min: 0.274 s, Max: 2.723 s, Mean: 0.406 s
* **Percentile 50 (Median): 0.277 s**, Percentile 75: 0.283 s
* **Percentile 90: 0.777 s**, Percentile 95: 1.403 s
* Percentile 99: 2.293 s
*for all cases, 1000 requests were made with 50 r/s rate*
If you want to reproduce the results and play with it, I have prepared all relevant noscripts on my GitHub: [https://github.com/BinaryIgor/code-examples/tree/master/cdn-difference](https://github.com/BinaryIgor/code-examples/tree/master/cdn-difference)
https://redd.it/1pql6h1
@r_devops
It's a system of distributed servers that deliver content to users/clients based on their geographic location - requests are handled by the closest server. This closeness naturally reduce latency and improve the speed/performance by caching content at various locations around the world.
It makes sense in theory but curiosity naturally draws me to ask the question:
>ok, there must be a difference between this approach and serving files from a single server, located in only one area - but what's the difference exactly? Is it worth the trouble?
**What I did**
Deployed a simple frontend application (`static-app`) with a few assets to multiple regions. I've used DigitalOcean as the infrastructure provider, but obviously you can also use something else. I choose the following regions:
* **fra** \- Frankfurt, Germany
* **lon** \- London, England
* **tor** \- Toronto, Canada
* **syd** \- Sydney, Australia
Then, I've created the following droplets (virtual machines):
* static-fra-droplet
* test-fra-droplet
* static-lon-droplet
* static-tor-droplet
* static-syd-droplet
Then, to each *static* droplet the `static-app` was deployed that served a few static assets using Nginx. On *test-fra-droplet* `load-test` was running; used it to make lots of requests to droplets in all regions and compare the results to see what difference CDN makes.
Approximate distances between locations, in a straight line:
* Frankfurt - Frankfurt: \~ as close as it gets on the public Internet, the best possible case for CDN
* Frankfurt - London: \~ 637 km
* Frankfurt - Toronto: \~ 6 333 km
* Frankfurt - Sydney: \~ 16 500 km
Of course, distance is not all - networking connectivity between different regions varies, but we do not control that; distance is all we might objectively compare.
**Results**
**Frankfurt - Frankfurt**
* Distance: as good as it gets, same location basically
* Min: 0.001 s, Max: 1.168 s, Mean: 0.049 s
* **Percentile 50 (Median): 0.005 s**, Percentile 75: 0.009 s
* **Percentile 90: 0.032 s**, Percentile 95: 0.401 s
* Percentile 99: 0.834 s
**Frankfurt - London**
* Distance: \~ 637 km
* Min: 0.015 s, Max: 1.478 s, Mean: 0.068 s
* **Percentile 50 (Median): 0.020 s**, Percentile 75: 0.023 s
* **Percentile 90: 0.042 s**, Percentile 95: 0.410 s
* Percentile 99: 1.078 s
**Frankfurt - Toronto**
* Distance: \~ 6 333 km
* Min: 0.094 s, Max: 2.306 s, Mean: 0.207 s
* **Percentile 50 (Median): 0.098 s**, Percentile 75: 0.102 s
* **Percentile 90: 0.220 s**, Percentile 95: 1.112 s
* Percentile 99: 1.716 s
**Frankfurt - Sydney**
* Distance: \~ 16 500 km
* Min: 0.274 s, Max: 2.723 s, Mean: 0.406 s
* **Percentile 50 (Median): 0.277 s**, Percentile 75: 0.283 s
* **Percentile 90: 0.777 s**, Percentile 95: 1.403 s
* Percentile 99: 2.293 s
*for all cases, 1000 requests were made with 50 r/s rate*
If you want to reproduce the results and play with it, I have prepared all relevant noscripts on my GitHub: [https://github.com/BinaryIgor/code-examples/tree/master/cdn-difference](https://github.com/BinaryIgor/code-examples/tree/master/cdn-difference)
https://redd.it/1pql6h1
@r_devops
GitHub
code-examples/cdn-difference at master · BinaryIgor/code-examples
Various code examples, mainly for my blog and youtube channel. - BinaryIgor/code-examples
Data analytics or full stack ?
I come from a very lower middle class family, so which field should I go into where I can get a high package and most importantly, where will freshers get a job quickly without experience, I will later Become sde agar me full stack karunga tho or data analytics karunga tho data scientist ya aiml engineer , kaha freshers ko job milegi I can wait for 10 months job dhundh ne ke liye .
Kaha high package or high package milega
Tell me guys
https://redd.it/1pqnsh3
@r_devops
I come from a very lower middle class family, so which field should I go into where I can get a high package and most importantly, where will freshers get a job quickly without experience, I will later Become sde agar me full stack karunga tho or data analytics karunga tho data scientist ya aiml engineer , kaha freshers ko job milegi I can wait for 10 months job dhundh ne ke liye .
Kaha high package or high package milega
Tell me guys
https://redd.it/1pqnsh3
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Liftbridge is back: Lightweight message streaming for distributed systems
Tyler Treat's Liftbridge project has been transferred to Basekick Labs for continued maintenance. It's been dormant since 2022, and we're reviving it.
TL;DR: Durable message streaming built on NATS. Think
Kafka's log semantics in a Go binary.
Technical Overview:
Liftbridge sits alongside NATS and persists messages to a replicated commit log. Key design decisions:
\- Dual consensus model: Raft for cluster metadata, ISR (Kafka-style) for data replication. Avoids writing messages to both a Raft log and message log (like NATS Streaming did).
\- Commit log structure: Append-only segments with offset and timestamp indexes. Memory-mapped for fast lookups.
\- NATS integration: Can subscribe to NATS subjects and persist transparently (zero client changes), or use gRPC API for explicit control.
Why this matters:
IBM's $11B Confluent acquisition has teams looking at alternatives. Liftbridge fills a gap: lighter than Kafka, more durable than plain NATS.
Useful for:
\- Edge computing (IoT, retail, industrial)
\- Go ecosystems wanting native tooling
\- Teams needing replay/offset semantics without JVM ops
What's next:
Modernizing the codebase (Go 1.25+, updated deps), security audit, and first release in January.
GitHub: https://github.com/liftbridge-io/liftbridge
Technical details: https://basekick.net/blog/liftbridge-joins-basekick-labs
Happy to answer questions about the architecture.
https://redd.it/1pqpe3z
@r_devops
Tyler Treat's Liftbridge project has been transferred to Basekick Labs for continued maintenance. It's been dormant since 2022, and we're reviving it.
TL;DR: Durable message streaming built on NATS. Think
Kafka's log semantics in a Go binary.
Technical Overview:
Liftbridge sits alongside NATS and persists messages to a replicated commit log. Key design decisions:
\- Dual consensus model: Raft for cluster metadata, ISR (Kafka-style) for data replication. Avoids writing messages to both a Raft log and message log (like NATS Streaming did).
\- Commit log structure: Append-only segments with offset and timestamp indexes. Memory-mapped for fast lookups.
\- NATS integration: Can subscribe to NATS subjects and persist transparently (zero client changes), or use gRPC API for explicit control.
Why this matters:
IBM's $11B Confluent acquisition has teams looking at alternatives. Liftbridge fills a gap: lighter than Kafka, more durable than plain NATS.
Useful for:
\- Edge computing (IoT, retail, industrial)
\- Go ecosystems wanting native tooling
\- Teams needing replay/offset semantics without JVM ops
What's next:
Modernizing the codebase (Go 1.25+, updated deps), security audit, and first release in January.
GitHub: https://github.com/liftbridge-io/liftbridge
Technical details: https://basekick.net/blog/liftbridge-joins-basekick-labs
Happy to answer questions about the architecture.
https://redd.it/1pqpe3z
@r_devops
GitHub
GitHub - liftbridge-io/liftbridge: Kafka-style message streaming in Go. Built on NATS. Single binary, no JVM, no ZooKeeper.
Kafka-style message streaming in Go. Built on NATS. Single binary, no JVM, no ZooKeeper. - liftbridge-io/liftbridge
Help with EKS migration from cloudformation to terraform
Hi all,
I am currently working on a project where I want to set up a new environment on a new account. Before that we used cloudformation templates, but I always liked IaC, so I wanted to do some learning and decided to use Terraform for it. My devops and cloud engineering knowledge is rather limited as I am mostly a fullstack dev. Regardless I decided that I will first import everything from Env A and then just apply it on ENV B. Which worked quite well, except for the EKS Loadbalancer.
So for eks we used eksctl in the cloudshell and just configured it that way. later we connected via a bastion host to the cluster and added helm, eks-chart and then AWS Loadbalancer Controller. First I just imported the cluster, nodes and loadbalancer. But a target group was not created, then I imported the target group, but it's not connecting to the load balancer and the nodes.
I also tried the eks module from AWS, but that one can't find the subnets of the vpc eventhough I add them directly as an array (everywhere else it works)
Tl;dr: What I know need help with is getting resources. It's holiday season and while I do not have to work, I want to read some stuff and finally understand how to set up an eks cluster in a vpc with a correctly working loadbalancer and target group with the nodes are linked via ip adress. THANK YOU VERY MUCH (and happy holidays)
EDIT: you can also recommend some books for me
https://redd.it/1pqq6jq
@r_devops
Hi all,
I am currently working on a project where I want to set up a new environment on a new account. Before that we used cloudformation templates, but I always liked IaC, so I wanted to do some learning and decided to use Terraform for it. My devops and cloud engineering knowledge is rather limited as I am mostly a fullstack dev. Regardless I decided that I will first import everything from Env A and then just apply it on ENV B. Which worked quite well, except for the EKS Loadbalancer.
So for eks we used eksctl in the cloudshell and just configured it that way. later we connected via a bastion host to the cluster and added helm, eks-chart and then AWS Loadbalancer Controller. First I just imported the cluster, nodes and loadbalancer. But a target group was not created, then I imported the target group, but it's not connecting to the load balancer and the nodes.
I also tried the eks module from AWS, but that one can't find the subnets of the vpc eventhough I add them directly as an array (everywhere else it works)
Tl;dr: What I know need help with is getting resources. It's holiday season and while I do not have to work, I want to read some stuff and finally understand how to set up an eks cluster in a vpc with a correctly working loadbalancer and target group with the nodes are linked via ip adress. THANK YOU VERY MUCH (and happy holidays)
EDIT: you can also recommend some books for me
https://redd.it/1pqq6jq
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
What is one piece of complexity in your stack that you would happily remove if you could?
More teams are quietly stepping back from complexity. Not because they cannot handle it, but because they are tired of it.
Distributed systems are powerful. They are also exhausting. I hear more engineers saying they want systems they can reason about at 2am.
This shows up in small ways. Fewer services. Clearer boundaries. More boring tech. And that is meant as a compliment.
It also shows up in how teams think about reliability. Not chasing five nines, but aiming for fast recovery and clear failure modes.
Observability has helped here. When systems tell you what they are doing, you do not need as many layers of abstraction.
https://redd.it/1pqrijg
@r_devops
More teams are quietly stepping back from complexity. Not because they cannot handle it, but because they are tired of it.
Distributed systems are powerful. They are also exhausting. I hear more engineers saying they want systems they can reason about at 2am.
This shows up in small ways. Fewer services. Clearer boundaries. More boring tech. And that is meant as a compliment.
It also shows up in how teams think about reliability. Not chasing five nines, but aiming for fast recovery and clear failure modes.
Observability has helped here. When systems tell you what they are doing, you do not need as many layers of abstraction.
https://redd.it/1pqrijg
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
when high eCPMs trick you into thinking a network performs well
i used to chase the “top” network by looking at ecpm alone. big mistake. one partner showed some crazy ecpm on paper, but the fill was so low that real revenue flatlined.
the wake up was a week in india where a “lower” network filled most of the requests and beat the fancy one on arpu. i removed the high ecpm one for two days and arpu jumped. felt kinda stupid ngl.
now i test for at least a week unless stuff breaks. i watch retention, session drops, and uninstall spikes, not only ecpm. i also added extra placements ahead of time and toggle them remote, which saves time and helps me test quick ideas without rebuilding.
if you’re stuck with unstable revenue, i’d look at arpu, fill, and session length together, not only ecpm.
https://redd.it/1pqpf3q
@r_devops
i used to chase the “top” network by looking at ecpm alone. big mistake. one partner showed some crazy ecpm on paper, but the fill was so low that real revenue flatlined.
the wake up was a week in india where a “lower” network filled most of the requests and beat the fancy one on arpu. i removed the high ecpm one for two days and arpu jumped. felt kinda stupid ngl.
now i test for at least a week unless stuff breaks. i watch retention, session drops, and uninstall spikes, not only ecpm. i also added extra placements ahead of time and toggle them remote, which saves time and helps me test quick ideas without rebuilding.
if you’re stuck with unstable revenue, i’d look at arpu, fill, and session length together, not only ecpm.
https://redd.it/1pqpf3q
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
🔥 Avatar: Fire & Ash — Deployed a Playwright cron job that triggers email alerts
Deployed a Playwright noscript with secrets in env vars (SMTP creds), added cron scheduling, and used logs to confirm runs. Trigger condition is a CTA state change → email alert.
Repo: https://github.com/kannasaikiran/dummy-ticket-bot
Demo reel: https://www.instagram.com/reel/DSdwoD-idkF/?igsh=amgwaXhyZzhyaGIx
https://redd.it/1pr1skf
@r_devops
Deployed a Playwright noscript with secrets in env vars (SMTP creds), added cron scheduling, and used logs to confirm runs. Trigger condition is a CTA state change → email alert.
Repo: https://github.com/kannasaikiran/dummy-ticket-bot
Demo reel: https://www.instagram.com/reel/DSdwoD-idkF/?igsh=amgwaXhyZzhyaGIx
https://redd.it/1pr1skf
@r_devops
GitHub
GitHub - kannasaikiran/dummy-ticket-bot
Contribute to kannasaikiran/dummy-ticket-bot development by creating an account on GitHub.
Virtarix vs OVH vs Hetzner for CI/CD and development
Looking for Hosting
I'm running a small dev team (5 people) and we need a dedicated server for our CI/CD pipelines, Docker registry, GitLab instance, and some dev/staging environments.
Current options I'm considering:
Virtarix - $122/mo - 8 cores, 64GB RAM, 500GB NVMe, unlimited bandwidth. Pros: Good specs, unlimited traffic. Cons: They only started in 2023 so not much track record.
Hetzner AX42 - €46/mo (\~$50) - AMD Ryzen 7 8 cores, 64GB DDR5, 2x512GB NVMe. Pros: Been around since 1997, cheapest option, great specs. Cons: €39 setup fee.
OVH Rise - Around $60-80/mo depending on config. Pros: Established, multiple global locations. Cons: Mixed reviews on support.
Budget's flexible but I'm trying to stay under $100/mo. Need something reliable since downtime means blocked developers. Most of us are in US/Canada.
What would you pick for this use case? Or should I be looking at something else entirely?
https://redd.it/1pr9cf5
@r_devops
Looking for Hosting
I'm running a small dev team (5 people) and we need a dedicated server for our CI/CD pipelines, Docker registry, GitLab instance, and some dev/staging environments.
Current options I'm considering:
Virtarix - $122/mo - 8 cores, 64GB RAM, 500GB NVMe, unlimited bandwidth. Pros: Good specs, unlimited traffic. Cons: They only started in 2023 so not much track record.
Hetzner AX42 - €46/mo (\~$50) - AMD Ryzen 7 8 cores, 64GB DDR5, 2x512GB NVMe. Pros: Been around since 1997, cheapest option, great specs. Cons: €39 setup fee.
OVH Rise - Around $60-80/mo depending on config. Pros: Established, multiple global locations. Cons: Mixed reviews on support.
Budget's flexible but I'm trying to stay under $100/mo. Need something reliable since downtime means blocked developers. Most of us are in US/Canada.
What would you pick for this use case? Or should I be looking at something else entirely?
https://redd.it/1pr9cf5
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
The Struggle with Cloud Infrastructure: Is There a Better Way?
Managing cloud infrastructure feels like a never-ending game of whack-a-mole. Every time I fix one issue, another one seems to pop up, and it’s hard to keep track of everything at once.
It’s not just the servers and databases, but also the logs and the security. There is just so much to monitor. And having all this data scattered across different tools can make it difficult to get a real-time view of your infrastructure’s health.
I’ve been thinking that there has to be a more efficient way to integrate all of these things into one platform, especially if you want to catch issues early without spending hours manually piecing everything together.
Anyone found a solution that helps with keeping things simple while still tracking performance and security at scale? Let me know what’s been working for you!
https://redd.it/1pr9qw2
@r_devops
Managing cloud infrastructure feels like a never-ending game of whack-a-mole. Every time I fix one issue, another one seems to pop up, and it’s hard to keep track of everything at once.
It’s not just the servers and databases, but also the logs and the security. There is just so much to monitor. And having all this data scattered across different tools can make it difficult to get a real-time view of your infrastructure’s health.
I’ve been thinking that there has to be a more efficient way to integrate all of these things into one platform, especially if you want to catch issues early without spending hours manually piecing everything together.
Anyone found a solution that helps with keeping things simple while still tracking performance and security at scale? Let me know what’s been working for you!
https://redd.it/1pr9qw2
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
How to pass AWS developer associate exam on first attempt?
I am a last year studend and I have recently passed AWS Cloud practitioner with 837 score and now I am preparing for AWS developer associate exam .I have no hands on experience with AWS .Is there anyone help me out so that I pass the exam before December on my first attempt.
https://redd.it/1pr9nj5
@r_devops
I am a last year studend and I have recently passed AWS Cloud practitioner with 837 score and now I am preparing for AWS developer associate exam .I have no hands on experience with AWS .Is there anyone help me out so that I pass the exam before December on my first attempt.
https://redd.it/1pr9nj5
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
What’s the minimum skill set for an entry level DevOps engineer?
I am currently in 6th Semester with knowledge in Mern, Sql, Python and foundational Spring Boot.
I’m aiming to transition toward a DevOps role and want to understand what’s actually required at an entry level.
Would appreciate advice from industry professionals
https://redd.it/1prbstc
@r_devops
I am currently in 6th Semester with knowledge in Mern, Sql, Python and foundational Spring Boot.
I’m aiming to transition toward a DevOps role and want to understand what’s actually required at an entry level.
Would appreciate advice from industry professionals
https://redd.it/1prbstc
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Do certs have any value?
I'm trying to get hired (in Europe, Poland if it matters) and I wonder if any certifications are valued by recuiiters enough to really pay for them. I want to be a DevOps engineer. I have a year experience being an IT admin
Certifications I though are good to get are from AWS and terraform, maybe bootcamp with income share agreement.
https://redd.it/1prcfoq
@r_devops
I'm trying to get hired (in Europe, Poland if it matters) and I wonder if any certifications are valued by recuiiters enough to really pay for them. I want to be a DevOps engineer. I have a year experience being an IT admin
Certifications I though are good to get are from AWS and terraform, maybe bootcamp with income share agreement.
https://redd.it/1prcfoq
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Resterm: TUI http/graphql/grpc client with websockets, SSE and SSH
Hello,
I've made a terminal http client which is an alternative to Postman, Bruno and so on. Not saying is better but for those who like terminal based apps, it could be useful.
Instead of defining each request as separate entity, you use .http/rest files. There are couple of "neat" features like automatic ssh tunneling, profiling, tracing or workflows. Workflows is basically step requests so you can kind of, "noscript" or chain multiple requests as one object. I could probably list all the features here but it would be long and boring :) The project is still very young and been actively working on it last 3 months so I'm sure there are some small bugs or quirks here and there.
You can install either via brew with brew install resterm, use install noscripts, download manually from release page or just compile yourself.
Hope someone would find it useful!
repo: https://github.com/unkn0wn-root/resterm
https://redd.it/1prd1u5
@r_devops
Hello,
I've made a terminal http client which is an alternative to Postman, Bruno and so on. Not saying is better but for those who like terminal based apps, it could be useful.
Instead of defining each request as separate entity, you use .http/rest files. There are couple of "neat" features like automatic ssh tunneling, profiling, tracing or workflows. Workflows is basically step requests so you can kind of, "noscript" or chain multiple requests as one object. I could probably list all the features here but it would be long and boring :) The project is still very young and been actively working on it last 3 months so I'm sure there are some small bugs or quirks here and there.
You can install either via brew with brew install resterm, use install noscripts, download manually from release page or just compile yourself.
Hope someone would find it useful!
repo: https://github.com/unkn0wn-root/resterm
https://redd.it/1prd1u5
@r_devops
GitHub
GitHub - unkn0wn-root/resterm: Terminal client for HTTP/GraphQL/gRPC with support for SSH tunnels, WebSockets, SSE, workflows,…
Terminal client for HTTP/GraphQL/gRPC with support for SSH tunnels, WebSockets, SSE, workflows, profiling, OpenAPI and response diffs. - unkn0wn-root/resterm
Post-re:Invent: Are we ready to be "Data SREs" for Agentic AI?
Just got back from my first re:Invent, and while the "Agentic AI" hype was everywhere (Nova 2, Bedrock AgentCore), the hallway conversations with other engineers told a different story. The common thread: "The models are ready, but our data pipelines aren't."
I’ve been sketching out a pattern I’m calling a Data Clearinghouse to bridge this gap. As someone who spends most of my time in EKS, Terraform, and Python, I’m starting to think our role as DevOps/SREs is shifting toward becoming "Data SREs."
The logic I’m testing:
• Infrastructure for Trust: Using IAM Identity Center to create a strict "blast radius" for agents so they can't pivot beyond their context.
• Schema Enforcement: Using Python-based validation layers to ensure agent outputs are 100% predictable before they trigger a downstream CI/CD or database action.
• Enrichment vs. Hallucination: A middle layer that cleans raw S3/RDS data before it's injected into a prompt.
Is anyone else starting to build "Clearinghouse" style patterns, or are you still focused on the core infra like the new Lambda Managed Instances? I’m keeping this "in the lab" for now while I refine the logic, but I'm curious if "Data Readiness" is the new bottleneck for 2026.
https://redd.it/1prfqpv
@r_devops
Just got back from my first re:Invent, and while the "Agentic AI" hype was everywhere (Nova 2, Bedrock AgentCore), the hallway conversations with other engineers told a different story. The common thread: "The models are ready, but our data pipelines aren't."
I’ve been sketching out a pattern I’m calling a Data Clearinghouse to bridge this gap. As someone who spends most of my time in EKS, Terraform, and Python, I’m starting to think our role as DevOps/SREs is shifting toward becoming "Data SREs."
The logic I’m testing:
• Infrastructure for Trust: Using IAM Identity Center to create a strict "blast radius" for agents so they can't pivot beyond their context.
• Schema Enforcement: Using Python-based validation layers to ensure agent outputs are 100% predictable before they trigger a downstream CI/CD or database action.
• Enrichment vs. Hallucination: A middle layer that cleans raw S3/RDS data before it's injected into a prompt.
Is anyone else starting to build "Clearinghouse" style patterns, or are you still focused on the core infra like the new Lambda Managed Instances? I’m keeping this "in the lab" for now while I refine the logic, but I'm curious if "Data Readiness" is the new bottleneck for 2026.
https://redd.it/1prfqpv
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
I built a tiny approval service to stop my cloud servers from burning money
I run a bunch of cloud servers for dev, testing, and experiments. Like everyone else, I’d forget to shut some of them down, burning money.
I wanted automation to handle shutdowns safely, but every option felt heavy:
Slack bots
Workflow engines
Custom approval UIs
Webhooks and state machines
All I really wanted was a simple human approval before the cron job can shutdown the server.
So I built ottr.run \- a small service that turns approval into state, not an event.
The pattern is dead simple:
A noscript creates a one-time approval link
A human clicks approve
That click write a value to key/value store
The noscript is already polling and resumes
No callbacks, no webhooks, no OAuth, no long-running workers.
This worked great for:
Auto-shutdown of idle servers
Risky infra changes
“Are you sure?” moments in cron jobs
Guardrails around cost-saving automations
Later I realized the same pattern applies to AI agents, but the original use case was pure DevOps: cheap, reliable human checkpoints for automation.
https://redd.it/1prkxvs
@r_devops
I run a bunch of cloud servers for dev, testing, and experiments. Like everyone else, I’d forget to shut some of them down, burning money.
I wanted automation to handle shutdowns safely, but every option felt heavy:
Slack bots
Workflow engines
Custom approval UIs
Webhooks and state machines
All I really wanted was a simple human approval before the cron job can shutdown the server.
So I built ottr.run \- a small service that turns approval into state, not an event.
The pattern is dead simple:
A noscript creates a one-time approval link
A human clicks approve
That click write a value to key/value store
The noscript is already polling and resumes
No callbacks, no webhooks, no OAuth, no long-running workers.
This worked great for:
Auto-shutdown of idle servers
Risky infra changes
“Are you sure?” moments in cron jobs
Guardrails around cost-saving automations
Later I realized the same pattern applies to AI agents, but the original use case was pure DevOps: cheap, reliable human checkpoints for automation.
https://redd.it/1prkxvs
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
I built a small tool to turn incident notes into blameless postmortems — looking for DevOps feedback
Hey r/devops,
I built a small side project after getting tired of postmortems turning into political documents instead of learning tools.
After incidents we usually have:
\- Slack threads
\- timelines
\- partial notes
\- context scattered across tools
Turning that into a clean, exec-safe postmortem takes time and careful wording, especially if you’re trying to keep things blameless and system-focused instead of personal.
This tool takes raw incident notes and generates a structured postmortem with:
\- Executive summary
\- Impact
\- Timeline
\- Blameless root cause
\- Action items
You can regenerate individual sections, edit everything, and export the full doc as Markdown to paste into Confluence / Notion / Docs. It’s meant as a drafting accelerator, not a replacement for review or accountability.
There’s a small free tier, then it’s $29/month if it’s useful. I’m mostly trying to sanity-check whether this solves a real pain for teams that write postmortems regularly.
Link: https://blamelesspostmortem.com
Genuinely interested in feedback from folks who actually run incidents:
\- Does this match how you do postmortems?
\- Where would this break down in real-world incidents?
\- Would you ever trust something like this, even as a first draft?
https://redd.it/1prkfe8
@r_devops
Hey r/devops,
I built a small side project after getting tired of postmortems turning into political documents instead of learning tools.
After incidents we usually have:
\- Slack threads
\- timelines
\- partial notes
\- context scattered across tools
Turning that into a clean, exec-safe postmortem takes time and careful wording, especially if you’re trying to keep things blameless and system-focused instead of personal.
This tool takes raw incident notes and generates a structured postmortem with:
\- Executive summary
\- Impact
\- Timeline
\- Blameless root cause
\- Action items
You can regenerate individual sections, edit everything, and export the full doc as Markdown to paste into Confluence / Notion / Docs. It’s meant as a drafting accelerator, not a replacement for review or accountability.
There’s a small free tier, then it’s $29/month if it’s useful. I’m mostly trying to sanity-check whether this solves a real pain for teams that write postmortems regularly.
Link: https://blamelesspostmortem.com
Genuinely interested in feedback from folks who actually run incidents:
\- Does this match how you do postmortems?
\- Where would this break down in real-world incidents?
\- Would you ever trust something like this, even as a first draft?
https://redd.it/1prkfe8
@r_devops
Blamelesspostmortem
Blameless Postmortem Generator
Generate executive-safe incident postmortems with AI
Real-time location systems on AWS: what broke first in production
Hey folks,
Recently, we developed a real-time location-tracking system on AWS designed for ride-sharing and delivery workloads. Instead of providing a traditional architecture diagram, I want to share what actually broke once traffic and mobile networks came into play.
Here are some issues that failed faster than we expected:
\- WebSocket reconnect storms caused by mobile network flaps, which increased fan-out pressure and downstream load instead of reducing it.
\- DynamoDB hot partitions: partition keys that seemed fine during design reviews collapsed when writes clustered geographically and temporally.
\- Polling-based consumers: easy to implement but costly and sluggish during traffic bursts.
\- Ordering guarantees: after retries, partial failures, and reconnects, strict ordering became more of an illusion than a guarantee.
Over time, we found some strategies that worked better:
\- Treat WebSockets as a delivery channel, not a source of truth.
\- Partition writes using an entity + time window, rather than just the entity.
\- Use event-driven fan-out with bounded retries instead of pushing everywhere.
\- Design systems for eventual correctness, not immediate consistency.
I’m interested in how others handle similar issues:
\- How do you prevent reconnect storms?
\- Are there patterns that work well for maintaining order at scale?
\- In your experience, which part of real-time systems tends to fail first?
Just sharing our lessons and eager to learn from your experiences.
https://redd.it/1proepg
@r_devops
Hey folks,
Recently, we developed a real-time location-tracking system on AWS designed for ride-sharing and delivery workloads. Instead of providing a traditional architecture diagram, I want to share what actually broke once traffic and mobile networks came into play.
Here are some issues that failed faster than we expected:
\- WebSocket reconnect storms caused by mobile network flaps, which increased fan-out pressure and downstream load instead of reducing it.
\- DynamoDB hot partitions: partition keys that seemed fine during design reviews collapsed when writes clustered geographically and temporally.
\- Polling-based consumers: easy to implement but costly and sluggish during traffic bursts.
\- Ordering guarantees: after retries, partial failures, and reconnects, strict ordering became more of an illusion than a guarantee.
Over time, we found some strategies that worked better:
\- Treat WebSockets as a delivery channel, not a source of truth.
\- Partition writes using an entity + time window, rather than just the entity.
\- Use event-driven fan-out with bounded retries instead of pushing everywhere.
\- Design systems for eventual correctness, not immediate consistency.
I’m interested in how others handle similar issues:
\- How do you prevent reconnect storms?
\- Are there patterns that work well for maintaining order at scale?
\- In your experience, which part of real-time systems tends to fail first?
Just sharing our lessons and eager to learn from your experiences.
https://redd.it/1proepg
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community