Reddit DevOps – Telegram
Is there a way to get notified when a CVE in your container image is actually being exploited in the wild?

Getting tired of patching every theoretical CVE that scanners throw at us. Half of them never see real exploits but still create noise and patch fatigue.

Anyone know of tools or feeds that can tell you when a CVE in your container images is actually being exploited in the wild? Not just CVSS scores or theoretical impact, but real threat intel showing active exploitation.

Would love to prioritize patches based on actual risk instead of just severity numbers.

https://redd.it/1ojimb6
@r_devops
35 to DevOps too late?

Been doing QA for the past 5 years and it is getting toll on me. I feel like I can do more and I love tinkering linux. I don't hate my job God bless but feels like I can do more. I am more than your average user, but less than a professional DevOps I suppose. Appreciate your opinions.

https://redd.it/1ojr5vd
@r_devops
How do you write your first post about a new habit-building app?

I’ve recently finished developing my first product app that helps users build habits and achieve their goals step by step. Since I don’t have prior marketing experience, I’m planning to start with zero-cost marketing and rely mainly on organic posts. My goal is to share the story behind the app and invite feedback, but I’m unsure how to write that first post without sounding like I’m trying to sell something.

For those who’ve launched a product before, how did you craft your first post to make it feel authentic and engaging? What elements or structure helped you get genuine feedback instead of just promotional nois

https://redd.it/1ojs797
@r_devops
Stuck between honesty and overselling.

I’ve been working in DevOps for about 12 years now. Covering most aspects over the years: build and release management, infra provisioning and maintenance (cloud and on-prem), SRE work, config management, and a bit of DevSecOps too.



Here’s where my dilemma starts. Like most DevOps engineers in large orgs, I haven’t personally set up every layer of the stack. For instance,

* I know Kubernetes well enough to manage deployments, troubleshoot, and maintain clusters, but I wasn’t the one who built them from scratch.
* Same with Ansible, I write and manage playbooks daily, but I didn’t originally architect or configure the controller host.
* Similar story with Terraform, cloud infra setup, and WAF/network administration, I understand the moving parts and can work on them, but I didn’t create everything ground-up.

In interviews, when I explain this honestly, I can almost feel the interviewer’s interest drop the moment I say “I haven’t personally set up the cluster or administer it” or “I wasn’t responsible for the initial infra design.”

Yet, I see people who exaggerate their contributions land those same roles. People who, frankly, can’t even write solid production-ready manifests or pipelines. There are people who write manifests in Notepad++ who are hired in Lead DevOps role(same as me). It's frustrating working with these people.



So, here’s my question:

* Is it time I start “selling” myself more aggressively in interviews?
* Or is there a way to frame my experience truthfully without underselling what I actually know and can do?



I don’t want to lie, but I’m starting to feel that being 100% transparent is working against me. Has anyone else faced this? How do you balance credibility and confidence in technical interviews; especially in senior DevOps/SRE roles?

I don't like the feeling of getting rejected in final round of interviews. Or am I just overestimating my skills/capabilities and I'm far behind market/job expectations. What is it that I'm doing wrong?

https://redd.it/1ojrzhi
@r_devops
Fresher DevOps Engineer (3 months in) — how can I best use my free time to upskill for a better WLB + higher paying role later?

Hey folks 👋

I joined 3 months ago as a Junior DevOps Engineer (fresher). My CTC is 3 LPA and there’s a 2-year bond (₹1L if I break it). The work is super light, so I get a lot of free time in office.

Here’s what I have access to:

Ubuntu VM with sudo access

ChatGPT

2 weekly offs (Sat & Sun)

Right now I know a bit of Linux, Jenkins, GitLab, SVN, and WinSCP.
My goal is to upskill in DevOps + Cloud, build hands-on projects, and later move to a remote or Hyderabad-based role with better pay + WLB.

My goal:
👉 Build solid DevOps + Cloud skills
👉 Create hands-on projects I can show later on GitHub
👉 Prepare for a better-paying role after my bond (ideally remote or Hyderabad-based)
👉 Maintain a good work-life balance

Can you suggest:

What should I focus on learning next (AWS, Docker, Kubernetes, Terraform, etc.)?

Any project ideas I can do on my Ubuntu VM?

Free resources, YouTube channels, or courses worth following?

How to plan a practical roadmap using ChatGPT + self-practice?

https://redd.it/1ojuqtw
@r_devops
Stuck between a great PhD offer and a solid DevOps career any advice?

I’m currently working as a DevOps Engineer with a good salary, and I’m 27 years old.
Recently, I received an offer to pursue a PhD at a top 100 university in the world. The topic aligns perfectly with my passion — information security, WebAssembly, Rust, and cloud computing.

The salary is much lower than my current salary, and it will take around 5 years to finish the program, but I see this as a rare opportunity at my age to gain strong research experience and deepen my technical skills.

I’m struggling to decide is this truly a strong opportunity worth taking, or should I stay in the industry and keep building my professional experience?
Has anyone here gone through a similar situation? How did it impact your career afterward whether you stayed in academia or returned to industry?


After having a phd in information security, what are the opportunities to come back to the industry?

https://redd.it/1ojv4rf
@r_devops
Offloading SQL queries to read-only replica

What's the best strategy? One approach is to redirect all reads to replica and all writes to master. This is too crude, so I choose to do things manually, think

Database.on_replica do
# code here
end

However this has hidden footguns. For one thing the code should make no writes to the database. This is easy to verify if it's just a few lines of code, but becomes much more difficult if there are calls to procedures defined in another file, which call other files, which call something in a library. How can a developer even know that the procedure they're modifying is used within a read-only scope somewhere high up in the call chain?

Another problem is "mostly reads". This is find_or_create method semantics. It does a SELECT most of the time, but for some subset of data it issues an INSERT.

And yet another problem is automated testing. How to make sure that a bunch of queries are always executed on a replica? Well, you have to have a replica in test environment. Ok, that's no big deal, I managed to set it up. However, how do you get the data in there? It is read-only, so naturally you have to write to the master. This means you have to commit the transaction, otherwise replica won't see anything. Committing transactions is slow when you have to create and delete thousands of times per each test suit run.

There has to be a better way. I want my replica to ease the burden of master database because currently it is mostly idle.

https://redd.it/1ojv8gv
@r_devops
payment processing went down for 2 minutes. engineering said p3. finance said p1

we had a payment gateway timeout friday that lasted barely 2 minutes. during that time customers couldnt complete checkouts.

engineering immediately called it p3. its a known issue with the third party provider. happens occasionally. self resolved. no code changes needed.

finance lost their minds. called it p1. ran the numbers and we lost significant revenue because its black friday weekend. customers who hit errors abandoned carts and didnt come back.

support sided with finance because they got slammed with tickets and customers were threatening chargebacks on social media.

product sided with engineering because technically the system worked as designed. timeout and retry logic did exactly what it should.

spent the entire postmortem arguing about severity instead of talking about improvements. finance wants anything touching payments to be p1 automatically. engineering says that makes severity meaningless.

the problem is both are right. from technical standpoint it was minor. from business standpoint we literally lost money during peak shopping weekend.

calling on fintech and ecommerce people: how do you handle this kinda scenario, looking for some advice.?

https://redd.it/1ojz89l
@r_devops
What do you do when Audit wants tickets and there are none?

For those in large public companies, do you ever work with Audit? What do you do when Audit comes around asking for tickets on work that was done using systems outside of Jira/ADO? Audit is breathing down our necks.

https://redd.it/1ojzvun
@r_devops
The problem I see with AI is if the person asking AI to do something doesn’t understand scale, they could end up with infrastructure issues at the foundation.

How many times have we had to talk our own people off a ledge for considering Kubernetes when we just need ECS or vice-versa? How many times has management come back from a conference with a new shiny and it then becomes the biggest maintenance headache for every one involved?

I think that we may not see it immediately but poorly architected architecture in middling companies that are trying to poorly execute AI agents will keep us busy for quite some time. The bubble isn’t a sudden pop. Its a slow realization that you screwed yourself over two years ago by blindly taking the recommendations of an advanced autocomplete program.

https://redd.it/1ok2g4q
@r_devops
1
What’s everyone using for application monitoring these days?

Trying to get a feel for what folks are actually using in the wild for application monitoring.

We’ve got a mix of services running across Kubernetes and a few random VMs that never got migrated (you know the ones). I’m mostly trying to figure out how people are tracking performance and errors without drowning in dashboards and alerts that no one reads.

Right now we’re using a couple of open-source tools stitched together, but it feels like I spend more time maintaining the monitoring than the actual app.

What’s been working for you? Do you prefer to piece stuff together or go with one platform that does it all? Curious what the tradeoffs have been.

https://redd.it/1ok21tz
@r_devops
Datadog suddenly increasing charges

Hi there 👋🏻
Just wanna check if anyone else got these news.. Basically, they informed us that they have decided to have a new SKU for fargate apm and that now we are gonna be billed 3 times more for this product.. that is, if we have a fargate apm task, currently we pay 1usd and after this change is gonna cost 4usd.
has anyone got this news? I even thought that they wanna ditch us and this is the way for doing so..

https://redd.it/1ok48jx
@r_devops
Final interview flipped into a surprise technical test! and I froze

Went through a multi-stage interview process at a cybersecurity company, two technical interviews, one half-technical intro chat, and an HR round. Everything went well, strong vibes, and I genuinely felt aligned with the company culture and team, they loved the vibes as well.

I was told the final call with the VP would be a “casual intro and culture fit conversation.”

Except… it wasn’t.

The VP immediately turned it into a high-pressure technical interview. No warm-up, no small talk, straight into deep technical questions and drilling down to very specific wording. I tried to keep up, but I wasn’t mentally prepared for a surprise test. The pressure hit, I got flustered, and couldn’t articulate things I normally handle well.

After that call, I was told they think I have “knowledge gaps” and it’s not the right fit right now.

And honestly… it stung. Not because I think I deserved anything, but because I felt like I didn’t get judged on the abilities I showed throughout the whole process, but on a single unexpected stress moment.

I know interviews can be unpredictable, but being evaluated on an exam you didn’t know you were about to take feels off. Still processing whether I should reach out and ask for reconsideration or just move forward?

Just needed to get it out.

https://redd.it/1ok74pn
@r_devops
Should incident.io be my alert router, or only for critical incidents?


So our observability stack consists of grafana and prometheus for monitoring and alerting, and incident.io for incidents and on-call....

Should I send all alerts to indicent.io and from there decide which channels the alert should go to (like slack, email... etc)? or make that decision on grafana and only send critical incidents to incident.io?



https://redd.it/1ok3iuw
@r_devops
Do your teams skip retros on busy weeks?

Hi everyone, I’m looking for a bit of feedback on something.

I’ve been talking with a bunch of teams lately, and a lot of them mentioned they skip retros when things get busy, or have stopped running them altogether.

This makes sense to me since since I've definitely had Fridays with too much to get done, and didn't want to take the time for a retro.

But I wanted to check with everyone here - is that true for your teams too?

I wondered if a lighter weight way to run a retro would be of interest, so I put together a small experiment to test that idea (not ready yet, just testing the concept).

The concept is a quick Slackbot that runs a 2-minute async retro to keep a pulse on how the team’s doing: https://retroflow.io/slackbot

Would this be valuable to anyone here?

(Not promoting anything — just exploring the idea and genuinely interested in feedback.)

https://redd.it/1okadjz
@r_devops
How do you get engineering teams to standardize on secure base images without constant pushback?

We're scaling our containerized apps and need to standardize base images for security andcompliance, but every team has their own preferences. Policy as code feels heavy, and blocking PRs kills velocity.

What’s worked for you? Thinking about automated scanning that flags non-approved images but doesn't block initially, then gradually tightening. Or maybe image registries with approved-only pulls?

Any tools or workflows that let you roll this out incrementally? Don't want to be the team that breaks everyone's deploys.

https://redd.it/1okcmjx
@r_devops
Have you ever discovered a vulnerability way too late? What happened?

AI coding tools are great at writing code fast, but not so great at keeping it secure. 

Most developers spend nights fixing bugs, chasing down vulnerabilities and doing manual reviews just to make sure nothing risky slips into production.

So I started asking myself, what if AI could actually help you ship safer code, not just more of it?

That’s why I built Gammacode. It’s an AI code intelligence platform that scans your repos for vulnerabilities, bugs and tech debt, then automatically fixes them in secure sandboxes or through GitHub actions. 

You can use it from the web or your terminal to generate, audit and ship production-ready code faster, without trading off security.

I built it for developers, startups and small teams who want to move quickly but still sleep at night knowing their code is clean. 

Unlike most AI coding tools, Gammacode doesn’t store or train on your code, and everything runs locally. You can even plug in whatever model you prefer like Gemini, Claude or DeepSeek.

I am looking for feedback and feature suggestions. What’s the most frustrating or time-consuming part of keeping your code secure these days?

https://redd.it/1ok5z0a
@r_devops
Database design in CS capstone project - Is AWS RDS overkill over something like Supabase? Or will I learn more useful stuff in AWS?

Hello all! If this is the wrong place, or there's a better place to ask it, please let me know.

So I'm working on a Computer Science capstone project. We're building a chess.com competitor application for iOS and Android using React Native as the frontend.

I'm in charge of Database design and management, and I'm trying to figure out what tool architecture we should use. I'm relatively new to this world so I'm trying to figure it out, but it's hard to find good info and I'd rather ask specifically.

Right now I'm between AWS RDS, and Supabase for managing my Postgres database. Are these both good options for our prototype? Are both relatively simple to implement into React Native, potentially with an API built in Go? It won't be handling too much data, just small for a prototype.

But, the reason I may want to go with RDS is specifically to learn more about cloud-based database management, APIs, firewalls, network security, etc... Will I learn more about all of this working in AWS RDS over Supabase, and is knowing AWS useful for the industry?


Thank you for any help!

https://redd.it/1okfoz3
@r_devops
Understanding Terraform usage (w/Gitlab CI/CD)

So i'll preface by saying I work as an SDET who is learning Terraform the past couple of days. We are also moving our CI/CD pipeline to gitlab and aws for our provider (from azure/azure devops, in this case don't worry about the "why's" because it was a business decision made whether I agree with it or not unfortunately)

So with that being said when it comes to DevOps/Gitlab and AWS I have very little knowledge. I mean I understand devops basics and have created gitlab-ci.yml files for automated testing, but the "Devops" best practices and AWS especially I have very little knowledge.

Terraform has been something we are going to use to manage infrastructure. It took me a little bit to understand "how" it should be used, but I want to make sure my "plan" makes sense at a base level. Also FWIW our team used Pulumi before but we are switching to Terraform (to transfer to what everyone else is using which is Terraform)

So how I have it setup currently (and my understanding on best practices). Also fwiw this is for a .net/blazor app (for now as a demo) but most of our projects we are converting are going to be .NET based ones. Also for now we are hosting it on an Elastic beanstalk.

Anyways here's how I have it setup and what I see as a pipeline (That so far works)

Gitlab CI/CD (build/deploy) handles actually building the app and publishing it (as a deploy-<version>.zip file.
The Deploy job does the actual copying of the .zip to S3 bucket (via aws-cli docker image) AS well as updating the elastic environment.
Terraform plan job runs every time and copys the tfplan to an artifact
Terraform apply actually makes the changes based off the tfplan (But is a manual job)
the terraform.tfstate is stored in s3 (with DynamoDB locking) as the "Source of truth".

So far this is working as a base level. but I still have a few questions in general:

Is there any reason Terraform should handle app deploy (to beanstalk) and deploy.zip copying to S3. I know it "can" but it sounds like it shouldn't be (Sort of a separation of concerns problem)
It seems like once set up terraform tfplan "apply" really shouldn't be running that often right?
Seems for "first time setup" it makes more sense to set it up manually on AWS and then import it (the state file). Others suggested setting up the .tf resource files first (but this seems like it would be a headache with all the configurations
Seems like really terraform should be mainly used to keep "resources" the same without drift.
This is probably irrelevant, but a lot of the team is used to Azure devops pipeline.yml files and thinks it'll be easy to copy-paste but I told them due to how gitlab works a lot is going to need to be re-written. is this accurate?

I know other teams use helm charts, but thats for K8's right?, for ECS. It's been said that ECS is faster/cheaper but beanstalk is "simpler" for apps that don't need a bunch of quick pod increases/etc...

Anyways sorry for the wall of text. I'm also open for hearing any advice too.

https://redd.it/1okfopx
@r_devops