Reddit DevOps – Telegram
Search compressed files without decompressing - just shipped Crystal Unified

Hey everyone!  Just shipped something I'm pretty excited about - Crystal Unified Compressor.  

The big deal: Search through compressed archives without decompressing. Find a needle in 700MB or 70GB of logs in milliseconds instead of waiting to decompress, grep, then clean up.  

What else it does:
  \- Firmware delta patching \- Create tiny OTA updates by generating binary diffs between versions. Perfect for IoT/embedded devices, games patches, and other updates
  \- Block-level random access \- Read specific chunks without touching the rest
  \- Log files \- 10x+ compression (6-11% of original size) on server logs + search in milliseconds
  \- Genomic data \- 4:1 compression on DNA sequences
  \- Time series / sensor data \- Delta encoding that crushes sequential numeric patterns
  \- Parallel compression \- Throws all your cores at it  Decompression runs at 1GB/s+.  

Check it out: https://github.com/powerhubinc/crystal-unified-public  
We published it under BSL 1.1 license.

Would love thoughts on where you've seen this kind of thing needed in your portfolios 

https://redd.it/1qdkelh
@r_devops
What I like about being a senior engineer

What I don't like about being a senior engineer:


* I'm no longer in a room full of people smarter than me.
* I don't trust my ego sometimes. That's a me thing.


What I like about being a senior engineer:


* When I speak things I know something about, people pretty much listen.
* I get to have a meaningful impact on organizational outcomes, I get to work on big projects.
* I really enjoy mentoring junior people who are open to it.

https://redd.it/1qdmhoi
@r_devops
Do you think that justfiles underdelivers everywhere except packing noscripts into single file?

I'm kinda disappointed in Justfiles. In documentation it looks nice, on practice it create whole another set of hustle.

I'm trying to automate and document few day to day tasks + deployment jobs. In my case it is quite simple env (dev, stage, prod) + target (app1, app2) combination.

I'd want to basically write something like just deploy dev app1, just tunnel dev app1-db.

Initially I've tried have some map like structure and variables, but Justfile doesn't support this. Fine, I've written all the constants manually by convention like, DEV_SOMETHING, PROD_SOMETHING.

Okay, then I figured I need a way to pick the value conditionally. So for the test I picked this pattern:

noscript
arg("env", pattern="dev|stage|prod")
arg("target", pattern="app1|app2")
deploy env target:
{{ if env == "dev" { "instanceid=" + DEVINSTANCEID } else { "" } }}
{{ if env == "prod" { "instance
id=" + PRODINSTANCEID } else { "" } }}
...

Which is already ugly enough, but what are my options?

But then I faced the need to pick values based on combination of env + target conditions, e.g. for port forwarding, where all the ports should be different. At this point I found out that justfile doesn't support AND or OR in if conditions. Parsing and evaluation of AND or OR operations isn't much harder then == and != itself.

Alright. Then I thought, maybe I'm approaching this wrong completely, maybe I need to generate all the tasks and treat justfile as a rendering engine for noscripts and task? I thought, maybe I need to use some for loop and basically try to generate deploy-{{env}}-{{target}}: root level tasks with fully instantiated noscript definition?

But I justfile doesn't support it as well.

I thought also about implementing some additional functions to simplify it, or like render time evaluation, but justfile doesn't support such functions as well.

So, at this point I'm quite disappointed in the value proposition of justfile, because honestly packing the noscripts into single file is quite the only value it brings. I know, maybe it's me, maybe I expected too much from it, but like what's the point of it then?

I've looked through github issues, there are things in dev, like custom functions and probably loops, but it's been about 3 or 4 years since I heard about it first time, and main limitations are still there. And the only thing I found regarding multiple conditions in if, is that instead of just implementing simplest operators evaluation, they thinking about integrating python as a noscripting language. Like, why? You already have additional tool to setup, "just" itself, bringing other runtime which actually gives programming features, out of which you need only the simplest operators and maps, is kinda defeats all the purpose. At this point it seems like reverting completely to just bash noscripts makes more sense than this.

What's your experience with just? All the threads I've seen about justfiles are already 1-3 years old, want to hear more fresh feedback about it.

https://redd.it/1qdnjhz
@r_devops
Research: how are teams controlling and auditing AI agents in production?

Hey folks,

We are researching how teams running AI agents in production deal with things like cost spikes, access control, and “what did this agent actually do?”

We put together a short anonymous survey (5–7 min) to understand current practices and gaps.

This is not a sales pitch. We are validating whether this is even a real problem worth solving.

Would appreciate honest, even skeptical feedback.

👉 https://forms.gle/yo7xwf6DrAnk2L5x7


https://redd.it/1qdoyc0
@r_devops
How big of a risk is prompt injection for client-facing chatbots or voice agents?

I’m trying to get a realistic read on prompt injection risk, not the “Twitter hot take” version When people talk about AI agents running shell commands, the obvious risks are clear. You give an agent too much power and it does something catastrophic like deleting files, messing up git state, or touching things it shouldn’t. But I’m more curious about client-facing systems. Things like customer support chatbots, internal assistants, or voice agents that don’t look dangerous at first glance. How serious is prompt injection in practice for those systems?

I get that models can be tricked into ignoring system instructions, leaking internal prompts, or behaving in unintended ways. But is this mostly theoretical, or are people actually seeing real incidents from it?

Also wondering about detection. Is there any reliable way to catch prompt injection after the fact, through logs or output analysis? Or does this basically force you to rethink the backend architecture so the model can’t do anything sensitive even if it’s manipulated?

I’m starting to think this is less about “better prompts” and more about isolation and execution boundaries.

Would love to hear how others are handling this in production.

https://redd.it/1qdr4hg
@r_devops
A Friday production deploy failed silently and went unnoticed until Monday

We have automated deployments that run Friday afternoons, and one of them silently failed last week. The pipeline reported green, monitoring did not flag anything unusual, and everyone went home assuming the deploy succeeded.

On Monday morning we discovered the new version never actually went out. A configuration issue prevented the deployment, but health checks still passed because the old version was continuing to run. Customers were still hitting bugs we believed had been fixed days earlier.

What makes this uncomfortable is realizing the failure could have gone unnoticed for much longer. Nothing in the process verified that the running build actually matched what we thought we deployed. The system was fully automated, but no one was explicitly confirming the outcome.

Automation removed friction, but it also removed curiosity. The pipeline succeeded, dashboards looked fine, and nobody thought to validate that the intended version was actually live. That is unsettling, especially since the entire system was designed to prevent exactly this kind of failure.

https://redd.it/1qdl5m8
@r_devops
What has been the most painful thing you have faced in recent time in Site Reliability/Devops

I have been working in the SRE/DevOps/Support-related field for almost 6 years
The most frustrating thing I face is whenever I try to troubleshoot anything, there's always some tracing gaps in the logs, from my gut feeling, know that the issue generates from a certain flow, but can never evidently prove that.

Is it just me, or has anyone else faced this in other companies as well? So far, I have worked with 3 different orgs, all Forbes top 10 kinda. Totally big players with no "Hiring or Talent Gap."

I also want to understand the perspective of someone working in a startup, how the logging and SRE roles work there in general, more painful as the product has not evolved, or if leadership cuts slack because the product has not evolved?

https://redd.it/1qdtskm
@r_devops
What's the canonical / industry standard way of collaborating on OpenTofu IaC?

I am a Typenoscript/Node backend developer and I am tasked with porting a mono repository to IaC.
- (1) When using OpenTofu for IaC, how do you canonically collaborate on an infrastructure change (when pushing code changes, validating plans, merging, applying)? I've read articles dealing with this topic, but it's not obvious what is a consensual option and what isn't. Workflows like Atlantis seem cool but I'm not sure what's are the caveats and downsides that come with its usage.
- (2) Why do people seem to need an external backend service? Do we really need to store a central state in a third party, considering OpenTofu can encrypt it? Or could we just track it in CI and devise a way to prevent merges on conflict? (secret vaults make sense though, since Github's secret management isn't suitable for the purpose of juggling the secrets of multiple apps and environments)
---

For more context:

The team I work for has a Github mono-repository for 4 standalone web applications, hosted on Vercel. We also use third party services like a NeonDB database, Digital Ocean storage bucket, OpenSearch, stuff like that.

Our team is still small at 8 developers, and it's not projected to grow significantly in size in the near future.
Vercel itself already offers a simplified CI/CD flow integration, but the reason we are going for IaC is mostly to help with our SOC2 compliance process. The idea is that we would be able to review configurations more easily, and not get bitten by un-auditable manual changes.

From that starting point, my understanding is that the industry standard for IaC is Terraform, and that the currently favored tool is its open source fork OpenTofu.

Then, I understand that in order to enable smooth collaboration and integration into GitHub's PR cycles, teams usually rely on a backend service that will lock/sync state files. Some commercial names that popped during my researches like Scalr, Env0, or Spacelift. These offer a lot of features which quite frankly I don't even understand. I also found tools like Atlantis and OpenTacos/Digger, but it's unclear whether or not these are niche or widely adopted.

If I had to pick up course of action right now, I would have gone for an Atlantis-like "GitOps" flow, using some sort of code hashing to detect conflicts on stale states when merging PRs. But I imagine that if it was that simple, this is what people would be doing.

https://redd.it/1qdsjmp
@r_devops
Resume Review Request 4 YOE, Jr. Security Engineer, US

Hello!

Resume here

Could I kindly request a quick glance over my resume? I transitioned into my position, and it's my first time in IT and cybersecurity. My first rotation threw me into the deep end with Linux engineering, automation, networking, and much more. However, I loved it and continued to pursue it.

Once I graduate from this program, I want to apply to software engineering roles that focus on security, devops, cloud, and the such.

I've mostly been chucking stuff that I've documented from my monthly reports into LLMs to try and come up with resume bullets, but would really appreciate human insight. Ideally, I'd like to shorten it back down to one page, and if there's any "fluff" please point it out. Would love constructive criticism.

Thanks in advance.

https://redd.it/1qe1je4
@r_devops
Looking for a "pro" perspective on my DevOps Capstone project

Hello everyone,

I’m currently building my portfolio to transition into Cloud/DevOps. My background is a bit non-traditional: I have a Bachelor's in Math, a Master’s in Theoretical CS, and I just finished a second Master’s in Cybersecurity.

My long-term goal is DevSecOps, but I think the best way to make my way on it is through a DevOps, Cloud, SRE, Platform Engineer, or any similar role for a couple of years first. 

I’ve just completed a PoC based on Rishab Kumar’s DevOps Capstone Project guidelines. Before I share this on LinkedIn, I was hoping to get some "brutally honest" feedback from this community.

The Tech Stack: Terraform, GitHub Actions, AWS, Docker

 Link: https://github.com/camillonunez1998/DevOps-project 

Specifically, I’m looking for feedback on:

1. Is my documentation clear enough for a recruiter?
2. Are there any "rookie" mistakes?
3. Does this project demonstrate the skills needed for a Junior Platform/DevOps role?

Thanks in advance!



https://redd.it/1qdokq6
@r_devops
What to focus on to get back into devops in 2026?

Some context: I worked in DevOps-related positions for the past decade but suffered some serious skill rot the past 4 years while working for the US government-- everything was out of date and I was kept away from most of the important pieces (No Kube exposure despite asking for experience with it, no major project deployments, mostly just small-time automation work.) However the job was *very* comfy and I allowed myself to settle into it -- a fatal error given that my entire team was laid off back in September during the government "cost saving" cuts.

Not taking the time after work to make sure I was current anyway and up to date was in part entirely my fault and in part severe burnout of the industry. (I have no passions for any work, really, so burnout is unavoidable for me.)

How do I course correct from here? I will likely need to work a much lower position in IT support (I'm completely out of money and lost my apartment already; Unemployment is not giving enough for cost of living here) and study evenings because I cannot pass an interview given the last several I've had going poorly; I simply do not have the necessary knowledge. I intend to re-certify as an AWS Solutions Architect Associate after letting it lapse, and may study for CKA as well.

I am admittedly pretty against AI and have that going against me right now, so I'm trying to focus on other avenues.

https://redd.it/1qe6z9g
@r_devops