NEW BOT Телеграм, страница

Reddit DevOps

How can I improve my Kubernetes and cloud skills

Basically, that’s it.
I have little to no experience with Kubernetes or cloud technologies. I wasn’t involved in any meaningful work with either of them in my previous roles. I’m currently unemployed and would love to gain some real, hands-on skills with both Kubernetes and AWS. Could you recommend any projects that would help me gain practical knowledge?

https://redd.it/1on5cjn
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

11 views07:28

Reddit DevOps

Which Azure cert begin with and is it hard for someone who has 8 years experience as a Data Engineer?

Im looking to get a cert in Azure just to get it and make any future jobs that require Azure easier and less stressful and these certs seems valuable af. My last job were trying to hire like 4 people with 5 years of general experience in data development but they had to have a azure cert and oh man our higher ups set up a pedestal for anyone who had this and tbh when I was training them I could tell they did not have 5 years of data development. But
Im pretty knowledgeable in everything data as I can confidently say I mastered Azure ADP's predecessor called SSIS already as working as an ETL Dev for most of my career was my bread and butter,

Question is Do I have to do azure certs in order or can I pick either the mid on and start studying from there? What would you reccommend?

Edit: they did not have 5 years of general experience

https://redd.it/1on6n95
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

8 views09:28

Reddit DevOps

Concentric AI - Devops engineer interview

I have an interview with Concentric AI for the role of DevOps Engineer. My profile shows 4+ years of experience in DevOps, but to be honest, most of my work has been around setting up simple CI/CD pipelines (built from scratch). I don’t have much hands-on experience with cloud technologies.

What should I expect from the interview, and how should I prepare?
Can someone please help?

https://redd.it/1on6frl
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

6 views10:28

Reddit DevOps

Clarity from an experienced cloud architect/DevOps engineer

How secure is path-based routing and is it industry standard for a 3-tier cloud native application that makes use of ECS and CodePipeline for CI/CD?

https://redd.it/1on8nuk
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

6 views11:28

Reddit DevOps

From Linux System Engineer to DevOps - Looking for Advice and Experiences

Hi everyone, I’ve wanted to transition into DevOps for a long time, but I only started seriously working toward it in February this year, building up the necessary skills. In the meantime, I received an offer to work as a Linux System Engineer, and I’ve been in that role for about four months now. I accepted it thinking it would help me transition to DevOps because of the skill similarities. Before that, I completed a three-year System Administrator apprenticeship here in Germany (“Ausbildung zum Fachinformatiker für Systemintegration”), where I mainly worked with Windows servers until the company introduced a deployment pipeline for its software. Unfortunately, the only overlapping skills in my current role are noscripting and Linux. The rest, Ansible, Kubernetes, CI/CD pipelines, etc. are not part of my job. I recently told my boss that I had expected more hands-on work with tools like Ansible and Terraform, and I asked whether there’s a way for me to transition internally to a DevOps position or possibly take on a new DevOps-focused role. Has anyone here gone through a similar transition? If so, I’d really appreciate hearing your detailed experience and any good tips you might have.

https://redd.it/1onacpn
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

5 views12:28

Reddit DevOps

We had perfect observability but still struggled during incidents. Here's what fixed it

We built a solid observability stack. OpenTelemetry pipelines, unified metrics, logs, traces. Beautiful Grafana dashboards. Everything instrumented. We could see everything.

But when incidents hit, we still struggled. Alerts fired, but we didn't know: is this severe? What do we do? Who should respond? Everyone had different opinions. "2% error rate is fine" vs "2% is catastrophic." We were improvising every time.

The missing piece wasn't technical. It was organizational. We needed SLOs to define what "working" means (so severity isn't subjective), runbooks to codify remediation steps (so response isn't improvisation), and post-mortems to learn from failures systematically (so we don't repeat mistakes).

Here's what actually worked for us:

SLOs: We use availability SLIs from OpenTelemetry span-metrics in Prometheus. We calculate percentage of successful requests by comparing successful calls (2xx/3xx) against total calls for each service. This gives us availability. We set 99.5% as our SLO, which creates a 0.5% error budget (14.4 hours downtime per month). Now we know when something is actually broken, not just "different." When we're burning error budget faster than expected, we slow feature releases.

Runbooks: We connect runbooks directly to alerts via PagerDuty. When an alert fires, the notification includes what's broken (service name, error rate), current vs expected (SLO threshold), where to look (dashboard link, trace query), and what to do (runbook link). The on-call engineer clicks the runbook and follows steps. No guessing, no Slack archaeology trying to remember what worked last time.

Post-mortems: We use a simple template: Impact (users affected, SLO impact), Timeline, Root Cause, What Went Well/Poorly, Action Items (with owners, priorities P0-P2, and due dates). The key is prioritizing action items in sprint planning. Otherwise post-mortems become theater where everyone nods, writes "we should monitor better" and changes nothing.

After implementing these practices, our MTTR dropped by 60% in three months. Not because we collected more data, but because we knew how to act on it.

I wrote about the framework, templates, and practical steps here: From Signals to Reliability: SLOs, Runbooks and Post-Mortems

What practices have helped your team move from reactive firefighting to proactive reliability?

https://redd.it/1ona979
@r_devops

Fatih Koç

From Signals to Reliability: SLOs, Runbooks and Post-Mortems

Build reliability with SLOs, runbooks and post-mortems. Turn observability into systematic incident response and learning. Practical examples for Kubernetes environments.

6 views13:28

Reddit DevOps

How are you enforcing code-quality gates automatically in CI/CD?

Right now our CI just runs unit tests. We keep saying we’ll add coverage and complexity gates, but every time someone tries, the pipeline slows to a crawl or throws false positives. I’d love a way to enforce basic standards - test coverage > 80%, no new critical issues - without babysitting every PR.

https://redd.it/1onb20l
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

6 views14:28

Reddit DevOps

Combining code review and SAST results - possible?

Security runs their scans separately, devs review manually, and we’re constantly duplicating effort. Ideally, reviewers should see security warnings inline with the code diff. Has anyone achieved that?

https://redd.it/1ona5yo
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

7 views15:28

Reddit DevOps

Anyone using AI for pull-request reviews yet?

Copilot is fine for writing code, but it doesn’t help during reviews. I’m wondering if anyone has used AI that can actually review a PR - like summarize changes, highlight risky logic, or point out missing edge cases.

https://redd.it/1onfv66
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

9 views16:28

Reddit DevOps

PostMessage Vulnerabilities: When Cross-Window Communication Goes Wrong 📬

https://instatunnel.my/blog/postmessage-vulnerabilities-when-cross-window-communication-goes-wrong

https://redd.it/1onbbod
@r_devops

InstaTunnel

PostMessage Vulnerabilities: Cross-Window Security Risks

Learn how improper postMessage usage enables XSS and token exfiltration, plus strict origin checks and mitigation recipes to secure cross-window communication.

7 views17:28

Reddit DevOps

AI is a Corporate Fad where I work

The noscript says it all. In my workplace (big company) we have non-technical decision makers asking for integrations of technology that they don't understand with existing technologies that they don't understand. What could go wrong financially?

My only hope is that this fad replaces the existing fad of hiring swaths of inexpensive out of town engineers to provide "top notch" solution design that falls flat at the implementation phase.

What's your experience?

https://redd.it/1onilgi
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

6 views18:28

Reddit DevOps

Gprxy: Go based SSO-first, psql-compatible proxy

https://github.com/sathwick-p/gprxy

Hey all,
I built a postgresql proxy for AWS RDS, the reason i wrote this is because the current way to access and run queries on RDS is via having db users and in bigger organization it is impractical to have multiple db users for each user/team, and yes even IAM authentication exists for this same reason in RDS i personally did not find it the best way to use as it would required a bunch of configuration and changes in the RDS.

The idea here is by connecting via this proxy you would just have to run the login command that would let you do a SSO based login which will authenticate you through an IDP like azure AD before connecting to the db. Also helps me with user level audit logs

I had been looking for an opensource solution but could not find any hence rolled out my own, currently deployed and being used via k8s

Please check it out and let me know if you find it useful or have feedback, I’d really appreciate hearing from y'all.

Thanks!

https://redd.it/1oni3df
@r_devops

GitHub

GitHub - sathwick-p/gprxy: Go based SSO-first, psql-compatible proxy

Go based SSO-first, psql-compatible proxy. Contribute to sathwick-p/gprxy development by creating an account on GitHub.

9 views19:28

Reddit DevOps

Just got $5K AWS credits approved for my startup

Didn’t expect this to still work in 2025, but I just got **$5,000 in AWS credits** approved for my small startup.

We’re not in YC or any accelerator just a verified startup with:

* a **website**
* a **business email**
* and an actual product in progress

It took around 2–3 days to get verified, and the credits were added directly to the AWS account.

So if you’re building something and have your own domain, there’s still a valid path to get AWS credits even if you’re not part of Activate.

If anyone’s curious or wants to check if they’re eligible, DM me I can share the steps.

https://redd.it/1onmg20
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

7 views20:28

Reddit DevOps

Migrating from Octopus Deploy to Gitlab. What are Pros and Cons?

Due to reasons I won't get into, we might need to move from Octopus Deploy to Gitlab for CICD. Trying to come up with some pros and cons so I can convince management to keep Octopus (despite the cost). Here are some of pros for having Octopus that I have listed:

Release management.
If we need to roll back to a previously functioning version of our code, we can simply click on the previous release and then leisurely work on fixing the problem. (sometimes issues aren't always visible in QA or Staging). Gitlab doesn't seem to have this.
Script Console
Octopus lets us send commands (eg, iisreset) to an entire batch of VMs in one shot instead having to write something that would loop through a list of VMs, or God forbid, remoting into each VM manually. GitLab doesn't seem to have that either. This comes in really handy when we need to quickly run a task in the middle of an outage.
Variable Management and Substitution
Scoping variable with different values seem to be handled much better in Octopus compared to GitLab. Also I could not find anything that says you can do variable substitution in your code for files like .config, .json files. No .NET variable substitution either in Gitlab.
Pipeline Design
Gitlab pipeline seems to be all YAML which means a lot of the tasks that Octo does for you, like IIS configuration, Kubernetes deployments, etc., will have to noscripted from scratch. (Correct me if I'm wrong on this).

These some of the Pros of Octopus I could think of. Are there any more I can use to back up my argument.
Also is there anyone who went through the same exercise? What is your experience using Gitlab after having Octopus for a while?

https://redd.it/1onlv3s
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

6 views21:28

Reddit DevOps

Curious how folks are feeling regarding ethics in the current political climate with regards to tech?

I'm asking the question in this sub on the basis that people have to have a reasonable level of experience to be in this field across disciplines. (I did the helpdesk - Sysad - DevOps route, for context I started that journey in the late 90s).

If not allowed (I understand politics is a sensitive subject) I will completely understand mods removing the post.

I'm between contracts at the moment, and in the last decade at least, whenever I'm not working I get offers to work for "gaming" (gambling) sites that I always turn down... I have a number of friends who wrecked their lives through gambling addiction - I wouldn't feel comfortable taking that paycheck.

(I'm not shitting on people who do, I get it. No judgement... Just a personal thing on my part).

I recently watched a pretty in-depth breakdown of the facial recognition AI stuff being trialled in the US (additional context, I'm not American nor do I live there anymore, but it's fair to assume it's coming everywhere soon), but I have previously worked for a number of US companies, including at least one that I know is involved in stuff that makes me feel pretty uncomfortable about the way the technology is progressing, and importantly, the things it is being used for.

I suppose I'm speaking more to the greybeards and grey hats in this community - but I'm curious to gauge how folks are feeling about developing and supporting this kind of thing?

https://redd.it/1onpv8x
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

9 views22:28

Reddit DevOps

The APM paradox

I've recently been thinking about how to get more developers (especially on smaller teams) to adopt observability practices, and put together some thoughts about how we're approaching it at the monitoring tool I'm building. We're a small team of developers who have been on-call for critical infrastructure for the past 13 years, and have found that while "APM" tools tend to be more developer-focused, we've generally found logging to be more essential for our own systems (which led us to build a structured logging tool that encourages wide events).

I'm curious what y'all think — how can we encourage more developers to learn about observability?

https://www.honeybadger.io/blog/apm-paradox/

https://redd.it/1onmrnj
@r_devops

Honeybadger Developer Blog

The APM paradox: Too much data, too few answers

Most dev teams don't need 47 dashboards or petabytes of logs—they need answers at 2 AM. Explore the evolution from APM to observability.

7 views23:28

Reddit DevOps

Why do cron monitors act like a job "running" = "working"?

Most cron monitors are useless if the job executes but doesn't do what it's supposed to. I don't care if the noscript ran. I care if:
- it returned an error
- it output nothing
- it took 10x longer than usual
- it "succeeded" but wrote an empty file

All I get is "✓ ping received" like everything's fine.

Anything out there that actually checks exit status, runtime anomalies, or output sanity? Or does everyone just build this crap themselves?

https://redd.it/1onrwrl
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

10 views00:28

Reddit DevOps

Custom Podman Container Dashboard?

I have a bunch of docker containers(well technically podman containers) running on a Linux node and its getting to a point where its annoying to keep a track of all the containers. I have all the necessary identifying information(like requestor, poc etc.) added as labels to each container.

I'm looking for a way to create something like a dashboard to present this information like Container name, status, label1, label2, label3 in a nice tabular form.

I've already experimented with Portainer and Cockpit but couldn't really create a customized view per my needs. Does anyone have any ideas?

https://redd.it/1onsszc
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

7 views01:28

Reddit DevOps

How do you size VPS resources for different kinds of websites? Looking for real-world experience and examples.

I’m trying to understand how to estimate VPS resource requirements for different kinds of websites — not just from theory, but based on real-world experience.

Are there any guidelines or rules of thumb you use (or a guide you’d recommend) for deciding how much CPU, RAM, and disk to allocate depending on things like:

* Average daily concurrent visitors

* Site complexity (static site → lightweight web app → high-load dynamic site)

* Whether a database is used and how large it is

* Whether caching or CDN layers are implemented

I know “it depends” — but I’d really like to hear from people who’ve done capacity planning for real sites:

What patterns or lessons did you learn?

* What setups worked well or didn’t?

* Any sample configurations you can share (e.g., “For a small Django app with \~10k daily visitors and caching, we used 2 vCPUs and 4 GB RAM with good performance.”)?

I’m mostly looking for experience-based insights or reference points rather than strict formulas.

Thanks in advance!

https://redd.it/1onlpxe
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

5 views02:28

Reddit DevOps

Dudes, I'm scared, I know it's scam, but what if it is not? Have you ever received a mail like this before? and how its going?

Dudes, I'm just a hobbys ( If you look in my linkedin, you'll notice I'm not a programmer ), I've learned programming, algorithms, design patterns, all by myself. I also publish articles on my Medium blog, documenting the concepts I've learned from my reading and online study.

I recently discovered an email written in Chinese in my Gmail spam folder and have translated its contents using Google Translate.

---
Tonghuashun AIME Program Invitation

Hello,

I am XXXXX, HR from Hexin Tonghuashun (Stock Code: 300033). We noticed your excellent background in development on GitHub, which highly matches the requirements for our AI Engineering Development position. The specific focus areas include, but are not limited to, algorithm application, algorithm engineering, and large AI model development.

Core Advantages of the Position

- Salary benchmarked against top-tier tech companies.

- Listed company stock incentives provided through the AIME Talent Double Hundred Plan.

- Participation in the R&D of AI products with millions of users.

- Tech Stack: Engineering (Java/Web/C++/etc.) and cutting-edge algorithms (Large Models/NLP/AIGC/Robotics/Speech/etc.).

Company Profile

Zhejiang Hexin Tonghuashun Network Information Co., Ltd. (Tonghuashun), established in 1995 and listed on the Shenzhen Stock Exchange in 2009 (Stock Code: 300033), is the first listed company in China's internet financial information services industry. We currently have over 7,000 employees, with our headquarters located in the beautiful and livable city of Hangzhou.

As an internet financial information provider, Tonghuashun's main business is to offer software products and system maintenance services, financial data services, and intelligent promotion services to various institutions, and to provide financial information and investment analysis tools to individual investors. To date, Tonghuashun has nearly 600 million registered users and over ten million daily active users. We have established business cooperation with over 90% of domestic securities companies, with a strong "moat" business ensuring stable cash flow for the company.

Supported Business

Based on comprehensive AI capabilities such as large models, NLP, speech, graphics, image, and vision, we currently cover multiple 2B and 2C application scenarios. Our numerous products include the intelligent investment advisory robot AIME, intelligent service, data intelligence, smart healthcare, AIGC, and digital humans. Targeting various regions including China, Europe and the US, the Middle East, and Southeast Asia, we are progressively realizing the path of technology commercialization and product marketization. The AI team has accumulated over ten years of experience, with hundreds of large model application scenarios implemented, trillions of user financial dialogue data points, and a self-built cluster of thousands of cards for computing power. We are one of the first domestic enterprises to receive cybersecurity administration approval for financial large models.

I look forward to discussing this further with you! You can contact me via:

- WeChat/Phone: XXXX

- Email: XXXX

If you are interested, please feel free to contact me at any time, and I will arrange a detailed conversation for you.

Wishing you the best of business!

HR XXX | Zhejiang Hexin Tonghuashun Network Information Co., Ltd.

Company Website: https://www.10jqka.com.cn/

----

Ok, now, I dont know what to do. I know this could be spam, but what if doesn't, I mean, links look real.

here is my git if you're interested in what they've seen: https://github.com/EDBCREPO

Have you ever received a mail like this before?

----

EDIT: f@@k, the link and job looks real: https://campus.10jqka.com.cn/job/list?type=school

https://redd.it/1ony2tl
@r_devops

同花顺财经

同花顺财经__让投资变得更简单

核新同花顺网络信息股份有限公司（同花顺）成立于1995年，是一家专业的互联网金融数据服务商，为您全方位提供财经资讯及全球金融市场行情，覆盖股票、基金、期货、外汇、债券、银行、黄金等多种面向个人和企业的服务。

5 views04:28

Reddit DevOps

How are you handling these AWS ECS (Fargate) issues? Planning to build an AI agent around this…

Hey Experts,

I’m exploring the idea of building an AI agent for AWS ECS (Fargate + EC2) that can help with some tricky debugging and reliability gaps — but before going too far, I’d love to hear how the community handles these today.

**Here are a few pain points I keep running into 👇**

* When a process slowly eats memory and crashes — and there’s no way to grab a heap/JVM dump *before* it dies.
* Tasks restart too fast to capture any “pre-mortem” evidence (logs, system state, etc.).
* Fargate tasks fill up ephemeral disk and just get killed, no cleanup or alert.
* Random DNS or network resolution failures that are impossible to trace because you can’t SSH in.
* A new deployment “passes health checks” but breaks runtime after a few minutes.

**I’m curious**

* Are you seeing these kinds of issues in your ECS setups?
* And if so, how are you handling them right now — noscripts, sidecars, observability tools, or just postmortems?

Would love to get insights from others who’ve wrestled with this in production. 🙏

https://redd.it/1onzb40
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

5 views06:28

About

Blog

Apps

Platform