CatOps – Telegram
CatOps
5.09K subscribers
94 photos
5 videos
19 files
2.57K links
DevOps and other issues by Yurii Rochniak (@grem1in) - SRE @ Preply && Maksym Vlasov (@MaxymVlasov) - Engineer @ Star. Opinions on our own.

We do not post ads including event announcements. Please, do not bother us with such requests!
Download Telegram
You may have seen this news already, but it's still worth sharing in my opinion.

Terraform adds a new language concept called "actions". It's interesting, because while HCL evolved through time, basic concepts such as resources, data sources, variables, and locals remained the same.

In nutshell, Actions are basically provider-approved local-execs: provider developers can add specific imperative tasks, such as invoking an AWS Lambda, that you can later attach to your resource's lifecycle.

The concept is not new, you were likely already doing this with provisioners, local and remote execs, and other hacks. Now, it's just more universal and straightforward.

The only thing is that actions are up to provider developers to implement, so do not expect many of them available from the get-go.

#terraform #hashicorp
👀91👍1
There’s a lot of fluff around about how one should be a technical leader, yet the exact understanding of what is leadership is rather vague.

Here’s a nice Friday read on the topic that resonated with me.

Here’s a great quote:

>>>
The more comfortable you become with not being the expert, the more effective you become as a leader.
When you stop trying to out-expert the experts, you can focus on what expert teams actually need:
- Clear problem definitions
- Context for decision-making
- Translation between different perspectives
- Protection from unnecessary complexity
- Space to do their best work

#culture
👍6🔥3
​​It’s Monday, fellas.

A friend of mine is raising money for a Starlink and 3 power stations for the 5th Separate Heavy Mechanized Brigade and 33rd Separate Mechanized Brigade.

You can donate via to this Monobank jar (Apple pay works):

https://send.monobank.ua/jar/3VEQNLAcia

Almost half of the goal is already there.

#donations #Monday
6
I like it, when people talk about definitions, especially for some common / widely use terms. The more common a term is, the more one is "afraid" to ask about it. And since we cannot glimpse into other people's minds, we can talk about completely different things using the same words.

Availability Models talks about the definitions of "high availability" - incredibly popular term in computer science! It doesn't examine all the availability models, despite its name. Rather, this article brings up a question: so, what the heck is "high availability" and how can we define that based on our actual needs?

P.S. Also, I didn't know about the PACELC theorem. It always feels so cool to learn something new!

#databases #programming #system_design
3🤔1
Nice weekend read about technical blogging.

https://writethatblog.substack.com/p/thorsten-ball-on-technical-blogging

One of the best ways to learn is to try to explain whatever you have learned to someone else. Blogging is a form of that.

Perhaps, we can dedicate a day to share the blogs we like here on CatOps, let me know, if you think, it’s a good idea.

#culture
👍12
​​Since AWS has an outage, some of you have unplanned time off anyway. So, it's a good time to make a donation to a noble cause!

A friend of mine has a supportive jar for INSCIENCE who partnered with the Come Back Alive foundation to raise money to combat enemy UAVs.

https://send.monobank.ua/jar/fKfgmjgw1

Her goal is 20k UAH, so we can easily achieve it!

P.S. Apparently, Monobank hosts in AWS, since their web paged did not renew once I sent the donation.
P.P.S. Despite N26 also being on AWS, they processed the transaction just fine.

#donations #Ukraine
👍9
​​Speaking of AWS:

Oct 20 2:01 AM PDT We have identified a potential root cause for error rates for the DynamoDB APIs in the US-EAST-1 Region. Based on our investigation, the issue appears to be related to DNS resolution of the DynamoDB API endpoint in US-EAST-1. We are working on multiple parallel paths to accelerate recovery. This issue also affects other AWS Services in the US-EAST-1 Region. Global services or features that rely on US-EAST-1 endpoints such as IAM updates and DynamoDB Global tables may also be experiencing issues. During this time, customers may be unable to create or update Support Cases. We recommend customers continue to retry any failed requests. We will continue to provide updates as we have more information to share, or by 2:45 AM.

#aws #outage
😭20
Helm graduated into a "boring technology" a long time ago. So, the news about the upcoming v4 major version are probably no so exciting, unless you are a Helm plugin developer.

Anyway, here's a sneak peek of upcoming changes. They promise that:

>>>
*Bottom Line*: Helm v4.0 is a major architectural upgrade focused on better Kubernetes integration, enhanced plugin capabilities,
and improved developer experience while maintaining chart
compatibility.


So, if you only care about your charts, those should continue to work.

#helm #kubernetes
5👍5🤮1
I don't know if you are involved into the promotions cycles in your company or participate in hiring, but I still want to share with you this article - On Hiring: Promote Stars, Not Strangers.

It is targeted towards managers, but you can easily apply the core idea of this article to any role or position in a company.

#culture
👍2
A glimpse into the alerting infrastructure of Hugging Face - a repository for ML models and datasets.

This article has a bit of a "they made me write this for promotion" vibe, but it's still interesting to see what technologies other people use, even if they don't dig deep into any of them.

#observability
3🤗1
Reddit engineers made a long post about their DevEx survey, which they shared... on Reddit.

This is a nice read, if you'd like to learn how different companies evaluate developer productivity and satisfaction.

A few interesting points there:

- They don't use any specialized SaaS tools for this survey - only Typeform and Looker.
- The survey is quite long, but the adoption seems to be good.
- They include various topics into the survey, not just purely DX metrics, they also changed the name of the survey to reflect that.
- Teams can use the results on their own to calibrate their decisions.

#dx #reddit
4
OneUptime has published their update after two years since moving from AWS.

It's an interesting read, and tl;dr is that they do not regret their decision at all. Although, they admit that they still use cloud for dev environments, cold storage, etc.

Here are a few points that I find interesting:

- Our workload is 24/7 steady / We still recommend staying put if your usage pattern is spiky or seasonal - right sizing is one of the major advantages of clouds that is often overlooked.
- We still recommend staying put if you lean heavily on managed services - this is another important point. Managed services add a lot of value to the clouds. It does seem a bit dumb to use AWS just like an expensive datacenter. On another hand, if you want to be able to do multi-cloud, hybrid-cloud, etc.; you have to make a deliberate decision to stay as much decoupled from cloud offerings as possible. It's a deliberate strategy that trades flexibility for immediate value.
- Ceph stack in production - I'm sure, Ceph evolved a lot through the years, but I still remember words of a colleague of mine from a long time ago: "We didn't lose the data, we just cannot retrieve it". Back then, we decided to keep on-premise installations with ephemeral disks and ship all the data that has to be preserved into AWS (it's not like there was a lot of data to preserve there, though).
- so we added Anycast ingress via BGP with our transit provider to cut traffic shifting to sub-minute and We PXE boot with Tinkerbell - ask on an interview about Anycast, and the same day you will get an angry post on Reddit about unreasonable questions and lowballing candidates, lol. Or maybe, it's just my pessimism speaking.

Anyway, use your best judgment before doing any rapid moves. BTW, this advice is generally applicable, and is not limited to the clouds vs dc discussions.

#aws #bare_metal
👍83👎1
​​For today’s Donations Monday, I’d like to share with you a fundraiser for FPV drones from DeepState - a collective behind the close-to-real-time battlefield maps.

https://send.monobank.ua/jar/9AtiB8esqu

#donations #Ukraine
2
More follow-ups for the AWS outage (Azure outage didn't generate that much press).

Lorin Hochstein analyzes the postmortem from the complexity point of view and comes to quite interesting conclusions that you can absolutely apply to your incidents and postmortems as well.

tl;dr is that incidents (especially bigger ones) are often unique. So, when reasoning about the preventive measures, you need not only to prevent similar incidents, but also get prepared to handle incidents in general, because the next incident may be not the same as the present one.

#reliability #sre #aws
5👍1
An article by Charity Majors on why thinking of Observability in pillars is limiting.

I recall a similar article from the past about how Facebook does their observability. It’s somewhere here on the channel.

The core idea is to treat all the signals as universal wide events that would allow one to preserve all the context and not hop between different tools.

#observability
👍8🤯1