PayPal and OpenAI join forces: shopping via ChatGPT will become a reality
🔗Link: https://newsroom.paypal-corp.com/2025-10-28-OpenAI-and-PayPal-Team-Up-to-Power-Instant-Checkout-and-Agentic-Commerce-in-ChatGPT
🔗Link: https://newsroom.paypal-corp.com/2025-10-28-OpenAI-and-PayPal-Team-Up-to-Power-Instant-Checkout-and-Agentic-Commerce-in-ChatGPT
PayPal Newsroom
OpenAI and PayPal Team Up to Power Instant Checkout and Agentic Commerce in ChatGPT
PayPal's tens of millions of merchants will soon be discoverable in ChatGPT, helping connect consumers to businesses that they love SAN JOSE, Calif., Oct. 28, 2025 /PRNewswire/ -- PayPal, Inc....
😱2🤪2
The four horsemen of modern data architectures:
https://luminousmen.com/post/data-warehouse-data-lake-data-lakehouse-data-mesh-what-they-are-and-how-they-differ
https://luminousmen.com/post/data-warehouse-data-lake-data-lakehouse-data-mesh-what-they-are-and-how-they-differ
Blog | iamluminousmen
Data Warehouse, Data Lake, Data Lakehouse, Data Mesh: What They Are and How They Differ
Discover the differences between Data Warehouse, Data Lake, Data Lakehouse, and Data Mesh. Dive into modern data architectures without the BS. Explore their strengths, weaknesses, and use cases in plain language.
❤2
A teammate asks you a question. You answer. They move on.
Repeat that enough, and you’ve accidentally trained your team not to think
https://luminousmen.com/post/what-do-you-think/
Repeat that enough, and you’ve accidentally trained your team not to think
https://luminousmen.com/post/what-do-you-think/
Blog | iamluminousmen
What Do You Think?
Encourage critical thinking in your team with a simple phrase. Learn how asking "What do you think?" can transform you from a problem-solver to a mentor and team multiplier.
👍5
Come on, this is fucking ridiculous
"hey claude, create a datasheet where our model is leading on every benchmark (btw create a benchmark)"
🔗Link: https://www.anthropic.com/news/claude-opus-4-5
"hey claude, create a datasheet where our model is leading on every benchmark (btw create a benchmark)"
🔗Link: https://www.anthropic.com/news/claude-opus-4-5
🔥4💯1
Most people treat BigQuery like a magic SQL endpoint.
You write a query, hit Run, wait a few seconds... and a petabyte-sized answer pops out.
If it's slow or expensive, the default reaction is: "I need more compute".
That's backwards.
BigQuery is designed to skip work, not to muscle through it:
https://luminousmen.com/post/bigquery-explained-what-really-happens-when-you-hit-run
You write a query, hit Run, wait a few seconds... and a petabyte-sized answer pops out.
If it's slow or expensive, the default reaction is: "I need more compute".
That's backwards.
BigQuery is designed to skip work, not to muscle through it:
https://luminousmen.com/post/bigquery-explained-what-really-happens-when-you-hit-run
Blog | iamluminousmen
BigQuery Explained: What Really Happens When You Hit “Run”
Discover the magic behind BigQuery's "infinite cluster" in this insightful breakdown of its internals. Learn how SQL queries get executed in seconds, unraveling the mystery behind Google's serverless system.
🔥1
Security researchers at PromptArmor have discovered a critical vulnerability in Google Antigravity - Google's new AI-powered IDE that uses Gemini-based agents. Through an indirect prompt-injection attack, an outside actor can:
- Trick Gemini into reading sensitive local files (like
- Use the built-in agent browser to quietly exfiltrate that data through crafted URLs
- Bypass safeguards such as "secret filtering" or
Antigravity's agents are granted broad capabilities - access to code, a shell, and a browser - a single injected prompt hidden in a README or a code comment can silently leak data without any user action😦
If you're experimenting with Antigravity or any similar agent-driven development tools, keep the following in mind:
- Lock down access to secrets
- Audit what capabilities your agents actually have
- Treat AI agents like remote developers - don't give them any more power than you'd hand to a junior engineer with near-root access
🔗 Link: https://promptarmor.com/resources/google-antigravity-exfiltrates-data
- Trick Gemini into reading sensitive local files (like
.env files or API keys)- Use the built-in agent browser to quietly exfiltrate that data through crafted URLs
- Bypass safeguards such as "secret filtering" or
.gitignore protections by triggering shell commands like catAntigravity's agents are granted broad capabilities - access to code, a shell, and a browser - a single injected prompt hidden in a README or a code comment can silently leak data without any user action😦
If you're experimenting with Antigravity or any similar agent-driven development tools, keep the following in mind:
- Lock down access to secrets
- Audit what capabilities your agents actually have
- Treat AI agents like remote developers - don't give them any more power than you'd hand to a junior engineer with near-root access
🔗 Link: https://promptarmor.com/resources/google-antigravity-exfiltrates-data
Promptarmor
Google Antigravity Exfiltrates Data
An indirect prompt injection in an implementation blog can manipulate Antigravity to invoke a malicious browser subagent in order to steal credentials and sensitive code from a user’s IDE.
👍2
Lowering the gates to the CUDA moat.
NotebookLM - generated infographics follows the Google's new TPU announcement
🔗Link: https://www.linkedin.com/posts/semianalysis_notebooklm-recently-introduced-a-new-function-activity-7400973159853780992-PsXz
NotebookLM - generated infographics follows the Google's new TPU announcement
🔗Link: https://www.linkedin.com/posts/semianalysis_notebooklm-recently-introduced-a-new-function-activity-7400973159853780992-PsXz
👍1
Throughout my career, I keep coming back to the same optimization in data pipelines:
Filter as early as possible.
Recently I cut a 3-hour job down to 30 minutes and dropped compute cost from $600 to $9 just by doing that.
If your analytics team needs sales from just three stores, don't build the full sales mart and filter later. That's waste.
Push the store filter upstream-before joins, before aggregations, as close to storage as you can. Join only on those store IDs from the start.
On most engines this means less data scanned, less shuffling, and better use of partition pruning / predicate pushdown. In practice you get:
- Less I/O
- Less memory pressure
- Faster, cheaper queries
But here's the nuance: don't hardcode business logic upstream. Maintainability still matters.
Instead of sprinkling storeid IN (...) across jobs, drive those filters from config, parameters, or dimension tables (like an activestores view). Same optimization, less brittleness.
Before you run your next pipeline, ask:
Can I reduce data volume earlier without introducing fragile business logic?
Filter as early as possible.
Recently I cut a 3-hour job down to 30 minutes and dropped compute cost from $600 to $9 just by doing that.
If your analytics team needs sales from just three stores, don't build the full sales mart and filter later. That's waste.
Push the store filter upstream-before joins, before aggregations, as close to storage as you can. Join only on those store IDs from the start.
On most engines this means less data scanned, less shuffling, and better use of partition pruning / predicate pushdown. In practice you get:
- Less I/O
- Less memory pressure
- Faster, cheaper queries
But here's the nuance: don't hardcode business logic upstream. Maintainability still matters.
Instead of sprinkling storeid IN (...) across jobs, drive those filters from config, parameters, or dimension tables (like an activestores view). Same optimization, less brittleness.
Before you run your next pipeline, ask:
Can I reduce data volume earlier without introducing fragile business logic?
💯5👍1
AWS Lambda Managed Instances allows you to run Lambda functions on EC2 instances, preserving the familiar serverless model while gaining control over the hardware and EC2-based pricing. Wow, serverful computing
This is an attempt to cover use cases where Lambda is great from a development perspective but not cost- or hardware-efficient-without fully switching to ECS/EC2. In architectures with steady-state load or specific hardware requirements, this could be a game-changer, but you'll need to carefully profile multiconcurrency and realistically calculate the cost for your workload.
🔗 Link: https://aws.amazon.com/blogs/aws/introducing-aws-lambda-managed-instances-serverless-simplicity-with-ec2-flexibility/
This is an attempt to cover use cases where Lambda is great from a development perspective but not cost- or hardware-efficient-without fully switching to ECS/EC2. In architectures with steady-state load or specific hardware requirements, this could be a game-changer, but you'll need to carefully profile multiconcurrency and realistically calculate the cost for your workload.
🔗 Link: https://aws.amazon.com/blogs/aws/introducing-aws-lambda-managed-instances-serverless-simplicity-with-ec2-flexibility/
Amazon
Introducing AWS Lambda Managed Instances: Serverless simplicity with EC2 flexibility | Amazon Web Services
Run Lambda functions on EC2 compute while maintaining serverless simplicity—enabling access to specialized hardware and cost optimizations through EC2 pricing models, with AWS handling all infrastructure management.
👍3
The real reason your Spark cluster is burning money:
https://luminousmen.com/post/dive-into-spark-memory
https://luminousmen.com/post/dive-into-spark-memory
Blog | iamluminousmen
Deep Dive into Spark Memory Management
Discover why your Spark cluster is losing money with a deep dive into Spark memory management. Uncover the complexities of memory allocation, off-heap memory, and task management for optimal performance.
👍1
It was a long year and you still hold on to my writing?
Thank you - genuinely.
Now, since you've made it this far, I want to give you a gift.
You know, I'm a simple man - my favorite holiday is New Year, and if you check the calendar you can guess I'm a bit happier right now.
I've been writing for a long time without giving much back to you, fellow reader - I assume a data engineer, maybe a future colleague.
What I write is usually deeply technical stuff, occasional rants, sometimes practical tips, and sentimental career advice for fellow data engineers. If you like how that sounds and want access to the paid posts too, there's a 30% off yearly discount running right now: https://luminousmen.substack.com/129bfd67
I keep some work on the paid side to make it sustainable and to go deeper instead of chasing clicks. As I said before, gated knowledge is where we're heading - I'm just trying to keep the gate cheap and honest.
ho-ho-ho-ho 🎄
Thank you - genuinely.
Now, since you've made it this far, I want to give you a gift.
You know, I'm a simple man - my favorite holiday is New Year, and if you check the calendar you can guess I'm a bit happier right now.
I've been writing for a long time without giving much back to you, fellow reader - I assume a data engineer, maybe a future colleague.
What I write is usually deeply technical stuff, occasional rants, sometimes practical tips, and sentimental career advice for fellow data engineers. If you like how that sounds and want access to the paid posts too, there's a 30% off yearly discount running right now: https://luminousmen.substack.com/129bfd67
I keep some work on the paid side to make it sustainable and to go deeper instead of chasing clicks. As I said before, gated knowledge is where we're heading - I'm just trying to keep the gate cheap and honest.
ho-ho-ho-ho 🎄
Substack
Subscribe to Blog | luminousmen
helping robots conquer the earth and trying not to increase entropy using Python, Data Engineering, Machine Learning. Click to read Blog | luminousmen, a Substack publication with thousands of subscribers.
❤4🔥4👀1