A chance of training on huge piles of retail GPUs and not just on the superclusters?
https://huggingface.co/papers/2311.08105
https://huggingface.co/papers/2311.08105
huggingface.co
Paper page - DiLoCo: Distributed Low-Communication Training of Language Models
Join the discussion on this paper page
❤2
System 2 Attention (S2A).
- Soft attention in Transformers is susceptible to irrelevant/biased info
- S2A uses LLM reasoning to generate what to attend to
Improves factuality & objectivity, decreases sycophancy.
https://arxiv.org/abs/2311.11829
- Soft attention in Transformers is susceptible to irrelevant/biased info
- S2A uses LLM reasoning to generate what to attend to
Improves factuality & objectivity, decreases sycophancy.
https://arxiv.org/abs/2311.11829
DARE/MergeLM: Absorbing Abilities from Homologous Models as a Free Lunch
https://github.com/yule-BUAA/MergeLM
https://github.com/yule-BUAA/MergeLM
GitHub
GitHub - yule-BUAA/MergeLM: Codebase for Merging Language Models (ICML 2024)
Codebase for Merging Language Models (ICML 2024). Contribute to yule-BUAA/MergeLM development by creating an account on GitHub.
In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day.
https://arxiv.org/abs/2304.03442
https://github.com/joonspk-research/generative_agents
https://arxiv.org/abs/2304.03442
https://github.com/joonspk-research/generative_agents
GitHub
GitHub - joonspk-research/generative_agents: Generative Agents: Interactive Simulacra of Human Behavior
Generative Agents: Interactive Simulacra of Human Behavior - joonspk-research/generative_agents
👾1
https://github.com/comfyanonymous/ComfyUI
if you are into Stable Diffusion
if you are into Stable Diffusion
This media is not supported in your browser
VIEW IN TELEGRAM
insanely detailed LLM inference visualization from Brendan Bycroft
https://bbycroft.net/llm
https://bbycroft.net/llm
🔥2
⚡3
https://twitter.com/MistralAI/status/1733150512395038967
beautiful. on friday. even more beautiful.
beautiful. on friday. even more beautiful.
👍2
ChatGPT: sometimes “hallucinates” (tries to guess the details not in the training set).
OpenAI: tries to counter that
Google: hold my beer, let’s hallucinate the actual Gemini model presentation!
https://arstechnica.com/information-technology/2023/12/google-admits-it-fudged-a-gemini-ai-demo-video-which-critics-say-misled-viewers/
OpenAI: tries to counter that
Google: hold my beer, let’s hallucinate the actual Gemini model presentation!
https://arstechnica.com/information-technology/2023/12/google-admits-it-fudged-a-gemini-ai-demo-video-which-critics-say-misled-viewers/
Ars Technica
Google’s best Gemini AI demo video was fabricated
Google takes heat for a misleading AI demo video that hyped up its GPT-4 competitor.
👍2
Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia from Google DeepMind
https://arxiv.org/abs/2312.03664
https://github.com/google-deepmind/concordia
https://arxiv.org/abs/2312.03664
https://github.com/google-deepmind/concordia
GitHub
GitHub - google-deepmind/concordia: A library for generative social simulation
A library for generative social simulation. Contribute to google-deepmind/concordia development by creating an account on GitHub.
👍1
Google has good researchers and not so good product managers, as always.
Loosely related: Terence Tao was saying "LLMs help me with math" for a while already.
“The FunSearch paper by DeepMind that was used to discover new mathematics is an example of searching through generative patterns and employing evolutionary methods to creatively conjure up new solutions. This is a very general principle that lies at the core of creativity.”
https://www.nature.com/articles/d41586-023-04043-w
Loosely related: Terence Tao was saying "LLMs help me with math" for a while already.
“The FunSearch paper by DeepMind that was used to discover new mathematics is an example of searching through generative patterns and employing evolutionary methods to creatively conjure up new solutions. This is a very general principle that lies at the core of creativity.”
https://www.nature.com/articles/d41586-023-04043-w
Nature
DeepMind AI outdoes human mathematicians on unsolved problem
Nature - Large language model improves on efforts to solve combinatorics problems inspired by the card game Set.
🔥1
> Unfortunately , too few people understand the distinction between memorization and understanding. It's not some lofty question like "does the system have an internal world model?", it's a very pragmatic behavior distinction: "is the system capable of broad generalization, or is it limited to local generalization?"
-- a thread from François Chollet
> by popular demand: a starter set of papers you can read on the topic.
"Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks": https://arxiv.org/abs/2311.09247
"Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve": https://arxiv.org/abs/2309.13638
"Faith and Fate: Limits of Transformers on Compositionality": https://arxiv.org/abs/2305.18654
"The Reversal Curse: LLMs trained on "A is B" fail to learn 'B is A'": https://arxiv.org/abs/2309.12288
"On the measure of intelligence": https://arxiv.org/abs/1911.01547 not about LLMs, but provides context and grounding on what it means to be intelligent and the nature of generalization. It also introduces an intelligence benchmark (ARC) that remains completely out of reach for LLMs. Ironically the best-performing LLM-based systems on ARC are those that have been trained on tons of generated tasks, hoping to hit some overlap between test set tasks and your generated tasks -- LLMs have zero ability to tackle an actually new task.
In general there's a new paper documenting the lack of broad generalization capabilities of LLMs every few days.
-- a thread from François Chollet
> by popular demand: a starter set of papers you can read on the topic.
"Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks": https://arxiv.org/abs/2311.09247
"Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve": https://arxiv.org/abs/2309.13638
"Faith and Fate: Limits of Transformers on Compositionality": https://arxiv.org/abs/2305.18654
"The Reversal Curse: LLMs trained on "A is B" fail to learn 'B is A'": https://arxiv.org/abs/2309.12288
"On the measure of intelligence": https://arxiv.org/abs/1911.01547 not about LLMs, but provides context and grounding on what it means to be intelligent and the nature of generalization. It also introduces an intelligence benchmark (ARC) that remains completely out of reach for LLMs. Ironically the best-performing LLM-based systems on ARC are those that have been trained on tons of generated tasks, hoping to hit some overlap between test set tasks and your generated tasks -- LLMs have zero ability to tackle an actually new task.
In general there's a new paper documenting the lack of broad generalization capabilities of LLMs every few days.
❤1
"Noisy TV problem" is solvable by introducing yet another level of abstraction :)
Curiosity-Driven Exploration via Latent Bayesian Surprise
https://arxiv.org/abs/2104.07495
More on the topic: https://lilianweng.github.io/posts/2020-06-07-exploration-drl/
Curiosity-Driven Exploration via Latent Bayesian Surprise
https://arxiv.org/abs/2104.07495
More on the topic: https://lilianweng.github.io/posts/2020-06-07-exploration-drl/
🤔3
a nice take on AI/ML from outside the Valley
https://wz.ax/bridgewater-on-ai
https://wz.ax/bridgewater-on-ai
Bridgewater
Assessing the Implications of a Productivity Miracle
What happens when cognitive tasks can be done at zero marginal costs? Co-CIO Greg Jensen explores some of the potential impacts that advancements of AI/ML technology could have on companies and the economy, including an extreme scenario that could potentially…
oh interesting. makes sense as all addictions are driven by the same neurochemicals
https://twitter.com/tenobrus/status/1738364449122357365
https://twitter.com/tenobrus/status/1738364449122357365
X (formerly Twitter)
Tenobrus (@tenobrus) on X
it turns out ozempic is also the cure for doomscrolling and tiktok
💊1
> In this study, we show that when aiming for limited precision, existing approximation methods can be outperformed by programs automatically discovered from scratch by a simple evolutionary algorithm.
https://arxiv.org/abs/2312.08472
https://arxiv.org/abs/2312.08472
arXiv.org
AutoNumerics-Zero: Automated Discovery of State-of-the-Art...
Computers calculate transcendental functions by approximating them through the composition of a few limited-precision instructions. For example, an exponential can be calculated with a Taylor...