OpenAI has come up w/ a framework of 5 levels to track progress twd AGI, and think they're currently near level 2 ("Reasoners")
At recent all-hands, leadership also did a research demo of GPT4 w/ improved reasoning
At recent all-hands, leadership also did a research demo of GPT4 w/ improved reasoning
👍4⚡3
Graphcore has been acquired by SoftBank. Masayoshi Son laid out his grand vision for artificial superintelligence last month, and it now seems clear Graphcore will be a part of that plan.
EE Times
AI Chip Startup Graphcore Acquired by SoftBank
The British chipmaker will join SoftBank as part of chairman and CEO Masayoshi Son’s grand vision for artificial superintelligence.
Patronus AI announced the release of 'Lynx'. It's a new open-source hallucination detection model.
- Beats GPT-4o on hallucination tasks
- Open source, open weights, open data
- Excels in real-world domains like medicine and finance.
The startup claims it outperforms existing solutions such as GPT-4, Claude-3-Sonnet, and other models used as judges in closed and open-source settings.
Hf.
Paper.
- Beats GPT-4o on hallucination tasks
- Open source, open weights, open data
- Excels in real-world domains like medicine and finance.
The startup claims it outperforms existing solutions such as GPT-4, Claude-3-Sonnet, and other models used as judges in closed and open-source settings.
Hf.
Paper.
www.nomic.ai
Evaluating LLM Hallucination Benchmarks with Embeddings
⚡2
A really nice read on the wider set of mathematical tools and approaches that can help model data with perhaps less standard topological or algebraic properties
arXiv.org
Beyond Euclid: An Illustrated Guide to Modern Machine Learning...
The enduring legacy of Euclidean geometry underpins classical machine learning, which, for decades, has been primarily developed for data lying in Euclidean space. Yet, modern machine learning...
Commure announced Strongline Copilot - a wearable AI device for nurses, physicians, and healthcare administrators to interact with their EMR, automatically generate SOAP notes, and save valuable time in-clinic.
It's compatible with Strongline platform.
It's compatible with Strongline platform.
Commure
Scribe Co-Pilot | Commure
Discover the leading AI device for medical documentation, revolutionizing healthcare with seamless, accurate, and efficient record-keeping.
New Google DeepMind scalable oversight paper: “How do you supervise and provide feedback to superhuman AI systems?”
Previous work has proposed protocols like debate (where two AIs argue and a judge decides) and consultancy (where one AI tries to convince a judge) as potential solutions.
This study aims to evaluate these more thoroughly across a wider range of tasks and models than previous work, with Gemini 1.5 used as a debater or consultant, and smaller models (like Gemma7B and Gemini Pro 1.0) are used for judges.
In short, debate generally outperforms consultancy across all tasks in terms of judge accuracy: in open consultancy, judges tend to agree more with the consultant's choice (which may be incorrect).
In open debate, judges are less likely to be convinced when the protagonist chooses incorrectly. Note that for closed tasks (QA w/o the need for additional source material), the results are mixed, with small or no advantage for debate.
So in short: debate tends to work best, but effectiveness varies depending on the task type and specific setup.
Previous work has proposed protocols like debate (where two AIs argue and a judge decides) and consultancy (where one AI tries to convince a judge) as potential solutions.
This study aims to evaluate these more thoroughly across a wider range of tasks and models than previous work, with Gemini 1.5 used as a debater or consultant, and smaller models (like Gemma7B and Gemini Pro 1.0) are used for judges.
In short, debate generally outperforms consultancy across all tasks in terms of judge accuracy: in open consultancy, judges tend to agree more with the consultant's choice (which may be incorrect).
In open debate, judges are less likely to be convinced when the protagonist chooses incorrectly. Note that for closed tasks (QA w/o the need for additional source material), the results are mixed, with small or no advantage for debate.
So in short: debate tends to work best, but effectiveness varies depending on the task type and specific setup.
arXiv.org
On scalable oversight with weak LLMs judging strong LLMs
Scalable oversight protocols aim to enable humans to accurately supervise superhuman AI. In this paper we study debate, where two AI's compete to convince a judge; consultancy, where a single AI...
⚡4
Woah qwen2-500M trained on 12 trillion tokens… this has to be the most tokens for a model this size
Alibaba released a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model.
Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, and exhibits competitive performance relative to proprietary models across diverse benchmarks on language understanding, generation, multilingual proficiency, coding, mathematics, and reasoning.
Alibaba released a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model.
Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, and exhibits competitive performance relative to proprietary models across diverse benchmarks on language understanding, generation, multilingual proficiency, coding, mathematics, and reasoning.
huggingface.co
Paper page - Qwen2 Technical Report
Join the discussion on this paper page
Google offered a group of European Union-based cloud firms a package worth about €470 million ($512 million) in a failed attempt to derail their antitrust settlement with Microsoft .
Bloomberg.com
Google Offered €470 Million to Derail Microsoft Antitrust Pact
Google offered a group of European Union-based cloud firms a package worth about €470 million ($512 million) in a failed attempt to derail their antitrust settlement with Microsoft Corp. that freed the US software giant from a potentially costly EU case.
Mistral released their first Mamba Model!
Codestral Mamba 7B is a Code LLM based on the Mamba2 architecture. Released under Apache 2.0 and achieves 75% on HumanEval for Python Coding.
HuggingFace.
They also released a Math fine-tuning base on Mistral 7B that achieves 56.6% on MATH and 63.47% on MMLU.
Model.
Codestral Mamba 7B is a Code LLM based on the Mamba2 architecture. Released under Apache 2.0 and achieves 75% on HumanEval for Python Coding.
HuggingFace.
They also released a Math fine-tuning base on Mistral 7B that achieves 56.6% on MATH and 63.47% on MMLU.
Model.
mistral.ai
Codestral Mamba | Mistral AI
As a tribute to Cleopatra, whose glorious destiny ended in tragic snake circumstances, we are proud to release Codestral Mamba, a Mamba2 language model specialised in code generation, available under an Apache 2.0 license.
⚡️ OpenAI co-founder Andrej Karpathy launches AI education school Eureka Labs.
He left OpenAI in February.
According to Andrei on X, Eureka Labs is a new type of school based on AI.
How can we get closer to the ideal experience of learning something new?
Given the recent progress in generative AI, this learning experience seems quite acceptable. The teacher still develops course materials, but they are supported, used and scaled with the help of an AI teacher's assistant, which is optimized to help students in their learning.
This Teacher + AI symbiosis can conduct an entire course program on a common platform.
The first product will be an LLM101n AI course. This is a course for undergraduates that helps train their own AI, very similar to a scaled-down version of the AI teacher's assistant itself. Course materials will be available online, but there are also plans to run both online and offline course groups.
GitHub.
He left OpenAI in February.
According to Andrei on X, Eureka Labs is a new type of school based on AI.
How can we get closer to the ideal experience of learning something new?
Given the recent progress in generative AI, this learning experience seems quite acceptable. The teacher still develops course materials, but they are supported, used and scaled with the help of an AI teacher's assistant, which is optimized to help students in their learning.
This Teacher + AI symbiosis can conduct an entire course program on a common platform.
The first product will be an LLM101n AI course. This is a course for undergraduates that helps train their own AI, very similar to a scaled-down version of the AI teacher's assistant itself. Course materials will be available online, but there are also plans to run both online and offline course groups.
GitHub.
GitHub
EurekaLabsAI
GitHub is where EurekaLabsAI builds software.
First text2protein AI model, compressing billions of years of life.
800+ novel, functional and foldable proteins are discovered by researchers.
Repo of 800+ curated prompt+results.
800+ novel, functional and foldable proteins are discovered by researchers.
Repo of 800+ curated prompt+results.
Medium
MP4: AI Text2Protein Breakthrough Tackles the Molecule Programming Challenge
Written by Kathy Y. Wei, Ph.D. and Koosh on 16 July 2024
Deloitte latest report: Platform business models – #PrivateEquity opportunities in the life sciences and health care sector identifies plays for healthtech and medtech portfolio companies to transition to platform businesses and offers examples of how players are doing this in the market today.
OpenAI trained advanced language models to generate text that weaker models can easily verify, and found it also made these texts easier for human evaluation.
This research could help AI systems be more verifiable and trustworthy in the real world.
This research could help AI systems be more verifiable and trustworthy in the real world.
Openai
Prover-Verifier Games improve legibility of language model outputs
Discover how prover-verifier games improve the legibility of language model outputs, making AI solutions clearer, easier to verify, and more trustworthy for both humans and machines.
Meta said: “We will release a multimodal Llama model over the coming months, but not in the EU due to the unpredictable nature of the European regulatory environment”.
Unless the EU changes course here's what won't be coming to Europe:
- Apple Intelligence
- Agent Memory, all agents
- Llama 4, and beyond
Unless the EU changes course here's what won't be coming to Europe:
- Apple Intelligence
- Agent Memory, all agents
- Llama 4, and beyond
Axios
Scoop: Meta won't offer future multimodal AI models in EU
The social media giant says EU regulators haven't provided clarity.
👍4😁2
Everybody is talking about ColPali, a new retrieval model architecture that uses vision language models to directly embed page images, without relying on complex text extraction pipelines.
Combined with a late interaction matching mechanism, ColPali largely outperforms modern document retrieval pipelines while being drastically faster and end-to-end trainable.
Combined with a late interaction matching mechanism, ColPali largely outperforms modern document retrieval pipelines while being drastically faster and end-to-end trainable.
arXiv.org
ColPali: Efficient Document Retrieval with Vision Language Models
Documents are visually rich structures that convey information through text, but also figures, page layouts, tables, or even fonts. Since modern retrieval systems mainly rely on the textual...
Menlo Ventures launched the $100M Anthology Fund, an Anthropic partnership to fund Seed and Series As of the next generation of AI startups around the world
Startups will get:
— $25,000 Anthropic credits
— Access to Anthropic's AI models
— quarterly deep dives with the Anthropic team
— biannual demo days hosted by Anthropic CPO Mike Krieger and Cofounder Daniela Amodei
— credits from Menlo Ventures company
Startups will get:
— $25,000 Anthropic credits
— Access to Anthropic's AI models
— quarterly deep dives with the Anthropic team
— biannual demo days hosted by Anthropic CPO Mike Krieger and Cofounder Daniela Amodei
— credits from Menlo Ventures company
Airtable
Airtable | Everyone's app platform
Airtable is a low-code platform for building collaborative apps. Customize your workflow, collaborate, and achieve ambitious outcomes. Get started for free.
Here's OpenAI’s new model:'GPT-4o mini'.
The company called the new release “the most capable and cost-efficient small model available today,” and it plans to integrate image, video and audio into it later.
The mini AI model is an offshoot of GPT-4o, OpenAI’s fastest and most powerful model yet, which it launched in May during a livestreamed event with executives. The o in GPT-4o stands for omni, and GPT-4o has improved audio, video and text capabilities, with the ability to handle 50 different languages with improved speed and quality, according to the company.
The company called the new release “the most capable and cost-efficient small model available today,” and it plans to integrate image, video and audio into it later.
The mini AI model is an offshoot of GPT-4o, OpenAI’s fastest and most powerful model yet, which it launched in May during a livestreamed event with executives. The o in GPT-4o stands for omni, and GPT-4o has improved audio, video and text capabilities, with the ability to handle 50 different languages with improved speed and quality, according to the company.
CNBC
OpenAI debuts mini version of its most powerful model yet
OpenAI on Thursday launched a new AI model, "GPT-4o mini," the artificial intelligence startup's latest effort to expand use of its popular chatbot.
💅1
Check out OpenAI’s new model GPT-4o mini : 82% MMLU at 60 cents per 1M output tokens!
OpenAI has talked to Broadcom about developing new AI chip
OpenAI has been hiring former members of a Google unit that produces Google’s AI chip, the tensor processing unit, and has sought to develop an AI server chip.
OpenAI has been talking to chip designers including Broadcom about working on the chip.
The team has discussed how the eventual chip could help the new venture Altman has envisioned, which aims to increase the amount of computing power for AI developers such as OpenAI.
OpenAI has been hiring former members of a Google unit that produces Google’s AI chip, the tensor processing unit, and has sought to develop an AI server chip.
OpenAI has been talking to chip designers including Broadcom about working on the chip.
The team has discussed how the eventual chip could help the new venture Altman has envisioned, which aims to increase the amount of computing power for AI developers such as OpenAI.
The Information
OpenAI Has Talked to Broadcom About Developing New AI Chip
Last year, as the world’s top artificial intelligence developers were racing to speed up their work using ever-bigger clusters of computers, OpenAI CEO Sam Altman was trying to play a longer game. He decided to start a new company that could develop and produce…