NEW BOT Телеграм, страница

Continuous Learning_Startup & Investment

The CEO of Databricks explains the justification for the $1.3 billion acquisition of young AI startup, Mosaic. Databricks has paid 65 times Mosaic’s annual recurring revenue of $20 million, a cost Ghodsi deems reasonable due to Mosaic's substantial revenue growth and the growing demand for customized AI models within large enterprises.

He believes that the merger can significantly boost their revenue, considering MosaicML's small sales team of three will now be part of Databricks' extensive 3,000-strong sales organization.

Databricks offers a cloud database and other software to facilitate the application of machine learning models to data. The purchase of Mosaic is aimed at providing customers with a simplified way to customize large-language models, a type of machine learning software that powers chatbots, offering a more bespoke solution than the generalized software provided by OpenAI.

Mosaic's AI models, while less advanced than OpenAI’s, are typically more cost-effective and are better tailored to companies’ internal needs, such as sourcing internal information for employees. Mosaic's value has been attested by its clients like Replit, a software development tool provider, and Glean AI, which develops software to monitor company expenses and suggest cost-saving measures.

However, the deal's actual value is considerably lower as Databricks will pay for Mosaic in stock, priced at Databricks' last equity financing round in 2021, which was at its peak valuation.

The acquisition also signals an unfolding competition between companies like OpenAI, Anthropic, and Cohere that develop large, proprietary or closed-source LLMs, and providers like Databricks hoping businesses will prefer to train smaller, open-source LLMs on their own corporate data for superior performance and data security. This acquisition could potentially strain the relationship between OpenAI and Microsoft.

https://www.theinformation.com/articles/how-databricks-ceo-justifies-paying-1-3-billion-for-a-young-ai-startup

The Information

How Databricks CEO Justifies Paying $1.3 Billion for a Young AI Startup

When enterprise software firm Databricks revealed on Monday it would pay $1.3 billion for a two-year-old artificial intelligence startup, MosaicML, the deal looked overpriced. Databricks is paying 65 times Mosaic’s $20 million in annual recurring revenue…

32 views23:45

Continuous Learning_Startup & Investment

The CEO of Databricks explains the justification for the $1.3 billion acquisition of young AI startup, Mosaic. Databricks has paid 65 times Mosaic’s annual recurring revenue of $20 million, a cost Ghodsi deems reasonable due to Mosaic's substantial revenue…

Databricks CEO는 왜 Mosaic ML을 $1.3B에 인수했을까?
Source: https://www.theinformation.com/articles/how-databricks-ceo-justifies-paying-1-3-billion-for-a-young-ai-startup

데이터브릭스는 머신러닝 모델을 데이터에 쉽게 적용할 수 있도록 클라우드 데이터베이스와 기타 소프트웨어를 제공합니다. 이번 모자이크 인수는 고객에게 챗봇을 구동하는 머신러닝 소프트웨어의 일종인 대규모 언어 모델을 사용자 정의할 수 있는 간소화된 방법을 제공하기 위한 것으로, OpenAI가 제공하는 일반화된 소프트웨어보다 더 맞춤화된 솔루션을 제공할 수 있습니다.

Mosaic의 AI 모델은 OpenAI보다 덜 발전되었지만 일반적으로 더 비용 효율적이며 직원을 위한 내부 정보 소싱과 같은 기업의 내부 요구 사항에 더 잘 맞습니다. 소프트웨어 개발 도구 제공업체인 Replit과 회사 비용을 모니터링하고 비용 절감 방안을 제안하는 소프트웨어를 개발하는 Glean AI와 같은 고객사들은 Mosaic의 가치를 이미 입증한 바 있습니다.

그래도 $1.3b을 지불할만할까?

데이터브릭스는 모자이크의 연간 반복 매출 2천만 달러의 65배에 달하는 금액을 지불했는데, 고드는 모자이크의 상당한 매출 성장과 대기업 내 맞춤형 AI 모델에 대한 수요 증가를 고려할 때 합리적인 비용이라고 판단했습니다.

Mosaic ML의 영업팀은 3명 규모인데 올해 1월 ARR(연간 반복 매출)이 $1m에서 6개월 사이에 ARR이 $20m으로 늘어났습니다. 제품에 대한 시장 검증은 마무리되었고 Databricks의 3,000명 정도 되는 글로벌 영업망에 해당 제품을 판매한다면 인수 금액 대비 더 큰 수익을 얻을 수 있다는 계산을 한 것 같습니다.

더불어, 데이터브릭스가 모자이크의 주식을 인수하는 가격은 2021년 데이터브릭스의 마지막 주식 파이낸싱 라운드에서 가장 높은 평가를 받았던 가격( $38 B)이기 때문에 실제 거래 가치는 상당히 낮을 것으로 예상됩니다.

최근에 Snowflake가 Neeva라는 AI Startup을 약 $150m에 인수(https://www.snowflake.com/blog/snowflake-acquires-neeva-to-accelerate-search-in-the-data-cloud-through-generative-ai/) 한 것도 Databricks 팀이 Mosaic ML 인수를 서두르게 한 요인이라고 생각합니다.

이번 인수에서 생각해볼만한 지점

1. 이번 인수는 또한 대규모 독점 또는 비공개 소스 LLM을 개발하는 OpenAI, Anthropic, Cohere와 같은 기업과 우수한 성능과 데이터 보안을 위해 기업이 자체 기업 데이터로 소규모 오픈 소스 LLM을 학습시키는 것을 선호하는 Databricks와 같은 제공업체 간의 경쟁이 본격화되고 있음을 의미합니다.

2. MS는 OpenAI의 가장 큰 투자자이면서도 Databricks의 거대 벤더입니다. Databricks가 OpenAI의 ChatGPT보다 가성비가 좋은 모델을 지원한다면 MS입장에서는 판매처로서 여러 카드를 손에 쥘 수 있고 이번 인수가 잠재적으로 OpenAI와 Microsoft의 관계를 긴장시킬 수 있다는 관점도 있습니다.

3. AI Startup 가치에 대해서. Neeva $150m, Mosaic ML $1.3b(현실적으로 $650m)그리고 최근에 톰슨 로이터가 로펌의 문서 리뷰를 자동화해주는 Casetext를 $650 를 전액 현금을 지불하면서 구매한 것들을 봤을 때, 데이터를 가지고 AI Use-case를 만들거나, Model을 만들 역량이 있거나, Model과 관련한 인프라를 만드는 기업들에 대한 Valuation Inflation이 시작될 것 같습니다.

The Information

How Databricks CEO Justifies Paying $1.3 Billion for a Young AI Startup

29 views23:55

Continuous Learning_Startup & Investment

Wanting to get started with Generative AI and LLMs, but not sure where to start? 🤔 I am super excited to share Amazon Web Services (AWS) and DeepLearning.AI just launched "Generative AI with LLMs" course, designed specifically for individuals and beginners! 🔰🔥

In Generative AI with Large Language Models (LLMs), you’ll learn the fundamentals of how generative AI works and how to use the Hugging Face ecosystem (Transformers, PEFT, TRL) to instruction-tune, RLHF, or deploy open-source LLMs! 🤯

👉 https://lnkd.in/ep68k-Pk

I am incredibly proud to say that I worked behind the scenes with Antje Barth, Chris Fregly, and Mike Chambers to make this course a reality. Huge kudos to everyone who was involved.
🤗 https://lnkd.in/e3a8jXw7

If you've ever been curious about how generative AI works or want to refresh your knowledge, this course is an absolute must-attend! 🔥🤝

www.deeplearning.ai

Generative AI with LLMs - DeepLearning.AI

Learn the fundamentals of how generative AI works, and how to deploy it in real-world applications. Equip yourself with the technical skills and intuition needed to succeed in the growing demand for ML engineers and data scientists.

32 views00:09

Continuous Learning_Startup & Investment

https://twitter.com/michael_nielsen/status/1673901166172979200?s=46&t=h5Byg6Wosg8MJb4pbPSDow

Twitter

Is there *reliable* (!!!) data on how much power is being consumed by data centers, over time?

34 views01:56

Continuous Learning_Startup & Investment

https://youtu.be/7iU6K7NccXk

YouTube

Creating the future of search and competing vs Google with Perplexity AI’s Aravind Srinivas | E1770

(00:00) Perplexity CEO Aravind Srinivas joins Jason
(1:25) Competing in the generative search / AI chatbot market
(4:49) How Perplexity's AI model formulates answers
(11:18) Crowdbotics - Get a free scoping session for your next big app idea at ⁠https:/…

41 views01:56

Continuous Learning_Startup & Investment

https://youtu.be/7fj2NB1-AiQ

YouTube

Snap CEO on How AI Is Reinventing Big Tech | WSJ News

Snapchat's chief, Evan Spiegel, discusses the shifting ad market, ways to keep the internet safe for younger users and where the company is deploying AI.

#Snapchat #AI #WSJ

41 views01:56

Continuous Learning_Startup & Investment

Forwarded from 요즘AI

최초로 IOS 탈옥에 성공했던 미국의 천재 해커 조지 호츠(George Hotz)가 그동안 베일에 감춰진 GPT-4의 구조에 대해 언급했습니다.

그가 GPT-4의 성능을 높이는 핵심 구조로 언급한 ‘MoE(Mixture of Experts)’ 모델에 대해 알기 쉽게 내용을 정리해보았습니다.

https://news.aikoreacommunity.com/ceonjaehaekeo-jiohasi-gpt-4yi-bimileul-puleonaeda/

1/ 조지 호츠는 OpenAI의 GPT-4가 1조 개의 파라미터(parameter)를 가진 모델이 아닌, 2,200억 개의 파라미터를 가진 모델 8개가 혼합된 구조로 이루어져 있다고 주장.

즉 같은 크기의 모델을 여덟 번 훈련 시킨 후, ‘MoE’라는 모델 구조를 활용하여 1조 개의 파라미터를 가진 큰 모델인 척 트릭을 썼다는 것.

그렇다면 MoE가 무엇일까?

2/ MoE(Mixture of Experts)는 여러 개의 신경망을 서로 다른 분야에 특화된 전문가(Experts) 신경망으로 각각 훈련시키고, 이 신경망들을 혼합(Mixture)하여 활용하는 딥러닝 모델 구조임.

즉 여러 개의 서로 다른 신경망(전문가)이 서로 다른 문제나 데이터 분야를 처리하도록 설계된 모델인 것.

3/ MoE 모델은 크게 두 가지로 구성됨. 전문가(Experts)와 게이트(Gate).

전문가는 앞서 말했듯 특화된 각 부분에 대한 처리를 담당함. 게이트는 입력값(input)에 대해 각 전문가에게 가중치를 부여하는 역할을 수행함.

4/ MoE 모델이 답변을 내는 방식은 다양함. 큰 가중치를 부여받은 전문가가 출력값(output)을 생성하거나, 혹은 각 전문가들의 답변에 가중치를 매긴 후 이를 합쳐서 출력값을 생성하는 방식도 있음.

어떤 방식이든 각 모델이 전문화된 분야에 특화된 답변을 제공하기 때문에, 같은 크기의 모델로 이루어진 단일 신경망 구조보다 더 높은 답변 성능을 낼 수 있는 것.

5/ 한 가지 단점은 여러 개의 모델을 한 번에 사용하다 보니 계산 및 메모리 비용이 기존 단일 신경망 구조보다 높아질 수 있다는 것.(MoE는 여러 개의 신경망 모델을 이용하기 때문)

하지만 이는 병렬 처리와 관련된 연구와, Sparse Gate 등 다양한 연구를 통해 계속해서 해결되고 있음. 아래 링크는 관련 연구 논문 중 하나.
https://arxiv.org/pdf/2212.05055.pdf

6/ 또한 MoE 구조는 파라미터가 적은 모델일수록 한 개의 신경망으로 작동하는 모델보다 성능을 향상시키기 어려움.

데이터 세트가 작을수록 각 모델이 학습할 수 있는 특정 데이터가 매우 제한적이기 때문.

하지만 학습된 데이터 세트가 커질수록 각 신경망이 학습할 수 있는 데이터가 많아지므로 이 구조가 매우 효율적으로 작용함.

7/ 즉 MoE 구조는 대규모 언어 모델에 매우 적합하다는 것.

GPT-3.5의 파라미터(parameter)는 1,750억 개이며, 조지 호츠가 GPT-4에서 사용되었다고 주장한 모델의 파라미터는 2,200억 개.

만약 그의 주장이 맞다면 GPT-3.5와 GPT-4의 성능 차이는 파라미터의 차이가 아닌 MoE 구조의 유무에서 오는 차이라는 뜻.

8/ MoE가 갖는 특성은 GPT-4와 같은 대규모 AI 모델을 만드는 데 적합하게 사용될 수 있음.

샘 알트만 또한 AI 모델의 규모의 한계에 대해서 언급한 바가 있기에, 규모의 한계를 뛰어넘는 성능을 발휘하도록 하는 MoE 구조의 활용성이 매우 기대됨.

팟캐스트 전문은 이곳에서 볼 수 있습니다. 읽어주셔서 감사합니다. :)

https://www.latent.space/p/geohot#details

AI 코리아 커뮤니티 뉴스레터

천재해커 지오핫이 GPT-4의 비밀을 풀어내다?!

수수께끼가 풀리다

지난 6월 21일, 해킹으로 유명한 '지오핫(GeoHot)'이 인터뷰를 진행했는데요1. 여기서 자신이 알아낸 GPT-4의 구조를 풀어버렸습니다(!) 과연 어떤 구조이길래 GPT-4가 특별한 걸까요? 알아봅시다!

타고난 천재

우선 조지 호츠(George Hotz), 통칭 '지오핫'에 대해 설명하고 넘어갈게요.

지오핫은 1989년 태어난 만 34세의 천재 해커에요2. 17살이던 2007년에는 아이폰(iPhone)의 '탈옥'을

30 views06:30

Continuous Learning_Startup & Investment

https://twitter.com/chamath/status/1674113059374211072

Twitter

Gosh this is such a good thread. Brian does a good job below of summarizing some big ideas in investing. But more subtilely, it presents the big issues we all face as investors. It’s obviously easier to read than to implement - but it’s important to understand…

40 views06:35

Continuous Learning_Startup & Investment

인터뷰 보는 내내 창업자가 자신이 만들고 있는 제품과 고객을 사랑하는 게 느껴진다._Character.ai

1. 사용자에게 Character.ai를 어떻게 사용하라고 알려드리는 것은 저희의 일이 아닙니다. 저희의 일은 일반적인 것을 내놓고 사람들이 그것을 즐겁게 사용하는 것을 보는 것입니다.
2. 많은 사람들이 외롭거나 고민이 있어 대화 상대가 필요하기 때문에 페르소나를 사용합니다.
3. 유저들이 Character.ai의 캐릭터들을 롤플레잉 게임, 텍스트 어드벤처, TV 또는 인터넷 인플루언서 시청 등 다양한 방식으로 사용합니다.

https://youtu.be/GavsSMyK36w
https://youtu.be/emCoG-hA7AE

It’s not our job to tell you what uses for. Our job is to put out something general and see people enjoy using it.
Many use personas because they are lonely or troubled and need someone to talk to.
Noam Shazeer talks about the concept of a persona, which is a character or a person that users create in order to use their imagination. He explains that people use persona in various ways, such as role-playing games, text adventures, and watching TV or internet influencers.
the backstory of Character, where they wanted to create a technology that was accessible, flexible and put the user in control.

YouTube

Character.AI CEO: Generative AI Tech Has a Billion Use Cases

Character.AI founder and CEO Noam Shazeer joins Ed Ludlow to discuss the rise of generative AI and its many potential applications, and why he is skeptical about the federal government regulating it.
--------
Like this video? Subscribe to Bloomberg Technology…

42 views08:48

Continuous Learning_Startup & Investment

New open-source LLMs! 🔔 Salesforce just released XGen 7B, a new LLM with an 8k context under the Apache 2.0 license. 🔓 XGen uses the same architecture as Metas LLaMa and is, therefore, a 1-to-1 replacement for commercial use! 🔥 XGen achieves similar performance to LLaMa on MMLU and outperforms on coding! 🎖

TL;DR; ✨:
🔠 Trained on 1.5T Tokens
🪟 8192 context window
🧮 7B parameter
🔓 Apache 2.0 license
🧠 Trained on TPUs
🧑🏻‍💻 Can write code
🤗 Available on Hugging Face

Model: https://lnkd.in/emHEPZy8
Announcement Blog: https://lnkd.in/e6utBth9

It's exciting to see more LLaMa models released with permissive licenses. Hopefully, Salesforce will continue the model family with 13 or 16B versions.🚀

huggingface.co

Salesforce/xgen-7b-8k-base · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

42 views12:21

Continuous Learning_Startup & Investment

We develop a method to test global opinions represented in language models. We find the opinions represented by the models are most similar to those of the participants in USA, Canada, and some European countries. We also show the responses are steerable in separate experiments.

https://twitter.com/AnthropicAI/status/1674461614056292353?s=20

Twitter

39 views23:59

Continuous Learning_Startup & Investment

Inflection AI today announced that the company has raised $1.3 billion in a fresh round of funding led by Microsoft, Reid Hoffman, Bill Gates, Eric Schmidt, and new investor NVIDIA. The new funding brings the total raised by the company to $1.525 billion.

Largest AI cluster in the world
The deployment of 22,000 NVIDIA H100 GPUs in one cluster is truly unprecedented, and will support training and deployment of a new generation of large-scale AI models. Combined, the cluster develops a staggering 22 exaFLOPS in the 16-bit precision mode, and even more if lower precision is utilized. We estimate that if we entered our cluster in the recent TOP500 list of supercomputers, it would be the 2nd and close to the top entry, despite being optimized for AI – rather than scientific – applications. The rollout of the cluster is actively under way, and we have already been able to confirm its performance in the recent MLPerf benchmark.

https://inflection.ai/inflection-ai-announces-1-3-billion-of-funding

Inflection

Inflection AI announces $1.3 billion of funding led by current investors, Microsoft, and NVIDIA

38 views00:03

Continuous Learning_Startup & Investment

Consider the future of this decidedly "semantic" AI https://learn.microsoft.com/en-us/semantic-kernel/when-to-use-ai/schillace-laws
The "Schillace Laws" were formulated after working with a variety of Large Language Model (LLM) AI systems to date. Knowing them will accelerate your journey into this exciting space of reimagining the future of software engineering. Welcome!

Don’t write code if the model can do it; the model will get better, but the code won't. The overall goal of the system is to build very high leverage programs using the LLM's capacity to plan and understand intent. It's very easy to slide back into a more imperative mode of thinking and write code for aspects of a program. Resist this temptation – to the degree that you can get the model to do something reliably now, it will be that much better and more robust as the model develops.

Trade leverage for precision; use interaction to mitigate. Related to the above, the right mindset when coding with an LLM is not "let's see what we can get the dancing bear to do," it's to get as much leverage from the system as possible. For example, it's possible to build very general patterns, like "build a report from a database" or "teach a year of a subject" that can be parameterized with plain text prompts to produce enormously valuable and differentiated results easily.

Code is for syntax and process; models are for semantics and intent. There are lots of different ways to say this, but fundamentally, the models are stronger when they are being asked to reason about meaning and goals, and weaker when they are being asked to perform specific calculations and processes. For example, it's easy for advanced models to write code to solve a sudoku generally, but hard for them to solve a sudoku themselves. Each kind of code has different strengths and it's important to use the right kind of code for the right kind of problem. The boundaries between syntax and semantics are the hard parts of these programs.

The system will be as brittle as its most brittle part. This goes for either kind of code. Because we are striving for flexibility and high leverage, it’s important to not hard code anything unnecessarily. Put as much reasoning and flexibility into the prompts and use imperative code minimally to enable the LLM.

Ask Smart to Get Smart. Emerging LLM AI models are incredibly capable and "well educated" but they lacks context and initiative. If you ask them a simple or open-ended question, you will get a simple or generic answer back. If you want more detail and refinement, the question has to be more intelligent. This is an echo of "Garbage in, Garbage out" for the AI age.

Uncertainty is an exception throw. Because we are trading precision for leverage, we need to lean on interaction with the user when the model is uncertain about intent. Thus, when we have a nested set of prompts in a program, and one of them is uncertain in its result ("One possible way...") the correct thing to do is the equivalent of an "exception throw" - propagate that uncertainty up the stack until a level that can either clarify or interact with the user.

Text is the universal wire protocol. Since the LLMs are adept at parsing natural language and intent as well as semantics, text is a natural format for passing instructions between prompts, modules and LLM based services. Natural language is less precise for some uses, and it is possible to use structured language like XML sparingly, but generally speaking, passing natural language between prompts works very well, and is less fragile than more structured language for most uses. Over time, as these model-based programs proliferate, this is a natural "future proofing" that will make disparate prompts able to understand each other, the same way humans do.

Docs

Introduction to Semantic Kernel

Learn about Semantic Kernel

35 views00:08

Continuous Learning_Startup & Investment

Hard for you is hard for the model. One common pattern when giving the model a challenging task is that it needs to "reason out loud." This is fun to watch and very interesting, but it's problematic when using a prompt as part of a program, where all that is needed is the result of the reasoning. However, using a "meta" prompt that is given the question and the verbose answer and asked to extract just the answer works quite well. This is a cognitive task that would be easier for a person (it's easy to imagine being able to give someone the general task of "read this and pull out whatever the answer is" and have that work across many domains where the user had no expertise, just because natural language is so powerful). So, when writing programs, remember that something that would be hard for a person is likely to be hard for the model, and breaking patterns down into easier steps often gives a more stable result.

Beware "pareidolia of consciousness"; the model can be used against itself." It is very easy to imagine a "mind" inside an LLM. But there are meaningful differences between human thinking and the model. An important one that can be exploited is that the models currently don't remember interactions from one minute to the next. So, while we would never ask a human to look for bugs or malicious code in something they had just personally written, we can do that for the model. It might make the same kind of mistake in both places, but it's not capable of "lying" to us because it doesn't know where the code came from to begin with. _This means we can "use the model against itself" in some places – it can be used as a safety monitor for code, a component of the testing strategy, a content filter on generated content, etc. _

42 views00:08

Continuous Learning_Startup & Investment

Could one Language Learning Model handle all programming languages? Or should we tailor a model for each? What's your take? #LLM #ProgrammingLanguages

https://www.linkedin.com/posts/mateizaharia_introducing-english-as-the-new-programming-activity-7080242815120637952-bIY0?utm_source=share&utm_medium=member_ios

Introducing English as the New Programming Language for Apache Spark | Matei Zaharia | 69 comments

One of my favorite announcements today: English SDK for Apache Spark! Just write stuff like df.ai.transform('get 4 week moving average sales by dept') instead… | 69 comments on LinkedIn

279 views02:39

Continuous Learning_Startup & Investment

https://udlbook.github.io/udlbook/?fbclid=IwAR3W94wrWVoWWT1sIe8XHqmMeO5pGtCeW_r0FFVz-lJlvYlI5QXuBe3J9ic_aem_AYVtjXhox-ltymMT3S2gjPjl3HaVkAoik1hPiUz7MN3fnWICXA8fst8aeg8IlzC8-gE&mibextid=Zxz2cZ

47 views03:28

Continuous Learning_Startup & Investment

How copilot works at the high level, https://youtu.be/B2-8wrF9Okc.

YouTube

How Microsoft 365 Copilot works

Get an inside look at how large language models (LLMs) work when you connect them to the data in your organization. See what makes this possible and how the process respects your privacy to keep data safe with Microsoft 365 Copilot. The LLM for Copilot for…

47 views04:33

Continuous Learning_Startup & Investment

State of GPT talk by Andrej Karpathy: https://www.youtube.com/watch?v=bZQun8Y4L2A&t=373s

Would highly recommend watching the above! A 45-minute lecture going over the State of Generative LLMs, how are they trained, what they can and can't do, advanced techniques like CoT, ReAct, Reflection, BabyAGI, and Agents in general and finally some great tips on using LLMs in production. Pretty simple but very very informative

YouTube

State of GPT | BRK216HFS

Learn about the training pipeline of GPT assistants like ChatGPT, from tokenization to pretraining, supervised finetuning, and Reinforcement Learning from Human Feedback (RLHF). Dive deeper into practical techniques and mental models for the effective use…

47 views06:41

Continuous Learning_Startup & Investment

Here's an http://assembly.ai trannoscript and chapter summaries:
👂🏼 🤖 📃
https://www.assemblyai.com/playground/trannoscript/64kyzev80o-6ed4-4902-a066-7df25c363193

Andre Karpathi is a founding member of OpenAI. He will talk about how we train GPT assistants. In the second part he will take a look at how we can use these assistants effectively for your applications.

TRAINING NEURAL NETWORKS ON THE INTERNET

We have four major stages pretraining supervised fine tuning, reward modeling, reinforcement learning. In each stage we have a data set that powers that stage. And then we have an algorithm that for our purposes will be an objective for training a neural network.

GPT 3.1: BASE MODELS AND AGENTS

The GPT four model that you might be interacting with over API is not a base model, it's an assistant model. You can even trick base models into being assistants. Instead we have a different path to make actual GPT assistance, not just base model document completers.

NEUROANATOMY 2.8

In the reward modeling step, what we're going to do is we're now going to shift our data collection to be of the form of comparisons. Now, because we have a reward model, we can score the quality of any arbitrary completion for any given prompt. And then at the end, you could deploy a Rlhf model.

COGNITIVE PROCESSES AND GPT

How do we best apply a GPT assistant model to your problems? Think about the rich internal monologue and tool use and how much work actually goes computationally in your brain to generate this one final sentence. From GPT's perspective, this is just a sequence of tokens.

TREE OF THOUGHT AND PROMPT ENGINEERING

A lot of people are really playing around with kind of prompt engineering to bring back some of these abilities that we sort of have in our brain for LLMs. I think this is kind of an equivalent of AlphaGo but for text. I would not advise people to use it in practical applications.

WHAT ARE THE QUIRKS OF LLMS?

The next thing that I find kind of interesting is that LLMs don't want to succeed, they want to imitate. And so at test time, you actually have to ask for a good performance. Next up, I think a lot of people are really interested in basically retrieval augmented generation.

CONSTRAINT PROMPTING IN LLMS

Next, I wanted to briefly talk about constraint prompting. This is basically techniques for forcing a certain template in the outputs of LLMs. And I think this kind of constraint sampling is also extremely interesting.

FINE-TUNING A LANGUAGE MODEL

You can get really far with prompt engineering, but it's also possible to think about fine tuning your models. Fine tuning is a lot more technically involved. It requires human data contractors for data sets and or synthetic data pipelines. Break up your task into two major parts.

LIMITS TO FULLY AUTONOMOUS LLMS

There's a large number of limitations to LLMs today, so I would keep that definitely in mind for all your applications models. My recommendation right now is use LLMs in low stakes applications, combine them with always with human oversight. Think copilots instead of completely autonomous agents.
🧑🏼‍✈️ 🚧💻

Assemblyai

AssemblyAI | AI models to transcribe and understand speech

With AssemblyAI's industry-leading Speech AI models, transcribe speech to text and extract insights from your voice data.

55 views06:41

Continuous Learning_Startup & Investment

In this post, I try to answer specific questions about the internals of Copilot, while also describing some interesting observations I made as I combed through the code. I will provide pointers to the relevant code for almost everything I talk about, so that interested folks can take a look at the code themselves.

https://thakkarparth007.github.io/copilot-explorer/posts/copilot-internals

59 views06:54

Continuous Learning_Startup & Investment

<자극을 줄이고 생각을 늘리기>

요즘 현대인들은 거의 ADHD 상태로 일을 한다고 생각이 드는 면이 있다. 지속적으로 높은 강도의 자극에 자신을 노출시키기 쉽기 때문이다. 이런 환경속에서 뭐 하나에 차분하게 집중하고 깊이 있는 사고를 하기가 힘들다.

두 가지 사례를 먼저 소개하겠다.

사례 1)

내가 아는 K모씨는 대기업 직원이었는데, 하루에 전사에서 들어오는 업무 요청만 수백건이라고 했다. 그래서 날마다 밤 11시에 퇴근을 하고 있었다.

그러다가 나에게서 애자일 이야기를 듣고 실험을 해보기로 결심했다. 정시 퇴근. 그래서 팀장에게 제안을 했다. 오늘부터 18시 정시 퇴근을 하겠다. 혹여 일 처리가 조금이라도 떨어진다는 느낌이 들면 얘기해라. 바로 원복하겠다. 그러고 그날부터 18시 퇴근을 했다. 집에 오면 저녁 7시부터 9시까지 두 시간씩 6살 아이랑 놀아줬다고 한다. 그전까지 아이에게 아빠는 없는 존재였다. 주중에는 밤 11시에 오고, 아침에는 자기보다 먼저 나가고 주말에는 계속 쓰러져 있었으니. 그런 아이에게 "아빠"가 생긴 거다.

근데 문제가 하나 있었다. 정시 퇴근을 했으니 다 처리 못한 일들이 문제. 그런데 보안문제 때문에 집에서 회사 컴퓨터나 자료에 접근할 수가 없었다. 그래서 그가 대안으로 했던 거는 밤 11시부터 1시까지 두 시간 동안 자기 책상에 이면지 펼치고 앉아서 오늘 했던 일들, 내일 할 일들을 어떻게 해야 더 현명하게 처리할 건가 전략을 짜는 거였다. 그걸 날마다 했다.

그러고 다음날 출근을 하니 업무 요청 중의 50% 이상은 자동으로 해결된 경우가 많았고(요청한 부서에서 답답하니 자체적으로 해결), 남은 50%는 지난 밤에 고민한 결과 더 현명한 방법으로 처리를 해서 금방 끝낼 수 있었다.

물론 결과적으로 밤 11시 퇴근할 때보다 수면시간이 줄었다고 한다. 예전에는 집에 들어오면 바로 쓰러져서 잤으니까. 하지만 몸이 느끼는 에너지는 훨씬 좋아졌다고 한다.

사례 2)

예전에 군대시절 자대 배치를 받고 해당 부대에 갔고 사수를 배당 받았다. 근데 그 사수 얼굴을 보기가 힘든거다. 며칠 지나 알게 됐는데 그 사수 전역일이 1주일 뒤란다. 내 사수의 보직은 대대 정비과 서무병. 워낙 하는 일이 많고 복잡해서 통상 1년 정도는 인수인계를 받아야 제대로 일을 하게 된다고 한다. 근데 이 사람은 1주일 뒤에 전역하고, 이 1주일도 얼렁뚱땅 지나가고 있었다. 가끔 정비과에 내려와서는 궁금한 거 물어봐하고 누워있거나 하는 정도. 정말 문제는 이 사람의 보직을 정확하게 파악하는 사람이 간부나 병 중에 아무도 없다는 거.

결국 나는 거의 아무것도 배우지도 못한 채로 사수가 전역을 했고, 업무 매뉴얼도 하나 없었다. 참고할 자료가 전혀 없는 상황.

고민하다가 결국 하게 된 선택은 원리와 원칙으로 생각해서 행동하자는 거였다. 어떤 문제 상황이 발생하면 내가 생각하는 기본적인 원리에 따라(예컨대 어떻게 하는 것이 육군에게 이득이 되는 행동인가 같은) 논리적으로 말이 되는 행동을 생각해서 했다. 내가 모든 규칙과 법을 설계하면서 했다고 할까. 이러니까 거칠 것이 없었다. 뭐든지 깊게 생각해서 그대로 하면 다 풀리더라는.

근데 의외로 이 방법이 잘 통했다. 그래서 결국 내가 모든 체계를 만들었고 이걸로 상도 몇번 받았다. 군단에서 감사 내려왔을 때에는 내가 군무원이랑 장교들 모아놓고 비공식 강연도 했다.

----
때로는 외부 자극/정보를 제한하고 생각에 집중하는 것이 도움이 되는 경우가 있다. 덤으로 생각하는 근육과 기술도 늘게 된다.

그래서 나는 예컨대 다음과 같은 것들을 추천한다:
* 버그가 나오면 바로 검색창에 때려넣지 말고 적어도 5분, 10분간은 백지에다가 문제상황을 그려보고 원인 유추해보기
* 전혀 모르는 분야에 입문하고 싶을 때 인터넷 검색보다는 서점에서 잘나가는 책 중에 스타일이 다른 책 3권을 구입해서 얘를 비교해보면서 보기 (나는 이걸 bounded exploration이라고 부른다 -- 이걸 안하면 어느 하나 제대로 보지 않고 계속 깔짝깔짝 대면서 시간을 낭비하기 쉽다)
* 해결해야할 복잡한 문제가 있을 경우 추가 정보를 전혀 찾지 않고 백지를 펼쳐놓고 30분 동안 논리와 내 생각, 내 과거경험으로만 해결책을 설계해 보기

https://www.facebook.com/100000557305988/posts/pfbid02joCFDgeyR58vuv2MyZqQWJ1cf7FwrYZHS6FLq9ox8Bqu2RE9cV3HdgzWdHJvopjkl/?mibextid=jf9HGS

Facebook

See posts, photos and more on Facebook.

👍5

2.29K viewsedited 09:32

About

Blog

Apps

Platform