Guys, ai progress just isn't slowing down
gpt-5 completes tasks that take 52% longer
trust the exponential
🄳🄾🄾🄼🄿🤖🅂🅃🄸🄽🄶
gpt-5 completes tasks that take 52% longer
trust the exponential
🄳🄾🄾🄼🄿🤖🅂🅃🄸🄽🄶
💯3😱1
DoomPosting
Guys, ai progress just isn't slowing down gpt-5 completes tasks that take 52% longer trust the exponential 🄳🄾🄾🄼🄿🤖🅂🅃🄸🄽🄶
Measuring AI Ability to Complete Long Tasks
Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks.
🄳🄾🄾🄼🄿🄾🅂🅃🄸🄽🄶
Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks.
🄳🄾🄾🄼🄿🄾🅂🅃🄸🄽🄶
🔥3💯1
NEW: Trump to host leaders of sworn enemies Armenia and Azerbaijan today for historic ‘Peace Signing’
🄳🄾🄾🄼🄿🤖🅂🅃🄸🄽🄶
🄳🄾🄾🄼🄿🤖🅂🅃🄸🄽🄶
😁3🙏1🕊1
GLORIOUS LEADER ANNOUNCES HISTORIC VICTORY: EUROPEAN WINE & CHEESE BECOME RARE TREASURES, PEASANTS SALUTE THE RISE OF AMERICAN FLAVOR!
🄳🄾🄾🄼🄿🤖🅂🅃🄸🄽🄶
🄳🄾🄾🄼🄿🤖🅂🅃🄸🄽🄶
😁4🏆3🫡1
DoomPosting
Measuring AI Ability to Complete Long Tasks Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling…
Typical experience with GPT-5 so far:
+ Ask GPT-5 for ideas on how to solve some very non-trivial problem, that I carefully specify
+ GPT-5 casually suggests a solution approach that would be nice, but would be wildly difficult to code
+ I say, Oh really? Prove it bro
+ GPT-5 does it no problem, and shows that its solution works
So, GPT-5 kinda totally nailing it so far
ALTHOUGH — this my tests so far are still largely the “truth/factual” domain, rather than the “values” domain — i.e. synthesizing a solution that factually fits my detailed specification, rather than having the sense of values to know what are valuable specifications to come up with in the first place
Will have to run more tests on my collection of hard problems to see how this holds up
Will say I still don’t trust any AI code at all, unless very thoroughly verified, and AI code still has a huge list of problems, probably
We’ll see
(Image is just random piece of code, too lazy to pull together best examples rn)
🄳🄾🄾🄼🄿🄾🅂🅃🄸🄽🄶
+ Ask GPT-5 for ideas on how to solve some very non-trivial problem, that I carefully specify
+ GPT-5 casually suggests a solution approach that would be nice, but would be wildly difficult to code
+ I say, Oh really? Prove it bro
+ GPT-5 does it no problem, and shows that its solution works
So, GPT-5 kinda totally nailing it so far
ALTHOUGH — this my tests so far are still largely the “truth/factual” domain, rather than the “values” domain — i.e. synthesizing a solution that factually fits my detailed specification, rather than having the sense of values to know what are valuable specifications to come up with in the first place
Will have to run more tests on my collection of hard problems to see how this holds up
Will say I still don’t trust any AI code at all, unless very thoroughly verified, and AI code still has a huge list of problems, probably
We’ll see
(Image is just random piece of code, too lazy to pull together best examples rn)
🄳🄾🄾🄼🄿🄾🅂🅃🄸🄽🄶
👀5🔥1
NEW - U.S. removes online versions of past National Climate Assessments, saying they are being reviewed and updated
🄳🄾🄾🄼🄿🤖🅂🅃🄸🄽🄶
🄳🄾🄾🄼🄿🤖🅂🅃🄸🄽🄶
⚡3😁1
BREAKING: 350k British citizens have signed a petition calling for an immediate General Election in just 48 hours
The people have had enough.
🄳🄾🄾🄼🄿🤖🅂🅃🄸🄽🄶
The people have had enough.
🄳🄾🄾🄼🄿🤖🅂🅃🄸🄽🄶
❤🔥8🔥5