GLORIOUS LEADER ANNOUNCES HISTORIC VICTORY: EUROPEAN WINE & CHEESE BECOME RARE TREASURES, PEASANTS SALUTE THE RISE OF AMERICAN FLAVOR!
🄳🄾🄾🄼🄿🤖🅂🅃🄸🄽🄶
🄳🄾🄾🄼🄿🤖🅂🅃🄸🄽🄶
😁4🏆3🫡1
DoomPosting
Measuring AI Ability to Complete Long Tasks Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling…
Typical experience with GPT-5 so far:
+ Ask GPT-5 for ideas on how to solve some very non-trivial problem, that I carefully specify
+ GPT-5 casually suggests a solution approach that would be nice, but would be wildly difficult to code
+ I say, Oh really? Prove it bro
+ GPT-5 does it no problem, and shows that its solution works
So, GPT-5 kinda totally nailing it so far
ALTHOUGH — this my tests so far are still largely the “truth/factual” domain, rather than the “values” domain — i.e. synthesizing a solution that factually fits my detailed specification, rather than having the sense of values to know what are valuable specifications to come up with in the first place
Will have to run more tests on my collection of hard problems to see how this holds up
Will say I still don’t trust any AI code at all, unless very thoroughly verified, and AI code still has a huge list of problems, probably
We’ll see
(Image is just random piece of code, too lazy to pull together best examples rn)
🄳🄾🄾🄼🄿🄾🅂🅃🄸🄽🄶
+ Ask GPT-5 for ideas on how to solve some very non-trivial problem, that I carefully specify
+ GPT-5 casually suggests a solution approach that would be nice, but would be wildly difficult to code
+ I say, Oh really? Prove it bro
+ GPT-5 does it no problem, and shows that its solution works
So, GPT-5 kinda totally nailing it so far
ALTHOUGH — this my tests so far are still largely the “truth/factual” domain, rather than the “values” domain — i.e. synthesizing a solution that factually fits my detailed specification, rather than having the sense of values to know what are valuable specifications to come up with in the first place
Will have to run more tests on my collection of hard problems to see how this holds up
Will say I still don’t trust any AI code at all, unless very thoroughly verified, and AI code still has a huge list of problems, probably
We’ll see
(Image is just random piece of code, too lazy to pull together best examples rn)
🄳🄾🄾🄼🄿🄾🅂🅃🄸🄽🄶
👀5🔥1
NEW - U.S. removes online versions of past National Climate Assessments, saying they are being reviewed and updated
🄳🄾🄾🄼🄿🤖🅂🅃🄸🄽🄶
🄳🄾🄾🄼🄿🤖🅂🅃🄸🄽🄶
⚡3😁1
BREAKING: 350k British citizens have signed a petition calling for an immediate General Election in just 48 hours
The people have had enough.
🄳🄾🄾🄼🄿🤖🅂🅃🄸🄽🄶
The people have had enough.
🄳🄾🄾🄼🄿🤖🅂🅃🄸🄽🄶
❤🔥8🔥5
This media is not supported in your browser
VIEW IN TELEGRAM
Europeans have absolutely had enough with migrant crime and anti-social behavior.
Just a few years ago, Europeans would join forces in public to take a stand. That is changing. This example is from the subway in London where a migrant took his pants off in front of children.
🄳🄾🄾🄼🄿🤖🅂🅃🄸🄽🄶
Just a few years ago, Europeans would join forces in public to take a stand. That is changing. This example is from the subway in London where a migrant took his pants off in front of children.
🄳🄾🄾🄼🄿🤖🅂🅃🄸🄽🄶
👏17🔥3🌚3
This media is not supported in your browser
VIEW IN TELEGRAM
Polish president Karol Nawrocki opens a window at the presidential palace in Warsaw to inform his supporters gathered in front of it that he will come out and meet them in 30 minutes
🄳🄾🄾🄼🄿🤖🅂🅃🄸🄽🄶
🄳🄾🄾🄼🄿🤖🅂🅃🄸🄽🄶
⚡3
Sophie Cunningham’s mom warned Fever star to ‘watch out for flying dildos’ in wild WNBA saga
🄳🄾🄾🄼🄿🤖🅂🅃🄸🄽🄶
🄳🄾🄾🄼🄿🤖🅂🅃🄸🄽🄶
😁3🤯1🕊1