Placeholder Image

Subtitles section Play video

  • Are you surprised at the advances

  • that have come in the last several years?

  • Oh, yes, definitely. I didn’t imagine

  • it would become this impressive.

  • What’s strange to me,

  • is that we create these models,

  • but we don’t really understand

  • how the knowledge is encoded.

  • To see what’s in there,

  • it’s almost like a black box,

  • although we see the innards,

  • and so understanding why it does so well,

  • or so poorly, were still pretty naive.

  • One thing I’m really excited about

  • is our lack of understanding

  • on both types of intelligence,

  • artificial and human intelligence.

  • It really opens new intellectual problems.

  • There’s something odd

  • about how these large language models,

  • that we often call LLMs,

  • acquire knowledge in such an opaque way.

  • It can perform some tests extremely well,

  • while surprising us

  • with silly mistakes somewhere else.

  • It’s been interesting that,

  • even when it makes mistakes,

  • sometimes if you just

  • change the prompt a little bit,

  • then all of a sudden,

  • even that boundary is somewhat fuzzy,

  • as people play around.

  • Totally.

  • Quote-unquote "prompt engineering"

  • became a bit of a black art

  • where some people say that you have to

  • really motivate the transformers

  • in the way that you motivate humans.

  • One custom instruction that I found online

  • was supposed to be about

  • how you first tell LLM’s

  • you are brilliant at reasoning,

  • you really think carefully,”

  • then somehow the performance is better,

  • which is quite fascinating.

  • But I find two very divisive reactions

  • to the different results that you can get

  • from prompt engineering.

  • On one side, there are people

  • who tend to focus primarily

  • on the success case.

  • So long as there is one answer

  • that is correct, it means

  • the transformers, or LLMs,

  • do know the correct answer;

  • it’s your fault that

  • you didn’t ask nicely enough.

  • Whereas there is the other side,

  • the people who tend to focus

  • a lot more on the failure cases,

  • therefore nothing works.

  • Both are some sort of extremes.

  • The answer may be

  • somewhere in between,

  • but this does reveal

  • surprising aspects of this thing. Why?

  • Why does it make

  • these kinds of mistakes at all?

  • We saw a dramatic improvement

  • from the models the size of GPT-3

  • going up to the size of ChatGPT-4.

  • I thought of 3 as kind of a funny toy,

  • almost like a random sentence generator

  • that I wrote 30 years ago.

  • It was better than that,

  • but I didn’t see it as that useful.

  • I was shocked that ChatGPT-4

  • used in the right way

  • can be pretty powerful.

  • If we go up in scale,

  • say another factor of 10 or 20 above GPT-4,

  • will that be a dramatic improvement,

  • or a very modest improvement?

  • I guess it’s pretty unclear.

  • Good question, Bill.

  • I honestly don’t know

  • what to think about it.

  • There’s uncertainty,

  • is what I’m trying to say.

  • I feel there’s a high chance

  • that well be surprised again,

  • by an increase in capabilities.

  • And then we will also be really surprised

  • by some strange failure modes.

  • More and more, I suspect that

  • the evaluation will become harder,

  • because people tend to have a bias

  • towards believing the success case.

  • We do have cognitive biases in the way that

  • we interact with these machines.

  • They are more likely to be adapted

  • to those familiar cases,

  • but then when you really start trusting it,

  • it might betray you

  • with unexpected failures.

  • Interesting time, really.

  • One domain that is almost counterintuitive

  • that it’s not as good at is mathematics.

  • You almost have to laugh that

  • something like a simple Sudoku puzzle

  • is one of the things that it can’t figure out,

  • whereas even humans can do that.

  • Yes, it’s like reasoning in general,

  • that humans are capable of,

  • that these ChatGPT

  • are not as reliable right now.

  • The reaction to that

  • in the current scientific community,

  • it’s a bit divisive.

  • On one hand, that people might believe

  • that with more scale,

  • the problems will all go away.

  • Then there’s the other camp

  • who tend to believe that, wait a minute,

  • there’s a fundamental limit to it,

  • and there should be better, different ways

  • of doing it that are much more efficient.

  • I tend to believe the latter.

  • Anything that requires a symbolic reasoning

  • can be a little bit brittle.

  • Anything that requires

  • a factual knowledge can be brittle.

  • It’s not a surprise when you actually look at

  • the simple equation that we optimize

  • for training these larger language models

  • because, really, there’s no reason why

  • suddenly such capability should pop out.

  • I wonder if the future architecture may have

  • more of a self-understanding

  • of reusing knowledge in a much richer way

  • than just this forward-chaining

  • set of multiplications.

  • Yes, right now the transformers, like GPT-4,

  • can look at such a large amount of context.

  • It’s able to remember so many words

  • as spoken just now.

  • Whereas humans, you and I,

  • we both have a very small working memory.

  • The moment we hear

  • new sentences from each other,

  • we kind of forget exactly

  • what you said earlier,

  • but we remember the abstract of it.

  • We have this amazing capability

  • of abstracting away instantaneously

  • and have such a small working memory,

  • whereas right now GPT-4

  • has enormous working memory,

  • so much bigger than us.

  • But I think that’s actually the bottleneck,

  • in some sense,

  • hurting the way that it’s learning,

  • because it’s just relying on the patterns,

  • a surface of patterns overlay,

  • as opposed to trying to abstract away

  • the true concepts underneath any text.

  • Subscribe toUnconfuse Me” wherever you listen to podcasts.

Are you surprised at the advances

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it