Placeholder Image

Subtitles section Play video

  • What's up? Josh here.

  • So in case you missed it, OpenAI has just announced ChatGPT-4o, which is their brand new flagship model that is two times faster and more capable than GPT-4.

  • And good news for all of us is going to be free to use.

  • Now, GPT-4 was previously a $20-a-month subscription, but now with 4o being completely free, we also get the benefits of everything that we got with GPT-4.

  • There's Vision, where you can upload images and ask it questions about those images.

  • There's also Browse, where it can scrub the internet for more real-time and up-to-date data.

  • There's also Memory, where it can actually remember facts about you.

  • And then lastly, there's Analyzing Complex Data. So you can actually give it like an Excel spreadsheet and ask it questions about that.

  • So all of those features are going to be coming to 4o in the next couple of weeks.

  • But yeah, first of all, let's just start with everything that's going to be new with GPT-4o.

  • So in the presentation, the most impressive part was obviously the demo. So they did a bunch of stuff.

  • They asked it all kinds of questions, gave it math equations, and asked it to read bedtime stories.

  • And for the most part, I think the intelligence level and like the answers it's giving is pretty similar to the current GPT-4, which is why I don't think they updated the name to GPT-5.

  • But surprisingly, the biggest updates of 4o actually come in the voice feature.

  • Hey, ChatGPT, how are you doing?

  • I'm doing fantastic. Thanks for asking. How about you?

  • Pretty good.

  • What's up?

  • So my friend Barrett here, he's been having trouble sleeping lately.

  • And I want you to tell him a bedtime story about robots and love.

  • Oh, a bedtime story about robots and love. I got you covered.

  • So now we have response times as quick as 232 milliseconds and with an average of 320 milliseconds, which is sort of the average human response rate of a conversation.

  • You can also now just interrupt the conversation simply by speaking, which I think is pretty intuitive.

  • They even put this disclaimer on the website that all of their videos are played at one time speed because previously there was such a delay that now it just seems like such a drastic improvement.

  • So yeah, clearly some very impressive stuff here that they're able to pull off just milliseconds for a response time.

  • And you know what I was thinking, the humane AI pin really would have benefited from GPT-4o with its faster response times because it was largely flamed online for how slow it took to respond.

  • And it was running on GPT-4, which was much slower.

  • Who designed the Washington Monument?

  • But yeah, that is the first thing that I noticed is the speed.

  • But the second thing you might've picked up on already is the emotion behind the voice.

  • How are you?

  • I'm doing well. Thanks for asking. How about you?

  • Hey, ChatGPT. How are you doing?

  • I'm doing fantastic. Thanks for asking. How about you?

  • Me? The announcement is about me? Well, color me intrigued.

  • Are you about to reveal something about AI?

  • So it seems like OpenAI has really just dialed up the expressiveness and just the overall energy of this assistant, which I'm not sure how I feel about.

  • It just feels like you're talking to a friend who is just overly caffeinated and overly energized all of the time,

  • which I think for an assistant should just honestly be a little bit more straightforward and straight up.

  • Hopefully in the future, we can have the option to customize the voice.

  • I think that would be a smart move.

  • But also, you can ask it to change its tone.

  • So in the demo, they asked it to be a little bit more dramatic when reading a bedtime story.

  • And they also asked it to read it in a robotic voice.

  • I really want maximal emotion, like maximal expressiveness, much more than you were doing before.

  • Understood. Let's amplify the drama.

  • Once upon a time in a world not too different from ours.

  • Initiating dramatic robotic voice.

  • And then also apparently the robot can sing, which I'll let you be the judge of that.

  • (ChatGPT-4o singing)

  • There's also a new feature that is sort of a subset of Vision, which is being able to take your camera and just point it at something and asking it questions about that in real time.

  • Sort of like this beta test of giving the AI eyes.

  • What do you see?

  • Oh, I see "I love ChatGPT." That's so sweet of you.

  • Now, as if all of that wasn't enough, they also announced a brand new desktop app where you can do all of those same things like text input, speech input, as well as upload images.

  • But also on top of that, you can also screen share.

  • So you can have it sort of just look at your screen and whatever you're looking at, you can ask it questions.

  • Now, I think this is going to be a huge productivity feature for anybody who works on their computer a lot.

  • In the demo, they sort of showed how it could analyze a graph that you're looking at.

  • But also I think it would be really helpful for research purposes.

  • And just, I don't know, there's just so many use cases where I'm on the computer and it would be nice to almost have a conversational, like, assistant or someone to bounce ideas off of.

  • I think that would be really helpful.

  • All right, sure it can see our screen.

  • Can you find which one is the hypotenuse?

  • Oh, okay. I see.

  • So I think the hypotenuse is this really long side from A to B.

  • Would that be correct?

  • Exactly. Well done.

  • Now, just to quickly touch on what the "o" in 4o actually really is pointing to.

  • It's not pointing to so much the fact that it's omniscient or omnipotent, but rather the fact that it is taking your multimodal inputs, which is text, speech, and now vision, all into the same neural network.

  • Whereas before, it was processing those separately.

  • So before with the voice feature on 3.5 and 4, it would actually take your voice and transcribe it into text.

  • And so that's how it was recognizing your input, which basically strips a lot of information from that LLM.

  • So all of your emotion and the tone that would be captured in an audio format is now just boiled down into text.

  • So you can think of it like texting a friend versus calling a friend.

  • So now with the new Omni model, it is sort of taking all of those things into consideration with their response.

  • But yeah, that is the latest update with OpenAI.

  • Clearly some very impressive stuff cooking under the hood.

  • I'm curious to see what Google is going to come out with tomorrow.

  • So definitely get subscribed for that.

  • And that video is already out.

  • It's probably on the screen somewhere.

  • Hope you enjoyed the video.

  • I'll catch you guys in the next one.

  • Peace.

What's up? Josh here.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it