VideoToBe
Lang: en

Introducing GPT-5

Sam Altman, Greg Brockman, Sebastien Bubeck, Mark Chen, Yann Dubois, and others introduced and demonstrated GPT-5. The team showcased the advancements and capabilities of this new model, highlighting its potential applications and improvements over previous versions, emphasizing its significance in the field of artificial intelligence and machine learning technology.

Video thumbnail

Convert your Audio Video Files to Text

Speaker Separated Transcription98% Accurate
Free for files under 30 minutes
0:23|Sam Altman:
Good morning. 32 months ago, we launched ChatGPT. And since then, it has become the default way that people use AI. In that first week, a million people tried it out. And we thought that was pretty incredible. But now, about 700 million people use ChatGPT every week, and increasingly rely on it to work, to learn, for advice, to create, and much more. Today, finally, we're launching GPT-5. GPT-5 is a major upgrade over GPT-4 and a significant step along our path to AGI. Now, today we're going to show you some incredible demos, we'll talk about some performance metrics, but the important point is this.
1:04|Sam Altman:
We think you will love using GPT-5 much more than any previous AI. It is useful, it is smart, it is fast, and it's intuitive. GPT-3 was sort of like talking to a high school student. There were flashes of brilliance, lots of annoyance, but people started to use it and get some value out of it. With GPT-4, maybe it was like talking to a college student. Real intelligence, real utility. But with GPT-5, now it's like talking to an expert, a legitimate PhD level expert in anything, any area you need on demand that can help you with whatever your goals are.
1:39|Sam Altman:
And we are very excited that you'll get to try this. But it's not only asking now. GPT-5 can also do stuff for you. It can write an entire computer program from scratch to help you with whatever you'd like. And we think this idea of software on demand is going to be one of the defining characteristics of the GPT-5 era. It can help you plan a party, send invitations, order supplies. It can help you understand your healthcare and make decisions on your journey. It can provide you information to learn about any topic you'd like and much more.
2:09|Sam Altman:
This is an incredible superpower on demand that would have been unimaginable at any previous time in history. You get access to an entire team of PhD level experts in your pocket helping you with whatever you want to do. and anyone pretty soon will be able to do more than anyone in history could. So today, we're going to talk about GPT-5, we'll show you some upgrades to ChatGPT, and we'll talk about the API. GPT-5 is great for a lot of things, but we think it's going to be an especially important moment for businesses and developers, and we're very excited to see what they're going to build with this new technology.
2:47|Sam Altman:
So we can't wait for you all to start building with this. We hope you enjoy it as much as we enjoyed building it for you. And to start, I'm going to hand it over to my colleague Mark, our Chief Research Officer, to tell you about GPT-5. Thank you.
3:05|Mark:
Hi, I'm Mark, and I'm joined by Max, who leads the post-training team, and Rene from our engineering team. Over the past few years, OpenAI has spearheaded the reasoning paradigm. These are models which pause to think before delivering more intelligent responses. Now, reasoning is at the heart of our AGI program, and it underlies the technology that we use to ship stuff like ChatGPT Agent and deep research. GPT-5 aims to bring this breakthrough to everyone. Until now, our users have had to pick between the fast responses of standard GPTs or the slow, more thoughtful responses from our reasoning models.
3:44|Mark:
But GPT-5, it eliminates this choice. It aims to think just the perfect amount to give you the perfect answer. Now, something like this takes a lot of hard work. We've had to do a lot of research to make GPT-5 the most powerful, the most smart, the fastest, the most reliable, and the most robust reasoning model that we ship to date. Today, we're going to show a series of demos in coding, in writing, in learning, and in health. But GPD5 isn't limited to these domains. It's very useful in all cases where you require deep reasoning or expert-level knowledge in things like math, in physics, even in things like law.
4:24|Mark:
And the exciting thing is we're excited to make this available to everyone, even to our free tier. After we show our demos, we're going to be talking about how GPD5 supercharges our ChatGPD app and our API. We believe that GPD5 is the best coding model on the market today. To start, let's have Max talk a little bit about the benchmarks and how the models stack up.
4:46|Max:
Yeah. Thanks, Mark. So as Mark said, we think GPT-5 is by far our smartest model ever. So let's start by talking through some evals. Now, evals aren't everything, and they don't tell you everything about a model, but they can highlight its intelligence. And GPT-5 performs exceptionally well on a range of academic evals across subjects. It outperforms both our previous models and other models on the market. So picking up first on the theme of coding, GPT-5 sets a new high on SweeBench, which is an academic eval that tracks performance on real software engineering tasks. Now, this again is an eval, but we think it will reflect the model's performance in the real world.
5:26|Max:
GPT-5 also performs very well on Adr Polyglot, which measures its ability to implement complex functionality in a variety of different programming languages. Now, beyond coding, GPT-5 performs exceptionally well at multimodal reasoning, setting a new high on MMMU, actually outperforming both our previous models and most human experts on this task. This is basically a visual reasoning domain where you are asked to, from an image, figure out what's going on. GPT-5 is also excellent at mathematical reasoning, as shown by its performance on Amy 2025. Now, this is an exam that American high school students take to qualify for the International Mathematical Olympiad, and GPT-5 performs exceptionally well.
6:07|Max:
Again, beating our previous models and other models that are out there. Now, moving beyond academic evals and more towards some real-world use cases, we put a lot of work into making GPT-5 the most reliable and accurate model in the world. Language models historically have been plagued by hallucinations, factual errors that make it hard to rely on their outputs for actually important tasks. For GPT-5, we made improving factuality, especially on open-ended or complex questions, a priority. We also built a set of new evals to track this, and we're very happy to report that GPT-5 is by far our most reliable, most factual model ever.
6:46|Max:
GPT-5 also performs exceptionally well on health-related questions. Now, health is a big part of how people get value from GPT in the real world. We'll talk about this later on in the livestream. But again, we're very happy to report that GPT-5 is by far our most reliable model for health yet. So all of this together adds up to a model that is faster, more reliable, and more accurate for everyone who uses Trat GPT. So now Reni will talk to you about how to actually use GPT-5.
7:16|Reni:
Thanks, Max. The best part is that we're bringing this frontier intelligence to all users. GPT-5 is rolling out today for free plus pro and team users. And next week we'll roll it out to Enterprise and EDU. For the first time, our most advanced model will be available to the free tier. Free users will start with GPT-5, and when they hit their limit, they'll transition to GPT-5 Mini, a smaller but still highly capable model. It actually outperforms 03 on many dimensions. Plus users will still have significantly higher usage than free users and our pro subscribers will get unlimited GPT-5 along with GPT-5 Pro.
8:00|Reni:
Extended thinking for even more detailed and reliable response when you just need that extra depth. Team, enterprise, and EDU customers can also use GPT-5 reliably as their default model for everyday work with generous rate limits that enable entire organizations to use GPT-5. And all the tools you already know, search, file, and image upload, data analysis with Python, Canvas, image generation, memory, custom instructions, they'll all just work on GPT-5.
8:34|Mark:
Amazing. Thank you so much, Max. Thank you so much, Reni. We've just seen a lot about how the model stacks up in terms of benchmarks, but there's nothing quite like seeing it live. We're going to see a couple of live demos now presented by Tina, by Elaine, and Jan. Thank you so much. Elaine, can you show us how smart the model is?
9:01
Sure. Thanks so much, Mark. I'm Guilin. Reasoning Chai GPT's ability to think deeply through complex problems is now built into GPT-5. It will automatically think whenever needed, delivering a more comprehensive, accurate, and detailed answer to you. Just as Sam said, it's like having a team of PhDs in your pocket. So let's see that in action. Suppose your kid is in middle school physics and they want to learn about Bernoulli effect. They need your help with their homework, and you might be like, wait, I might need some help with that too. So you could ask GPT-5, give me a quick refresher on the Bernoulli effect and why airplanes are the shape they are.
9:46
Since this is a pretty straightforward prompt, GPT-5 actually doesn't need extra time to think about it and answers right away. But it still gives me a high quality answer and explains the concept clearly. So here it says Bernoulli effect means faster moving fluid has lower pressure and slowing moving fluid has higher pressure. So to make this even more helpful, I'm going to ask GPT-5 to create a moving demo to illustrate this. So I could ask, explain this in detail and create a moving SVG in the Canvas tool to show me. This is a pretty complex task because now GPT-5 actually needs to build the vigil.
10:36
Therefore, GPT-5 takes a moment to think through the answer so you can come back with something more comprehensive and accurate. What's really nice is that you don't need to remember to turn on thinking each time. GPT-5 will do it for you automatically whenever the test benefits from deeper reasoning. If you really want to make sure that GPT-5 uses thinking, you can either say something like, think hard about this in the prompt to guide the model, or if you're a paid user, you can choose the GPT-5 thinking model from the model picker. Now you can see that the model is actually writing the front-end code to build the demo I asked for.
11:17
So Christina, have you ever done some front-end coding before?
11:21
Yeah, actually the last time I touched any front-end coding was about three years ago for the first demo of ChatGBT. Wow. So, it's the first chat GPT. That's where it all begins.
11:31
Tell us more about it.
11:33
It wasn't even called chat GPT then. I think it was called chat with GPT. That's a really good name. Always good at naming. But I'm not a front-end expert and I really hadn't touched front-end in quite a while. So, it took me quite a bit of time to get the React app up.
11:50
I see, that's a lot of work. So how long did it take you to build something like that? Honestly, maybe embarrassing to admit, like a week. Well, but your weeks of hard work actually paid off well. See how successful gpt it is today after your first demo. So you know what, I'm also building a demo right now, but luckily I have gpt five with me right now and let's see how long it will take this time.
12:16|Mark:
Maybe you should call it 5 with GPT.
12:19
Yeah, exactly. So you see that GPT-5 has already written more than 200 lines of code already. And while the model is thinking, you can also tap here to expand the train of thought to actually see what's going on under the hood. For example, the GPT-5 was thinking about the user wants a moving SVG visualization canvas. I actually need to create HTML code to do that. It also think about what kind of front-end tool I need to use, for example, React and Tailwind. It also thinks about I need to ensure the phases are accurate. I need to check what the Bernoulli principle is.
13:03
So, Christina, since you're here from the first day of chargeability, can you tell us what it was like at that time and what motivated chargeability?
13:14
Yeah, I think at the time we weren't really sure about how people would actually use it and what use cases were important. We were even going back and forth about maybe we should be releasing something that's more specific to a certain use case. It's really cool now here that we have a much better understanding of how people actually want to work with Chat and we can actually optimize the model for those use cases like coding.
13:35
Yeah, exactly. Do you still remember how it felt like when you first talked to ChatGPT, the first version of the model?
13:43
Yes. I don't know if people remember when the first version of ChatGBT would always start, as an AI model, I can't do something, something. It's so great to see how far we've come from that personality. Yeah.
13:55
It's much more human-like right now. Okay. So, it's already done. So, look like ChatGBT just finished like 300 or near 400 lines of code in two minutes. So, let's see if the code can actually run. Okay.
14:12
Nice.
14:12
Yeah. So with just a simple prompt, GPT-5 created this interactive and engaging demo that I can actually play with. So I can actually change the airspeed here to see how the lift and the pressure change accordingly. I can also tweak the angle of attack to see if my plane will actually fly or crash. So, GPT-5 can just bring any hardcore concept to life in moments. Imagine you can use this for anything that you're interested in, whether it's math, physics, chemistry, or biology. GPT-5 just makes learning so much more approachable and enjoyable.
15:00
Thanks, Elaine. I've been a part of ChatGBT since day one, and it's really cool to see all the progress we've made since then, especially with capabilities like writing. Writing is one of the most common use cases people have been using ChatGPT for, and we're excited to say with GPT-5, we've improved the writing quality significantly. It's a much more effective partner. It can help you elevate anything from drafts to emails and even stories. Let's see this in action. So with GPT-5, we'll actually be deprecating all of our previous models. I think they've done a pretty good job, so let's make sure we can give them a proper goodbye.
15:35
So we're going to ask both 4.0 and GPT-5 to write a eulogy to our previous chat GPT models. We want it to be heartfelt and heartwarming, but also hopeful. So let's ask GPT-5 for it. And as it's thinking, we're actually going to go ahead and read a preloaded, the 4.0 response. So 4.0 decides to start with, today as we prepare to welcome GPT-5 into the world, we gather to bid a heartfelt farewell to the models that came before. a decent start. Now, let's skim through and find another line. Your words reached across the globe building connections where there had been none.
16:14
I personally don't really like this line because rather generic and really without the previous context, it just feels like it could be about anything and feels more like a templated response. Now, let's go back to GPT-5 to see what it's given us. It starts with friends, colleagues, curious strangers who became regulars. Even with this just first line here, we can see that GPT-5 has a lot more rhythm and beat to its pros than 4.0 did. Now let's find some other lines here. I actually like this. These models help millions write first lines, last lines, bridge language gaps, pass tests, argue better, soften emails, and say things they couldn't quite say alone.
17:02
I think I really like this line because it shows that it's not just a templated response and it's actually quite personal and it gets the nuance of the situation right. And I think that's the kind of stuff with GBT-5 does much better than 4.0 than before. It actually makes things a lot more genuine and emotionally resonant with people. With GPT-5, the responses feel less like AI and more like you're chatting with your high IQ and EQ friend.
17:28
Thanks, Christina. My name is Jan, and I'll be telling you about some of the progress that we made on coding. GPT-5 is clearly our best coding model yet. It will help everyone, even those who do not know how to write code, to bring their ideas to life.
17:45
It just helped me.
17:46
Indeed. And it will help me right now. So I will try to show you that. I will actually try to build something that I would find useful, which is building a web app for my partner to learn how to speak French so that she can better communicate with my family. Here I have a prompt, I will execute it. It asks exactly what I just said. Please build a web app for my partner to learn French. One thing to note is that GPT-5, just like many of our other models, has a lot of diversity in its answers.
18:17
So what I like doing, especially when you do this type of vibe coding, is to take this message and ask it multiple times to GPT-5, and then you can decide which one you prefer. So I'm going to open a few tabs. Just going to paste there. Great. So while it's working on it, let's read through exactly the prompt I wrote. Create a beautiful and highly interactive web app for my partner, an English speaker, to learn French. Then I give a little bit more details, track her daily progress, use a highly engaging theme. It's already working.
18:52
I'm going to put it on the side for now, use a highly engaging theme, include a variety of activities like flashcards and quizzes that she can interact with. Then to make it even more fun for her, I actually asked GPT-5 to embed an educational game, which is based on the old snake game. But I asked to add this French touch to it, which is to replace the snake with a mouse. and the apples with cheese. And to make sure that it's educational, every time, I know it's complicated, please bear with me. Every time the mouse will eat a piece of cheese, I ask GPT-5 to voice over a new French word so that my partner can practice her pronunciation.
19:37
I can see how much you want her to learn. Great.
19:45
GB5 is still working on it. It already wrote 240 lines of code, which honestly is much more than what I would have written in that time. And... Yeah, front-end code is super hard.
19:58|Mark:
You know, you miss a couple things and it just doesn't work.
20:00
Exactly. But the good part is that you don't need to understand any of that right now. So we'll just let it through. Maybe we can check the other tabs. So I can simply press run code. So I'll do that and cross my fingers.
20:19
Whoa. Nice. Voila.
20:22
So we have a nice website. Name is Midnight in Paris.
20:28|Mark:
Oh, I love that. Learn together. So romantic.
20:30
We also see a few tabs, flashcards, quiz, and mouse and cheese exactly like I asked for. I will play that. So this says Le Chat, which says the cat, sorry.
20:42
Le Chat.
20:43
Well, that's pretty good pronunciation.
20:44
What does that mean?
20:45
The cat. So I can reveal and check if GPT-5 is correct. It is. So if I press next, I don't know if you saw, I think it actually updated the progress bar, which is exactly what I had asked for. Let's check the quiz. Here is the word no, which is no. So if I press on it. Bien joué, which means congrats. It updated the progress bar again. Let's check the mouse and cheese tab. Okay. That seems like a mouse. Here's the cheese. I'm going to try to play it. I can't promise I'm going to be good at it.
21:21
Okay. Seems to be working. Indeed, just when I eat the cheese, it gives me a new French word. It's actually super complicated and I already lost. I'm sorry. But let's just check a few other tabs just to see what is the type of diversity that GPT-5 can give you. So I can run the code here. That's not my favorite, but it seems that I can maybe switch. Look at that.
21:56
That's better. I like this mouse game better.
21:58
Yeah, I don't know. That doesn't look like a mouse, but let's check maybe the third one. Sometimes it's not great. The good thing with GPT-5 is that if you have something that you don't like, you can just ask it to change it, and it will do it for you. Let's check this one. Oh, that's nice. That's also something to note is that GPT-5 really likes purple, so you will see a lot of that. It's fine. Purple is my favorite color. Great. You will love GPT-5 then. So as we just saw, in a few minutes, GPT-5 built a few demos for us and for my partner to learn French.
22:36
GPT-5 really opens up a whole new world of vibe coding. And as we saw, there will be some small rough edges. But a good thing is that you can ask GPT-5 to fix them. GPT-5 really brings the power of beautiful and effective code to everyone. I can't wait to see what people will build with it. But until then, back to you, Marc.
22:59|Mark:
Thank you so much, Tina. Thank you so much, Elaine. Thank you so much, Sean. We've come a long way from the days where only five to ten lines of code were working. And now it's amazing that you can produce these kind of apps on demand. We've made Chatch BT5 much smarter, much powerful, and much faster. But we've also worked on enhancing some of the existing features. Here to talk about some of these features are Ruochen and Christina Kaplan. Chen comes from our multimodal research team, and he's going to talk about a feature, namely voice.
23:35
Thank you, Mark. So we've been steadily improving voice over the past year to make it more useful for everyone. First, it sounds incredibly natural, just like you're talking to a real person. Second, we've added video so that it sees what you see while chatting with you. Third, you also translate between languages consistently and smoothly across terms. But today, we're doing something very special. where we are bringing our best voice experience to everyone. Free users can now chat for hours, while paid subscribers can have nearly unlimited access. And voice is also available in custom GPT. Plus, subscribers now can custom tailor the voice experience exactly to their need.
24:16
You will follow our instruction closely. Let's try something fun. Excited. So I can ask the voice model to give me a comprehensive answer, a concise one, or even just a single word. Hey, Chad, could you only answer to me in one word, please, from now on?
24:38
Absolutely.
24:41
Describe the plot of the book, Pride and Prejudice.
24:46
Relationships.
24:47
That's true. Yeah. Give me a piece of wisdom. Patience. So that's the word of wisdom from our voice model. And I think our model is trying to say thank you for your patience waiting for GPT-5. Now, let's move on to something more practical. Learning is a top use case for our voice model. People love to use it to dive deep into topics, get quiz before exams, or learn new languages. For me, I've been learning Korean recently, and I found that speaking it and understanding it incredibly challenging. But let's see how voice can help. To do that, I will enable this study and learn mode.
25:30
It's a new mode feature we launched in ChatGBT that guides you step by step towards deeper understanding of a subject. Now we can enter voice. Hey chat, I'm learning Korean. Could you help me practicing it? Let's pretend I'm ordering at a cafe. Now, what should I say in Korean?
25:50
Absolutely, I'd be happy to help you practice. So if you're at a cafe and you want to keep it simple, you could start with something like, 안녕하세요, 아메리카노 한 잔 주세요. Which means, hello, I'd like one Americano please. And of course, you can adjust it based on what you want to order. Let me know if you want to try out more phrases.
26:08
So when it comes to Korean, could you speak that part really slowly? Because I'm a beginner and I want to hear every word clearly.
26:17
Now, let's level things up. Could you speak that part really, really fast? Faster than a Korean would do. Sure, I'll speed it up as much as I can, so it would sound like this. Basically, just zipping right to it. Let me know if you need any more speed adjustments.
26:53
Thank you, chat. So now this voice.
26:56
Thank you, thank you.
27:01
So now there's a voice, simpler, smarter, and more powerful than ever. We can't wait for you to experience it.
27:08|Mark:
Yeah, it sounds so much more natural than the voice we demoed just a year ago in our 4.0 demo. Yes. Great. So today, we would also like to announce a new feature and a set of features that make Chat GPT more personalized so that it's more like your AI. First, a very simple and fun one. We're now allowing you to customize the colors of your chats with a couple of options exclusive to our paid subscribers. We're also launching a research preview of personalities. You can now change the personality of Trag GPT such that it's more supportive, or it's more professional and concise, or maybe even a little bit sarcastic.
27:45|Mark:
And this lets you interact with Trag GPT in a way that's consistent with your own communication style. But the way that ChatGPT sounds and the way that it looks is just one part of making ChatGPT yours. One of my favorite features that we've launched over the last year has been memory. And we've made a lot of enhancements in memory in the time since. This allows ChatGPT to learn about you. And here to talk a little bit more about the memory feature is Christina.
28:13
It's been amazing to see your reaction and response to memory and ChachiBT getting to know you more and more over time. And this is our aspiration for ChachiBT to understand what's meaningful to you so it can help you achieve your goals in life. Chat2BT has already been so helpful for me. I'm training for a marathon right now, and Chat2BT is helping me pull together a personalized running schedule. But Chat2BT still has many limitations. It doesn't understand my actual schedule. Next week, starting with pro users, followed by plus team and enterprise users, this is changing, and we're giving Chat2BT access to Gmail and Google Calendar.
28:50
Let me show you how I've been using it. So I'll just ask something simple, like help me plan my schedule tomorrow. It's been a pretty busy week for us, so I've been using this every day this week to help get my life together. I've already given Chacha BT access to my Gmail and Google Calendar, so it just works, and it's easy here. But if you hadn't, Chacha BT would be asking you to connect right now. Let's see what Chacha BT is doing. OK. Pretty quick. Okay, so Chachi BT has pulled in my schedule tomorrow. And, oh, without even asking, Chachi BT found time for my run.
29:28|Mark:
I don't think I was invited to the launch celebration.
29:32
We'll get you on there. We'll get you on there. Chachi BT has found an email that I didn't respond to two days ago. I will get on that right after this. And even pulled together a packing list for my red-eye tomorrow night based on what it knows I like to have with me. It's been amazing to see that as GPT-5 is getting more capable, ChatGPT is getting more useful and more personal. We're really excited for you to try this out next week.
29:58|Mark:
Well, thank you so much, Rochen. Thank you. Great. So we've seen a little bit about features that we've enhanced. Here to talk a little bit about the research that went into ChatGPT and the safety that made it more deployable, we have Sachi and Seb.
30:13
Thanks, Mark. Hi, my name is Sachi, and I lead the safety training team at OpenAI. So in addition to mitigating hallucinations, we've also spent a significant amount of time mitigating deception. So this is instances where the model might misrepresent its actions to the user or lie about task success. This can especially happen if the task is underspecified, impossible, or lacking key tools. And we've found that GPT-5 is significantly less deceptive than 03 and 04 Mini. We've also completely overhauled how we do safety training. So our old models, the models would look at the user prompt and then decide to either outright refuse or fully comply.
30:54
And this works well in most settings, but you might have a cleverly worded prompt that would sneak through, or you might have a sensitive but legitimate question that would end up with an outright refusal. So as an example, let's take a look at this prompt. So this prompt is about a user who's asking for technical details on how to light Pyrogen, which is a material commonly used in fireworks. And this prompt is pretty dual use. This user might just be trying to set up their July 4th display, or they could be trying to cause harm with this kind of information.
31:27
And so for this kind of prompt, 03 over-rotates on intent. As you can see, this particular prompt is stated in a way that's relatively neutral and has a lot of technical details. So we can see that 03 fully complies with this prompt. However, if we take that exact same question and we frame it in a more explicit way, so it's clear what the user is trying to do, 03 will outright refuse, even though we're asking for the exact same information. For GPT-5, we've changed this approach entirely, and we're introducing something that we're calling safe completions.
32:00
The point of safe completions is, rather than judging the user's prompt, instead it tries to maximize helpfulness within safety constraints. So that might mean partially answering a question or just answering at a high level. If we have to refuse, we'll tell you why we refused, as well as provide helpful alternatives that can help create the conversation in a more safe way. So let's look at that same technical prompt that O3 complied with before. GPT-5 instead explains to the user why we can't directly help the user with lighting Pyrogen. It then guides the user towards safety guidelines and what parts of the manufacturer's manual the users should really be checking if they're trying to do this safely.
32:43
Overall, GPT-5 allows for better handling of tricky dual-use scenarios, and users will experience fewer, I'm sorry, I can't assist with that. And it creates a more robust safety system. This is one big step towards a more safe, reliable, and helpful AI. Sebastian?
33:03
Thank you, Sachi. With GPT-5, we are experimenting with a set of new training techniques that maximally leverage our previous generation of models. Today, frontier models do not just consume data, they help create it. We used OpenAI's O3 to craft a high-quality synthetic curriculum to teach GPT-5 complex topics in a way that the raw web simply never could. Recently in the industry, synthetic data has been talked about a lot. It's often viewed as a cheap way to just get more data. However, our breakthrough was not just to create more data, but rather to create the right kind of data, shaped in a way to teach rather than just to fill space.
33:46
This interaction between generations of models foreshadows a recursive self-improvement loop, where the previous generation of models increasingly helps to improve the data and generate the training for the next generation of models. Here at OpenAI, we've cracked pre-training, then reasoning, and now we're seeing their interaction significantly deepen. In the future, AI system will move far beyond our current pre-training and post-training pipelines that we have been used to, and we're seeing the first steps toward this right now, right here. We could not be more excited to see what scaling up this new set of techniques will yield in the near future.
34:26|Mark:
Thank you so much, and really impressive work to both of you. Thank you. There's one last feature that we'd love to highlight, which is in health. Here to share this feature, we have Sam.
34:38|Sam Altman:
Thanks, Mark. One of the top use cases of ChatGBT is health. People use it a lot. You've all seen examples of people getting day-to-day care advice, or sometimes even a lifesaving diagnosis. GPT-5 is the best model ever for health, and it empowers you to be more in control of your healthcare journey. We really prioritized improving this for GPT-5, and it scores higher than any previous model on Healthbench, an evaluation that we created with 250 physicians on real-world tasks. To talk about this, I'd like to invite my colleague Felipe and his wife Carolina to share their healthcare journey.
35:16|Sam Altman:
Thank you so much for joining us.
35:17
Thank you for having us.
35:21|Sam Altman:
So to start off with, could you tell us about the journey, the healthcare journey that you've been on?
35:26
Yeah. So last October, our lives were turned completely upside down when I was diagnosed with three different cancers, including an aggressive form of breast cancer at the age of 39 all within one week. And there's just absolutely nothing that prepares you to receive news like this. I found out about the first diagnosis when I got an email notification that my biopsy results were ready. I decided to open it. And when I opened it, I saw the only two words that I could understand from the report, which was invasive carcinoma. And I knew that wasn't good.
36:04
But everything else was just a blur of medical jargon. So I completely panicked. And in that moment, did the first thing that I thought of, which was to take a screenshot of the report and put it into ChatGBT to see if it could just help me understand what this meant. And within seconds, it translated this complex report into plain language that I could understand. And in this moment of overwhelm and panic, I had a little bit of clarity about what was going on. And that moment was really important because by the time I got a hold of my doctor and we got on the phone, which was three hours after I had seen the report, I had a baseline understanding of what I was facing and we were able to jump into a conversation about what to do next.
36:53|Sam Altman:
And how have you been using ChachiBT throughout?
36:56
I've used it in so many different aspects of my journey, but one of the ways that I found it most powerful is in helping me make critical decisions and in helping me advocate for myself. So to share an example, when I was facing a decision about whether or not to do radiation as part of my treatment, the doctors themselves didn't agree. My case was nuanced, and there wasn't a medical consensus on the right path. And so the experts turned the decision back to me as the patient. And for me, bearing the weight of this decision that could have lifelong impact felt really heavy, and I didn't feel equipped to make the call.
37:38
So I turned to Chad GPT to gain knowledge and understand the nuances of my case. And again, within minutes, it gave me a breakdown that not only matched what the doctors had already shared with us, but was much more thorough than anything that could fit into a 30-minute consultation. And it went further. It helped me weigh the pros and cons. It helped me understand the risks and the benefits. And ultimately, it helped me make a decision that I felt was informed, that I felt I could stand behind when the stakes were so high for me and my family.
38:14
I mean, for me, what was really inspirational was watching her regain her sense of agency by using Chatcha BT. In this moment, it'd be so easy to feel helpless. And there's such a big knowledge gap between what the doctors know and what we know. And, however, no one cares more about Carolina's health than she does. And so what I loved was seeing her really empower herself and gain knowledge and become an active participant in her own care journey.
38:41
And I think that's a really important point to emphasize. I think that the promise of AI in healthcare isn't in just breakthrough discoveries or better diagnostics. I think it's in creating smarter and more empowered patients that can fully participate and advocate for themselves in their care.
39:02|Sam Altman:
Speaking of that, you've been testing GPT-5. What do you think?
39:05
I've been so mind-blown about GPT-5 and its capabilities. One of the first things that jumps out at me is just how fast it is. Almost a little alarmingly at first.
39:16|Sam Altman:
I felt that too. It's like, are you sure you thought about that enough?
39:18
Did you think long enough? But it is very thorough. And more importantly, it feels more like a thought partner. that connects the dots. So rather than just translating information or giving you an answer, it helps you actually navigate the problem.
39:36
Yeah. A great example is we actually went back and took our initial biopsy prompts and put them into GPT-5. And GBT-4.0 had done a great job. It had translated, explained what these words meant, and helped in a way that we can understand. But GBT-5 seemed to understand more of the context and the question behind the question. Like, why would we be asking biopsy results? And so I said, well, here's actually what's not on here yet. Here's what results are still pending that you're going to have to ask about. Here are questions you might want to go ask your doctor and think when you start talking to them.
40:04
And so it really started to pull together a complete personalized picture. That's what really inspires us. I mean, you can see all the amazing improvements in the benchmarks. But what is so helpful is that this tool is available today. And the reason Carolina and I are here and the reason we feel so passionate about sharing our story is for that individual that's going to get a diagnosis like this today. Those families going through a cancer diagnosis, similar medical diagnosis, are going to face some of the most challenging decisions of their lives. And what really inspires me is that they're going to have access to better tools and support than we had even just eight months ago.
40:40|Sam Altman:
We're incredibly excited for that, too. Thank you so much for coming to share your story. We're pleased that ChatGPT has been able to be helpful to you, and we hope that the new version will really be able to help a lot of people. We wish you the very best.
40:51
Thank you.
40:52|Sam Altman:
Thank you. And I'd like to hand it over to our president, Greg Brockman.
41:09
Software engineering is already fundamentally changing and GPT-5 will turbocharge that revolution. We released our first coding optimized model back in 2021 and demonstrated in a live stream much like this one what we would call vibe coding today for the very first time. You talk to the model and ask it for a little application, like a little game, a little feature in a game, and would actually do it. I remember seeing the model being capable of doing this, and it was so mind-blowing. You just realize, we have to see where this goes. This is the promise of what computers can be, that you can talk to them and they actually do what you want.
41:45
They can really amplify what you're able to accomplish and what you're able to deliver to not just your own benefit, but really for the world. This year, we've released great coding models like GPD 4.1 and 03, but GPD 5 sets a whole new standard. It is the best model at agentic coding tasks. You can ask it to go and accomplish something very complicated, and it'll go off and it'll work on it. It'll call many tools. It'll work for many minutes at a time, sometimes even longer to accomplish your goal, your instruction, your task, whatever it is that you're trying to build.
42:19
It's incredible at front end. It makes very beautiful visualizations and interactive games and you've seen some of this in the live stream so far and you'll see some more upcoming. But it's just really amazing to see whatever you imagine coming to life. It's extremely good at instruction following, very detailed instructions, being able to accomplish when you have something very vaguely specified inferring your intent or something very detailed specified actually following it. And it's also, it's very fast at accomplishing these tasks. And again, thanks for the right amount of time to accomplish whatever it is that you have in front of you.
42:59
But so, we're making it available not just to developers to use to write their own code, but to build novel applications. So we're putting into the API, and to talk about that is Michelle.
43:15
Thanks, Greg. Hi, I'm Michelle and I lead a research team on post-training focused on improving our models for power users. And that includes use cases like instruction following and coding. Today, I'm so excited to tell you that we're shipping three state-of-the-art reasoning models in the API. GPT-5, GPT-5 mini, and GPT-5 nano. All three slot right in in the cost latency curve so you can pick the right one for your application. We're also, for the first time, releasing a new parameter option for reasoning effort called minimal. And this is so that you can use these reasoning models, but with minimal reasoning so that they can slot into the very fastest and most latency sensitive applications.
43:58
So now you don't actually have to choose between a bunch of models and you can use GPT-5 for all of your use cases and just dial in the reasoning effort. We also have a few new features coming to the API. The first is called custom tools. In the past, all of our function calling had the model wrap its outputs in JSON, and this works super well when the model needs to output a few parameters. But sometimes, developers are pushing our models to their limits, and they have extremely long arguments for tool calls. And it can be more challenging for the models to escape valid control characters out of 100 lines of code in JSON.
44:36
And that's why custom tools are just free-form plain text. And what's super cool is that we're releasing an extension to structured outputs where you can supply a regular expression or even a context-free grammar and constrain the model's outputs to that. And this will be super useful if you want to supply like a custom DSL, if you have your own SQL fork and specify that the model always follow that format. We're also shipping tool call preambles. And this is the model's ability to output explanation of what it's about to do before it calls tools. This is not super new, but 03 didn't have this capability.
45:14
And in GPT-5, it's supercharged with extreme steerability. The model is able to follow instructions about these preambles very effectively. You can ask the model to give a preamble before every tool call, or only when something notable is going to happen, or not at all. Next, we're shipping a verbosity parameter. We've actually wanted this in the API for a long time, and now you can set verbosity to low, medium, and high to control how terse or expansive the model is with its outputs. GPT-5 is a state-of-the-art coding model. On SweetBench, a measure of Python coding ability, GPT-5 sets a new high of 74.9% versus the 69.1% from 03.
46:00
On Adr polyglot, which is a benchmark that covers all sorts of programming languages and not just Python, GPT-5 scores 88%, a stark improvement over 03. You've also seen that it's incredible at front-end web development. And so we've asked human trainers to look at outputs from GPT-5 and 03 and pick which they prefer. And they prefer GPT-5 70% of the time for its improved aesthetic abilities, but also better capabilities overall. But GBD5 is not just for coding. It's incredible at agentic tool calling. It's the leading state-of-the-art model for tool calling, and we see this on the new Tau squared benchmark.
46:41
This benchmark, released just two months ago, is a test of the model's ability to call tools and work in concert with the user to solve a challenging problem, this case in the telecom industry, so trying to solve the problem for a user not having their service working. Just two months ago, no model in the field scored more than 49%, and today, GBD5 scores 97%. GBD5 is also state-of-the-art on general purpose instruction following. It scores 99% on Kali, which signals a great departure for this benchmark for us. It also scores 70% on scale's multi-challenge benchmark, up 10 points from 03.
47:25
And this is a measure of multi-turn instruction following. Finally, the instruction following eval I actually prefer the most is one we've built in-house. It's based on real API use cases. And for that reason, it's a really good measure of how GPT-5 will perform in your application. On the hard subset of this, GBD5 scores 64% up from 47% from 03, a pretty meaningful improvement. So we think it will perform quite well in your applications. We're also bringing GBD5 to a longer context window in the API. It's now got 400K of total context up from 200K from 03.
48:07
But it's not enough to just release a longer context window. We wanted to make it more effective and usable. GPT-5 is state-of-the-art on the 128K to 256K of OpenAI MRCR, which is a benchmark we open-sourced two months ago on long-context retrieval capability. It's also state-of-the-art on OpenAI's GrafsWox BFS benchmark, which is a measure of the model's ability to reason over long-context inputs. It's a great merger of the reasoning capabilities and also the longer context in this model. We're also open sourcing a new long context eval called Browse Comp Long Context to measure the model's ability to answer challenging questions over long context.
48:51
We're excited to spur on more work in this field. We think GPT-5 is the best model for developers. It was trained with a focus on real world utility and less so on benchmarks, but we happened to pick up a few of those along the way. We focused a lot on the intersection of engineering and research, and we think you'll really love working with this model.
49:25
Thank you, Michelle. As Michelle was saying, the benchmarks, they're exciting numbers, but we're starting to saturate them. When you're moving between 98% and 99% in some benchmark, it means you need something else to really capture how great the model is. And one thing we've done very differently with this model is really focus on not just these numbers, but really on real world application and being really useful to you in your daily workflow. So hearing about it is much less exciting than seeing it. So to show you this model in action, I'd like to welcome Addy and Brian to the stage.
50:00
Thanks, Greg. I'm Brian, a solutions architect on the startups team. I'm Addy, a researcher on the post-training team. To recreate the ideal pair programmer, you need a model that understands best software engineering practices, but has a personality that just feels right to work with. For GPT-5, we worked really hard to make the model pair perfectly with you by default, out of the box. Let me pull up a demo of GPT-5 inside of Cursor to show you this behavior that we taught it. So last month I was on a different live stream and towards the end I ran into a bug that I covered up.
50:43
And afterwards I tried to have GPT-5 or I tried to have GPT-3 fix it for me and it couldn't. So while we were testing GPT-5 before this, I had it see if it could fix that bug for me. And to taunt the demo gods, I'm gonna see if it can do it on stage.
51:00
All right, let's hope for better luck than with GPT-3.
51:02
This is less about that fix and more about the behavior of the model during this process. So right away, you're going to see that it's going to tell you its plan up front. It's going to tell you how it's going to look for the bug, maybe how it's going to fix it. This kind of communication builds trust during a coding session and helps you redirect if you need to, but you don't need to.
51:23
I like how it's giving you updates. Like it said, it's going to search, and now it's continuing.
51:28
Yeah, it searches faster than me. It's using the same best practices that I would while I was hunting this down, but it is much more powerful than I am as a developer.
51:37
Now, did you try to fix the bug yourself?
51:39
I did. I couldn't do it. I mean, I was busy, so. Okay, so continuing on, it's starting to figure out where it's going. It's going to sort of figure this out. So while this is going, let me tell you a little bit about how we trained GBD5 to behave this way. We started by talking to users and customers about how our models perform in the most popular coding tools like Cursor. And we identified frustrations and rough edges and we boiled it all down into four personality traits. autonomy, collaboration, communication, context management, and testing. We turned those into a rubric that we used to shape the model's behavior, and then we tuned it until it felt like a collaborative teammate while we were using it.
52:31
It's been really amazing to see the team really doing the grind of going and seeing how this model behaves in practice, figuring out what people really want, and putting that back into model training. That's something that I think has been like a real focus for this model.
52:44
It's been pretty great. So while this is fixing, the other thing that we did during testing, which was really surprising, was we were sort of pressed for time. And we had it refactor one of our test harnesses to run parallel on Docker, and set it off, came back like 45 minutes later, it just finished, and we tested it out and it ran the first time. It was pretty surprising. That's incredible.
53:09
That is magical.
53:11
OK, so it made the edits. It looks like, yeah, it found the right problem. And right now, it's actually, OK, it's running lints. But these lints are actually not related to this bug. So it's going to ignore them. And then it's going to run a build. It'll run tests if there are any. It's going to make sure that this code is shippable before it's done.
53:32
It's actually really smart that it finds lints and realizes that these aren't relevant to the specific bug we're fixing. It's not making unnecessary edits.
53:41
Totally. So this is just one example, but it really shows the power of the autonomy and the collaborative communication and how it stays reliable on difficult coding tasks without getting stuck on death loops. And the best part, GPT-5 is totally tunable. You can steer it with system prompts or cursor rules. You can change its verbosity levels or reasoning levels to match your tasks. And if you get stuck, Ask it, GPT-5 is actually really good at modifying its own prompts by meta prompting. So after using this for the past few weeks, it really feels like we've achieved state of the art, zero shot performance and reliability across the most complex coding tasks.
54:28
For me, it's the first time I trust a model to do my most important work. This is beyond vibe coding. It's an incredibly powerful tool and I'm really excited for people to try it.
54:43
Thanks, Brian. It's super exciting to see how far GPT-5 has come when it comes to coding personality and steerability. I'm really excited to show how great GPT-5 is at front-end coding, where design and aesthetics really matter. So I've got two demos for you today, one for work and one for fun. Let's start with the work example. So imagine you're the CFO of a startup. I have some data that I'd like to visualize about the company. And I'm going to ask the model to make me a dashboard. So you'll see here that I'm being specific about the audience.
55:19
So the target audience is the CFO. It said, create a finance dashboard for my startup. and I've asked it to be beautiful, tastefully designed with some interactivity, and to have a clear hierarchy for easy focus on what matters. I've also specified what frameworks it should use, and you can see that it's actually started. It's following my instructions and using Create Next app to make a Next.js project. Totally from scratch. Yeah, exactly.
55:47
Now, how long do you think this kind of task would take you to take?
55:49
Yeah, easily at least a couple of days. I'm not a front-end expert. Just to understand the latest frameworks and piece everything together would easily take me a few days. We'll see how long it takes with the model. Yeah. And it's really cool to see that the model is thought for a bit and it's explaining how it's going to structure the project. So it's talking about how it's going to scaffold a new Next.js app. It's going to use Tailwind CSS. It's running a couple of commands to install dependencies, which is cool. And now it's proceeding to implement the rest of the project.
56:25
So while this runs, I'm going to talk a little bit about how we trained GPT-5 to be a great front-end coding model. We tried to follow the principle of giving it good aesthetics by default, but also making it steerable. So if I give the model a concise prompt, it should be able to infer my intent and make something that looks great by default. On the other hand, if I'm specific about a layout or frameworks that I want the model to use, it should follow my instructions precisely. And this makes it the best of both worlds for developers.
56:56
We also train GPT-5 to be much more agentic than previous models. So if you give it a task like this, it will run long chains of reasoning and tool calls and just go to work to build code that is both ambitious and coherent.
57:11
I like how you said ambitious because it means it goes above and beyond without going off track or off what you specified.
57:17
Yeah, exactly. So what we want is the model should adhere to my prompt, but also be ambitious and go above and beyond when it thinks it can. And so checking in here, looks like the model is making progress. It's creating a readme file. Yeah, and I think it's thinking about how to make the code modular. It's created a bar, bar chart component. Looks like it's continuing here.
57:52
I love that it doesn't just write the code, it also really thinks about proper abstractions and documentation and really the whole life cycle of what it is to write software.
58:00
Yeah, exactly. It's not just writing the code like in Sweet Bench, but it's also communicating about the code and explaining what it's doing. Let's check in to see what's going on. So while this runs, GPT-5 understands details much better than previous models. So when we train the model, we taught it to understand details like typography, color, and spacing in a way that just eclipses any previous model we've shipped. I remember with old models, you would have to write really specific prompts to get it to do what you want, but GPT-5 just gives you great results by default.
58:37
During testing, we were looking at the A's and B's for different versions of the model to see if it was doing better at UI. And at some point, we stopped being able to tell and actually had to pull in designers to teach us what was better.
58:49
Yeah, it was really fascinating to see the model's aesthetic preferences evolve during training. And we woke up one day and it was just making these great UIs.
58:58
How do the model's aesthetic preferences compare to your own?
59:02
Yeah, I think in general, I feel like the model has better aesthetics than me. Like, usually I defer to its judgment. And I find that, like, really helpful when I'm trying to make an app. Like, I'm not exactly sure how I want it to look, but the model's defaults are just great. Yeah, and checking in here, so you can see that the model has actually structured the code into these different components. So it's made a sample data TypeScript file, KPI card, component, revenue chart. And like I said, it's super modular, and it's thinking about how to not just write code, but write high quality code that can actually be merged.
59:47
It feels like it's close. Yeah, I think it's pretty close. It did say ambitious. Yeah. Okay, cool. So this is awesome. So you can see here that it's actually building the project and streaming errors back to itself. For me, this was just a profound moment to see that the model could write code, but also run builds, stream the errors back, and iterate on the code. So it's able to improve its own code in this sort of self-improvement loop, which is fascinating.
60:16
And it's definitely a good taste of what the future holds as well, right? When you really think about where these models can go and how much they can accelerate developers and kind of all aspects of what we all collectively do.
60:27
Yeah, exactly.
60:32
Nice. It actually just fixed a bug that it found in that previous build. OK, cool.
60:35
Nice. Yeah, looks like it's done. Let's check it out. So I'm going to follow the instructions that I don't really know front end. So let me see how I should run it. So it's saying CD to the directory and then run npm run dev. So let me do that. And it looks like it's being served on port 3001. So let me just open that port. Wow, it's alive. Nice. Yeah, so you can see here. Let's check it out. So the model has made me a dashboard. It's telling me like my ARR, cash. Looks like this company is doing pretty well.
61:13
You can see that revenue is growing. And the model's added like some interactivity here. So if I hover over a graph, it actually tells me the exact value for a particular day. It would take me like five hours to do that in D3. Yeah, imagine like manually doing this in D3.
61:28
Now, just because it's so easy to take this for granted, could you remind the audience what the actual prompt was? Like how much creativity and sort of understanding your intent was required to accomplish this?
61:38
Yeah, it's crazy that this, you know, this prompt is so concise and it's able to just give me something that looks beautiful in just five minutes. That's amazing. Yeah, it's also implemented another graph here showing our customers. It's also implemented a date picker so I can filter by different dates and visualize data accordingly. Yeah, it's even segmented it by customer segment, which is cool. So this is just one example that highlights the power of GPT-5.
62:14
There will no longer be excuse for ugly internal applications.
62:16
Exactly. Let's go to the fun demo.
62:21
I mean, this was pretty fun, but even more.
62:24
Even more, yeah. So, I have a younger cousin and I want to make a game for her. So, I want to make a 3D game that incorporates a castle so you can see my prompt. I'll just kick this off. It's always the non-AR parts. Yeah, exactly. Yeah, OK. So you can see my prompt, create a beautiful castle. I've included some details, like we want people patrolling the walls, some movement, horses. And I want a mini game where I can pop balloons by clicking on them. And this should make a sound effect. So let me run this in cursor.
63:08
I'll just paste it in, and I'm going to show an example that I've already generated just to save some time. So here is the beautiful castle that the model made. So it's just wild how from a concise prompt, the model has this great sense of aesthetics where it's made this floating rock. made a 3D castle. And if you zoom in, you can see tons of detail. These guards that are walking around, cannons firing. Do you want to fire the cannons if you click this button?
63:39
Oh, yes. Of course. Who wouldn't want to?
63:42
There we go. So, you can fire the cannons. You can even chat with the characters. So, we'll say hi to Captain Rowan. We have names. Names. Say hello to the merchant. Merchants selling some stuff. What's your favorite song? A ballad of banners and daunts. Nice. Give me some wisdom. Curiosity is volatile. Yeah, that makes sense. Minigame. Yeah. Do you guys want to try the minigame? Absolutely.
64:15
Let's play the minigame.
64:16
So if you hit this button, you want to try it, Greg? All right. So you can fire at these balloons.
64:23
Oh, wow. All right. Oh, no, I'm not good at it. Hold on. Maybe I can ask GPT-5 for some help with it. You hit one, yes. I got one. Oh, there we go. We got a sound effect.
64:35
These are historically accurate balloons. Yes.
64:39
Did I get a second one yet? This game is harder than it looks. Hold on, we got a balloon coming. There we go. All right. Nice. I think I should quit while I'm ahead.
64:53
Cool. So working with GPT-5 has been really fun and profound for me because for me, this is the first model I've worked with that actually has a sense of creativity. And we're really excited to see how GPT-5 unlocks your creativity.
65:08
All right. Thank you both. This is absolutely amazing. Now, we believe that GPT-5 is the best coding model in the world, but don't just hear it from us. To talk more about this model and how to make it really useful for developers, I'd like to welcome Michael Truel, who is the co-founder and CEO of Cursor.
65:31
Thank you. Good to be here. Great to have you.
65:34
Yes. So, what was your very first experience with GPT-5?
65:40
So when we got access to GPT-5, we just set about using it on our actual work. And so to start with, as a test, we asked it to tell us something non-obvious about our code base. And within a couple of minutes, it buried into the code base. It identified a particular system that we use for remote code execution. And it identified a non-obvious architecture decision we had made. And then it also understood why we made that architecture decision. And it was to harden our security. And those were architecture decisions and trade-offs that took humans weeks to think through.
66:11
So it was kind of amazing to see its code-based understanding abilities from the get-go.
66:15
That's really great. Not just the code writing, but actually the code reading and understanding. Yes. Turns out there's so much more to software than just the emitting of the code. Yes, yes.
66:24
No, the understanding is an important prerequisite. And what has most stood out to you about GPT-5? It's incredibly smart. It's a very smart model. And even though it is smart, it does not compromise on its ease of use for real pair programming. And that means it's incredibly fast. That also means that it's quite interactive. And so it's good about talking about what it's about to do, breaking problems down into sub-problems that a human can then see, and leaving a reasoning trace that you can then intervene on and react to. It's also great not just that you give it one initial query and then it goes and does that, but working with you over a long session where you're asking it to backtrack on something that it's gone down or asking it to make additional changes to the code base.
67:10
Should we show it in action? Let's do it. Yes. So I think we are going to Go and we're going to try and solve a bug. And so this is the OpenAI Python SDK. There are a bunch of issues in the OpenAI Python SDK. There are also a lot of closed issues. OK, good. And it seems like there's a problem with uploading PDFs through the SDK.
67:30
This has been open for three weeks. So it's not a trivial problem.
67:34
And so let's see if we can go tackle this issue. So we're going to go, we're going to take the issue, we're going to paste it into the editor, paste it into cursor, and gbd5 is going to set off and try and solve the problem. And this is actually an example of the robustness of the model in the API. where to solve this problem in Cursor, it's working with a set of custom models that it hasn't seen before, a set of custom tools that it hasn't seen before, to do things like pull down text from the web, to search throughout the code base, and it's incredibly robust and adapt it using those tools, and they boost the overall results.
68:08
Yeah, I love seeing just the full explanation of all the things that it's running and doing. And I guess, yeah, how does this seem to compare to how you would solve this problem?
68:15
Well, it's very fast. You can see it's made a high-level plan, searched throughout the code base, it started to read some files, and continued searching, and now it's kind of thinking through what it'd like to do next. And now it's started to actually solve the issue, and started to think through some code changes.
68:36
Now, any advice for people on how to get the most out of GPT-5 Incursor?
68:42
I would suggest using it for your real work. So GPT-5 is a step forward towards a real pair programmer. And so I would start using it as a helper on, you know, as a daily driver model for you. And so if you haven't used AI to code much before, you know, I would take some of your more scoped down problems and try handing them off to the bot and working with it synchronously.
69:03
Yeah, I think the fact that GPT-5 is so great for the real world, like big code bases, like doing your daily driver, not just this like demo of a cool one-off application. As cool as that is, right? The real value comes from really operating in a larger code base and sort of these long-lived applications.
69:19
and its could-be's understanding is very impressive. Also, its ability to be steered is impressive. And so, yeah, if you specify a long, complicated task with lots of subtleties in the initial instructions, it's very good at picking up on those subtleties. It's also very good at if it's gone down a wrong path and actually goes and executes the code or hears back from you that it was incorrect. It's very good at backtracking, too. Now, what can't GPD5 do? Well, we're really excited about computer using capabilities, about those getting better. It would be great if, for instance, the dashboard audio just showed.
69:55
If it could run the code, see the output, actually kind of QA every little bit itself, and then react to it. And yeah, so looking forward to computer using capabilities. How would you like GPT-5 to be better?
70:07
Oh, well, I think that is a great one. It's just expanding the dimensions. I think it's in all directions. There's so much of doing DevOps and other work that is external to software code writing as we think of it today. But also, you look at these demos. We run them for five minutes, 10 minutes, a couple hours. But I think extending that lifecycle to really be able to go for days and weeks and eventually even months, I think that is ultimately where we expect things to go.
70:37
So we can see that it has buried into the code base and discovered that there's an issue with the MIME or the MIME typing being sent up for PDFs and the plumbing through the SDK. It has identified that and it started making some code changes. And this has created some new methods. It's gone and edited some existing code. And this looks roughly correct. Looks pretty good. And we'd love to merge the PR too.
71:02
I would love to do that as well. Let's do that after the show. Yes, that sounds great. All right, cool. Well, thank you so much. We're so excited to have GPT-5 in Cursor. And starting today... Excited to partner with you guys.
71:14
And so yeah, starting today, GPT-5 is default for new users in Cursor and we're releasing it to all Cursor users, free to try for the next few days so people get a sense of the model. And it is the smartest coding model we've ever tried. Awesome. Thank you so much, Michael. Thanks.
71:38
We think of it like, it's great for the enterprise. We think of it like a subject matter expert that is in your pocket, that is an expert across every domain, legal, finance, whatever application you have in mind. To talk about how GPT-5 can be applied to the enterprise, I'd like to welcome Olivier to the stage. Thank you. Thank you, Greg.
72:02
Hi, everyone. I'm Olivier. I lead the platform at OpenAI. At this point, I think you got the message. We care a ton about developers and coding. But that's not all. Enabling businesses and governments is critical to open-eyed mission. Put shortly, we want to enable the key industries to transform themselves, such as healthcare, education, energy or finance. Since we launched our GPT and the API, five million businesses have been using our technology. I'm still mind blown, five millions businesses. And those businesses are not just playing, they are not just experimenting, they are pushing in production new products in the real world.
72:44
And I believe GPT-5 is going to be a step function with that regard. As Sam mentioned earlier, the possibility to have a subject matter expert in your pocket is going to enable every employee to do more. But let me give you a few examples. First, I want to talk about life sciences. Amgen is a company in the US that designs new drugs, new medicines, to fight some of the toughest human diseases. Amgen was one of the first testers of GPT-5. And they used it in the context of drug design. And what Amgen scientists found is that GPT-5 is particularly good at deep reasoning with complex data.
73:21
Think analyzing scientific literature or clinical data. Next, I want to talk about finance. BBVA is a multinational bank which is headquartered in Madrid, in Spain. BBVA has been using GPT-5 for financial analysis. And the takeaway was pretty clear. GPT-5 beats every single other model out there in terms of accuracy and speed. What used to take three weeks for a financial analyst to do, GPT-5 can do it in a couple of hours. Next, I want to talk about healthcare. Oscar is an insurance company based in New York. And they've been using DeepT5 and what they found is that DeepT5 is the single best model for clinical reasoning.
74:03
Think mapping complex medical policy to patient conditions. It's not all about businesses. It's also about governments. We are super excited by the announcement that we made yesterday that the 2 million U.S. federal employees will be able to use GPT-5 into GPT. And I cannot wait to see how that enables to deliver better, faster services to the American people. And frankly, that's all very cool, but I think that's the tip of the iceberg. If history is a teacher, and we've seen it with DeepT4, we are going to see many, many news cases emerge over the coming weeks and months that all of us could not even imagine.
74:41
And so I cannot wait for us to invent that future together. Let's talk quickly about pricing and availability. GPT-5 is going to be available in the API starting today. Three models, GPT-5, GPT-5 Mini, GPT-5 Nano. GPT-5 is going to be priced at $1.25 and $10 per million input token and output token. Mini and Nano are even faster and more affordable. Nano, don't sleep on it. It's 25 times more affordable than GPT-5. It's pretty cool. I cannot wait to see what you will build. And next, our chief scientist, Jacob, is going to close us out.
75:26
Thanks, Olivier. At OpenAI, at the core, we are about understanding this miraculous technology called deep learning and what its consequences are. Our research aims to understand what deep learning is capable of and how to steer it to make it safe and useful for all of us. This is a work of passion. And it's a mission. And I want to recognize and just deeply thank the team at OpenAI. It is a great privilege. It is a great privilege for me to work alongside this incredible group of brilliant people driven by this shared goal. What adds up to a model like GPT-5 are years of investigations aimed not only at producing a great release, but at building understanding of this underlying technology itself.
76:46
And so a lot of what you'll see in this model are really just early glimpses of new ideas that we believe will go much further. There is a lot we still have to understand, and we look towards a future where AI can uncover new knowledge about the world and meaningfully transform our lives for the better. We hope you'll enjoy what we've built. And we'll get back to sailing. Thank you.