Fei-Fei Li: Spatial Intelligence is the Next Frontier in AI

Fireside with Dr. Fei-Fei Li on June 16, 2025 at AI Startup School in San Francisco. Dr. Fei-Fei Li shares insights on the creation of ImageNet, breakthroughs in computer vision, and the challenges of modeling spatial intelligence for AGI.

Part of Y Combinator Collection

Startup Wisdom

Explore 500+ transcripts from Y Combinator's startup accelerator program, featuring insights from successful entrepreneurs and industry leaders.

Browse Full Collection

Convert your Audio Video Files to Text

Speaker Separated Transcription98% Accurate

Start Transcribing

Free for files under 30 minutes

0:00|Dr. Fei-Fei Li:

My entire career is going after problems that are just so hard, bordering, delusional. To me, AGI will not be complete without special intelligence. And I want to solve that problem. I just love being an entrepreneur. Forget about what you have done in the past. Forget about what others think of you. Just hunker down and build. That is my comfort zone.

0:29|Diana Hu:

So I'm super excited here to have Dr. Fei-Fei Li. She has such a long career in AI. I'm sure a lot of you know her, right? Raise your hand. I know you too. She's been named the godmother of AI. One of the first projects that you created was ImageNet in 2009, 16 years ago. Oh my god. Don't remind me of that. That has over 80,000 citations. And it really kicked off one of the legs of Stools for AI, which is the data problem. Tell us about how that project came about. It was pretty pioneering work back then.

1:23|Dr. Fei-Fei Li:

Yeah, well, first of all, Diana and Gary and everybody, thanks for inviting me here. I'm so excited to be here because I feel like I'm just one of you. I'm also an entrepreneur right now. I just started a small company, so very excited to be here. ImageNet was Yeah, you're right. We actually conceived that almost 18 years ago. Time really flies. I was a first year assistant professor at Princeton. Oh, wow. Hi. Hi, Tigers. Yeah, and the world of AI and machine learning was so different at that time. There was very little data. Algorithms, at least in computer vision, did not work.

2:11|Dr. Fei-Fei Li:

There was no industry. As far as the public was concerned, the word AI doesn't exist. group of us, starting from the founding fathers of AI, right? John McCarthy, and then we go through people like Jeff Hinton. I think we just had an AI dream. We really, really want to make machines to think and to work. And with that dream, my own personal dream was to make machines see. because seeing is such a cornerstone of intelligence. Visual intelligence is not just perceiving. It's really understanding the world and do things in the world. So I was obsessed with the problem of making machines see.

2:57|Dr. Fei-Fei Li:

And as I was obsessively developing machine learning algorithms, at that time we did try neural network, but it didn't work. We pivoted to base net to support vector machines, whatever it was. But one problem always haunted me. and it was the problem of generalization. If you're working in machine learning, you have to respect that generalization is the core mathematical foundation or goal of machine learning. And in order to generalize these algorithms, these data, yet no one had data at that time in computer vision. And I was the first generation of grad student who was starting to dabble into data because I was the first generation of graduate student who saw the internet, the big internet of things.

3:46|Dr. Fei-Fei Li:

So fast forward. around 2007-ish, my student and I decided that we have to take a bold bet. We have to bet that there needs to be a paradigm shift in machine learning, and that paradigm shift has to be led by data-driven methods. And there was no data, so we're like, okay, let's go to the internet, download a billion images, that's the highest number we can get on the internet, and then just create the entire world's visual taxonomy. And we use that to train and benchmark machine learning algorithm. And that was why ImageNet was conceived. came to life.

4:32|Diana Hu:

And it took a while until there were algorithms that were promising. It wasn't until 2012 when AlexNet came out. And that was the second part of the equation with getting to AI, was getting the compute and throwing enough at it and algorithms. Tell us about what was that moment where you started to see, oh, you seeded it with data. And now the community started to figure more things out for AI.

5:00|Dr. Fei-Fei Li:

Right, so between 2009, we published this tiny little CVPR poster. In 2009 to 2012, Alex said, there were three years that we really believe that data will drive AI, but we had very little signal in terms of if that was working. So we did a couple of things. One is we open sourced. We believed from the get-go we have to open source this to the entire research community for everybody to work on this. The other thing we did is we created a challenge, because we want the whole world's smartest students and researchers to work on this problem.

5:44|Dr. Fei-Fei Li:

So that was what we call the ImageNet challenge. So every year, we release a testing data set. Well, the whole ImageNet is there for training, but we release testing. And then we invite everybody openly to participate. And then the first couple of years was really setting the baseline. The performance was in the 30% error rate. It wasn't zero. I mean, it wasn't completely random, but it wasn't that great. But the third year, 2012, I wrote this in a book that I published, but I still remember it was It was around the end of summer that we were taking all the results of ImageNet challenge and running it on our servers.

6:40|Dr. Fei-Fei Li:

And I remember it was late night. One day I got a ping from my graduate student. I was home and said, we got a result that really, really stand out. And you should take a look. And we looked into it. It was convolutional neural network. It wasn't called Alex at that time. That team, that Jeff Hinton's team was called Supervision. It was a very clever play of the word super as well as supervised learning. So Supervision. And we look at what Supervision did. It was an old algorithm. Convolution on your network was published in the 1980s.

7:22|Dr. Fei-Fei Li:

There was a couple of tweaks in terms of the algorithm. But it was pretty surprising at the beginning for us to to see that there was such a step change. And of course, we, you know, we I mean, the rest of the history, you all know, we presented this in the image that challenge workshop in that year's ICCV, Flores, Italy, and Alex Kruschevsky came, and many people came. I remember Yung Lekugo also came, and now the world knows this moment as the image that challenged Alex that moment. I do want to say that it's not just convolutional neural network.

8:11|Dr. Fei-Fei Li:

It was also the first time that two GPUs were put together by Alex and his team and were used for the computing of deep learning. So it was really the first moment of data, GPUs, and neural network coming together.

8:30|Diana Hu:

Now, following this trend of the arc of intelligence for computer vision, ImageNet was really the seed to solve the concept of object recognition. Then, right after that, it started to also, AI got to the point that could solve scenes, right? Because you had a lot of the work with your students like Andrew Kaparthi being able to describe scenes. Tell us about that transition from objects to scenes.

8:55|Dr. Fei-Fei Li:

Yeah, so ImageNet was solving the problem of you're presented with an image and then you call out objects. There's a cat, there's a chair and all that. That's a fundamental problem in visual recognition. But ever since I was a graduate student entering the field of AI, I had a dream. I thought it was a 100-year dream, which is storytelling of the world. is that when humans open their eyes, you know, imagine you just open your eye in this room. You don't just see person, person, person, chair, chair, chair. You actually see a conference room. with screen, with stage, with people, with the crowd, the cameras, you actually can describe the entire scene.

9:45|Dr. Fei-Fei Li:

And that's a human ability that is at the foundation of visual intelligence. And it's so critical for us to use in terms of our everyday life. So I really thought that problem would take my entire life. I literally, when I graduated as a graduate student, I told myself on my deathbed, if I can create an algorithm that can tell the story of a scene, I've succeeded. That was how I thought my career would be. Imagine Alex, that moment came, deep learning took off, and then when Andre and then later Justin Johnson enter my lab, we start to see signals of natural language, you know, and visions start to collide.

10:34|Dr. Fei-Fei Li:

And then Andres and I proposed this problem of captioning images or storytelling And long story short, around 2015, Andre and I published a series of papers that was among the first with a couple of concurrent papers of making literally a computer that captioned an image. I almost felt like What am I going to do with my life? That was my lifelong goal. It was such an incredible moment for both of us. Last year I gave a TED Talk and I actually used something that Andre tweeted a couple of years ago, around the time he finished image captioning work, that was pretty much his dissertation, I actually joked with him.

11:35|Dr. Fei-Fei Li:

I said, hey, Andre, why don't we do the reverse? Take a sentence and generate an image. And of course, he knew I was joking, and he said, haha, I'm out of here. The world was just not ready. But now fast forward, now we all know generative AI. Now we can take a sentence and generate beautiful pictures. So the moral of the story is AI has seen incredible growth. And personally, I feel I'm the luckiest person in the world because my entire career started at the very beginning of The end of AI winter, the beginning of AI starting to take off and so much part of my own work, my own career is part of this change or helped with this change.

12:30|Dr. Fei-Fei Li:

So I feel so fortunate and lucky and in a way proud.

12:36|Diana Hu:

And I think the wildest thing, even to achieve your lifelong dream of describing a scene and even generating them with diffusion models, you're actually dreaming bigger because the whole arc of computer vision went from objects to scenes and now this concept of world. And you actually decided to move from academia being a professor to now being a founder and CEO of World Labs. Tell us about what world is. It's even harder than it seems and objects.

13:07|Dr. Fei-Fei Li:

Yeah, it is. It is kind of wild. So So, of course, you all know the past, it's really hard to summarize the past five or six years. For me, we're living in such a civilizational moment of this technology's progress, right? While computer vision, as a computer vision scientist, we're seeing this incredible growth from image that to image captioning. to image generation using some of the diffusion techniques. While this is happening in a very exciting way, we also have another extremely exciting thread, which is language, which is LLMs, which is that really 2022 November, Chagibiti blasted open the door of truly working generation models that can basically pass the Turing test and all that.

14:07|Dr. Fei-Fei Li:

So this becomes very inspirational, even for someone as old as me, is to really think audaciously about what's next. And I have a habit, as a computer vision scientist, a lot of my inspiration actually comes from evolution as well as brain science. I find myself in many moments of my career where I'm looking for the next North Star problem to solve. I ask myself, what evolution has done or what brain development has done? And there's something that's really important to notice or to appreciate. The development of human language in evolution took about, if you're super generous, let's just say took about 300 to 500 million years, less than a million years.

15:07|Dr. Fei-Fei Li:

That's the length of evolution that took to develop human language. And pretty much humans are the only animals that have sophisticated language. We can argue about animal language, but really language in its totality in terms of being a tool of communication, reasoning, abstraction, it's really humans. So that took less than even half a million years. But think about vision, think about the capability of understanding 3D world, figuring out what to do in this 3D world, navigate the 3D world, interact with the 3D world, comprehend the 3D world, communicate the 3D world. That journey took evolution 540 million years.

15:58|Dr. Fei-Fei Li:

The first trilobite developed a sense of vision underwater 540 million years ago, and since then, really vision was the reason that set off this evolutionary arms race. Before vision, animals were simple. For the half billion years before vision, there's just simple animals. But the next half billion years, 540 million years, because of the capability of seeing the world, understanding the world, evolutionary arms race began. and animal intelligence just start to race each other. So for me, solving the problem of spatial intelligence, to understand the 3D world, to generate the 3D world, to reason about the 3D world, to do things in the 3D world is a fundamental problem of AI.

16:55|Dr. Fei-Fei Li:

To me, AGI will not be complete without spatial intelligence. And I want to solve that problem. And that involves creating world models, world models that goes beyond flat pixels, world models that goes beyond language, world model that truly capture the 3D structure and the spatial intelligence of the world. And the luckiest thing in my life is no matter how old I am, I always get to work with the best young people. So I, you know, I founded a company with three incredible young but world-class technologists, Justin Johnson, Ben Mildenhall, and Christoph Lassner. And we are just going to try to solve, in my opinion, the hardest problem in AI right now.

17:49|Diana Hu:

Which is incredible talent. I mean, Chris, he was the creator of Pulsar, which was the initial seed before Gaussian Splats that did a lot of differentiable rendering. There's Justin Johnson, your former student, who really has this super system engineering mind that got real-time neural-style transfer. Then you got Ben, who was the author of NERF paper. So this is a super crack team. You need such a crack team because we were chatting a bit about that, that vision is actually harder than LLMs to some extent. Maybe this is a controversial thing to say because LLMs are basically 1D, right?

18:32|Diana Hu:

but you're talking about understanding a lot of the 3D structures. Why is this so hard? And it's still like behind language research.

18:41|Dr. Fei-Fei Li:

Yeah, no, I really appreciate, Diana, you empathize how hard our problem is. Yeah, so language is fundamentally 1D, right? Syllabus comes in sequence. I mean, this is why sequence to sequence, sequence modeling is so classic. There's something else that is language that people don't appreciate. Language is purely generative. There's no language in nature. You don't touch language. You don't see language. Language literally comes out of everybody's head, and that's a purely generative signal. Of course, you put it on a piece of paper, it's there, but the generation, the construction, the utility of language is very, very generative.

19:29|Dr. Fei-Fei Li:

The world is far more complex than that. First of all, the real world is 3D. And if you add time, it's 4D. But let's just confine ourselves within space. It's fundamentally 3D. So that by itself is a much more combinatorially harder problem. Second, the sensing, the reception of the visual world is a projection. Whether it's your eye, your retina, or a camera, it's always collapsing 3D to 2D. And you have to appreciate how hard it is. It's mathematically ill-posed. So this is why humans and animals have multi-censors. And then you have to solve that problem.

20:18|Dr. Fei-Fei Li:

And third, the world is not purely generated. Yes, we could generate virtual 3D world. It still has to obey physics and all that. But there is also a real world out there. You are now suddenly dialing between generation and reconstruction in a very fluid way. And the user behavior, the utility, the use cases are very different. If you dial all the way to generation, we can talk about gaming and metaverse and all that. If you dial all the way to real world, we're talking about robotics and all that. But all this is on the continuum of world modeling and spatial intelligence.

21:05|Dr. Fei-Fei Li:

And of course, the elephant in the room is, there's a lot of data on the internet for language. And whereas the data for spatial intelligence, you know, it's all in our head, of course, but it's not as easily as accessible as language. So these are the reason it's so hard. But frankly, It excites me because if it's easy, somebody else has solved it. And my entire career is going after problems that are just so hard, bordering delusional. And I think this is the delusional problem. Thank you for supporting that.

21:56|Diana Hu:

And even thinking about this from first principles, the human brain has a lot more in the visual cortex and amount of neurons that process visual data as opposed to language. How does that translate into the model architectures? They're very different from LLMs, from what you're kind of finding out, right? Yeah.

22:16|Dr. Fei-Fei Li:

That's actually a really good question. I mean, there are still different schools of thoughts out there, right? There is the LLM. A lot of what we see in LLM is really writing, scaling law all the way to happy ending. And you can almost you can just brute force self-supervision all the way. Constructing world model might be a little more nuanced. The world is more structured. There might be signals that we need to use to guide it. You can call it in the shape of prior. You can call it supervision in your data, whatever it is.

22:56|Dr. Fei-Fei Li:

I think that these are some of the open questions that we have to solve, but you're right. And also if you think about human, first of all, we don't have all the answers even to human perception, right? How does 3D work in human vision is not a solved problem. We know mechanically the two eyes had to triangulate information, but even after that, where is the mathematical model? And we're not that great. Humans are not that great as 3D animals. So there is a lot that is to be answered. So we are definitely at World Lab.

23:37|Dr. Fei-Fei Li:

I'm just counting on, really counting on one thing. I'm counting on we have the smartest people in the pixel world to solve this.

23:48|Diana Hu:

Is it fair to say that what you're building at World Labs is these whole new foundation models where the output are 3D worlds? And what are some of the applications that you're envisioning? Because I think you listed everything from perception to generation. This is always this tension between generative models and discriminant models. So what would these 3D worlds do?

24:17|Dr. Fei-Fei Li:

Yeah, so I'm not going to be able to talk too much about the details of world labs per se, but in terms of spatial intelligence, that's what also excites me, just like language, the use case is so huge, from creation which you can think about designers, architects, industrial designers, as well as just artists, 3D artists, game developers, from creation all the way to robotics, robotic learning. The utility of spatial intelligence model or world models is really, really big. And then there are many related industries from marketing to entertainment to even metaverse. I'm actually really, really excited by metaverse.

25:12|Dr. Fei-Fei Li:

I know so many people are kind of still like, it's still not working. I know it's still not working. That's why I'm excited because I think the convergence of hardware and software will be coming. So that's also another great use case down the road.

25:29|Diana Hu:

I'm personally very excited that you're solving Metaverse. I gave it a try in my previous company, so I'm so excited that you're doing that now.

25:36|Dr. Fei-Fei Li:

I do think hardware is part of the hurdle, but you need content creation. Metaverse content creation needs world models.

25:50|Diana Hu:

Let's switch gears a little bit. So maybe to some of the audience, they might find your transition from going from academia to now being a founder CEO to be sudden, but you actually have the remarkable journey through your whole life. This is not your first time you've gone zero to one. You were telling me about how you immigrated to the U.S. and you didn't speak any English in your teens and even ran a laundromat for a good number of years. Tell us about how all those skills shaped who you are now.

26:25|Dr. Fei-Fei Li:

Right, I'm sure you guys are here trying to listen to how to start a laundromat. That was when you were 19, right? Yeah, I was 19 and that was out of desperation. So I I had no means of supporting my family, my parents, and I need to go to college to be a physics major at Princeton. So I started a dry cleaning shop. And in Silicon Valley language, I fundraised. I was the founder CEO. I was also the cashier and all the other things. And I exited. So after seven years. All right, you guys are very kind.

27:11|Dr. Fei-Fei Li:

I've never got claps for my laundry mat, but thank you. So, but anyway, I think Diana's point, especially to all of you, I look at you, I'm so excited for you because you're like literally half my age or even, you know, maybe 30% of my age and you're so talented. Just do it, don't be afraid. You know, all my entire career, of course I did laundry math, but even as a professor, I chose, a couple of times I chose to go to departments where I was the first computer vision professor. And that was against a lot of advice.

27:51|Dr. Fei-Fei Li:

As a young professor, you should go to a place where there's a community and senior mentors. Of course, I would love to have senior mentors, but if they're not there, I still have to trailblaze my way. So I wasn't afraid of that. And then I did go to Google to learn a lot about business in Google Cloud and B2B and all those. And then I started a startup within Stanford because around 2018, AI was not only taking over the industry, AI became a human problem. Humanity will always advance our technology, but we cannot lose our humanity.

28:31|Dr. Fei-Fei Li:

And I really care about creating a beacon of light in the progress of AI and try to imagine how AI can be human centered, how we can create AI to help humanity. So I went back to Stanford and created Human Centered AI Institute and ran that as a startup for five years. Probably some people were not too happy I ran it as a startup for five years in a university, but I was very proud of that. So in a way, I think I just love being an entrepreneur. I love the feeling of ground zero, like standing on ground zero.

29:13|Dr. Fei-Fei Li:

Forget about what you have done in the past. Forget about what others think of you. Just hunker down and build. That is my comfort zone and I just love that.

29:25|Diana Hu:

The other really cool thing about you, another On top of all the awesome things you've done, you advise a lot of legendary researchers, like Andrew Caparthi, Jim Fan, who's at NVIDIA, Jia Dang, who's your co-author for ImageNet. They all went on to have these incredible careers. What really stood out about them when they were students? Advice for the audience that you could tell, ah, this person's gonna change the field of AI, and you could tell.

29:54|Dr. Fei-Fei Li:

So first of all, I'm the lucky one. I think I owe more to my students than the other way around. They really make me a better person, better teacher, better researcher. And having worked with so many, like you said, legendary students is really the honor of my life. So they're very, very different. Some of them are just pure scientists trying to hunker down and solve a scientific problem. Some of them are industrial leaders. Some of them are the greatest disseminator of AI knowledge. But I think there is one thing that unifies them. And I would encourage every single one of you to think about this.

30:50|Dr. Fei-Fei Li:

For those founders who are hiring, this is also my hiring criteria, is I look for intellectual fearlessness. I think it doesn't matter where you come from, it doesn't matter what problem we're trying to solve, that courage, that fearlessness of embracing something hard and go about it and be all in and trying to solve that in however way you want is really a core characteristic of people who succeed. I learned this from them and I really look for young people who have that and then that as a CEO at WorldLabs, in my hiring, I look for that quality.

31:39|Diana Hu:

So you're hiring a lot for WorldLabs too. So you're looking for that same trait, right?

31:44|Dr. Fei-Fei Li:

Yes. I get permission from Diana to say that we're hiring. So yes, so We are hiring a lot. We're hiring engineering talents. We're hiring product talents. We're hiring 3D talents. We're hiring generative model talents. So if you feel you're fearless and you're passionate about solving spatial intelligence, talk to me or come to our website.

32:12|Diana Hu:

Cool. We're going to open it up for questions for the next 10 minutes.

32:18

Hi, Fei-Fei. Thank you for your talk. I'm a big, big, big fan. And yeah, so my question is, more than two decades ago, you worked on visual recognition. I want to start my PhD. What should I work on so I become a legend like you are?

32:34|Dr. Fei-Fei Li:

I want to give you a thoughtful answer because I can always say do whatever excites you. So first of all, I think AI research has changed because Because academia, if you're starting a PhD, you're in academia. Academia no longer has most of the AI resources. It's very different from my time, right? The chip, the compute and the data are kind of are really low in terms of resourcing academia. And then there are problems that industry can run a lot faster. So as a PhD student, I would recommend you to look for those north stars that are not on the collision course of problems that industry can solve better using better compute, better data, and team science.

33:27|Dr. Fei-Fei Li:

But there are some really fundamental problems that we can still identify in academia that it doesn't matter how many chips you have. You can make a lot of progress. First of all, interdisciplinary AI, to me, is a really, really exciting area in academia, especially for scientific discovery. There's just so many disciplines that can cross AI. I think that's a big area. that one could go to. On the theoretical side, I find it fascinating that the AI capability has 100% outrun theory. We don't have explainability, we don't know how to figure out the causality. There's just so much in the models we don't understand that one could push forward.

34:20|Dr. Fei-Fei Li:

And the list can go on. In computer vision, there's still representational problems that we haven't solved. And also, small data. That's another really interesting domain. And so, yeah, these are the possibilities.

34:40

Thank you so much, Fei-Fei.

34:42

Thank you, Professor Li. And congratulations again on your honorary doctorate from Yale. I was honored there to witness that moment one month ago. And my question is, in your perspective, will AGI emerge more likely as a unified, single unified model or as a multi-agent system?

35:03|Dr. Fei-Fei Li:

The way you ask this question is already two kind of definition. One definition is more theoretical, which is define AGI as if there is a IQ test that one passes that defines AGI. The other half of your question is much more utilitarian. Is it functional if it's agent-based? What tasks can it do? I struggle with this definition of AGI. To be honest, here's why. The founding fathers of AI who came together in 1956 in Dartmouth, you know, the John McCarthy and Marvin Minsky of them, they wanted to solve the problem of machines that can think.

35:49|Dr. Fei-Fei Li:

And that's a problem that Turing, Alan Turing also put forward a few years earlier, 10 years or whatever earlier than them. And that statement is not a narrow, it's not a narrow AI. It's a statement of intelligence. So I don't really know how to differentiate that funding question of AI versus this new word, AGI. To me, they're the same thing, but I get it that the industry today like to call AGI as if that's beyond AI. And I struggle with that because I feel I don't know what exactly is AGI different from AI. If we say today's AGI-ish system performs better than the narrower AI system in 80s, 70s, 90s, or whatever, I think that's right.

36:47|Dr. Fei-Fei Li:

That's just the progression of the field. But fundamentally, I think the size of AI is the size of intelligence is to create machines that can think and do things as intelligently or even more intelligently as humans. So I don't know how to define AGI. So I don't know, without defining it, I don't know if it's monolithic. If you look at the brain, it's one thing, you know, you can call it monolithic, but it does have different functionalities and you can even, there's broco area for language, there's a visual cortex, there's motor cortex. So I don't really know how to answer that question.

37:29

Hi, my name is Yashna and I just want to say thank you. I think it's really inspiring to see a woman playing a leading role in this field. And as a researcher, educator and entrepreneur, I wanted to ask what type of person do you think should pursue graduate school in this rapid rise of AI?

37:54|Dr. Fei-Fei Li:

That's a great question, and that's a question even parents ask me. I really think graduate school is the four or five years where you have burning curiosity. you're led by curiosity. And that curiosity is so strong that there's no better other place to do it. It's different from a startup, because startup is not just, you have to be a little careful. Startup cannot be just led by curiosity. Your investors will be mad at you. Startup has a more focused commercial goal, and some part of it is curiosity. But it's not just curiosity. Whereas for grass group, that curiosity to solve problem or to ask the right questions is so important that I think those going in with that intense curiosity would really enjoy the four or five years.

38:58|Dr. Fei-Fei Li:

Even if the outside world is passing by at the speed of light, you'll still be happy because you're there following that curiosity.

39:08

First, I want to say thank you for your time. Thank you for coming out to speak to us. You mentioned that open sourcing was a big part of the growth from ImageNet. And now with the recent release and growth of large language models, we've seen organizations taking different approaches with open source, which with some organizations staying fully closed source, some organizations fully releasing their entire research stack, some being somewhere in the middle, open sourcing weights or having restrictive licenses and things of that nature. So I wanted to ask, what do you think of these different approaches to open source and what do you believe the right way to go about open source as an AI company is?

39:44|Dr. Fei-Fei Li:

I think the ecosystem is healthy when there are different approaches. I'm not religious in terms of you must open source or you must close source. It depends on the company's business strategy. And for example, it's clear why Facebook meta wants to open source, right? They are right now, their business model is not selling the model yet. They're using it to grow the ecosystem so that people come to their platform. So open source makes a lot of sense, whereas another company that is really monetizing on the, even monetizing, you can think about an open source tier and a closed source tier.

40:28|Dr. Fei-Fei Li:

So I'm pretty open to that. A meta level is I think open source should be protected. I think if there is efforts of open source both in public sector like academia as well as private sector is so important. It's so important for the entrepreneurial ecosystem. It's so important for public sector that I think that should be protected. It shouldn't be penalized.

41:00

Hi, my name is Karl. I flew in from Estonia. I have a question about data. So you called very well the shift in machine learning towards data train methods with ImageNet. Now that you're working on world models, and you mentioned that we don't have this spatial data on the internet. It exists only in our heads. How are you solving this problem? What are you betting on? Are you collecting this data from the real world? Are you doing synthetic data? Do you believe in that or do you believe in good old priors? Thanks.

41:36|Dr. Fei-Fei Li:

You should join World Labs, and I'll tell you. Look, as a company, I'm not going to be able to share a lot, but I think it's important to acknowledge that we're taking a hybrid approach. It is really important to have a lot of data, but also have a lot of quality data At the end of the day, there is still garbage in, garbage out, if you're not careful with the quality of data.

42:10

We'll do one last question. Hi, Dr. Lee. My name is Annie and thank you very much for speaking with us. So in your book, The World I See, you talk the challenges you face as a immigrant girl and a woman in STEM. I'm curious to know if there's a time that you feel the moment of being a minority in the workplace and if so, how did you manage to overcome this or persuade others?

42:35|Dr. Fei-Fei Li:

Thank you for that question. I want to be very, very careful or thoughtful in answering you because we all come from different backgrounds and how each of us feel is very unique. It almost doesn't even matter what are the big categories. All of us have moments that we feel we're the minority or the only person in the room. So, of course, I felt that way. Sometimes it's based on who I am. Sometimes it's based on my idea. Sometimes it's just based on, I don't know, the color of my shirt, whatever that is. I have, but this is where I do want to encourage everybody.

43:19|Dr. Fei-Fei Li:

Maybe it is because since I was young coming to this country, I kind of have experience, it is what it is. I am an immigrant woman. I almost developed a capability to not overindex on that. I'm here just like every one of you. I'm here to learn or to do things or to create things. Thank you. That was a great answer. And I really, all of you, you're about to embark on something or in the middle of embarking on something and you're gonna have moments of weakness or strangeness or I feel this every day, especially startup life.

44:04|Dr. Fei-Fei Li:

Sometimes I'm like, oh my God, I don't know what I'm doing. Just focus on doing it. Gradient descend yourself to the optimized solution. Yeah.

44:16|Diana Hu:

All right, that's a great way to ending. Thank you, Dr. Li.