MIT 6.S191: Towards AI for 3D Content Creation

Machine generated transcript…

Great yeah, thanks for a nice introduction, um im gon na talk to you about 3d content creation and particularly deep learning techniques to facilitate 3d content creation. Most of the work im going to talk about is the work um ive been doing with my group at nvidia and the collaborators, but its going to be a little bit of my work at ufc as well. All right, so you know you guys, i think

This is a deep learning class right, so you heard all about how ai has made you know so much progress in the last. Maybe a decade almost but computer graphics actually was revolutionized as well. With you know many new rendering techniques or faster rendering techniques, but also by working together with ai. So this is a latest video that johnson introduced a couple of months ago quietly. So this is all done all this rendering that you’re seeing is

Done uh real time, its basically rendered in front of your eyes – and you know compared to the traditional game – you’re used to maybe real time in gaming, but here there’s no baked lights, there’s, no brake light. Everything is computed online physics, real time retracing lighting. Everything is done online. What you’re seeing here is rendered in something called omniverse. Is this visualization a collaboration software that nvidia has just recently released, so you guys should check it out its really awesome. All right, oops these lights always get stuck.

Yeah so when i joined nvidia, this was two years and a half ago it was actually the orc that im in uh was creating the software called omniverse, the one that i just showed, and i got so excited about it, and you know i wanted to somehow Contribute in this space so somehow introduce ai in in into this content, creation and graphics, pipeline and and 3d content is really everywhere.

And graphics is really in a lot of domains right so in architecture you know, designers would create office spaces apartments, whatever everything would be done uh. You know you know in some some modeling software with computer graphics right such that you can judge whether you like some space before you go out and build it. All modern games are all like

Heavy 3d um in film there’s a lot of computer graphics, in fact, because directors just want too much out of characters or humans, so you just need to have them all done with computer, graphics and animate in realistic ways. Now that we are all home, you know, vr is super popular right. Everyone wants a tiger in the room or have a 3d character version 3d avatar of yourself and so on. Theres also robotics. So

Healthcare and robotics there’s actually also a lot of computer, graphics and and in these areas, and these are the areas that im particularly excited about and and why is that um its actually for simulation so before you can deploy any kind of robotic system in the real World you need to test it in a simulated environment. All right. You need to test it against all sorts of challenging scenarios on healthcare. For you know, robotic surgery, robotics are driving cars. You know warehouse, robots and and stuff like that. Im going to show you this uh

Simulator called drive sim that nvidia has been developing um and this is uh. This video is a couple of years old, now its not a lot better than this um, but basically simulation is kind of like a game, its really a game engine for robots. Where now you expose a lot more out of the game engine, you want to have the creator, the roboticist some control over the

environment right you want to decide how many cars you’re going to put in there whats going to be the weather night or day and so on so this gives you some control over the scenarios you’re going to test against but the nice thing about you know having this computer graphics pipeline is um everything is kind of labeled in 3d you already have created a 3d model of car you know its a car and you know the parts of the car is you.

know something is a lane and so on and instead of just rendering the picture you can also render you know grand truth for ai to both train on and be tested against right so you can get ground truth lanes crunch with weather ground truth segmentation all that stuff that’s super hard to collect in the real world okay um my kind of goal would be you know if we if we want to think about all.

these applications and particular robotics you know can we simulate the world in some way can we just load up a model like this which looks maybe good from far but we want to create really good content at street level and you know both assets as well as behaviors and just make these virtual cities alive so that we can you know test our robot inside this all right so it turns out that actually is super slow let me play this require.

Significant human effort, here we see a person creating a scene aligned with a given real world image the artist places scene elements. Edits their poses textures as well as scene or global properties such as weather lighting, camera position. This process ended up taking four hours for this particular scene, so here the artist already had the assets, you know bottom online or whatever, and the only goal was to kind of recreate the scene.

above and it already took four hours right so this is really really slow and i dont know whether you guys are familiar with you know games like grand theft auto that was an effort by a thousand engineers a thousand people working for three years um basically recreating la los angeles um going around around the city and taking tons of photographs you know.

250 000 photographs many hours of footage anything that would give them you know an idea of what they need to replicate in the real world all right so this is where ai can help you know we know computer vision we know deep learning can we actually just take some footage and recreate these cities both in terms of the construction the the assets as well as behavior so that.

we can simulate all this content or this live content all right so this is kind of my idea what we need to create and i really hope that some guys you know some of you guys are are going to be equally excited about these topics and im going to work on this so i believe that we we need ai in this particular area so we need to.

be able to synthesize worlds which means both you know scene layouts uh you know where am i placing these different objects maybe map of the world um assets so we need some way of creating assets like you know cars people and so on in some scalable way so we dont need artists to create this content very slowly as well as you know dynamic dynamic.

parts of the world so scenarios you know which means i need to be able to have really good behavior for everyone right how am i going to drive as well as you know animation which means that the human or any articulated objective animate needs to look realistic okay a lot of this stuff you know its already done there for any game the artists and engineers need to do that what im saying is can we have ai to do this much much better much faster all right so you know what im going to.

talk about today is kind of like our humble beginning so this was this is the main topic of my um you know toronto nvidia lab and and im gonna tell you a little bit about all these different topics that we have been slowly addressing but there’s just so much more to do okay so the first thing we want to tackle is can we synthesize worlds by just maybe looking at real footage that we can collect lets say from a self-driving platform so can we take those videos and and you know train some sort of a generative model is.

going to generate scenes that look like the real city that you know we want to drive in so if im in toronto i might need brick walls if im in la i just need many more streets like i need to somehow personalize this content based on the part of the world that im gonna be in okay if you guys have any questions just write them up i i like if the uh lecture is interactive all right so how can we compose scenes and our thinking was really kind of looking into how games.

are built right in games you know people need to create very diverse levels so they need to create in a very scalable way very large walls and one way to do that is using some procedural models right or probabilistic grammar which basically tells you you know rules about how the scene uh is created such that it looks like a valid scene so in this particular case and i would i would sample a road right with some.

number of lanes and then on each lane you know sample some number of cars and maybe there’s a sidewalk next to a lane with maybe people on walking there and there’s trees or something like that right so this this this probabilistic models can be fairly complicated you can quickly imagine how this can become complicated but at the same time its not so hard to actually write this anyone could would be able to write a bunch of rules about how to create this content.

okay so its not its not too tough but the tough is to really the the tough part is you know setting all these distributions here and you know such that the render scenes are really going to look like your target content right meaning that if im in toronto maybe i want to have more cars if im in a small village somewhere im.

going to have less cars so for all that i need to go and you know kind of personalize these models set the distributions correctly so this is just some one example of you know sampling from a probabilistic model here the uh the probabilities for the orientations of the cars become randomly set but there’s so much the scene already looks kind of kind of okay right because it already incorporates all the rules that we know about the world and the model will be needing to to training all right so you can think.

of this as some sort of a graph right where each node defines the type of asset we want to place and then we also have attributes meaning we need to have location height pose anything that is necessary to actually place this car in the scene and render it okay and and this this these things are typically said by an artist right they they look at the real data and then they decide you know.

how many pickup trucks im going to have in the city or so on all right so basically they said this distribution by hand what were saying is can we actually learn this distribution by just looking at data okay and we had this paper column metasim a couple of years ago where the the idea was lets assume that the structure of the scenes that im sampling so in this particular case apps you know how many lanes i have how many cars i have that comes from some distribution that.

artist has already designed so the the graphs are going to be correct but the attributes um should be modified so if i sample this original scene graph from that i can render like you saw that example before the cars were kind of randomly rotated and so on the idea is can a neural network now modify the attributes of these nodes modify the rotations the colors maybe even type of object such that when i render those those.

scene graphs i get images that look like real images that i have recorded in distribution so we dont want to go after exact replica of each scene we want to be able to train a generative model its going to synthesize images that are going to look like images we have recorded that’s the target okay so basically we have some sort of a graph neural network that’s operating on scene graphs and its trying to repredict attributes for each node i dont know whether you guys talked.

about graph neural nets and then the loss that’s coming out is through this renderer here and were using something called maximum indiscreptency so im not going to go into details but basically the idea is you could you need to compare two different distributions you could compare them by you know comparing the means of the two distributions or maybe higher order moments and mmd was designed to to compare higher order moments okay now this last can be back prop through this non-differentiable renderer back to.

graph neural net okay and we just use numerical gradients to do this step and the cool part about this is we havent really needed any sort of annotation on the image were comparing images directly because were assuming that the image the synthesized images already look pretty good all right so we actually dont need data we just need to drive around and record these things okay you can do something even cooler.

you can actually try to personalize this data to the task you’re trying to solve later which means that you can train this network to generate data that if you train some other neural net on top of this data its an object detector its going to really do well on you know whatever task you have in the end collected in the real world okay which might not mean that the.

object need to look really good in the scene you just might it just means that you need to generate scenes that are going to be useful for some network that you want to train on that data okay and that you you again back prop this and you can do this with reinforcement learning okay so this was now training the distribution for the attributes we were kind of the easy part and we were sidestepping the issue of well what about the structure of these graphs meaning if i had always generated you.

know five or eight or ten cars in a scene but now im in a village i will just not train anything very useful right so the idea would be can we learn the structure the number of lanes the number of cars and so on as well okay and and it turns out that actually you can do this as well where here we had a probabilistic context free grammar which basically means you have a you have a root note you have some symbols and which can be non-terminal terminal.

symbols and rules that they that basically expand non-terminal symbols into new symbols so an example would be here right so you have a road which you know generates lanes lanes can go into lane or more lanes right and so on so these are the the rules okay and basically what we want to do is we want to train a network that’s going to learn to sample from this probably the context-free grammar okay so were going to have some sort of a latent vector.

here we know where we are in the tree that or the graph weve already generated before so imagine we are in in we have sampled some lane or whatever so we now we know the the corresponding symbols that we can actually sample from here we can use that to mask the probabilities for everything else out all right and our network is basically gonna learn how to produce the correct.

probabilities for the next symbol we should be sampling okay so basically at each step im going to sample a new rule until i hit all the terminal symbols okay that basically gives me something like that these are the sample the rules in this case which can be converted to a graph and then using the previous method we can you know augment this graph with attributes and then we can render the.

scene okay so basically now we are also learning how to generate um the the the actual scenario the actual structure of the same graph and the attributes and and this is super hard to train so there’s a lot of bells and whistles to make this to work but essentially because this is all non-differentiable steps you need something like reinforcement learning and and there’s a lot of tricks to.

actually release to work but i was super surprised how well you this can actually turn out so on the right side you see samples from the real data set uh kitty is like a real driving data set on the left side is samples from probabilistic grammar here weve set this first probabilities manually and we purposely made it really bad which means that this probably is the.

grammar when you sample you got really few cars almost no buildings and you can see this is like almost not not populated scenes after training you the generative model learned how to sample this kind of scenes because they were much closer to the real target data so these were the final trained things okay and now how can you actually evaluate that we have done something reasonable here you can look at for example the distribution of cars in the real real data set this is kitty over here so here you will have a histogram of how.

many cars you have in each scene um you have this orange guy here which is the prior meaning this badly initialized prolistic grammar where we only were sampling most of the time very few cars and then the learned model which is the green the line here so you can see that the generated scenes really really closely follow this distribution of the real data without any single annotation at hand right now you guys could argue well its super easy to write you know these distributions by hand and and were done with it i think.

there’s just this just shows that this can work and the next step would just be make this really large scale make this you know really huge probabilistic models where its hard to tune all these parameters by hand and the cool part is that because everything can be trained now automatically from real data no any end user can just take this and its going to train on their end they know they dont need to go and.

set all this stuff by hand okay now the next question is you know how can i evaluate that my model is actually doing something reasonable and one one way to do that is by actually sampling from this model synthesizing these images along with gran truth and then train some some you know end model like a detector on top of this data and testing it on the real data.

and and just seeing whether the performance has somehow improved um compared to you know lets say on that badly initialized um probabilistic grammar and it turns out that that’s that’s the case okay now this was the example shown on driving but oh sorry so so this model is is just here im just showing basically whats happening during training let me just go quickly so the first snapshot is the first sample from the model and then what you’re seeing is how this.

model is actually training so how is modifying the scene during training let me show you one one more time so you can see the first frame was really kind of badly placed cars and then its slowly trying to figure out where to place them and to be correct and of course this generative model right so you can sample tons of scenes and everything comes labeled cool right um this model here was shown on on on driving but you can also apply it.

everywhere else like in other domains and here you know medical or healthcare now is very um you know important in particular these days when everyone is stuck at home um so you know can you use something like this to also synthesize medical data and what do i mean by that right so doctors need to take you know city or mr mri volumes and go and label every single slice of that with you know lets say a segmentation mask such as they can then train like a you.

know cancer segmentation or card segmentation or lung segmentation kobe detection whatever right so first of all data is very hard to come by right because in some diseases you just dont have a lot of this data the second part is that its actually super time consuming and you need experts to label that data so in the medical domain its really important if we can actually somehow learn how to synthesize this data label data so that we can kind of augment the real data sets with that okay and the model here is going to be.

very simple again you know we have some generative model lets go from a latent codes to some parameters of a of a mesh in this case this is our asset within a material map and then uh we synthesize this with a physically based um ct simulator uh which you know looks a little bit blurry and then we train a enhancement model with.

something like again and then you get simulated data out obviously again there is a lot of belts and whistles but you know you can get really nice looking synthesized volumes so here the users can actually play with the shape of the heart and then they can click synthesize data and you get some some labeled volumes out where the label is basically the stuff on the left and this is the simulated sensor in this case okay all right so now we talked about.

using procedural models to generate to generate worlds and of course the question is well do we need to write all those rules can we just learn how to recover all those rules and here was our first take on this um and here we wanted to generate or learn how to generate city road layouts okay which means you know we want to be able to generate.

something like that where you know the lines over here representing roads okay this is the base of any city and we want to again have some control over these worlds were going to have something like interactive generation i want this part to look like cambridge its parked to look like new york inspired to look like uh toronto whatever and we want to be able to generate or synthesize everything else you know according to these styles okay you can interpret road layout as a graph okay so what does that mean i.

have some control points and two control points being connected means i have a road line segment between them so really the problem that were trying to solve here is can we have a neural net generate graphs graphs with attributes where each attribute might be an x y location of a control point okay and again giant graph because this is an entire city we want to generate um so we had actually a very simple model where you’re kind of iteratively generating.

this graph and imagine that we have already you know generated some part of the graph what were going to do is take an a node from from like an unfinished set what we call we encode every path that we have already synthesized and leads to this node which basically means we wanna we wanna kind of encode how this node already looks like what are the roads that its connecting to and we want to generate the remaining nodes basically how these roads continue.

in this case okay and this was super simple you just have like r and ns encoding each of these paths and one rnand that’s decoding these neighbors okay and you stop where basically you hit some predefined size of the city okay let me show you some some results so here you can condition on the style of the city so you can generate barcelona or berkeley you can have this control or you can condition on part of the city being.

certain style and you can use the same model the generative model to also parse real maps or real aerial images and create and create variations of those maps for something like simulation because for simulation we need to be robust uh to the actual layouts so now you can turn that graph into an actual small city where you can maybe procedurally generate the rest of the content like we.

were discussing before where the houses are where the traffic signs are and so on cool right so now we can generate you know the map of the city um we can place some objects somewhere in the city so were kind of close to our goal of synthesizing worlds but were still missing objects objects are still a pain that the artists need.

to create right so all this content needs to be manually designed and that just takes a lot of time to do right and maybe its already available you guys are going to argue that you know for cars you can just go online and pay for this stuff i first of all its expensive and second of all its not really so widely available for certain classes like if i want a raccoon because im in toronto there’s just tons of them there’s just a couple of.

them and they dont really look like real raccoons right so the question is can we actually do this all these tasks by taking just pictures and synthesizing this content from pictures right so ideally we would have um something like an image and we want to produce out you know a 3d model 3d texture model right there can i then insert in my real scenes and ideally we want to do this on.

just images that are widely available on the web right i think the new iphones all have lidar so maybe this world is going to change because everyone is going to be taking 3d pictures right with some 3d sensor but right now the majority of pictures that are available of objects on flickr lets say its all single images people just snapshotting a scene or snapshotting on a particular object so the question is you know how can we learn from all the data and go from an image on the left to.

a 3d model and in our case were going to want to produce as an output from the image and mesh which basically has you know location of vertices xyz and you know some color material properties on each vertex right and 3d vertices along with faces which means which vertices are connected that’s basically defining this 3d object okay and now were going to turn to.

graphics to help us with our goal to do this from you know the kind of without supervision learning from the web okay and in graphics we know that images are formed by geometry interacting with light right that’s just principle of rendering okay so we know that you can you you if you have a mesh if you have some light source or sources and you have a texture and also materials and so on which im not writing out here and some graphics renderer you know.

there’s many issues choose from you get out a rendered image okay now if we make this part differentiable if you make the graphics renderer differentiable then maybe there is hope of going the other way right you can think of computer vision being inverse graphics graphics is going for 3d to images computer vision wants to go from images into 3d and if this model is differentiable maybe there’s hope of doing that okay so there’s been quite a lot of work lately on.

basically this kind of a pipeline with different modifications um but basically this summarizes the the ongoing work where you have an image you have some sort of a neural net that you want to train and you’re making this kind of button like predictions here which smash light texture maybe material okay now instead of having the loss over here because you dont have it you dont have the ground truth mesh for this car because you otherwise you need to annotate it what were going to do instead is were going to send these predictions.

over to this renderer which is going to render an image and were going to have the loss defined on the rendered image and the input image were basically going to try to make these images to match okay and of course there’s a lot of other losses that people use here like multi-view alloys you’re assuming that in training you have multiple pictures multiple views of the same objects you have masks and so on so there’s a lot of bells and whistles how to really make this pipeline work but in principle its a very clean idea.

right where we want to predict these properties i have this graphics renderer and now im just comparing input and output and because this is this render is differentiable i can propagate these slots back to all my desired you know neural light weights so i can predict this these properties okay now we in particularly had a very simple like opengl type render which we made differentiable there’s also versions where you can make rich racing differentiable and so on but basically the idea that we employed was super simple.

right a mesh is basically projected onto an image and you get out triangles and each pixel is basically just a butter centric interpolation of the vertices of this projected triangle and now if you have any properties defined of those vertices like color or you know texture and so on then you can compute this value here through your you know renderer that assumes some lighting or so on um.

in a differentiable manner using this percentage coordinates this is a differential function and you can just go back through whatever lighting or whatever um shader model you’re using okay um so very simple and there’s you know much much richer models that are available richer differentiable renders available these days but here we try to be a little bit clever as well with respect to data because most of the related work was taking synthetic data.

to train their model why because most of the work needed multi-view data during training which means i have to have multiple pictures from multiple different views of the same object and that is hard to get from just web data right its hard to get so people will just basically taking synthetic cars from synthetic data sets and rendering in different views and.

then training the model which really just maybe makes a problem not so interesting because now we are actually relying on synthetic data to solve this and the question is how can we get data and and and we try to be a little bit clever here and we turn to generative models of images i dont know whether you guys cover in class uh you know image gans but if you take something like stylogen which is uh you know generative adversarial network designed to really produce high quality images by by sampling from some some.

prior you get really amazing pictures out like all these images have been synthesized none of this is real this is all synthetic okay you know these guys basically what they do is you have some latent code and then there’s a you know some nice progressive architecture that slowly transforms that latent code into an actual image okay what happens is that if you start analyzing this this latent code or i guess im going to talk about this one if you take certain dimensions of that code and you try and you freeze them okay.

and you just manipulate the rest of the code it turns out that you can find really interesting controllers inside this latent code basically the gun has has learned about the 3d world and its just hidden in that latent code okay what do i mean by that so you can find some latent dimensions that basically control the viewpoint and the rest of the code is kind of.

controlling the content meaning the type of car and the viewpoint means the viewpoint of that car okay so if i look at it here we basically varied the viewpoint code and kept the this content called the rest of the code frozen and and this is just basically synthesized and the cool part is that it actually looks like you know multiple views of the same object.

its not perfect like this guy the third the third object in the top row doesnt look exactly matched but most of them look like the same car in different views and the other side also also holds so if i keep a content like a viewpoint code fixed in each of these columns but they vary the the content code meaning different rows here i can actually get different cars in.

each viewpoint okay so this is basically again synthesized and that’s precisely the data we need so we didnt do anything super special to our technique the only thing we were smart about was how we got the data and and no now now you can use these data to train our you know differentiable rendering pipeline and you got you know predictions like this you have an input image and a bunch of 3d predictions but also now we can do cars so the input image on the left and then.

the 3d prediction rendered in that same viewpoint here in this column and that’s that prediction rendered in multiple different viewpoints just to showcase the 3d nature of the predictions and now we basically have this tool that can take any image and produce a 3d asset so we can have tons and tons of cars by just basically taking pictures okay here is a little demo in that omniverse tool where the user can now take a picture.

of the car and get out the 3d model notice that we also estimate materials because you can see the windshields are a little bit transparent and the car body looks like its shiny so its metal because were also predicting 3d parts and you know its not perfect but theyre pretty good and now just uh you know a month ago we have a new version that can also animate this prediction so you can take an image predict this guard this guy and we can just put you know tires.

instead of the predicted tires you can estimate physics and you can drive these cars around so they actually become useful assets this is only on cars now but of course the system is general so were gonna were in the process of applying it to sorts of different content cool i think i dont know how much more time i have so maybe im just gonna skip today and i.

have always too much slides um so i have all these behaviors and whatever and i wanted to show you just the last project that we did because i think you guys gave me only 40 minutes um so you know i we also have done some work on animation using reinforcement learning and behavior that you know maybe i skipped here but we basically are building modular deep learning blocks for all the different aspects and the question is can we can we even sidestep all that can we just learn how to simulate data everything with one neural net and we.

and were going to call it neural simulation so can we have one ai model that can just look at our interaction with the world and then be able to simulate that okay so you know in computer games we know that you know they accept some user action left right keyboard control or whatever and then the computer engine is basically synthesizing the next frame which is going to tell us you know how.

how the world has changed according to your action okay so what were trying to attempt here is to replace the game engine with a neural net which means that we still want to have the interactive part of the of the game where the user is going to be inputting actions gonna be playing but the screens are going to be synthesized by a neural net which basically means that you know this neural net needs to learn how the world works right if i hit into a car it needs to you know produce a frame that’s gonna.

look like that okay now in the beginning our first project was can we just learn how to emulate a game engine right can we take a pac-man and try to mimic it try to see if the neural net can can learn how to mimic pac-man but of course the interesting part is going to start where we dont have access to the game engine like the world right you can think of.

the world as being the matrix where we dont have access to the matrix but we still want to learn how to simulate and emulate the matrix um and that’s really exciting future work but basically we have you know now were just kind of trying to mimic how what what a game engine does where you’re inputting some you know action and maybe the previous frame.

and then youll have something called dynamics engine which is basically just an ls stand that’s trying to learn how the dynamics in the world looks like how how frames change uh we have a rendering engine that takes that latent code is going to actually produce a nice looking image and we also have some memory which allows us to push any information that we want to be able to consistently produce you know the consistent gameplay uh in some some additional.

block here okay and and heres here was like our first result on pac-man and release this on the 40th birthday of batman what you see over here is all synthesized and the to me is even if its such a simple simple game its actually not so easy because you know the neural net needs to learn that uh pac-man if it eats the food the food needs to disappear if the ghost can become blue and then if you eat a.

blue ghost you survive otherwise you die so there’s already a lot of different rules that you need to recover along with just like synthesizing images right and of course our next step is can we can we scale this up can we go to 3d games and can we eventually go to the real world okay so again here the control is going to be the steering control so like speed and the steering wheel this is done by the user by a human and what you see on the right side is you know the frames painted.

by by game gun by this model so here were driving this car around and you can see what what the model is painting is is a pretty consistent world in fact and there’s no 3d there’s no nothing were basically just synthesizing frames and here is a little bit more complicated version where um we try to synthesize other cars as well and this is on a carla simulator that.

was the game engine were trying to emulate its not perfect like you can see that the cars actually change color and resume its quite amazing that its able to do that entirely um and right now we have a version actually training on the real driving videos like a thousand hours of real driving and its actually doing an amazing job already and you know so i think this could be a really good alternative on to the rest of the pipeline all right you know one thing to realize.

when you’re doing something that’s so broad and such a big problem is that you’re never gonna solve it alone you’re never gonna solve it alone so one one mission that i have is also to provide tools to community such that you know you guys can take it and build your own ideas and build your own 3d content generation methods okay so we just recently released 3d deep learning is an exciting new.

frontier but it hasnt been easy adapting neural networks to this domain cowlin is a suite of tools for 3d deep learning including a pi torch library and an omniverse application cowlins gpu optimized operations and interactive capabilities bring much needed tools to help accelerate research in this field for example you can visualize your models predictions as its training in addition to textured meshes you can view predicted point clouds and voxel grids with only two lines of.

code you can also sample and inspect your favorite data set easily convert between meshes point clouds and voxel grids render 3d data sets with ground truth labels to train your models and build powerful new applications that bridge the gap between images and 3d using a flexible and modular differentiable renderer and there’s more to come including the ability to visualize remote training.

checkpoints in a web browser dont miss these exciting advancements in 3d deep learning research and how cowlin will soon expand to even more applications yeah so a lot of the stuff i talked about all the basic tooling is available so you know please take it and do something amazing with it im really excited about that just to conclude you know my goal is to really become democratized 3d content creation you know i want my mom to be able to create really good 3d models and she has no idea even how to use microsoft word or.

whatever so it needs to be super simple um have ai tools that are going to be able to also assist maybe more advanced users like artists game developers but just you know reduce the load of the boring stuff just enable their creativity to just come to play much faster than it can right now um and all that is also connecting to learning to simulate for robotics simulation is just a fancy game engine that needs to be real as opposed to being.

from fan fantasy but it can be really really useful for robotics applications right and what we have here is really just like two years and a half of our lab but you know there’s so much more to do and im really hoping that you guys are gonna do this i just wanted to finish with one slide because you guys are students um my advice for for research um you know just learn learn learn this deep learning course is one dont stop here continue.

um one in very important aspect is just be passionate about your work and never lose that passion because that’s where you’re really going to be productive and you’re really going to do good stuff if you’re not excited about what the research you’re doing though you know choose something else through something else dont rush for papers focus on getting really good papers as opposed to the number of papers that’s not a good metric right hunting citations maybe also not the best metrics right some.

some not so good papers have a lot of citations some good papers dont have a lot of citations you’re going to be known for the good work that you do um find collaborations find collaborators and that’s particularly kind of in my style of research i want to solve real problems i want to solve problems which means that how to solve it is not clear and sometimes we need to go to physics.

sometimes we need to go to graphics sometimes we need to go to nlp whatever and i have no idea about some of those domains and you just want to learn from experts so its really good to find collaborators and the last point which you know i have always used as guidance its very easy to get frustrated because of the time things wont work but just remember to have fun um because research is really fun and that’s all from me whether you guys have some questions


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *