The Future Of AI - Digital Humans Enter Their Primetime

The Future Of AI - Digital Humans Enter Their Primetime

Previously I wrote about the Cyberworld that’s coming – the world of extended reality, or XR, enabled by a confluence of maturing foundational technologies, with AI a central part – computer vision and graphics, 3D reconstruction, natural language processing and more. It goes without saying that this seamless overlay of digital and real worlds will be populated by digital humans and avatars, both realistic and stylistic, driven by real humans and/or artificial intelligence. Here I am going to dive deeper into one of these foundational technologies: the creation and animation of digital humans (mainly faces).

The good news first – it won’t take us another decade to get there. 

When I started the Disney Research Laboratory back in 2008, I launched a long-term research vision to find the Holy Grail of special effects in film; i.e. to create and animate digital human faces indistinguishable from reality. We all remember some of the early digitally animated feature films, such as The Polar Express or Final Fantasy, and the heroic efforts to bring digital faces to life. However, the looks were artificial, shark-like, dead faces with no expressiveness or emotion. 

The reason why human faces are so notoriously difficult to model lies in the Uncanny Valley effect, the ability of our brain to perceive, read and differentiate human facial (e)motion. Evolution has built this ability into us, since reading facial expressions from a distance and distinguishing a friendly from a hostile face was (and still is) essential to our survival. As such, even the tiniest imperfection in a digital facial model or animation triggers an alert function in our brain and induces a feeling of alienation. This is exactly what we all experience when looking at digital characters in early films or computer games. For stylized and non-human faces, this effect is less important – a phenomenon used heavily in the computer-animated film industry; for instance, Toy Story.

The design and development of characters that convey emotional depth and bond with us are at the heart of visual storytelling in both film and interpersonal communication, and believable digital human faces are the centrepiece. This is why we made this topic the major focus of our research at Disney and – thanks to the ingenuity of my co-workers and students, such as Thabo Beeler and Derek Bradley, and the many other brilliant, talented individuals in our laboratory, and to the commitment of Disney as a company with Ed Catmull, our research father and longtime supporter – we have made it through.

In fact, our research at ETH on digital faces goes back to the early 90s. With our initial focus on facial surgery, we gained significant experience of facial geometry and proportion, expression, aesthetics and the notion of beauty. It led to the foundation of our current research into the digital human face, and has allowed us to get years ahead of other research labs working in the same field.

The creation of a digital human face indistinguishable from reality entails a number of individual hard-research challenges. First, we need a highly accurate three-dimensional capture of the facial micro-geometry and motion, in particular around the mouth. Second, we need a realistic model of the human eye and the effect of its motion on facial performance. Third, we need to model human hair, teeth, tongue, neck and, most importantly, the appearance and texture of human skin. The interplay of light with skin and teeth is fairly complex, as light penetrates skin and creates an effect of translucency and silkiness. If those values are missing, the face will appear plastic and toy-like. Finally, a tremendous amount of manual artistic work is needed to make a digital face look like Thanos or Hulk

But after 10 years of focused research, plus an Academy Award in 2019, for the performance capture system name Medusa, we can safely say that we have made it. Starting with the film Maleficent and up to the most recent Hollywood films, including The Irishman, Avengers Endgame and The Rise of Skywalker, we have lucidly demonstrated that the Uncanny Valley has passed.

For a variety of reasons, the human facial technologies developed by us for cinematic special effects cannot be transferred straight into the realm of the Cyberworld. Let me sketch the three main reasons:

  • Model Creation: In our capture setups, we have a highly controlled stage with multiple, high-resolution cameras and controlled lighting. This is in stark contrast to most XR applications, where the digital facial model with all its features has to be created quickly with a mobile device or laptop at best.
  • Motion Capture: When the digital face is created, we have to bring it to life. For a human avatar, the model will be ‘driven’ in realtime by the facial motion of its real counterpart. How do we capture such motion in sufficient detail with a simple pair of glasses on our nose?
  • Realtime Graphics: The digital imagery and realistic graphics have to be created in realtime, possibly on the device itself. For film, we render our digital content offline on large compute farms with virtually unlimited resources. 

Well, as noted, the great news is that it won’t take us another 10 years and the secret is data combined with machine (deep) learning. Imagine that we build a very large collection of highly realistic, special effects-quality facial models and animations of a sufficiently large variety of people of different ethnicities, age groups, body shapes, etc. Deep learning has proven its ability to extract the most important features from such data and to learn how to recreate a highly accurate three-dimensional model of a face from a few still photos. Similarly, these frameworks can learn and understand what makes a facial performance look believable versus uncanny. And finally, the plethora of deepfakes has demonstrated that deep learning is able to recreate a realistic image of the human face without the need of a complex three-dimensional model of the face and its appearance. 

Let me give you a few examples:

· Startup company Soul Machines focuses on AI-driven realistic digital human avatars

· Our former student and USC professor Hao Lee’s startup company Pinscreen enables the rapid creation of 3D digital avatars of oneself

· Samsung’s NEON project aims at artificial humans for interactive applications

· Doug Roble, a well-known special effects expert, has a very good TED presentation on the topic 

Much more can be discovered by searching for “digital humans” as a keyword online. Although I have not talked here yet about the rest of the human body, some brilliant research aims at 3D full body capture; for instance, from Michael Black and Christian Theobalt, both at Max-Planck.

All this makes me very confident that realtime, believable digital human technology is close to its primetime and will have matured by the time the Cyberspace kicks off. 

I just can’t wait to build my own avatar.


https://www.youtube.com/watch?v=S9hgvgDlilU&t=10s

Sam Truong

Founder, CTO at AiCommerce

3y

we can do digital humans asian shape and voice :-)

Wie
Antwort
Pruthvi Fernandes

Social Media Marketing Manager | Digital Marketing Operations | Human Resource | Entrepreneur | ‘CEO’ Good at Hiring | Graduate Second Class Honors UOW

4y

Only 26% of the internet population speaks English. So honestly, if your website (or chatbot) is only built for English speakers, you’re holding yourself back. See how going multilingual can help you (and find out how to build a bot in over 50 languages). https://www.engati.com/blog/multilingual-chatbots-conversational-artificial-intelligence

Wie
Antwort
Marie J.

Author 'Nadia' | Co-creator Nadia AI I Digital Human Cardiac Coach I Global AI Leader | Co-Design for AI © | AFR Top 100 Influential Women | CIO | US O-1 Visa | Inventor | Not Quiet |

4y

Phenomenal interview Markus Gross. Brilliant. Indeed, it will be co-design and content that will evolve “avatars” into domain rich digital humans - and I believe that will be the breakthrough.

Markus Gross

Swiss German Computer Scientist, Chief Scientist of The Walt Disney Studios, Expert in Graphics, Vision and AI

4y

Thanks Martin and Jos, there is really a huge potential here to leverage all of our collective experience in the special effects industry - not only technically, but also creatively and artistically - to bring digital humans to life in realtime. We will need help collectively from the most brilliant peers in academia and Corporate Reseach.

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics