VCAI: VCAI-ASSETS

UMA: Ultra-detailed Human Avatars via Multi-level Surface Alignment

UMA introduces a new dataset featuring multi-view 6K video recordings, capturing subjects wearing clothing with challenging texture patterns and rich dynamics. The fidelity of the reconstructed avatars makes them particularly suitable for virtual and mixed reality, where users can closely observe fine-grained appearance details.

Dataset Details

Relightable Holoported Characters: Capturing and Relighting Dynamic Human Performance from Sparse Views

We present Relightable Holoported Characters (RHC), a novel person-specific method for free-view rendering and relighting of full-body and highly dynamic humans solely observed from sparse-view RGB videos at inference...

Dataset Details

OLATverse: A Large-scale Real-world Object Dataset with Precise Lighting Control

We introduce OLATverse, a large-scale real-world dataset comprising over 9M images of 765 objects, captured from multiple viewpoints under a diverse set of precisely controlled lighting conditions. While recent advances in object-centric inverse rendering, novel view synthesis and relighting have demonstrated promising results, most...

Dataset Details

EgoAvatar: Egocentric View-Driven and Photorealistic Full-body Avatars

We first present a character model that is animatible, i.e. can be solely driven by skeletal motion, while being capable of modeling geometry and appearance. Then, we introduce a personalized egocentric motion capture component, which recovers full-body motion from an egocentric video...

Dataset Details

Real-time Free-view Human Rendering from Sparse-view RGB Videos using Double Unprojected Textures

We propose Double Unprojected Textures (DUT), a new method to synthesize photoreal 4K novel-view renderings in real-time. Our method consistently beats baseline approaches in terms of rendering quality and inference speed. Moreover, it generalizes to, both, in-distribution (IND) motions, i.e. dancing, and out-of-distribution (OOD) motions, i.e. standing long jump...

Dataset Details

TriHuman: A Real-time and Controllable Tri-plane Representation for Detailed Human Geometry and Appearance Synthesis

Creating controllable, photorealistic, and geometrically detailed digital doubles of real humans solely from video data is a key challenge in Computer Graphics and Vision, especially when real-time performance is required. Recent methods attach a neural radiance field (NeRF) to an articulated structure, e.g., a body model...

Dataset Details

3DPR: Single Image 3D Portrait Relighting with Generative Priors

Rendering novel, relit views of a human head, given a monocular portrait image as input, is an inherently underconstrained problem. The traditional graphics solution is to explicitly decompose the input image into geometry, material and lighting via differentiable rendering...

Dataset Details

HumanOLAT: A Large-Scale Dataset for Full-Body Human Relighting and Novel-View Synthesis

Simultaneous relighting and novel-view rendering of digital human representations is an important yet challenging task with numerous applications. We introduce the HumanOLAT dataset, the first publicly accessible large-scale dataset providing multi-view One-Light-at-A-Time (OLAT) captures of full-body humans...

Dataset Details

Relightable Neural Actor with Intrinsic Decomposition and Pose Control

Creating a controllable and relightable digital avatar from multi-view video with fixed illumination is a very challenging problem since humans are highly articulated, creating pose-dependent appearance effects...

Dataset Details

MetaCap: Meta-learning Priors from Multi-View Imagery for Sparse-view Human Performance Capture and Rendering

Faithful human performance capture and free-view render- ing from sparse RGB observations is a long-standing problem in Vision and Graphics. The main challenges are the lack of observations and the inherent ambiguities of the setting...

Dataset Details

ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering

Real-time rendering of photorealistic and controllable human avatars stands as a cornerstone in Computer Vision and Graphics. While recent advances in neural implicit rendering have unlocked unprecedented photorealism for digital avatars...

Software Details

Holoported Characters: Real-time Free-viewpoint Rendering of Humans from Sparse RGB Cameras

We present the first approach to render highly realistic free-viewpoint videos of a human actor in general apparel, from sparse multi-view recording to display, in real-time at an unprecedented 4K resolution...

Software Details

Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control

We propose Neural Actor (NA), a new method for high-quality synthesis of humans from arbitrary viewpoints and under arbitrary controllable poses. Our method is built upon recent neural scene representation...

Software Details

NeRF-OSR: Neural Radiance Fields for Outdoor Scene Relighting

Photorealistic editing of outdoor scenes from photographs requires a profound understanding of the image formation process and an accurate estimation of the scene geometry, reflectance and illumination...

Software Details

EventHands: Real-Time Neural 3D Hand Reconstruction from an Event Stream

3D hand pose estimation from monocular videos is a long-standing and challenging problem, which is now seeing a strong upturn. In this work, we address it for the first time using a single event camera...

Software Details

PhysCap: Physically Plausible Monocular 3D Motion Capture in Real Time

Marker-less 3D human motion capture from a single colour camera has seen significant progress. However, captured 3D poses are often physically incorrect and biomechanically implausible...

Software Details

Phi-SfT: Shape-from-Template with a Physics-Based Deformation Model

Shape-from-Template (SfT) methods estimate 3D surface deformations from a single monocular RGB camera while assuming a 3D state known in advance (a template)...

Software Details

DeepCap: Monocular Human Performance Capture Using Weak Supervision

Human performance capture is a highly important computer vision problem with many applications in movie production and virtual/augmented reality. Many previous performance capture approaches either required expensive multi-view setups...

Dataset Details

Real-time Deep Dynamic Characters

We propose a deep videorealistic 3D human character model displaying highly realistic shape, motion, and dynamic appearance learned in a new weakly supervised way from multi-view imagery...

Dataset Details

i3DMM: Deep Implicit 3D Morphable Model of Human Heads

We present the first deep implicit 3D morphable model (i3DMM) of full heads. Unlike earlier morphable face models it not only captures identity-specific geometry, texture, and expressions of the frontal face...

Software Details

LiveCap: Real-time Human Performance Capture from Monocular Video

We present the first real-time human performance capture approach that reconstructs dense, space-time coherent deforming geometry of entire humans in general everyday clothing from just a single RGB video...

Dataset Details

NRSfM: Neural Dense Non-Rigid Structure from Motion with Latent Space Constraints

We introduce the first dense neural non-rigid structure from motion (N-NRSfM) approach, which can be trained end-to-end in an unsupervised manner from 2D point tracks...

Software Details

HTML: A Parametric Hand Texture Model for 3D Hand Reconstruction and Personalization

3D hand reconstruction from images is a widely-studied problem in computer vision and graphics, and has a particularly high relevance for virtual and augmented reality...

Software Details

XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera

We present a real-time approach for multi-person 3D motion capture at over 30 fps using a single RGB camera. It operates successfully in generic scenes which may contain occlusions by objects and by other people...

Software Details

IsMo-GAN: Adversarial Learning for Monocular Non-Rigid 3D Reconstruction

The majority of the existing methods for non-rigid 3D surface regression from monocular 2D images require an object template or point tracks over multiple frames as an input...

Software Details

DispVoxNets: Non-Rigid Point Set Alignment with Supervised Learning Proxies

A supervised-learning framework for non-rigid point set alignment of a new kind — Displacements on Voxels Networks (DispVoxNets) — which abstracts away from the point set representation...

Software Details

VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera

The first real-time method to capture the full global 3D skeletal pose of a human in a stable, temporally consistent manner using a single RGB camera. Our method combines a new convolutional neural network...

Software Details

GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB

A real-time 3D hand tracking based on a monocular RGB-only sequence. Our tracking method combines a convolutional neural network with a kinematic 3D hand model, such that it generalizes well to unseen data...

Software Details