[PGDVS] Pseudo-Generalized Dynamic View Synthesis from a Video | Is Generalized Dynamic Novel View Synthesis from Monocular Videos Possible Today?

Pseudo-Generalized Dynamic View Synthesis from a Video

(Originally titled as "Is Generalized Dynamic Novel View Synthesis from Monocular Videos Possible Today?")

¹Apple
²University of Illinois Urbana-Champaign

Rendering scenes observed in a monocular video from novel viewpoints is a challenging problem. For static scenes the community has studied both scene-specific optimization techniques, which optimize on every test scene, and generalized techniques, which only run a deep net forward pass on a test scene. In contrast, for dynamic scenes, scene-specific optimization techniques exist, but, to our best knowledge, there is currently no generalized method for dynamic novel view synthesis from a given monocular video. We cannot help but ask a natural question:

"Is generalized dynamic novel view synthesis from monocular videos possible today?"

To answer this question, we establish an analysis framework based on existing techniques and work toward the generalized approach. We find

"A pseudo-generalized approach, i.e., no scene-specific appearance optimization, is possible, but geometrically and temporally consistent depth estimates are needed."

To clarify

We use the word pseudo due to the required scene-specific consistent depth optimization, which has already been utilized in many scene-specific approaches and can be replaced with depth from physical sensors, e.g., an iPhone LiDAR;
We call it generalized because of no need for costly scene-specific appearance fitting.

Despite no scene-specific appearance optimization, the pseudo-generalized approach improves upon some scene-specific methods.

@inproceedings{zhao2024pgdvs, title = {{Pseudo-Generalized Dynamic View Synthesis from a Video}}, author = {Xiaoming Zhao and Alex Colburn and Fangchang Ma and Miguel Ángel Bautista and Joshua M. Susskind and Alexander G. Schwing}, booktitle = {ICLR}, year = {2024}, }

Pseudo-Generalized Dynamic View Synthesis from a Video

(Originally titled as "Is Generalized Dynamic Novel View Synthesis from Monocular Videos Possible Today?")

ICLR 2024

Xiaoming Zhao^1,2, Alex Colburn¹, Fangchang Ma¹, Miguel Ángel Bautista¹, Joshua M. Susskind¹, Alexander G. Schwing¹

[NVIDIA Dynamic Scenes] Videos from Frames for Quantitative Evaluations

Balloon1

Balloon2

dynamicFace

Jumping

Playground

Skating

Truck

Umbrella

[DyCheck iPhone] Videos from Frames for Quantitative Evaluations

apple

block

paper-windmill

space-out

spin

teddy

wheel

[NVIDIA Dynamic Scenes] Spatial-temporal Interpolation

(ZERO scene-specific appearance optimization on these scenes)

[DAVIS] Spatial-temporal Interpolation

(ZERO scene-specific appearance optimization on these scenes)

Bibtex

Acknowledgements

Pseudo-Generalized Dynamic View Synthesis from a Video

(Originally titled as "Is Generalized Dynamic Novel View Synthesis from Monocular Videos Possible Today?")

ICLR 2024

Xiaoming Zhao1,2, Alex Colburn1, Fangchang Ma1, Miguel Ángel Bautista1, Joshua M. Susskind1, Alexander G. Schwing1

[NVIDIA Dynamic Scenes] Videos from Frames for Quantitative Evaluations

Balloon1

Balloon2

dynamicFace

Jumping

Playground

Skating

Truck

Umbrella

[DyCheck iPhone] Videos from Frames for Quantitative Evaluations

apple

block

paper-windmill

space-out

spin

teddy

wheel

[NVIDIA Dynamic Scenes] Spatial-temporal Interpolation

(ZERO scene-specific appearance optimization on these scenes)

[DAVIS] Spatial-temporal Interpolation

(ZERO scene-specific appearance optimization on these scenes)

Bibtex

Acknowledgements

Xiaoming Zhao^1,2, Alex Colburn¹, Fangchang Ma¹, Miguel Ángel Bautista¹, Joshua M. Susskind¹, Alexander G. Schwing¹