VOID โ€“ Video Object and Interaction Deletion

๐ŸŒ Project Page | ๐Ÿ’ป GitHub

Upload a video and its quadmask, enter a prompt describing the scene after removal, and VOID will erase the object along with its physical interactions.

Built on CogVideoX-Fun-V1.5-5B fine-tuned for interaction-aware video inpainting.

Quadmask format

The quadmask is a grayscale video where each pixel value encodes what role that region plays:

Pixel value Meaning
0 (black) Primary object to remove
63 (dark grey) Overlap of primary object / affected zone
127 (mid grey) Affected region โ€” shadows, reflections, new and old trajectories
255 (white) Background โ€” keep as-is

Use the VLM-Mask-Reasoner pipeline included in the repo to generate quadmasks automatically.

Sample sequences โ€” click to load inputs