Abstract: We invest the problem of weakly-supervised video grounding, where only video-level sentences are provided. This is a challenging task, and previous Multi-Instance Learning (MIL) based image ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results