End-To-End Generative Pretraining For Multimodal Video Captioning

Endtoend Generative Pretraining for Multimodal Video Captioning

End-To-End Generative Pretraining For Multimodal Video Captioning. Web objective effectively transfers to multimodal video captioning and outperforms the state of the art by a margin.

Endtoend Generative Pretraining for Multimodal Video Captioning
Endtoend Generative Pretraining for Multimodal Video Captioning

Web objective effectively transfers to multimodal video captioning and outperforms the state of the art by a margin.

Web objective effectively transfers to multimodal video captioning and outperforms the state of the art by a margin. Web objective effectively transfers to multimodal video captioning and outperforms the state of the art by a margin.