tech sign

New Algorithm Follows Human Intuition to Make Visual Captioning More Grounded

Annotating and labeling datasets for machine learning problems is an expensive and time-consuming process for computer vision and natural language scientists. However, a new deep learning approach is being used to decode, localize, and reconstruct image and video captions in seconds, making the machine-generated captions more reliable and trustworthy.

To solve this problem, researchers at the Machine Learning Center at Georgia Tech (ML@GT) and Facebook have created the first cyclical algorithm that can be applied to visual captioning models. The model is able to use the three-step processing during training to make the model more visually-grounded without human annotations or introducing additional computations when deployed, saving researchers time and money on their datasets.

Read the full story

Recent Stories


Erica Banks at Georgia Tech

Shaping Tomorrow’s Talent:…

Meet CSE Ziqi Zhang

Meet CSE Profile: Ph.D. Graduate Ziqi Zhang