The goal of computer vision is to interpret complex visual scenes, by recognizing objects and understanding their spatial arrangement within the scene. Achieving this involves learning
categories from annotated training images. In the current paradigm, each category is learned starting from scratch without any previous knowledge. This is in contrast with how humans learn, who accumulate knowledge about visual concepts which they reuse to help learning new concepts.
The goal of this project is to develop a new paradigm where computers learn visual concepts on top of what they already know, as opposed to learning every concept from scratch. We propose to progressively learn a vast body of visual knowledge, coined Visual Culture, from a variety of available datasets. We will acquire models of the appearance and shape of categories in general, models of specific categories, and models of their spatial organization into scenes. We will start learning from datasets with high degree of supervision and then gradually move to datasets with lower degrees. At each stage we will employ the current body of knowledge to support learning with less supervision. After acquiring Visual Culture from existing datasets, the machine will be ready to learn further with little or no supervision, for example from the Internet. Visual Culture is related to ideas in other fields, but no similar endeavor was undertaken in Computer Vision yet.
This project will make an important step toward mastering the complexity of the visual world, by advancing the state-of-the-art in terms of the number of categories that can be localized, and in
the variability covered by each model. Moreover, Visual Culture is more than a mere collection of isolated categories, it is is a web of object, background, and scene models connected by spatial relations and sharing visual properties. This will bring us closer to image understanding, the automatic interpretation of complex novel images.