Last update: 2011/08/21
Searching for information out of an image
The increased dependency of many users, particularly on mobile devices, has initiated an unstoppable drive, supported by their increasing processing capability, to pack more functionalities in devices. In some cases the functionality is entirely confined within the device, but in other cases the ability to interact with other devices or services depends on an agreed interface.
Users can already send the audio signature of a song captured from the air to a service and receive back all sort of information regarding the song. This is something that can already be implemented in a standard fashion using the MPEG-7 Audio Descriptors.
Another important example is provided by “Visual Search” which is best represented by the following use case. A user who wants to know more about an object takes a picture of it and sends it to a service with an appropriate knowledge data base. The service interprets the object and possibly responds with a number of additional information elements from a variety of viewpoints.
Here, too, the MPEG-7 standard already provides a number of descriptors for video, audio and multimedia that are useful for this task. However, more is needed to solve the challenging problem of enabling users to obtain the desired information by using both their mobile and non-mobile devices to take a picture, extract sophisticated descriptors from the picture without depleting the battery, and send the data to the service provider of choice without clogging the network.
MPEG has recently issued a CfP on “Compact Descriptors for Visual Search”. The list below provides a set of requirements for the standard: