Riding the Media Bits

Last update: 2011/08/21

Riding the media bits

 

 

Machines understand the world for us

 

Searching for information out of an image


The increased dependency of many users, particularly on mobile devices, has initiated an unstoppable drive, supported by their increasing processing capability, to pack more functionalities in devices. In some cases the functionality is entirely confined within the device, but in other cases the ability to interact with other devices or services depends on an agreed interface.

Users can already send the audio signature of a song captured from the air to a service and receive back all sort of information regarding the song. This is something that can already be implemented in a standard fashion using the MPEG-7 Audio Descriptors.

Another important example is provided by “Visual Search” which is best represented by the following use case. A user who wants to know more about an object takes a picture of it and sends it to a service with an appropriate knowledge data base. The service interprets the object and possibly responds with a number of additional information elements from a variety of viewpoints.

Here, too, the MPEG-7 standard already provides a number of descriptors for video, audio and multimedia that are useful for this task. However, more is needed to solve the challenging problem of enabling users to obtain the desired information by using both their mobile and non-mobile devices to take a picture, extract sophisticated descriptors from the picture without depleting the battery, and send the data to the service provider of choice without clogging the network.

MPEG has recently issued a CfP on “Compact Descriptors for Visual Search”.  The list below provides a set of requirements for the standard:

  1. Self-contained (no other data necessary for matching)
  2. Independent of the image format
  3. High matching accuracy at least for special types of image (textured rigid objects, landmarks, and printed documents), and robustness to changes (vantage point, camera parameters, lighting conditions and partial occlusions)
  4. Minimal length/size
  5. Adaptation of descriptor lengths for the target performance level and database size
  6. Ability to support web-scale visual search applications and databases
  7. Extraction/matching with low memory and computation complexity
  8. Visual search algorithms that identify and localise matching regions of the query image and the database image, and provide an estimate of a geometric transformation between matching regions of the query image and the database image.