Introducing A Unified Framework For Content Object
Introducing A Unified Framework For Content Object
net/publication/229059200
CITATIONS READS
12 123
15 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Petros Daras on 20 December 2013.
Xavier Le Bourdon
JCP-Consult,
9, mail de Bourgchevreuil,
35510 Cesson-Sévigné, France
E-mail: [email protected]
Vincenzo Croce
Engineering Ingegneria Informatica S.p.a.,
Viale Regione Siciliana Nord Ovest, 7275,
90146, Palermo, Italy
E-mail: [email protected]
Thomas Steiner
Google Germany GmbH,
ABC-Str. 19,
20354 Hamburg, Germany
E-mail: [email protected]
Sabine Spiller
EasternGraphics GmbH,
Albert-Einstein-Str. 1,
98693 Ilmenau, Germany
E-mail: [email protected]
Steeve Morin received his Master degree from EPITECH, European Institute
for Technology, in 2007. He worked as an R&D Software Engineer in Exalead
developing the main multimedia platforms of the Exalead platform (Voxalead).
He then joined Milestonelab for developing a multimedia platform. He is now
the founder of Mooncoral, a company specialised in online video processing
with the challenge of managing big volumes of data.
Amar-Djalil Mezaour received his PhD from the University of Paris Sud
(Paris XI), in 2005 and is the Research and Development Project Manager
at Exalead. He was previously a Research Assistant at LRI in IASI team
(Artificial Intelligence team) and a member of the INRIA GEMO project.
Within the research and innovation group of Exalead, he worked on integrating
advanced semantic technologies for information extraction in Exalead software.
Introducing a unified framework for content object description 355
1 Introduction
Multimedia content, which is available over the internet, is increasing at a faster pace
than respective increase in computational power. In 2006, digital content, produced by
either professional or amateur users, reached the capacity of 161 exabytes, while it is
expected that by 2010 it will reach the capacity of 998 exabytes, inducing a 6 fold
increase (IDC Consultancy Report, 2007). In 2011, the amount of digital information that
will be produced will be almost 10 times the content produced in 2006 (IDC Consultancy
Report, 2007). Due to the widespread availability of digital recording devices, improved
modelling tools, advanced scanning mechanisms as well as display and rendering
devices, even over mobile environments, users are getting more empowered to live an
immersive and unforgettable experience with latest-generation digital media. This growth
in popularity of media is not accompanied by similar rapid development of media search
technologies. The most popular media services on the web are typically limited to textual
search. However, over the last years, significant efforts have been devoted, mainly by the
European research community, to achieving content-based search of images, videos and
3D models.
In order for this ever growing media content to be easily searched and retrieved by
next generation content-based search engines, a framework for describing media in a
standard format to ensure interoperability is required. Towards this direction, the
MPEG-7 standard (Martinez et al., 2002) offers a comprehensive set of multimedia
description tools, which can be used by applications that enable quality access to content.
MPEG-7 gives a generic framework that can support various applications, facilitating
exchange and reuse of multimedia content across different application domains.
Similarly, JPSearch (Dufaux et al., 2007) aims to provide a standard for interoperability
for image (JPEG, JPEG2000) search and retrieval systems. More specifically, the goal of
JPSearch is to define the interfaces and protocols for data exchange between devices and
systems.
356 P. Daras et al.
Currently, media content is often the result of an off-line, cumbersome and lengthy
creation process. This media is delivered to end-users for consumption as a finalised
complete media presentation in the form of bit streams, followed by a play-out at the
end-user’s device. In the context of the future internet (FI), on the other hand, the concept
of content objects (COs) (Zahariadis et al., 2010) has been introduced to describe rich
media experiences created as just-in-time composition of content that is easily located,
synchronised, reused and composed. The availability of the constituent COs and their
spatial and temporal relationships, rather than an opaque stream of pixels and audio
samples, opens up new opportunities for content creation and consumption. In this new
era, where the volume and quality of transferred content rise sharply and more users
evolve from mere consumers to active creators, new approaches for describing, searching
and retrieving this rich media content are required.
With the objective to address the increasing demands of the FI, the EU-funded project
I-SEARCH (Axenopoulos et al., 2010) aims to provide a novel unified framework for
multimodal content indexing, search and retrieval. The I-SEARCH framework will be
able to handle specific types of multimedia and multimodal content (text, 2D image,
sketch, video, 3D objects and audio) along with real-world and user-related information,
which can be used as queries and retrieve any available relevant content of any of the
aforementioned types. The search engine which I-SEARCH is proposing will be highly
user-centric in the sense that only the content of interest will be delivered to the
end-users, satisfying their information needs and preferences. Being able to deal with all
the aforementioned types of media, dynamic content, real-world and user-related
information, this novel framework fits perfectly in the nature of the COs as described in
Zahariadis et al. (2010).
Multimodal search and retrieval has been already addressed by numerous commercial
applications. Several ‘closed’ industry standards are available today, mainly for mobile
devices like Apple’s iPhone or smart phones based on Google’s Android operating
system. Due to the small hard- or software keyboards on mobile devices, alternative input
types are desirable. In Android devices, a ‘search by voice recognition’ functionality is
available (sound is transformed into a textual query). The same functionality is available
in several applications for iPhone, e.g., Bing’s, Yahoo!’s, and Google’s own search
applications. Google Goggles goes one step further by allowing for images and GPS
location to be search queries. This can be used to search for, e.g., text contained in
images, landmarks or attractions in images, or product search. The returned search results
are of multimodal form (map results, image results, video results, and obviously text
results). While none of these standards are publicly available outside the scope of the
particular applications, I-SEARCH, on the other hand, proposes an open solution that
goes beyond what is commercially available today.
In this paper, the novel framework for description of rich media content that is
introduced by I-SEARCH is described in detail. The rich unified content description
(RUCoD) consists of a multi-layered structure, which will integrate intrinsic properties of
the content, dynamic properties, non-verbal expressive, emotional and real-world
descriptors. RUCoD will serve as a formal representation of COs, which will be also
clearly defined in the paper. The relations of RUCoD with other well-known standards
for multimedia description, such as MPEG-7, will be also presented. Special focus will be
on the implementation of RUCoD to the three use cases of the I-SEARCH project,
namely: search for music content, furniture model retrieval and 3D object/avatar retrieval
for games.
Introducing a unified framework for content object description 357
2 Content object
The concept of COs has been introduced in Zahariadis et al. (2010). More specifically,
the following definition was given:
“A Content Object is a polymorphic/holistic container, which may consist of
media, rules, behaviour, relations and characteristics or any combination of
the above.”
The definition given above is rather generic. In order to obtain a more concrete idea of
CO, the following definition has been developed in the context of the I-SEARCH project
and is presented for the first time in this paper. This definition was inspired by the
I-SEARCH requirements, in order to address a broad range of applications related to
multimodal search and retrieval as well as multimodal interaction.
“A Content Object is the representation of a specific instance of either a
physical object or a physical entity (an entity that has physical existence, e.g.
an earthquake) or an abstraction (a general concept formed by extracting
common features from specific examples), an event or a concept, which might
have multiple views (many images, videos, audio files, text, real-world and
user-related information).”
According to the definition above, it can be inferred that an integral part of the CO is
multimedia. A CO cannot exist without the existence of media items inside it. A CO can
span from very simple media items (e.g., a single image or an audio file) to highly
complex multimedia collections (e.g., a 3D object together with multiple 2D images and
audio files) along with accompanying information. When a user refers to a CO, s/he
directly refers to all of its constituting parts.
From a FI perspective, the adoption of COs is expected to revolutionise access to
digital content, since instead of sharing, searching and retrieving single media items,
novel FI architectures can be appropriately designed to support exchange of COs. A
significant step towards this goal has been made through multimodal search and retrieval,
which is addressed by the I-SEARCH project.
the same semantics. If two multimedia objects are in the same MMD, they can be
regarded as context of each other.
Another approach that addresses the problem of multimodal search is presented in
Zhang and Weng (2006), where both intra- and inter-media correlations are learnt among
multi-modality feature spaces in order to construct a semantic subspace containing
multimedia objects of different modalities. Here, the concept of Multimedia Bag is
introduced, which defines a container including text instances, image instances and audio
instances that share the shame semantic concepts.
Both of the abovementioned methods define a novel rich multimedia representation
(either MMD or Multimedia Bag), a container that integrates multiple modalities of the
same semantics into one single object that can be searched and retrieved as a whole.
Using the terminology of this paper, both MMD and Multimedia Bag provide good
approximations of the CO, since they represent rich collections of multimedia with the
same semantics. Based on the existing state-of-the-art methods in multimodal search, the
I-SEARCH project aims to provide a more formal definition of the CO plus a novel
framework to support efficient search and retrieval of COs.
Based on its former definition a CO may consist of media, rules, behaviour, relations,
characteristics or any combination of the above. In the sequel, we will identify which of
the above features can be addressed by the CO definition introduced in this paper:
• Media: It is the digital representation of anything that a human can
perceive/experience with his/her senses and can be captured (through a specific
device, such as camera, microphone, etc.) or created (using an authoring tool). As
previously mentioned, multimedia is an integral part of the CO. What should be
highlighted here is the multimodal nature of COs. To be more specific, COs do not
consist of a single media item but they can be highly complex multimedia collections
along with accompanying information. This information will be searched and
retrieved as a whole, irrespective of the media type that is used as query, even in
cases where the query modality is absent from the CO.
• Rules: Can refer to the way an object is treated and manipulated by other objects or
the environment (discovered, retrieved, casted, adapted, delivered, transformed, and
presented). In this paper, a novel framework for unified CO description is introduced
(to be analysed in the following sections). This description scheme provides the rules
on how these COs will be searched, retrieved, adapted, delivered and presented,
using the search, retrieval and adaptive presentation framework provided by
I-SEARCH.
• Behaviour: Can refer to the way the object affects other objects or the environment.
Currently, behaviour is not explicitly supported in the RUCoD format. However,
implicitly there are some functions such as Relevance Feedback, where the selected
COs (positive or negative) can affect the ranking position of other COs. In this case,
user behaviour (that can be modelled as a RUCoD query, with U-descriptors) can
affect the results of COs (with respect to their user-related parts)
• Relations: Refer to relations between a CO and other COs. In I-SEARCH relations
are addressed in two ways. Firstly, relations between the different media items,
which are constituting parts of the same CO, are directly established, since they are
included within the same CO description file. Secondly, each CO may contain links
to other COs that are somehow related to each other.
Introducing a unified framework for content object description 359
Figure 1 A conceptual diagram of an authoring tool for COs (see online version
for colours)
360 P. Daras et al.
The Authoring Tool will take as input all different types of media items,
real-world information (location, weather, time, etc.) and user-related information
(emotional/expressive characteristics) and will produce a rich media representation, a
CO. The formal description of a CO is its RUCoD file, which is an XML-based
document specifying descriptors for all the above input types.
Through an appropriate user interface of the Authoring Tool, the user will be able to
add manually all related information to create the CO. A special functionality of the tool
is that it will assist users to easily add links to other relevant COs. Links are underlined as
the representation of relations with other COs. These links gather the intention of the
User to relate other COs to the one he is defining. They represent the relation among
concepts that the User is expressing. With an appropriate search interface, user will be
able to search for similar COs and create relations among COs.
3 RUCoD specification
In this section, the specification of RUCoD is introduced. RUCoD will serve as a generic
multimedia content descriptor, enhanced with real-world information, expressive and
emotional descriptions, in order to facilitate the retrieval of different types of media
irrespective of the query format.
The goal is to enable the development of very heterogeneous applications, ranging
from pure search and retrieval to personalised, user interaction specific and/or
context-aware search. Therefore, this unified approach is proposed, where the actual
metadata and the real world and user-related parts reside in the same format.
The general form of the RUCoD structure is given in Figure 2:
Figure 2 The RUCoD general format (see online version for colours)
Figure 3 The overall structure of a RUCoD file (see online version for colours)
In the example above, the RUCoD description corresponds to the CO entitled ‘My
Barking Bulldog’. RUCoD consists of the following main parts:
• Header: Includes general information about the CO, such as the type, name, ID and
creation information. Moreover, the RUCoD header encloses some general
information about the different media (3D, images, sounds, videos, text) and
accompanying information (real world data, user-related cues) that constitute the
CO.
• Description: It is the core part of the RUCoD including detailed information about
the corresponding media and contextual information (real world, user-related). It
consists of:
a the L-Descriptors part, where the low-level descriptors, extracted from each
separate media (3D, images, sounds, videos, text), are presented
b the R-Descriptors part, which maintains descriptors extracted from real-world
sensors, representing time, weather, location, etc.
362 P. Daras et al.
Figure 4 Conceptual model presenting CO and the relations with its constituting elements
(see online version for colours)
c MediaLocator: The URI of the location where the specific media is stored,
together with the optional location of the media item to be used as preview.
d MediaCreationInformation: Information about the creator of the media (if it is
different from the CO creator).
• RealWorldInfo:
a RWContextSliceName: A unique name within the RUCoD representing the
specific ContextSlice.
b RWContextTypes: A list of R-Descriptors for a particular ContextSlice that are
used to describe the context of the media in this RUCoD record.
c RWDescriptorsFormat: A list of formats that is used to describe each
Real-World sensor descriptor for a particular ContextSlice.
• UserInfo: The emotional state associated with the content object, as specified by the
author of the Content Object, stored as a fragment of EmotionML.
• Links: The IDs of COs that are linked to the current CO (e.g., one linked CO could
be the one representing the house of the ‘bulldog’ CO).
3.2 L-Descriptor
A snapshot of a part of the RUCoD L-Descriptor is given in Figure 6. In the example, the
low-level descriptors of one of the 3D objects of the ‘My barking bulldog’ CO are
specified. L_Descriptor may include the following fields:
• MediaName: This field should be identical with the MediaName field of the same
media object defined at the RUCoD header. It is essential in order to map the
original media file with the corresponding low-level descriptors.
• Shape3DDescription: It is a container, enclosing descriptors of a specific 3D object.
Here, the type of the low-level descriptor as well as the matching method are
defined.
• GlobalShape: Defines a set of parameters of a 3D shape descriptor, such as the
dimension of the descriptor vector, the type of the descriptor (text, numerical,
integer) and the size in bytes of each descriptor. This information is required for
parsing the descriptor file (which may be given in binary format).
• ImageDescription: It is a container, enclosing descriptors of a specific image. These
may include Edge Orientation Histogram (Eoh_32), Probability weighted histogram
(Probrgb_6_2), Laplacian weighted histogram (Laplrgb_6), HSV standard histogram
(Histohsv_std) or SIFT local descriptors (sift_desc).
• VideoDescription: Consists of a set of visual objects present in several key-frames of
the video and, for each object, a list of information on the key-frame images
containing this visual object (time code of the image, position of the visual object
inside the image).
• AudioDescription: Defines a set of parameters for the audio signal, such as its
fingerprint (for computing audio similarity), its rhythmic pattern and melodic profile.
Examples of such low-level descriptors are the MPEG-7 descriptors (e.g.,
Introducing a unified framework for content object description 365
Figure 6 The L-Descriptor (3D descriptor) Part of RUCoD (see online version for colours)
3.3 R-Descriptor
The real-world descriptor is based on the context as defined by Dey and Abowd (1999),
“information that can be used to characterise the situation of an entity”, where an entity
is a CO. However, the different dimensions of the context are not always orthogonal. In
other words, semantic links can append between two dimensions of the context. For
example, in a video, the position of the camera may change (semantic link between the
location and the time), and the weather may change for different locations (semantic link
between the weather and the location). To be able to define such complex context, we
define the concept of Context Slices.
RUCoD R-Descriptors are grouped in ContextSlices where each descriptor provides
information on a particular aspect of the Context (i.e., time, position, temperature, etc.).
Each RUCoD record may contain multiple ContextSlices. Each ContextSlice is
composed by a non-empty set of R-Descriptors. A ContextSlice may refer to multiple
media and multiple ContextSlices may be used to set the context for a piece of media.
The following picture describes the potential relationships between ContextSlices, media
objects and R-Descriptors within a RUCoD record.
In Figure 7, ContextSlice1 describes the context for both video and 3D media item, while
for the video item, ContextSlice2 completes the relevant Real-World information. The
importance of ContextSlice2 is higher and it overrides any similar information that may
be provided in ContextSlice1.
For each ContextSlice the following information shall be provided to effectively
match it with the relevant media objects and set their context.
• MediaNamesList: This field includes all the media items to which the specific
R-Descriptors are mapped. The names used in MediaNamesList must be the same as
the names defined in the MediaName fields of the corresponding media items,
located in the RUCoD header. It is essential in order to match the original media file
to the corresponding real-world descriptors.
• Importance: This field should have a unique value that will allow dealing effectively
with conflicts between different ContextSlices referring to the same media object.
One ContextSlice should also contain one or more context descriptors:
• DateTime: Defines the start date of the CO, and its length if relevant.
• SubjectPosition: Defines the location of the main subject of the CO. The format used
in this descriptor is based on the GML standard, thus it may be a simple point or a
complex shape.
• ViewerPosition: Defines the location and the view (tilt, roll…) of the viewer. The
format is based on the GML standard, thus it may also be a simple point or a
complex shape.
• Weather: Defines the weather (temperature, wind, condition…) relevant for the
media.
For each sensor (aspect of the Context), the following information is required to
effectively interpret the Real World data.
• SensorFormat: The format of the sensor information (e.g., string, integer, XML).
• SensorAccuracy: The accuracy of the specific sensor. It may be undefined if not
applicable.
• SampleUnits: The units of the measured sensor samples if applicable.
3.4 U-Descriptor
U_Descriptor may include the following fields (Figure 8):
• MediaName: This field must be identical with the MediaName field of the same
media object defined at the RUCoD header. It is essential in order to map the
original media file with the corresponding user-related descriptors.
• UserDescription: Defines a set of parameters that encapsulate the emotional
information extracted by the audio or video signal. It includes descriptors that define:
368 P. Daras et al.
Figure 8 The U-Descriptor part of RUCoD (see online version for colours)
This section provides an overview of the three different scenarios supported by the
I-SEARCH project. This is done in order to identify the specific needs of each use case,
in terms of multimedia content description, and the subsequent modifications to their
RUCoD descriptions.
• Header: Almost all of the Header parts described in the previous section are present
in this specific use case.
a MultimediaContent will be used to point to the location of the SoundType object.
b Links will be used to point to other Cos, e.g., other material (recordings,
pictures, videos) gathered by the same researcher in the same place/region.
• L_Descriptor: Three AudioDescription elements will be used to store the low-level
information needed to perform the search-by-example queries (in a first iteration
StatisticalSoundDescription, InterOnsetIntervals, FundamentalFrequency).
• R-Descriptor: Here the following RUCoD parts are identified: DateTime,
SubjectPosition
• U_Descriptor: In this part of the RUCoD, the platform will store the UserDescription
elements to support locating audio files sharing the same affective features expressed
by the user performing the query (for instance, the location in the Valence-Arousal
space).
Figure 9 Snapshot of the <TextDescription> part of RUCoD for the furniture retrieval use case
(see online version for colours)
and video content is also quite similar to the case described in Section 3, so it will result
in a similar RUCoD, as well.
content scenario from creators and distributors to consumers. The standard covers the
entire content production and delivery ‘food-chain’ with interoperability and automation
in mind. In such vision the two main concepts in MPEG-21 are Digital Items (DI) and
Users. Within the framework the latter continuously interact with – and manipulate the
former.
A DI is defined as the basic entity within the framework: it is a combination of
resources, metadata and structure. Resources are the assets within the item (i.e., the actual
content, possibly remote); metadata provides information about the DI per se or the single
resources within it; structure holds data about the relationships between the components
of the DI.
Users are actors who interact with the DIs, possibly in relationship with other Users.
The framework is agnostic about Users’ roles: this means that anyone who uses DIs (be it
a content owner, provider, consumer, etc.) is a User. Interaction and manipulation by
Users of the content are regulated within the framework by Rights Management
mechanisms regarding what Users can and cannot do with the DIs and especially the
resources they hold.
Technically the MPEG-21 DI is defined by the digital item declaration (DID). This is
represented in the Digital Item Declaration Language (DIDL) which is an XML Schema
defined by the standard. Describing all the components of the DID is out of the scope of
the current work: in this context it is important to highlight the concepts of container and
item within the DI. The former is a structure which allows the grouping of both items
and/or other containers. The latter is a grouping of items and/or components, which in
turn are resources with a set of descriptors (i.e., metadata). Essentially this implies that
the MPEG-21 model enables the possibility of nested items within items (defined as
compilations) thus hierarchical content structures. A Resource (which could possibly be a
physical object) is a uniquely identifiable asset within the item.
An interesting feature of MPEG-21 is the presence within the DID of selections,
asserts, and predicates (all of which are in true, false or undecided states) which enable
choices to be made on the objects. MPEG-21 is also concerned with Rights and
Intellectual Property management issues which are not particularly relevant to our
context.
The MPEG-21 Digital Adaptation tries to tackle the issue of accessing media “any
time and anywhere” (Timmerer et al., 2008). This implies the adaptability of Digital
Items and their resources to different devices, networks, etc. Essentially the DI goes
through an Adaptation Engine which transforms it according to target environment in
terms of descriptions which can be (using the standards terminology): terminal
capabilities, network characteristics, user characteristics, natural environment
characteristics. DI Adaptation also takes care of metadata adaptation in terms of content
change which reflects the metadata and metadata scaling and filtering.
The CO presented above is similar to the MPEG-21 DI. In fact both share a
multimodal approach to media which can be of any type. Both MPEG-21 and RUCoD
allow for rich metadata to be created and attached to objects. The L-Descriptors and
R-Descriptors are a specific feature of the RUCoD. This doesn’t mean that similar
metadata couldn’t be added to MPEG-21 DI, yet the RUCoD is particularly targeted at
indexing, sharing, search and retrieval. In this direction the model doesn’t directly
address or enforce adaptation, although adaptation can be easily implemented thanks to
the provision of elements such as the FileFormat (in the example above we have wave
file for audio which could be converted to a lighter format for mobile appliances or
374 P. Daras et al.
6 Conclusions
In this paper, an attempt to define the concept of COs was made. COs are rich media
presentations, which enclose at the same time different types of media items related to the
same physical entity, event or concept. Apart from media items, real-world and
user-related information can also be identified within a CO. This definition was
introduced to address the requirements of the EU funded project I-SEARCH with respect
to multimodal search and retrieval and shares common features with the CO concept
given by the User Centric Media Cluster.
Furthermore, a unified framework for a formal representation of COs has been
introduced in this paper. The RUCoD provides a uniform descriptor for all types of COs
irrespective of the underlying media and accompanying information. A specification of
the RUCoD’s constituting parts, using an example CO for illustration, was also provided.
Special focus was on the identification of the potential modifications of RUCoD in
order to address the three different usage scenarios of I-SEARCH, since each use case
has different requirements regarding media search and retrieval. As it was described in
the relevant section, the generic nature of RUCoD is adequate to address all scenarios
with minor modifications.
Finally, the relations of RUCoD with other well-known standards for multimedia
description, such as MPEG-7, JPSearch and MPEG-21, were presented. RUCoD shares
several common features with these standards but introduces also numerous innovative
features, which are inline with the emerging demands of multimodal search in the FI.
Acknowledgements
References
Axenopoulos, A., Daras, P. and Tzovaras, D. (2010) ‘Towards the creation of a unified framework
for multimodal search and retrieval’, 2nd International ICST Conference on User Centric
Media – UCMedia 2010, 1–3 September, Palma de Mallorca.
Burnett, I., Van de Walle, R., Hill, K., Bormans, J. and Pereira, F. (2003) ‘MPEG-21: goals and
achievements’, IEEE Multimedia, October–December, Vol. 10, No. 4, pp.60–70.
Introducing a unified framework for content object description 375
Dey, A.K. and Abowd, G.D. (1999) ‘Towards a better understanding of context and
context-awareness’, HUC ‘99: Proceedings of the 1st International Symposium on Handheld
and Ubiquitous Computing.
DMOZ Open Directory Project, available at http://www.dmoz.org/ (accessed on 12 December
2011).
Dufaux, F., Ansorge, M. and Ebrahimi, T. (2007) ‘Overview of Jpsearch: a standard for image
search and retrieval’, Content-Based Multimedia Indexing, CBMI’07, June, London.
Google Goggles, available at http://www.google.com/mobile/goggles/ (accessed on 12 December
2011).
IDC Consultancy Report (2007) The Expanding Digital Universe, March.
Martinez, J.M., Koenen, R. and Pereira, F. (2002) ‘MPEG-7: the generic multimedia content
description standard, part 1’, IEEE Multimedia, Vol. 9, No. 2, pp.78–87.
Timmerer, C., Vetrob, A. and Hellwagnera, H. (2008) Mpeg-21 Digital Item Adaptation,
pp.457–463, Springer, New York, October.
Wordnet, A Lexical Database for English, available at http://www.wordnet.princeton.edu/
(accessed on 12 December 2011).
Yang, Y., Xu, D., Nie, F., Luo, J. and Zhuang, Y. (2009) ‘Ranking with local regression and global
alignment for cross-media retrieval’, Proceedings of the Seventeen ACM International
Conference on Multimedia, Beijing, China.
Zahariadis, T., Daras, P., Bouwen, J., Niebert, N., Griffin, D., Alvarez, F. and Camarillo, G. (2010)
‘Towards a content-centric internet’, in G. Tselentis et al. (Eds.): Towards the Future Internet
– A European Research Perspective, pp.227–236, IOS Press.
Zhang, H. and Weng, J. (2006) ‘Measuring multi-modality similarities via subspace learning for
cross-media retrieval’, in Lecture Notes in Computer Science, Springer, PCM 2006, Vol. 4261,
pp.979–988.