Apple TV Search Video Guidelines
Apple TV Search Video Guidelines
6 l
Guidelines for Search – Video
24 tia
Table of Contents
Please ensure that you are using the latest version of the guidelines from BaseLine, which are
found in the upper right corner of every task. ......................................................................... 1
Guidelines for Search – Video ......................................................................................... 1
78 n
Search Video Evaluation Guidelines .............................................................................. 3
(1) Introduction .................................................................................................................. 3
e
(1.1) The importance of your work as a Rater ................................................................................................... 3
6 l
(2.2.2) TV Navigational ........................................................................................................................................ 16
24 tia
(2.3.1) Movie Navigational .................................................................................................................................. 17
(2.3.4) Person........................................................................................................................................................ 21
78 n
(2.3.6) Channel/Studio ......................................................................................................................................... 26
Guidelines update 2/7/23 to sections 2.1.4 and 2.3.7 to provide additional clarity to
live event guidelines
6 l
Search Video Evaluation Guidelines
24 tia
(1) Introduction
In this document, we explain relevance rating guidelines for video search on Apple TV.
If you are not familiar with the Apple TV app, please refer to https://www.apple.com/apple-tv-app/ for
an overview and basic information about this app.
78 n
(1.1) The importance of your work as a Rater
e
Each of the judgements you complete will be used to build and improve artificial intelligence systems
such as search algorithms and machine learned rankers that power the user experience for Apple
TV users. Your attention to detail, research and language skills as well as your cultural knowledge of
63 id
the market are all critical to the success of our projects.
Your judgements should represent those of an Apple TV user who is using the Search feature. Ask
yourself if you would be content with the results returned for a particular search query. Is there a
significant relationship between the query and content returned? Would you be content if you see
this content appear as a search result? Stay curious and complete thorough research.
nf
Our ultimate goal is to surprise and delight our customers by improving search quality and
enhancing customer satisfaction, and you play an important role in this.
Please keep in mind that your tasks will be spot-checked for quality, and measured against those of
your peers.
Co
Mandatory Comment
Each rating must be explained in the comment box [even if “optional” is indicated
- rating comments are always mandatory.
The comment should be concise and must only explain why the rating chosen is the correct one in
application of the guidelines. See example below:
6 l
24 tia
78 n
Comment Example
A secondary intent is less likely, or would be a less popular intent compared to a primary one. A
secondary intent could be:
63 id
• Content relevant to a smaller group of users than for the primary intent. For queries like
[shows] and [movies] the primary intent is usually media content for grown-ups. Content
for children would be considered secondary intent, except if the intent is obviously kids-
related (such as cartoons, animated films, etc.).
• Complimentary content such as trailers, reviews, cast members, or interviews with the cast
on how the movie was made.
nf
• Lower quality/lesser known content that is relevant to the query but is not the primary
intent, a content that is dated or less popular.
On the left hand side in BaseLine, inside the input metadata section you will find the classification of
the query.
The classification is already set (you do not have to classify the query yourself).
You should rate the content relevancy based on (1)the classification of the query and (2)how
relevant is the content in relation to the query and the query type.
The ‘Query type’ aims to help you determine the primary intent of the query. Please also consider
potential secondary intents when doing your ratings.
Should you disagree with the ‘Query type’ classification, please explain in the comment section why
it should be different. You can then apply the appropriate rating scale to complete your judgement.
Example Comment: “I believe that James Bond shall be classified as character rather than Movie
navigational. I used this query type for the rating above.”
Examples:
• Query: "Bi" → Classification: Ambiguous - Unclear intent (the intent is not clear).
• Query: “big little” → Classification: TV Navigational.
The intent is clear, The user is searching for a specific TV show 'big little lies'.
• Query: “Come” → Classification: Genre
The most likely primary intent is for comedy content. However, a movie or tv show that
6 l
starts with this token can also achieve good ratings. (e.g. Come True (2020))
24 tia
78 n
Metadata Example
Note that we have updated our classification guidelines, however, you may still see tasks that are
classified using older classifications.
e
Reference the table below to see what “new” classification you should use when rating these tasks.
63 id
nf
Co
6 l
Online Research
24 tia
• IMDB
o Popularity Ranking
o Rating Counts
o Storyline, Taglines
o Genre
o Release Date
o For topic, decade, etc: (IMDB Sort By feature)
• Box Office Mojo
• Wikipedia
o Age Rating → for determination of kids content
Market Knowledge
nf
Use the knowledge of your market to identify shows which are trending or historically popular. Be
aware of the shows offered on Apple TV and other streaming platforms in your country. Leverage
your understanding of the language to identify misspellings, translations, or aliases related to the
query. When completing online research, domestic sources should typically be prioritized over global
sources. For example, take into account the significant bias of IMDB for Hollywood productions and
English-language content.
Co
6 l
topic of the content. Whenever there is a query relevance, please use overall relevance
rating.
24 tia
1. Example:
1. [modern] → “modern family”
2. [danie] → “No time to Die” (Daniel Craig stars as James Bond)
3. [spencer] → “Talentos Ocultos” (Character in the returned movie)
2. Similarity: There is no relevance between the query and the output, the intent of the
query is clear and the query classified as TV Navigational, Movie Navigational or
Character. In this case, the relationship between the intended content and returned
content shall be determined in these aspects: target audience, factual aspects and/or
theme. The highest rating to be used is good.
78 n
Query Relevance
e 1. Example: ‘modern family’ → ‘the goldbergs’
Relevance describes the relationship between the query and returned content. This does not only
include the text match with the title.
Examples:
• Relevant:
63 id
o [John] → “The Problem with Jon Steward” (*misspelled)
o [Back] → “Back to the Future”
o [th] → “The Big Conn” / “Thor: Ragnarok”
o [part] → “Anthony Bourdain: Parts Unknown”
o [acti] → “Top Gun: Maverick”
o [disn] → “Frozen II”
o [Show] → “The Morning Show”
nf
6 l
• [rhe] → consider that the user might have meant to write “the”
• [4], [IV], [Four] → consider synonyms
24 tia
• In case you are rating on an alternative spelling, please do not use similarity
o Why? Too many steps from query → result to trace in auditing
78 ne
63 id
nf
Co
(2) Rating Process
6 l
(2.1) Aspects to Consider
24 tia
(2.1.1) Kids Content
As kids content satisfies the requests for a subset of users, it shall not get the same ratings for
generic queries as content for all age groups. Therefore, kids content shall be demoted by one
rating level unless the query is regarding kids content either directly (navigational, character) or
indirectly (“kids” token in query, animation). Please make use of the “Kids&Family” genre, Common
Sense Media and your own knowledge to determine whether content is targeting kids specifically. Do
not demote relevant content from Acceptable -> Off-Topic, as relevant content should always be
78 n
rated as at least acceptable.
This only applies to queries that are classified as “topic”, “genre”, “functional” and “year/decade”.
Examples:
e
• [Adventure] → “Yakari”: The returned show is in the intended genre and very popular,
however dated and hence could be rated as good. However, since it is kids content, it will
be rated acceptable.
63 id
(2.1.2) Franchise Ratings
General Guideline:
• Perfect → The first and last item of core franchise.
Please make an honest determination of the order in which users would typically watch
nf
the collection or franchise of content and use this to complete the rating. Use comment to
justify decision.
• Excellent → Other content in core franchise
• content in the franchise, but not core
o excellent → popular and recent
o good → popular or recent
Co
6 l
this case can be rated as off-topic
24 tia
(2.1.4) Sports Ratings
You may come across sporting events in your rating tasks that are not classified as '“Live Event”.
Please determine if the query is directly relevant to the sporting event (clear Primary Intent) or if the
intent of the query can reasonably be multiple things, including non-sports events (Partial Intent or
Secondary Intent).
The following guidance can help determine the proper rating, however, please rely primarily on your
78 n
own market knowledge and query relevance to make your judgement.
6 l
budget, famous cast and crew)
• Original content from ATV+, Netflix, Hulu, or other popular production company - generally
24 tia
assumed to be popular.
Example:
• [christmas] → “Spirited”: Rated Excellet, this movie is an Apple TV+ original ant it stars Will
Ferrell and Ryan Reynolds and can therefore be assumed to become popular. Rating
completed on 11/9/22.
78 n
Movie bundles shall be rated for the content included that is related to the query. Once the rating for
the content is determined, demote rating by one level
e• If the movie bundle contains the primary intended content it can receive an Excellent rating
o Example: No time to Die → James Bond 10 Film collection → Excellent
• If the movie bundle contains a piece of content that could potential be a primary intent of the
query, determine rating for individual movie first and demote by one level
o Example: Mav → Top Gun 2 Movie Collection → Good
63 id
• If the movie bundle contains a piece of content that individually would be rated as
“Acceptable”, the bundle receives the “Unacceptable: Off-Topic” rating
6 l
24 tia
(2.1.11) Spell Correction
Some queries are misspelled. Determine intent and complete the rating based on the correct
spelling of the query.
• [grays anatomie] → [grey’s anatomy]
78 n
(2.1.12) Explicit Content
Rate all pornographic content as Unacceptable: Off-Topic. Exception: If the query intent is
specifically for erotic content, or porn → Please use above guidelines to find appropriate rating.
e • [drama] → any pornographic content is Unacceptable: Off-Topic
• [erotica] / [porn] / [navigational] → erotic content can receive any rating
• [18+] / [movies for adults] → Intent here is not specifically for pornographic or erotic content
63 id
(2.1.13) Non-Content Queries
You may come across queries such as [cancel subscription], [settings], or [log out] which are not
relevant to any video content. For such queries, please use Problem: Other. Please use with
caution and explain your judgement in a comment.
nf
when the indicated release date is 2 or more years apart from the researched correct release date of
the content.
6 l
24 tia
78 ne
Metadata Reporting
63 id
(2.1.15) Seasonal Results
In some locales there are seasonal results which have additional relevance and popularity at specific
times of year, for example holiday movies are very popular in the US at the end of the year. Please
consider the seasonal popularity when rating results and consider increasing the rating due to
seasonal popularity.
nf
excellent rating. However, if this result surfaces in the November / December timeframe it
should be rated as excellent as it is a holiday classic and an excellent result at that time
• [bo] → “A Boyfriend for Christmas”: If rating in the summer this would be an “acceptable”
result, it is relevant but not recent or popular. During the Holiday season it would be rated
as "good", being a Christmas Romantic Comedy, it has additional popularity during that
period
(2.2) Similarity Rating
Evaluate Similarity between the intended content and the returned content based on the following
aspects:
• Target Audience
6 l
o Genre
o Age Rating (Kids (PG) vs Everyone (PG-13, TV-14) vs Adults (R))
24 tia
• Factual Aspects
o Cast & Crew: Actors, Producers, Studio, etc.
o Setting: Location AND Time Period in which the content plays
• Theme
o What is the content about?
78 n
General Guideline:
• Good → Similarity in Target Audience, Factual Aspect and Theme
• Acceptable → Similarity in 2 of the three categories (Target Audience, Factual Aspect and
e Theme)
• Off-Topic → Similarity in less than two aspects
63 id
nf
Co
78 n
(2.2.2) TV Navigational
e
63 id
nf
(2.2.3) Character
6 l
General Guideline:
24 tia
• Perfect → If navigational content and result is certainly primary intent of user, rate it perfect,
regardless of popularity and recency. For Franchise related queries, please follow the
Franchise specific guidelines as outlined above.
• Excellent → If the returned content is a sequel or prequel of the intended content, or the
content is part of a movie bundle with the intended content, rate the result as excellent.
• Good → If the returned content is relevant to the query, and either recent or popular, or can
be considered a secondary intent, rate it as good. One example would be a title matching
show for a movie query.
78 n• Acceptable → If the returned content is relevant to the query, but neither recent nor popular
Movie Navigational
Rating Overview.jpg 82.7 KB
6 l
24 tia
78 n
Movie Navigational Rating.jpg 460 KB
e
(2.3.2) TV Navigational
63 id
nf
Co
6 l
General Guideline:
• Perfect:
24 tia
o The most popular and recent content featuring this character in a major role.
(Apply “Franchise Query” Rule).
o If only one content with the character is produced, this content can be rated
perfect
• Excellent:
o Sequels/prequels for the show in which the character is best known
o Other high-quality content featuring the character
o Person page for well known actor/actress who plays the character
78 n• Good:
• Acceptable:
appeared
Character
Rating Overview.jpg 123 KB
6 l
24 tia
78 n
Character Rating.jpg 457 KB
e
63 id
nf
Co
(2.3.4) Person
Not currently a classifier type. Queries with this intent are currently classified as “Actor/Actress”,
“Producer”, “Director”, “Writer” or “Topic”. Please rate tasks that are classified as one of the latter
according to the person guidelines.
6 l
General Guideline:
• Excellent:
24 tia
o Content with the intended person as lead in cast & crew
o Popular and recent content with intended person in cast & crew
o Most popular documentary about the person (more than 1 possible if equal in
popularity/quality)
o Recent and popular live event with person
o Content where person is a significant guest star. Hosted by reputable content
creator.
o Set of most popular content inspired by the person
78 n• Good:
e o Person page
o Documentary about the person that is popular, but not most popular or recent
o Popular live event with person
o Popular content inspired by the person
o Content with the intended person as cast & crew (not as lead) that is popular or
• Acceptable:
recent
63 id
o Unpopular content about/with the person
• Content with the intended person as cast & crew (not as lead) that is neither popular nor
recent
These queries have intents that fit in multiple classifications. Please use the rating scale which
matches to the output content.
6 l
24 tia
78 ne
63 id
Ambiguous Multiple Classifications Rating.jpg 432 KB
nf
Co
Ambiguous - Intent Unclear
If the intent is unclear or the query is relevant to a very large set of content, the maximum rating for
the query is Excellent and only relevance ratings can be applied (no similarity ratings).
6 l
General Guideline:
• Excellent → The returned content is relevant to the query (not only by title), popular and
24 tia
recent
• Good → The returned content is relevant to the query, and either popular or recent.
• Acceptable → The returned content is relevant or somewhat relevant to the query and
neither popular nor recent.
• Off-Topic → Regardless of popularity and recency, if it is unlikely that the user would use the
query to search for the returned content or there is no relationship between the query and
the content.
• Special case: If the intent is unclear and the query incomplete, ratings cannot be higher than
good when person pages are returned, if the person has recent content and is popular.
78 ne
63 id
nf
General Guideline:
• Excellent → popular and recent content from either the intended channel or studio
• Good → popular or recent content from the intended channel or studio
6 l
• Acceptable → neither popular nor recent content that is available on the intended channel or
produced by the intended studio
24 tia
• Off-Topic → content that is not related to the intended channel or studio
78 ne
63 id
nf
Co
Channel-Studio
Rating Overview.jpg 120 KB
6 l
24 tia
78 ne
63 id
Channel-Studio Rating.jpg 561 KB
nf
Co
(2.3.7) Live Event
Note that the examples below may have already taken place in the past. Please assume that live
assets are either ongoing or upcoming.
When considering a sporting events popularity please consider multiple aspects that may affect
6 l
popularity, here are some examples of factors:
• Event: Some sports are very popular in the Olympics and not so much otherwise
24 tia
• League: An British user is likely more interested in the Premier League or Champions
League than Ligue 1 in France
• Level: The highest level professional sports are often more popular than lower leagues or
college sports
• Competition: Preseason is generally less popular than regular season events and playoffs /
championship events are often the most popular
• Sport: Some queries may point to multiple sports ("NCAA", "Sports", "Olympics"), what is the
relative popularity of events / sports that fit the query
78 n
Note that this popularity is scaled based on the possible intents left, if the query points directly to a
niche sport or event then we should not demote results because the specific sport/event is not
popular overall, however, in broader live event queries and queries where live events are a potential
e
secondary intent lower popularity live events should receive correspondingly lower ratings.
63 id
nf
Co
6 l
an “Excellent” rating.
“Perfect” rating is not available for this query type.
24 tia
Please consider the kids content guidelines as explained in the “Aspects to Consider” section above.
General Guideline:
• Excellent → The topic is prominent and important for the plot of the returned content. It is
also popular and recently released
• Good → In one scenario the topic is important for the plot of the returned content and the
content is either popular or recent. Another scenario is if the content is somewhat related
to the topic, but popular and recent
78 n• Acceptable → The content is related to the topic, but neither popular nor recent. Also content
e that is somewhat relevant to the content and recent can be considered acceptable.
63 id
nf
Co
6 l
General Guideline:
24 tia
• Excellent → the returned content is in the intended genre, popular and recent
• Good → the returned content is in the intended genre and either popular or recent
• Acceptable → the returned content is in the intended genre, but neither popular nor recent
• Off-Topic → the returned content is not in the intended genre and not matching the query in
any other way
• if the content type is defined, this must be matched for the returned content to be eligible for
a rating of acceptable or better. Content types are movies and shows.
78 ne
63 id
nf
6 l
General Guideline:
24 tia
• Excellent → content satisfies all of the functional requirements of the intent and is popular
and recent. Content type must match if specified (movie/tv show)
• Good → content satisfies all of the functional requirements of the intent and is popular or
recent. Content type must match if specified (movie/tv show)
• Acceptable → content satisfies all of the functional requirements of the intent but is neither
popular nor recent
• Off-Topic → content does not match content type, or neither of the other requirements are
satisfied
• Queries that include the string “free”
6 l
recent or popular
• Acceptable → Movies and tv shows that is originally produced in the intended language and
24 tia
is neither recent nor popular. Also content that is not originally produced in the intended
language, but is available with audio in the intended language and is recent and popular
78 ne
63 id
nf
Co
Language Rating
Overview.jpg 105 KB
6 l
Ratings for (A) Content with Setting or Theme related to the Year/Decade
24 tia
Please defer to guidelines for “Topic” queries.
78 ne• Good:
o Ultra popular show has >=50% of seasons/episodes in decade
Year-Decade Rating
Overview.jpg 66.5 KB
6 l
24 tia
78 ne
Year-Decade Rating.jpg 444 KB
63 id
nf
Co
(2.3.13) Awards
Users searching for an award show are generally interested in (1) watching nominated
movies/shows before the award event, or (2) watching movies/shows which won the most recent
award event. Recent winners should receive higher ratings, unless the query specifies a specific
edition of the award event.
6 l
Definitions:
24 tia
• If award event upcoming: Nominations have already been announced for the next upcoming
award event
• If no award event upcoming: Nominations for next upcoming award event are not announced
yet.
General Guideline:
• Perfect:
year
6 l
24 tia
78 ne
63 id
nf
Co
78 ne
63 id
nf
Co