Report Intern
Report Intern
Declaration………………………………………………………………..i
Certificate………………………………………………………………...ii
Abstract…………………………………………………………………...iii
Acknowledgment…………………………………………………………iv
List of Abbreviation……………………………….……………………...v
List of Figures…….……………………………….………………………vi
Chapter 1 Introduction
1.1 Problem Statement……………………………………………..1
1.2 Motivation………..…………………………………………….3
1.3 Objective………………….……………………………………5
1.4 Learning Outcomes…………………………………………….6
1.5 Gemini Model Integration………………………………………7
1.6 Objective of Internship…………………………………………10
References………………………………………………………………….24
CHAPTER - 1
Introduction
The problem addressed in this project is the lack of personalised, engaging, and
AI-powered travel planning tools that incorporate immersive storytelling.
Existing tools often provide basic suggestions but fail to create dynamic,
customised experiences that cater to individual preferences. This project aims to
fill that gap by developing an AI-driven travel itinerary generator that crafts
fictional stories around Points of Interest (POIs), offering an engaging and
creative perspective for travellers.
1
1.2 Motivation
The motivation for this project comes from the intersection of my personal
interests in AI, machine learning, and storytelling, with my desire to create
innovative user experiences. Traveling and exploring new places is something
that everyone aspires to, but the process of planning can often be tedious and
impersonal.
1.3 Objective
2
fi
fi
◦
Develop a machine learning model to generate ctional travel
itineraries based on user input. The model will create stories that
revolve around different POIs, offering users a rich narrative
experience. Each itinerary will consist of several sub-POIs that the
user will "visit" in a narrative sequence.
2. Integration with Liquid Galaxy:
◦
Connect the mobile app with the Liquid Galaxy rig to display the
generated itineraries as immersive visual experiences. Users will
be able to see the stories and POIs on the screens of the Liquid
Galaxy rig as it orbits around the locations. This feature aims to
transform the travel planning process into an immersive and
engaging experience.
3. Voice Interaction and Accessibility:
◦
Use Bark AI or other similar tools to integrate text-to-speech
functionality. This will allow users to hear the generated story,
making the app more interactive and accessible, especially for
users with disabilities.
4. Multi-language Support:
◦
Provide support for multiple languages so that users from various
linguistic backgrounds can access and interact with the app,
expanding the reach of the tool.
5. Customizable Itineraries:
◦
Allow users to modify itineraries based on personal preferences.
This includes reshuf ing POIs or removing certain sub-POIs from
the itinerary to suit individual interests.
6. Mobile Application Development:
3
fl
fi
fi
1.4 Learning Outcomes of the Project:
Through the course of this project, I have gained valuable technical and non-
technical learning outcomes.
Technical Skills:
• Machine Learning:
4
fi
fi
fi
fi
fi
fi
• Project Management:
5
◦ Unlike traditional rule-based systems that require prede ned
responses, Gemini uses deep learning to derive context and
meaning from a broader range of data, offering responses that feel
more intuitive and human-like.
2. Contextual Generation:
6
fi
fi
fi
fi
historical anecdotes, ctional characters, or imaginative elements
that make the itinerary feel like a story.
◦ By leveraging Gemini’s creativity, your travel application doesn’t
just suggest destinations; it transforms travel planning into a
storytelling experience, engaging the user with rich, context-
aware narratives that are personalized to their preferences.
5. Fine-Tuning and Personalization:
◦
One of the key advantages of Gemini is its ability to ne-tune its
outputs based on speci c tasks and domains. For your travel
itinerary application, Gemini can be trained or con gured to focus
on speci c types of travel-related queries and responses.
◦ For example, it can generate itineraries for particular types of
travelers—such as solo travelers, family trips, luxury vacations, or
eco-tourism enthusiasts. With appropriate ne-tuning, Gemini can
adjust its tone, style, and content to re ect the user's desires,
providing a more personalized experience.
◦ It’s also possible to integrate real-time data from external APIs
(like weather, events, or POI databases), which allows Gemini to
generate itineraries that are both imaginative and relevant to
current trends or conditions.
6. Multilingual Support:
◦
Gemini supports multiple languages, enabling your application to
cater to a global audience. In your project, this means users from
different linguistic backgrounds can request travel itineraries in
their native languages, ensuring inclusivity and accessibility.
Whether the user is based in Japan, Brazil, or France, Gemini can
generate content in a way that resonates with their cultural context
and linguistic preferences.
Gemini in Travel Application
◦ When a user interacts with the app, they provide input about their
travel preferences. This could include preferences like destination
7
fi
fi
fi
fl
fi
fi
fi
type (e.g., beach, mountain, historical city), budget, duration of
stay, or personal interests (e.g., hiking, food, art).
◦ Gemini takes this input and processes it to understand the user's
goals and interests, ensuring that it can generate an itinerary that
matches their speci cations.
2. Personalized Itinerary Generation:
◦ Once the itinerary is generated, the user can interact with it—
perhaps by adjusting the destinations or activities, or requesting
additional information on certain places. Gemini is capable of
iterating on the itinerary based on user feedback, tweaking the
travel story, adjusting recommendations, or suggesting alternate
routes that t better with the user's needs.
◦ For example, if a user prefers to avoid large crowds, Gemini could
re ne the itinerary to include off-the-beaten-path locations and
quieter, less-visited attractions.
4. Integration with Liquid Galaxy for Immersive Visualization:
8
fi
fi
fi
fi
video clips of each destination while hearing a voiceover (via text-
to-speech) narrating the story.
5. Voice Interaction:
9
fi
1.6 Objective of Internship
10
CHAPTER - 2
Introduction to Organisation
Liquid Galaxy played a central role in the development of the Fictional Travel
Itinerary Generator project. Its innovative multi-display hardware was
combined with the power of AI to create a fully immersive, AI-driven
experience. The project aimed to provide users with a personalized, interactive
journey through various Points of Interest (POIs), where they could view and
explore different sub-POIs with the help of a dynamically generated narrative.
Hardware Integration
For the Fictional Travel Itinerary Generator, the Liquid Galaxy rig was used
to display AI-generated travel itineraries and stories. The system can take users
through a series of sub-POIs while displaying relevant imagery, textual
descriptions, and even a dynamic map that updates in real time. This immersive
setup allowed for a highly engaging experience where users could explore
locations while interacting with AI-generated narratives.
12
fi
around each of these places. The AI story narrated the user’s journey,
dynamically adjusting based on user preferences, allowing them to experience
different sub-POIs in a seamless, interconnected sequence.
13
fi
fi
Generator. Using Gemini’s capabilities, the project generated personalized
travel itineraries, providing each user with a unique journey based on their
preferences and input. Gemini not only helped craft the narrative for each
destination but also personalized it based on the traveler’s interests, making
each experience truly unique.
Hardware-Software Integration
For example, if the user chose Paris as their starting point, Gemini would
generate sub-POIs such as Eiffel Tower, Arc de Triomphe, and Montmartre,
each of which would be accompanied by a paragraph of narrative text. As the
user "traveled" through each sub-POI, Liquid Galaxy’s software updated the
visuals and synchronized them with the AI-generated content, creating an
immersive and personalized tour experience.
14
CHAPTER - 3
Tools and Technologies Used
The Liquid Galaxy project is a key element of the system, as it provides the
immersive, multi-screen visualization platform on which the user experiences
the itinerary.
Multi-Screen Setup
Liquid Galaxy utilizes a multi-screen setup that allows for the display of
panoramic data across several monitors or even a 360-degree array of screens.
Typically, the rig consists of several displays, often between 3 to 7, arranged in
a semi-circular or panoramic con guration. This setup can create an immersive
experience that simulates the user being "inside" the visualized environment.
For the travel itinerary generator, Liquid Galaxy uses its multi-screen
con guration to show users an expansive, high-resolution view of the
geographic areas included in their travel itinerary. As the user moves through
15
fi
fi
different Points of Interest (POIs), Liquid Galaxy adjusts the visual display in
real time, offering a uid transition from one location to the next.
Interactive Interface
One of the standout features of the Liquid Galaxy system is its interactive
nature. It allows users to not only observe but also engage with the displayed
content. Using mouse or gesture-based controls, users can zoom in on speci c
locations, rotate the map, and click on particular landmarks to view additional
information.
When a user interacts with the system to select a Point of Interest (POI),
Liquid Galaxy adjusts the visuals to re ect the new location. The content is
shown in real-time, meaning that users can explore and interact with a dynamic
world. For example, selecting the Eiffel Tower might bring up a detailed 3D
model, while additional POIs like nearby restaurants or shops might pop up on
the visual interface.
Liquid Galaxy also integrates with geospatial platforms like Google Earth.
This allows the system to pull in high-quality satellite imagery, terrain maps,
and 3D models of locations around the world. This data serves as the foundation
for the visual representation of the user’s journey through the itinerary.
At the heart of the travel itinerary system lies Google Earth, which provides the
geographic data that powers the map-based visualizations seen in the Liquid
Galaxy system. Google Earth offers access to rich satellite imagery, terrain
data, street views, and 3D models of locations worldwide, making it an essential
component for any geographically immersive project.
Google Earth provides high-resolution images of most cities, towns, and natural
landmarks around the world, enabling users to view realistic, detailed depictions
of their chosen destinations. This allows the Fictional Travel Itinerary
Generator to pull real-time data for any Point of Interest (POI) selected by
the user, ensuring an up-to-date and accurate visualization.
16
fl
fl
fi
In addition to satellite images, Google Earth also provides KML (Keyhole
Markup Language) les, which allow users to overlay geospatial data, such as
custom markers for places of interest or speci c routes. These KML les were
utilized by Liquid Galaxy to create dynamic and responsive maps, enabling the
seamless transition between sub-POIs as users move through their journey.
One of the key features that enhances the immersive experience is Google
Earth’s Street View and 3D Model capabilities. For example, if the user is
exploring Paris, they can virtually "walk" around the Eiffel Tower using Street
View, offering a richer, more interactive experience.
For example, if a user selects the Louvre Museum as part of their travel
itinerary, Gemini may generate a description that is educational, highlighting
famous art pieces like the Mona Lisa, or it could provide a more cultural
narrative based on the user’s interests, such as historical facts or trivia. Gemini
17
fi
fi
fi
uses contextual cues, like the user's location or preferences, to adjust the tone
and content of the story.
Contextual Awareness
Gemini is also capable of adapting its responses based on the user’s previous
interactions. If the user previously showed interest in ancient history, Gemini
may tailor the narrative to highlight historical landmarks like Pompeii or the
Pyramids of Giza, offering a more in-depth exploration of those locations. This
context-aware storytelling makes the itinerary feel much more like a
personalized guidebook rather than a one-size- ts-all solution.
The magic of the Fictional Travel Itinerary Generator lies in the seamless
integration of these three core technologies: Liquid Galaxy, Google Earth, and
Gemini. Together, they form a uni ed system that offers both immersive
visualization and dynamic content generation.
18
fl
fi
fi
fi
travel experience. The integration between Liquid Galaxy's visualization
system, Google Earth's geospatial data, and Gemini’s AI-driven narratives
brings together the best of hardware, software, and arti cial intelligence to
create a truly cutting-edge travel tool.
19
fi
CHAPTER - 4
Introduction to Project
In this section, we describe the architecture and the ow of the AI model used in
the project, which integrates with Google DeepMind’s Gemini API. The Gemini
API is a cutting-edge large language model (LLM) developed by DeepMind,
and it is used to generate the ctional travel itineraries in your app.
1. User Input:
◦
The user interacts with the Flutter mobile application, providing a
Point of Interest (POI) they are interested in.
◦ The input can be a city, landmark, or tourist destination, such as
"Paris" or "Great Wall of China".
2. Data Preparation:
◦The app sends the POI and other contextual information (like travel
preferences) to the Google Gemini API.
◦ The Gemini API processes this input and generates a detailed
response, which includes:
▪ Sub-POIs: Smaller, more speci c locations within the main
POI (e.g., Eiffel Tower, Louvre Museum in Paris).
▪ Story Elements: A narrative structure is created, divided
into multiple paragraphs, each corresponding to a different
sub-POI. The story is designed to be immersive, with
engaging details about each location.
3. Story and Sub-POIs Generation:
20
fi
fi
fi
fl
◦ Once the story and sub-POIs are generated, they are formatted into
KML (Keyhole Markup Language) les, which are then sent to
the Liquid Galaxy system for display.
◦ The KML le contains geographical coordinates for the sub-POIs,
and the Liquid Galaxy system uses these coordinates to create
immersive tours of each location, showing the user a " y-over" or
3D view of the location.
5. Text-to-Speech (TTS) Integration:
◦ The generated story is then converted into audio using the Bark
AI model (integrated with the app). This TTS conversion allows
the user to listen to the story as they follow the visual tour on the
Liquid Galaxy.
◦ The Bark AI model is also capable of adjusting voice parameters
(such as pitch, speed, and accent) based on user preferences.
6. User Interaction:
◦ The user can interact with the app via speech commands or by
using touch gestures on the tablet (or mobile device). Speech-to-
text functionality powered by Flutter’s speech_to_text library
allows the app to understand voice commands to navigate between
sub-POIs, change the narrative, or ask for additional information.
◦ The story continues as each sub-POI is “toured” in sequence on the
Liquid Galaxy, with the corresponding paragraph of the story
displayed alongside it.
This section provides a detailed working ow of the entire system, from user
input to the nal output displayed on Liquid Galaxy.
◦ The user opens the Flutter mobile app, where they can either search
for a POI or select from prede ned recommendations.
◦ The home page of the app allows the user to interact via voice
commands (using the speech_to_text library) or touch gestures.
2. Interaction with Gemini API:
21
fi
fi
fi
fi
fl
fl
◦
Upon selecting a POI, the app sends the POI name and user
preferences (e.g., preferred themes for the itinerary, such as history,
adventure, or nature) to the Gemini API.
◦ Gemini processes the input and generates:
▪ Sub-POIs: Smaller locations within the POI (e.g., speci c
attractions).
▪ Story Narrative: A well-structured narrative where each
paragraph focuses on a different sub-POI.
3. KML File Generation:
◦
After the Gemini API generates the sub-POIs and story, the Flutter
app creates a KML le, which includes:
▪ Coordinates for each sub-POI.
▪ Links to images, videos, or other multimedia resources for
immersive storytelling.
4. Sending Data to Liquid Galaxy:
◦The generated story is then converted into speech using the Bark
AI model.
◦ The speech output is synchronized with the Liquid Galaxy display,
allowing users to listen to the narrative as they virtually travel
through the locations.
6. User Feedback & Customization:
◦
Users can give feedback using speech commands or gestures (e.g.,
"Tell me more about the Eiffel Tower" or "Show me a different
itinerary").
◦ The app adjusts the story, switching between different POIs or
altering the narrative to match the user’s preferences.
Working Flow Diagram
22
fi
fi
fl
fi
This ow illustrates how the different components work together to provide the
user with an immersive, personalized travel itinerary.
A use case diagram visually represents how different actors interact with the
system. In this case, the actors include the User, the Gemini API, Liquid
Galaxy, and the Bark AI model.
• User: Interacts with the app, selecting POIs and providing preferences
(either via speech or touch).
23
fl
fl
fl
• Gemini API: Generates travel itineraries, sub-POIs, and the narrative.
• Liquid Galaxy: Displays the immersive 3D tour using KML les.
• Bark AI: Converts the story into speech for an enhanced user experience.
Use Case Diagram
This diagram summarizes the interactions between the user and the system,
where the user selects the POI, receives recommendations, views the tour, and
listens to the story.
1. The user provides input to the Flutter app (either via text or voice).
2. The Flutter app sends the input to the Gemini API, which processes the
POI and generates sub-POIs and the corresponding story.
3. The app creates a KML le containing the POI data, which is then sent to
Liquid Galaxy.
24
fi
fi
4. The app uses Bark AI to generate speech for the story, which is
synchronized with the Liquid Galaxy display.
5. The user can interact with the system using speech-to-text, which
in uences the content displayed on the Liquid Galaxy or the story.
This section discusses how your GSoC project contributes to both your personal
learning and the goals of the Liquid Galaxy team.
• Personal Learning:
The project provided valuable experience in integrating cutting-edge AI
models (like Gemini), working with Liquid Galaxy for immersive
experiences, and developing a Flutter mobile app.
• Broader Utility:
The project showcases how AI and immersive technology can
revolutionize areas like travel planning, education, and virtual tourism,
providing valuable insights into AI-powered user interfaces and 3D
visualization.
25
fl
CHAPTER - 5
Conclusion and Future Scope
5.1 Conclusion
Throughout the development process, the following key objectives were met:
26
fi
fl
3. Voice and Accessibility Features: By integrating the Bark AI text-to-
speech model, the app converted the generated travel stories into realistic,
natural-sounding voiceovers. This, combined with Flutter’s speech-to-
text capabilities, offered seamless voice interactions, enhancing
accessibility for users with disabilities and providing a more interactive
experience.
Overall, the project achieved its goals of blending AI, immersive travel
experiences, and interactive storytelling into a cohesive and engaging
platform, offering users a creative and personalized way to explore new
destinations.
While the Fictional Travel Itinerary Generator has met its core objectives,
there are several areas where the project can be expanded or improved in the
future:
• More Advanced AI Models: The Gemini API was a powerful tool for
generating stories; however, further advancements in natural language
processing (NLP) and machine learning could allow for more
sophisticated and nuanced story generation. Integrating additional
machine learning models could enhance the ability of the system to
understand deeper user preferences and generate even more personalized
itineraries.
27
fl
• User Behavior Learning: The app could use feedback and data from
previous user interactions to learn about their preferences over time. This
would enable more accurate story generation based on their past choices,
such as preferred travel themes, destinations, and sub-POIs.
2. Augmented Reality (AR) Integration
• Live Travel Data: Future versions of the app could incorporate real-time
data, such as current weather conditions, local events, and updated points
28
fi
fi
of interest, to provide users with dynamic itineraries. For instance, users
could receive travel suggestions based on the current weather or season,
or they could be alerted about special events occurring at a given POI.
• AI-Powered Dynamic Itineraries: Incorporating real-time AI
algorithms that dynamically adjust the travel itinerary based on user
preferences, availability of sub-POIs, and external factors (such as
weather or traf c) could improve the experience and make the app more
practical for real-world use.
6. Expanding Liquid Galaxy Integration
References
29
fi
fi
• DeepMind. (2023). Gemini API. Google Cloud. Retrieved from: https://
cloud.google.com/ai
• This API provided the natural language processing and story generation capabilities
for the app, generating personalized travel stories based on the input from the users.
[4] Bark AI
• OGC. (2008). KML 2.2 - Keyhole Markup Language Speci cation. Open
Geospatial Consortium. Retrieved from: https://www.opengeospatial.org/standards/
kml
30
fl
fl
fi
fl
• KML was used to structure the location data that was transferred to the Liquid
Galaxy system for displaying interactive, 3D representations of sub-POIs.
• Smith, R., & Johnson, T. (2022). AI in Tourism: How AI is Shaping the Future of
Travel Recommendations. Tourism Technology Review, 15(4), 25-32.
• This paper explores the use of AI and machine learning algorithms to generate
personalized travel itineraries and recommendations based on user preferences, which
inspired the recommendation system in this project.
• Brown, T. B., Mann, B., & Ryder, N. (2020). Language Models are Few-Shot
Learners. Proceedings of NeurIPS 2020. Retrieved from: https://arxiv.org/abs/
2005.14165
• This paper discusses the underlying technology of large language models (LLMs) like
Google DeepMind’s Gemini used for generating coherent and engaging stories based
on a given prompt.
31
fi