Project Proposal - Three Archives
Project Proposal - Three Archives
1. PROJECT DESCRIPTION
The Centre for Curating the Archive (CCA) of the University of Cape Town is re-
sponsible for the collection, curation and digitisation of various collections. The CCA
makes these collections accessible to artists, scholars, students and other community
members by providing Web access, publications and hosting events and exhibitions to
showcase the materials conserved 1 . Their collections include collections comprising
artefacts and multimedia centred around three distinct historical events occurring in
Cape Town. The Sequins, Self and Struggle archive, a collection containing multime-
dia objects from the Miss Gay Western Cape and Spring Queen beauty pageants 2 ;
the Harfield Village collection, an aggregation of artefacts about the forced removals
of the Claremont residents; Movie Snaps, a collection of photographs taken in and
around central Cape Town before and after apartheid 3 .
The CCA has successfully digitised the items contained in these archives and
has stored the artefacts on local hard drives thus the information is inaccessible
to scholars, artists, researchers and community members outside of the CCA. The
Miss Gay Western Cape and Spring Queen archive has an existing online platform,
however, the information is not presented in a manner deemed usable by the CCA.
The importance of the solution lies in the necessity to digitally preserve the cul-
tural heritage presented in these archives, encourage users to add information
thereby growing the collections and to increase the accessibility of the information
contained. Issues currently being experienced involve the inaccessibility of the infor-
mation in the archives, the manual management of the archived material and lack of
exposure of the archives.
The problem to be solved is the need for an online representation of the multimedia
files pertaining to the aforementioned archives. The solution is the development
and implementation of a digital cultural heritage archive to allow for the storage,
management and access of information representing the cultural heritage of minority
groups in Cape Town.
2. PROBLEM STATEMENT
The aim of the project is to provide the CCA with a digital cultural heritage archive
solution that will provide access to information that has been centrally stored and is
currently inaccessible to society members outside of the CCA.
1 http://www.cca.uct.ac.za/
2 http://sequins-self-and-struggle.com/
3 http://www.cca.uct.ac.za/projects/movie-snaps-capetown-remembers-differently/
2
allow for the creation and management of additional digital heritage archives by the
CCA.
2.1. Requirements
The Three Archives project has a client, the CCA. Potential users of the system
are researchers, artists, scholars, historians and interested members of the general
public. Below is an outline of the most important requirements to be investigated and
implemented in the software solution to satisfy the needs of the CCA and the users of
the digital archive.
The solution is to allow for the access and exploration of archived heritage arte-
facts via search and browse functionality. The solution will also allow community
members the ability to contribute to the archive resulting in an archive rich in content.
The contribution will be through commenting on and captioning items, as well as
uploading multimedia content to the archive.
The system will allow the client to manage the archives by providing function-
ality to upload data to the archive, edit metadata accompanying the artefacts, and
approve any submissions made by users. Together with the management functionality,
the solution will provide information to the client about where in the world the digital
heritage archive is being accessed from in order to obtain knowledge about the global
reach of the archives.
Over and above the requirements stated, functionality to be provided to the client
includes the ability to create new archives, post implementation, without the need to
implement an entirely new solution.
2.2. Project Scope
The scope of the project does not include the digitisation and preservation of digital
objects. The purpose of preservation is to protect digital objects for access by present
and future generations. The long term preservation of digital objects involves making
sustainable technological decisions for the implementation of the system. It will not
be considered as the core requirement of the project is to create services that allows
users to interact with the digital objects.
Users
Curator
Student/Researcher
General public interested in Cape Towns heritage
4.1.1. Interface Layer. The interface layer is responsible for linking users to the services
offered by the archive. Given that the three archives are distinct in the context of the
content they represent, three different interfaces will be implemented to allow for user
interaction with the system.
4
4.1.2. Service Layer. The service layer connects the front-end interface to the data
objects located in the back-end. The service layer contains the functionality that is
available to the user to retrieve the information from the repositories.
The table below lists and describes the services to be implemented in the service layer.
4.1.3. Repository layer (Back-end). This layer involves the storage and organisation of
the digital objects. The implementation of the repository layer will involve the user
of an archival tool. This is further discussed in the development platform section to
follow. The artefacts will be separately stored in a database. Throughout the duration
of the project the system will be designed in a manner that will allow for the creation
of new archives.
5
The repository layer will be implemented using the open source Fedora reposi-
tory tool. DSpace and Omeka are other digital object management tools which offer
additional services that Fedora does not offer. The additional tools will be investigated
in order to obtain understanding of how they have chosen to implement services to
be made available in the three archives project. The Fedora tool accommodates for
complex digital objects and allows for customisation and flexibility. Fedora provides
basic search and browse services which are exposed as web services and allows
for the introduction of new services. Fedora has been implemented using the Java
programming language and any additions to be made will be in Java. The database
used in Fedora is PostgreSQL 5 which will be used for the Three Archives project.
The implementation of the Three Archives’ services will make use of existing
third party software where applicable. Table II is a description of the implementation
strategies to be adopted for each service.
4 http://getbootstrap.com/2.3.2/
5 http://www.postgresql.org/about/
6
Roles and Responsibilities Each team member will have a specific role where
they have specific duties pertaining to the project. Many of the roles will be shared by
all the team members as is indicated in Table III.
7
Given that the Three Archives project team is small and given the time constraints,
a specific Agile project management methodology will not be followed but rather a
combination of Agile concepts will be considered throughout the implementation of
the Three Archives project. Principles to be adhered to are the iterative development
process, testing will be coducted throughout the project at each iteration forms of
which will be discussed in the evaluation section of the document, and constant client
interaction to ensure satisfaction with the product. Additionally, at the beginning of
each iteration, the tasks will be well defined and difficulty and time will be assigned
to each task - this will be kept track of during the development process. The team
will have daily stand-ups before beginning their tasks for the day to communicate to
the team the progress of their tasks and what they intend on achieving for the day.
Sprint reviews will be held at the end of every sprint to discuss the challenges during
the sprint and how things can be improved for the next. Features will developed in
priority order. All required software engineering deliverables like time sheets and
meeting minutes will be produced during the project and will be handed in with the
final deliverable.
Agile also lends itself to the development and specification of new requirements
throughout the development process which is fitting for the Three Archives project as
the general idea as to which services are required is present, however, the detail is still
unknown and adopting Agile will allow for sufficient exploration. Each iteration will
enforce the Agile Development Cycle of analysis, development, testing and evaluation.
This Reduces risk, and increases value by delivering some benefits early, results in
more flexibility, and better time management.
4.3.2. User Centered Design. The client and users will be closely involved through-
out the design process. The Three Archives project involves the design of three
distinct interfaces for the three collections being represented. The users’ involve-
ment throughout the process is necessary to ensure that the system is usable and
understandable and that all the services that will be implemented are well understood.
The user centered design cycle is iterative process and involves an initial evalu-
ation phase, followed by design and then a prototyping phase. The evaluation phase
involves the understanding of the different users of the system and the tasks they
intend on completing using the system. This is coupled with an evaluation of how the
users are currently completing these tasks in order to understand how to improve
their experience. The evaluation process is followed by a design process where we will
design the system taking into consideration knowledge obtained from the evaluation
session. The design phase will be followed by prototyping, where a prototype of the
design will be implemented and presented to the users for evaluation. Different
levels of prototyping will be adopted throughout the design process dependent on
which prototype fidelity provides the best feedback from the users. Feedback from
the evaluation of the prototypes will be taken into consideration during the next
iteration of the design cycle where improvements will be discussed, designed and then
prototyped.
This iterative process will take place until the team is satisfied with the users’
interaction with the system and the users’ navigate and interact with the system
intuitively and effortlessly. The CCA will be contacted to find users.
4.3.3. Expected Challenges. We have yet to receive any data from our client and
unaware of where this data is kept. We expect it to be challenging to acquire this data
from our client.
The client is very enthusiastic and often forgets that this is an honours project
for Computer Science. So a challenge we may have already run into is meeting our
clients expectations.
There are a number of services that we aim to provide that the archival tools do
not provide. We expect it to be challenging to offer the specific services the way we
intend. Using a variety of external tools raises the challenge of proper intergration.
The archive includes both sound and video media files besides images. We ex-
pect that to be challenging to deal with different types of media files and note that
some services may work for some types of media types and some not, as well as extra
precautions we may to take to allow for extra multimedia capabilities.
One of the biggest challenges we believe this project will face is providing a sys-
tem that wil give users the ability to be creative and designing the interface, features
and funtionality in such a way where the user is aware of the capabilities.
9
4.4. Evaluation
We will use Software engineering metrics to evaluate our system. The outcome will be
measured on both an application level as well as a project level.
On a project level calculating the cost of project as well as the time spent on
each task, whether it went overtime, whether some functionality had to be reduced
due to time constraints or whether there was over allocation of time to a specific task.
Then the overall time and cost of the project
We will be using user oriented, performance based and requirement based eval-
uation. Using usability metrics like Learnability, recovery from errors, ease of use etc.
we will evaluate the system using the speed and accuracy of results. We also want to
measure whether the end product met the clients requirements, keeping in mind that
this is still an honours project.
Several tests will be done with typical users of the system. Each test will be
categorised by the user behaviour towards the system being tested, they will be
evaluated against our expected outcomes to specific inputs.
Other tests that will be conducted include unit tests, usability tests , acceptance
tests and integration tests.
5.1. Testing
Ethical clearance is to be obtained as user testing will be conducted. It is necessary
to obtain this timeously and before testing occurs in order to avoid any delays and
issues that may arise. The ethical clearance will be accompanied by a form for the
users to sign that ensures their confidentiality and anonymity in participation in the
tests. Users representative of the client from the CCA and community members will
be sourced for the testing of the system. Users will consent to the observation of their
actions throughout the testing session and will be asked to provide feedback after the
testing session.
5.2. Software
Software to be used in the development of the solution is all open source. The open
source software will be utilised according to the terms specified. These are the tools
that have been discussed in the Development Platform section above. Additionally, the
solution will use services provided by third party software such as Google Analytics
and will abide by the terms stipulated in the agreement of use of this software.
10
5.3. Data
The multimedia data to be presented on the digital cultural heritage archive solution
will be sourced from the CCA. The digital archive will stipulate via the terms of use
clause, to what capacity and in what manner the content of the archive can be used.
This information will be obtained from the CCA and will be presented to the users.
There is no obligation of the users to use the material as stipulated, however, full
disclosure of the CCA’s terms is the measure to be taken by the Three Archives project.
The developed tool will be the intellectual property of Nicole Petersen, Noosrat
Hossain, Noxolo Mthimkulu; and the University of Cape Town.
6. RELATED WORK
This section outlines example digital archives that are related to the Three Archives
project. The architectural implementation decisions, services offered and content are
elements which make these archives related. Below is a brief discussion of what was
notable in each implementation and which factors about the collections will be consid-
ered during the development of the Three Archives solution.
6.1. Zamani Data Archive Project
The Zamani Data Archive Project 7 is a project that was completed in 2014 by
students from the University of Cape town. The project involved the implementation
of a digital data archive for the Zamani Project8 . The relevance of this project to the
Three Archives project is the back-end implementation. The project was implemented
using the Fedora9 framework as the repository layer. Fedora, as discussed above, is
an extensible digital content repository service providing services for the storage,
management and distribution of digital objects [Lagoze et al. 2006]. The Fedora
repository architecture focuses on the object model which are templates for data
objects and links to tools and services for managing these data objects. [Staples et al.
2003].
The Zamani Data Archive project used Fedora to store their dataset. The dataset was
represented using the Fedora Object Extensible Mark-up Language (FOXML) which
is an expression of the Fedora Digital Object Model. This would be relevant to the way
in which the Three Archives project can store the data objects. Using Fedora digital
objects will ensure that the objects use the Dublin Core metadata standard which will
allow efficient object management. Zamani made use of the isMemeberOf relationship
functionality provided by Fedora to assist in the grouping and association of the data
objects as well as the SOLR platform to assist in the indexing, searching and browsing
functionality.
7 http://pubs.cs.uct.ac.za/honsproj/cgi-bin/view/2014/benson ferguson.zip/
8 http://www.zamaniproject.org/
9 http://fedorarepository.org/
10 http://archive.nelsonmandela.org/home
11
user to conduct a side-by-side comparison of two items in the archive as per Figure 2.
The exhibition functionality provides a compilation of related items as decided by cu-
rators of the archive. The exhibitions are well captioned with descriptions in order to
guide the user when viewing the exhibition. An example of this can be seen in Figure
3. This image represents an exhibition of all items grouped under the Nelson Mandela
presidential years theme.
Fig. 2. Artefact comparison page from Nelson Mandela Cultural Heritage Archive
Fig. 3. 1994-1999 Nelson Mandela Presidential Years exhibition from Nelson Mandela Digital Cultural
Heritage Archive website. Here we see a compilation of images of interactions and diary entries relevant to
the exhibition title. 12
12
6.3. Europeana
Europeana13 is a digital library that provides access to multimedia material located in
digital libraries, museums and archives across Europe. The digital library, in addition
to allowing for search and browse functionality to explore the archive, provides the
users with virtual exhibitions that are themed dependent on what is selected by the
user. Along with this they provide an aggregation of the latest contributions of all the
different museums, libraries and archives that Europeana represents.
The Archive houses text, images, video, sound and virtual 3D representations.
Europeana personalises the digital archive experience by allowing a user to save the
search that they have conducted, allowing the user to add a tag to items and to store
the items for later view.
During the exploration of collections, users have the option to include content
which was contributed by other users. Europeana, thus, also allows a contribution
mechanism which is observed in their Europeana 1914-1918 collection14 as seen
in figure 4. The contribution is done by signing up to the site, adding information
about the contribution, attaching a digital version of the object and submitting the
contribution. Europeana then reviews the story before it is accepted and published15 .
Fig. 4. Europeana 1914-1918 project - collection of stories, films and historical material about the First
World War. Allows contribution
The elements discussed about the Nelson Mandela Cultural heritage archive, Eu-
ropeana Digital Library and the Zamani data archive will be considered and incorpo-
rated in the implementation of the Three Archives project.
13 http://www.europeana.eu/portal/
14 http://www.europeana1914-1918.eu/en/contributor
15 http://www.europeana1914-1918.eu/en/contributor
13
7. ANTICIPATED OUTCOMES
The identification of anticipated outcomes of the project will allow us to evaluate the
progress of the system. The outcomes include the implemented system functionality,
implementation challenges and success factors.
7.2. Challenges
Design challenges that may arise originate from the data that is currently available
and the tool selected for the foundation of the system. Missing metadata fields may
result in inconsistent data which will affect the results represented to the user. It is
therefore required that the data be curated. Another design challenge involving data
is the size of elements that need to be uploaded. The database must be designed to
handle large files while maintaining their quality.
The restrictive nature of repository tools available may result in the inability to
customise services based on the requirements of the Three Archives project.
Additional difficulties that may be experienced involve the integration of all the
tools to be used as the implementation of the project involves the use of various
existing tools. Together with the integration, various difficulties are expected during
the implementation of the actual services of the archive. These difficulties include
presentation of an archive to a creative audience, the integration of sensible workflows
in order to allow users to use services such as exhibitions and the annotations,
the interpretation of the statistics and history of the archives in order to provide a
personalised user experience and the implementation of a fluid browsing interface.
14
8. PROJECT PLAN
The following section describes the life cycle of the project. The description involves an
identification of potential risks, resources required, deliverables, milestones and work
allocation.
8.1. Risks Matrix
The scope of Team members Regular meetings Monitor the Change the sched-
the project will not have time will be conducted project scope ule if the scope is
changes last to implement the with the project changed
minute changes supervisor and
client.
Development
Difficulties The development Ensure tasks are Ensure that the Adjust the project
in the devel- of other tasks assigned effec- tasks are be- scope or adjust the
opment of a which depend on tively and take ing carried out project schedule.
major task . that task will be into account task on schedule by
delayed dependencies having weekly
when making this progress meetings
decision
Stakeholder
Stakeholders Schedule delays Agree on days No more than Discuss the issue
such as the may arise if the when the stake- 2 consecutive with the project
project su- project cannot holders will be meetings should client/supervisor
pervisor continue with- available. be missed with to either adjust
or client out consulting stakeholders due time or services
become the respective to unavailability offered
unavailable stakeholder.
Communication
Lack of Project tasks Team members Team members A meeting must
commu- might overlap will meet regu- must contact and be held to discuss
nication while others larly. Make use of update each other the current com-
between might be over- Google drive to en- at least once in 36 munication meth-
team mem- looked. This may sure all resources hours range. ods and make sug-
bers lead to confu- are accessible to gestions on adopt-
sion and conflict team members ing new ones
between team
members result-
ing in the failure
to meet the needs
of the project.
Lack of The needs of the Ensure that there Be able to commu- Discuss the issue
communi- client may not be are communica- nicate with stake- with the project
cation with met. tion links between holders at least client to either ad-
stakehold- stakeholders and once in a week. just time or ser-
ers project team at vices offered
the early stage
of the project.
Agree on the pre-
ferred method of
communication
Resources
16
Unable to The results of the Ensure users are Ensure users will Find alternative
obtain users user tests will be available to test be available for users to test the
that re- unreliable the system when testing a week system with
flect actual the testing phase before the test-
real world is scheduled ing session is
users of the scheduled
system for
testing.
Delays in The solution will Design database Create dummy
receiving not be able to in such a way that data to populate
data from be tested due it only requires the database.
the client to the database information that
having insuf- is available and
ficient/missing accessible.
information.
The tool The system will Ensure that the Have progress Adjust the ser-
selected not meet the phase of exploring meetings with the vices to be offered
to develop needs of the user the various tools is team to discuss or select another
the sys- or the client. done effectively. any difficulties tool if there is
tem is too arising from the sufficient time
restrictive tool.
Team mem- Scheduled time Ensure work Have regular Tasks can be reas-
bers drop- will be lost re- is allocated in team meetings signed in the team
ping the sulting in the such a way that to check whether if this were to hap-
course or project being it reduces task a team member pen.
being sick late. The work dependencies is delay another
of other team between team team member’s
members could members. work.
also be delayed if
they depend on
the missing team
member.
Table IV: Risks Matrix
The content required to populate the database is information from the three
archives previously mentioned.
The users that are required to test the system include researchers/scholars, cu-
17
rators and the general public interested in Cape Towns cultural heritage.
Three tools will be considered to provide the foundation of the system. Possible
tools includes Fedora, DSpace and Omeka.
8.4. Deliverables
— Initial Feasibility Demonstration
— First Implementation/Experiment/Performance Test + Writeup
— Final Prototype/Experiment/Performance Test + Writeup
— Chapters on Implementation and Testing
— Final Complete Draft of Report
— Project Report Final Submission
— Project Poster
— Project Report Final Submission
— Poster
— Website
— Reflection Paper
8.5. Milestones
8.5.1. Project milestones
During the first iteration all team members are expected to explore different
repository tools that will form the foundation of the digital archive. Thereafter, each
team member will implement their respective services. During the third iteration no
implementation of new services are expected to take place as this iteration improves
the functionality implemented during the first two iterations.
18
3 Refine services
REFERENCES
Carl Lagoze, Sandy Payette, Edwin Shin, and Chris Wilper. 2006. Fedora: an architecture for complex objects
and their relationships. International Journal on Digital Libraries 6, 2 (2006), 124–138.
Thornton Staples, Ross Wayland, and Sandra Payette. 2003. The Fedora Project. D-Lib Magazine 9, 4 (2003),
1082–9873.
APPENDIX A