Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
95 views13 pages

Book Form

only for students

Uploaded by

Alekh Karadia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views13 pages

Book Form

only for students

Uploaded by

Alekh Karadia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056

Volume: 03 Issue: 04 | Apr-2016 www.irjet.net p-ISSN: 2395-0072

System and Process for Data Transformation and Migration from


Libsys to Koha

Mukesh Pund1, Parul Jain2


1Principal Scientist, IT Division, CSIR-NISCAIR ,14, Satsang Vihar Marg, New Delhi - 110067, INDIA
2Senior Project Fellow, IT Division, CSIR-NISCAIR ,14, Satsang Vihar Marg, New Delhi - 110067, INDIA

Abstract—The purpose for this paper is to explain the becoming very popular now days in the digital libraries
transformation and migration process from Libsys to across the globe. According to a survey, satisfaction
Koha- an open source library management software. Open ratings on Koha ’s performance on some aspects found
source is a development methodology, which offers “good” and value for money. The use of OSS has
practical accessibility to a product’s source. Koha being an tremendously lower down the initial cost of setting up
open source software is cost effective i.e. freely available the libraries and improves flexibility in delivery of
and is customizable according to one’s requirements as services to a greater extent. This is the reason for what
compared to libsys. Free/open source software Koha is an the number of researchers and librarians are interested
economical alternative to reliance upon commercially and continuously working on the implementation of OSS
supplied software libsys. So to migrate from libsys to Koha, in digital libraries.[2]
the source data is being transformed into the target
format. The paper discusses various steps for 2. WHAT IS KOHA?
accomplishment of task and the benefits of exploiting
Koha over Libsys.
KOHA is the world’s first free and open source library
Keywords- open source , library management, management software that is being implemented in
linux, marcedit , mysql, transformation, migration, digital libraries. By open source software we are meant
Z39.50 protocol, marc21, libsys, Koha to say that the source code of software is freely available
and it can be modified, customized or redistributed
1. INTRODUCTION according to the person’s requirement. As with the
enhancement in technology, the need pops up for
Data migration is an emerging field nowadays because compliant replacement of existing library system and
with the advancement in technology, the need grows to provides the user the ability to receive free software,
exploit the newer technologies instead of the older ones. customize and redistribute for the benefits of whole
The newer systems contain advanced features compared community. Also the library system should be advanced
to already existing systems. Hence migration from an to meet the present scenario needs. So, in the year 1999,
existing system to a new one is the need of the hour. Katipo Communications proposed a new system,
Data Migration is a process of transferring data from one KOHA(the Maori word for “gift” or “donation”) which
system to another and it is divided into two processes: was the first’s open source Integrated Library
(a) extracting data from an existing system into an Automation Package (ILAP) using open-source tools to
extracted file and (b) loading data from extracted file be released under the general public licence (GPL) and
into the new application. The new application usually installed at Horowhenua Library Trust (HLT) in New
requires data in a different format, hence transformation Zealand, in the year 2000.
of data is required for successful migration. The data
transformation is the process of transforming data from
one format to another and is a mandatory step in data
2.1 Technical Features:
migration as the architecture of target system may be
different from source system[1].In this paper, we are  The current version is Koha-3.22 .It runs on
discussing the transformation and migration process different platforms, including Linux, MacOSx,
from LIBSYS to KOHA . LIBSYS is a proprietary software FreeBSD, Solaris, and Windows.[3]
product aiming most convenient and pleasing library  Developed on the Linux OS, Koha is written in
experience through its value added features.[12] KOHA on Perl, uses the Apache web server, and has better
the other hand is an open source library management support for multi-RDBMS like MySQL,
software. The use of OSS i.e. open source software is PostgreSQL.[3]

© 2016, IRJET ISO 9001:2008 Certified Journal Page 690


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 04 | Apr-2016 www.irjet.net p-ISSN: 2395-0072

 The Online Public Access Catalog(OPAC)  User Management: Koha manages users by
interface is in CSS with XHTML. It supports all providing integration with systems like
major library standards such as MARC record Lightweight Directory Access Protocol (LDAP) ,
import/export (MARC 21), Z39.50 and Radius, Central Authentication Service (CAS) to
SRU/Wfeature. allow single sign-on
 Records are stored internally in an SGML-like
format and can be retrieved in MARCXML, 2.3 Koha Modules:
Dublin Core, OAI-DC, and Endnote; and the
OPAC can be used by citation tools such as Koha includes various modules to provide tremendous
Zotero[3]. support to its users to enhance its functionalities. It
includes:
2.2 Key Features:
 ACQUISITION: Koha’s acquisition module holds
 Full-featured ILS : Koha is a true enterprise- suggestions, budgets, invoices, funds, currencies.
class ILS with comprehensive functionality  ADMINISTRATION: It is an exclusive module of
including basic and advanced features for Koha that enable users to change global system
customization of software according to a preferences and other parameters in various
person’s requirement. Koha will work for aspects to provide better customizability.
consortia of all sizes, multi-branch, and single-  CIRCULATION: Koha includes a fully featured
branch libraries. circulation module with circulation rules that
 Multilingual and translatable: Koha has a are customizable to meet needs of user. It
large number of languages with enhancement includes checking in and out of books. It also
and translation in various available languages. grants offline circulation feature.
 Full text searching: Koha supports powerful  CATALOGING: Koha provides cataloguing
searching, and an enhanced catalogue display features to its users that enable them to search
that can fetch data from Amazon , Google ,etc. It migrated data both for books and serials, amend
uses zebra search engine i.e. Z39.50 server and already existing records ,add a new record in
client to enhance search ability, data any framework (default or created by user) and
interchange and import data from Library of fetch from external sources if required.
congress.
 Web-based Interfaces: KOHA’s OPAC are all
based on worldwide technologies – XHTML, CSS, SEARCH
javascript etc. making it a platform independent CATALOG

solution.  EDIT RECORD


 Attach Files to Records: Koha's new feature to FOUND
 EDIT ITEM
YES
attach files to records provides the functionality 

COPY RECORD
DELETE RECORD
to upload documents in text, pdf or image
NO
format along with metadata.
 No Vendor Lock-in: It is an important aspect of RECORD NOT
KOHA as libraries can freely install it if they EXIST

have the in-house expertise to purchase support


or development of services from best available
resources or to change support company at any SEARCH FROM IMPORT FROM
time if found unsatisfactory. EXTERNAL
YES
LIBRARY OF
CONGRESS OR
 New Templates: Koha’s spine labels, barcode
SOURCES
OTHER LIBRARIES
labels, staff and patron interfaces are developed
with a template system that’s easy to theme. The NO
default templates are also provided that
ORIGINAL
compose of 100% valid XHTML and CSS that can CATALOGUING
be customized.
 Item Types: The module is self explanatory as Fig -1: Cataloguing Flowchart
there are various types of items present in Koha
and it gives the functionality to create the same  PATRONS: It enables us to create patrons who
so as to provide an attractive front end to users. exploits circulation module.
It can also be used to manage inventory such as
cameras, computers, etc.

© 2016, IRJET ISO 9001:2008 Certified Journal Page 691


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 04 | Apr-2016 www.irjet.net p-ISSN: 2395-0072

 REPORTS: This module provides users the


ability to query the data stored in database and
generate various reports accordingly. Web Browser Web Browser Web Browser Z39.50 Server

 SERIALS: The Serials module in Koha is used for


MARC Record
keeping track of journals, newspapers and other Repository
items that come on a regular schedule.[11]
 TOOLS: Tools in Koha perform some sort of Library Staff Internet Patrons Other Libraries

action like notices, slips, patron cards, Batch


item management, Records import and NETWORK

management, Calendar, Task Scheduler, etc. KOHA


Machine KOHA
HTTP
(Apache) EQUIPPED
Z39.50

3. SYSTEM ARCHITECTURE WE Client tool LIBRARY

FOLLOWED:
MYSQL
Database MARCEDIT
TOOL

Fig -2: Koha System Architecture

4. WHY KOHA?
S.NO. CHARACTERISTICS LIBSYS KOHA
1. Nature of Commercial Open source i.e. FREE of cost
developing
organization
2. Ownership Libsys Katipo communications
3. License Commercial Under GPL General Public License
4. Price In Lacs Freely available and free support
5. Customization Libsys charge users to source code is freely available for innovation to
provide customized solutions provide new features at users end. New versions
[12] are added freely.
6. Training manual No system manual is YES, manual includes everything for user
provided to users except user convenience[4]
manual to get AMC[4]
7. Database Software can be used either MYSQL dual database design (Text based and
with with SQL Server, RDBMS). Scalable enough to meet the transaction
ORACLE or MYSQL as a load of library. [4]
backend RDBMS with ODBC
compatibility[4]
8. Support Costly on the basis of Online support and discussion forums free of cost.
AMC(annual maintenance No human ware for this purpose. Open and constant
contract) usually 10 to 20% dialogue with developers.[4]
of total costs[4]
9. Vendor Lock –in Restrictions – can ask for No restrictions , no set term contracts on changing
support only from particular support
vendor
10. Addition of new Charge extra cost to upgrade Very frequently new versions are coming and added
features/new to new version or add new for free[4]
version features [4]
11. Web Server Only Apache and IIS Apache, IIS and others[4]

© 2016, IRJET ISO 9001:2008 Certified Journal Page 692


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 04 | Apr-2016 www.irjet.net p-ISSN: 2395-0072

5. DATA TRANSFORMATION Also the data contains various blank lines, unwanted
content after every record, one record may be separated
The transformation of data is a necessary step in data in different lines and one record may be repeated twice.
migration as the target format may have a different So it is required to remove all flaws like duplicacy and
system architecture which is differentiable from the consolidate the different data into the desired format.
previous one. It includes data collection, combination, The following snapshot will provide you a clearer
filtration, reformat and so on. It is necessary to find an version of the source data:
efficient and effective method for the same so as to
improve quality of data. One of the solutions we have
undergone for transformation of data is as follows:

5.1 Data In Source Format

The data we have in Libsys is in the format of text


file. We have multiple files with accession number as
a mandatory field along with other fields:

File 1:

a. Fig -3: Source Data Format


Accession number(barcode) – used to
uniquely identify a book Here we have multiple files in the same format as above.
b. The first target here is to bring the data in such a format
Title of the book that would be legible and easy to understand. Also the
c. data in various files must be accumulated in a single file
Publisher name and place of publication for migration.
separated by any delimiter for
So question arises is HOW TO TRANSFORM?
identification
One of the solutions we have worked to accomplish the
File 2:
task:
a.
Accession number 5.2 Transformation
b.
Volume of book and year of publication It is required that the transformation process should be
separated by another delimiter for simple and effective. Each received file is sorted
identification separately and then they are merged afterwards. The
c. Steps followed to sort out the data:
Author of book 1.
Bring the source data in MS –Excel for further
File 3: processing. Here we have chosen Microsoft
Visual Basic (VBA in MS-EXCEL ) to process
a.
data.
Accession number
b. a.
Classification number Go To the received file and open with
c. Microsoft Office excel
Pages, Edition of book separated by Now here we have different fields in
some another delimiter and so on. different columns like accession number in
first column, title in another and so on. Also

© 2016, IRJET ISO 9001:2008 Certified Journal Page 693


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 04 | Apr-2016 www.irjet.net p-ISSN: 2395-0072

we may have multiple fields in same The following figure explains the procedure:
column. So we can use MS-EXCEL
functionalities as well as code in VBA for Received text file
processing our data.
b.
Remove unwanted stuff,
At first we will use “Text to Columns in blank lines; bring record in

Data Tab” functionality of excel. We have multiple lines into 1 line

two options here: fixed width and delimited.


Fixed width: It is used when we have two Open received
fields in one column and they are separated file in Microsoft
excel
by a fixed width. So here we will select that
column, use this fixed width option and Process source data either
separate the two fields by a certain width by fixed width or delimited
or by applying VBA code
and then Click on “OK”. This will separate
the two fields in two columns as desired. Final
Delimited: It is used when we have two or processed
file
more fields in one column separated by
Fig -4: Data Transformation Flowchart
some delimiter say by comma or semicolon.
So here also we will select that column, use 2. Now data in snapshot is assumed to explain the
delimited option and specify that delimiter. procedure for carrying out the task:
It will give a preview of fields in different
columns as desired. Clicking on “OK” will a.
give the required output in excel. Open the files with Microsoft office excel.
c. b.
But there are also some flaws, like if the Remove the unwanted content by following
source data contains fields say title of book, algorithm:
publisher name separated by delimiter say Algorithm for removing errors
comma .So there may be a possibility that
title may also have comma in it. If we apply Step 1: Start
delimited method here, it will separate from Step 2: Declare variables iRow, LastRow .
every position wherever it will witness a Step 3: Initialize variables
comma. So part of title will also be iRow = 1
separated in multiple columns along with LastRow =
publisher name .In that case, delimited is ActiveSheet.UsedRange.Rows.Count
not the efficient way to separate. Here Step 4: Repeats the steps until iRow =
programming in VBA in MS-Excel will help LastRow
to have the desired output. 4.1 If data in cell of iRowth row and 1st
d. column contains text “NISCAIR” or
Press Alt + F11 to go to window where “Date :” or “Accn” or “---” , then
programming needs to be done. Here the Delete that iRow
programs created for solving problems are 4.2 iRow  iRow + 1
called as MACROS. Step 5: Stop

c.
Row all errors are removed. Then select
first column and by fixed width option in
text to columns functionality, separate the
accession number and title in different
fields. It is required to remove all the blank

© 2016, IRJET ISO 9001:2008 Certified Journal Page 694


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 04 | Apr-2016 www.irjet.net p-ISSN: 2395-0072

lines so we created another macro for this Step 4: Repeats the steps until iRow =
task: LastRow
Algorithm for removing blanks 4.1 If data in cell of iRowth row and 2nd
Step 1, Step 2 ,Step 3 and step 5 are same column contain double quotes as a
as above algorithm symbol of repetition, then
Step 4: Repeats the steps until iRow = Data in cell of (iRow-1)th row and
LastRow 2nd column come in place of iRowth
4.1 If data in cells of iRowth row and 1st row and 2nd column
2nd ,3rd ,4th ,5th columns are blank 4.2 iRow  iRow + 1
then Delete that iRow
4.2 If data in cells of iRowth row and 1st f.
column is not blank but iRowth row We have some data sorted now but a
and 2nd, 3rd, 4th, 5th columns column with multiple fields separated by
are blanks ,then Delete that iRow delimiter is not yet sorted. Here in the data,
4.3 iRow  iRow + 1 we have 'year' separated by comma(,) ;
'publisher' by (:) ; and place by (--). So it is
d. reqAuired to create macro for separating
All blank lines are now removed. Some titles them.
are divided in multiple lines so it is required Algorithm for using delimiter to separate
to bring them into a single line. For this we using macro
have created another macro.
Algorithm for merging multi row records Step 1: Start
Step 2: Declare variables iRow, LastRow ,
Step 1, Step 2 , Step 3 and step 5 are same pos, str, le.
as above algorithm Step 3: Initialize variables
Step 4: Repeats the steps until iRow = iRow = 1
LastRow LastRow =
4.1 If 1st column corresponding to ActiveSheet.UsedRange.Rows.Count
(iRow + 1)th row is blank, then str = data in cell of iRowth row and
Data in iRowth row and 2nd column 2nd column
and (iRow+1)th row and 2nd column le = length of str
gets merged into iRowth row and 2nd pos = 1st position of comma from
column and so on for 3rd column right to left in str
4.2 If 1st column corresponding to Step 4: Repeats the steps until iRow =
(iRow + 1)th row is blank , then LastRow
Delete that iRow 4.1 If pos = 0 , then
4.3 iRow  iRow + 1 Data in cell of iRowth row and 3rd
column is blank
Now we have all the titles in one line. Data Data in cell of iRowth row and 4th
also contains same records like if one title is column is string str
repeated again in the next row then instead Else
of writing the title again, ” is written in the Data in cell of iRowth row and 3rd
next row to signify that the title repeats column is right part after comma
itself. So for solving this, we created another Data in cell of iRowth row and 4th
column is left part before comma
macro: 4.2 iRow  iRow + 1
Algorithm for same records Step 5: Stop

Step 1, Step 2 , Step 3 and step 5 are same g.


as above algorithm Similarly publisher and place are also

© 2016, IRJET ISO 9001:2008 Certified Journal Page 695


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 04 | Apr-2016 www.irjet.net p-ISSN: 2395-0072

separated by changing the delimiter sign 5.3 Merging Files


and column number where the separated
field needs to be placed. All files are processed separately by following above
h. procedure respectively. It is required to merge all files
We are using MARC 21 library standard into a single one. To carry out this task, VLOOK UP
which includes various marc tags to which FORMULA is applied in Excel sheet.
the fields are mapped to migrate data into
Koha. FORMULA:
One among the tag is tag 008. This tag is VLOOKUP(lookup_value,table_array,col_index_num,r
used to reflect in Koha whether the title is a ange_lookup)
handbook or dictionary or encyclopedia and
so on. For creating this tag, macro is written Lookup_value : The value to search in the first column
as follows: of the table array. Lookup_value can be a value or a
Algorithm for marc tag 008 reference. If lookup_value is smaller than the smallest
value in the first column of table_array, VLOOKUP
Step 1: Start returns the #N/A error value.
Step 2: Declare variables iRow, LastRow , Table_array: Two or more columns of data. Use a
pos, str, le. reference to a range or a range name. The values in the
Step 3: Initialize variables first column of table_array are the values searched by
iRow = 1 lookup_value. These values can be text, numbers, or
LastRow = logical values.
ActiveSheet.UsedRange.Rows.Count Col_index_num: The column number in table_array
str = data in cell of iRowth row and from which the matching value must be returned. A
2nd column col_index_num of 1 returns the value in the first column
Data in cell of iRowth row and 3rd in table_array; a col_index_num of 2 returns the value in
column= the second column in table_array, and so on. If
col_index_num is:
"131209s2013\\\\xx\\\\\\\\\\\\000\0\e 
ng\d" Less than 1, VLOOKUP returns the #VALUE!
Step 4: Repeats the steps until iRow = error value.
LastRow 
4.1 If str contains text “handbook” Greater than the number of columns in
Change bit position 24 to “f” in table_array, VLOOKUP returns the #REF! error
above string of column 3 value.
4.2 If str contains text “encyclopedia” or Range_lookup: A logical value that specifies whether
“encyclopaedia” you want VLOOKUP to find an exact match or an
Change bit position 25 to “e” in approximate match:
above string of column 3 
4.3 If str contains text “BS” If TRUE or omitted, an exact or approximate
Change bit position 26 to “e” in match is returned. If an exact match is not found,
above string of column 3 the next largest value that is less than
4.4 If str contains text “Proceedings” lookup_value is returned. Values in table_array
Change bit position 29 to “1” in must be sorted.[13]
above string of column 3 
4.5 iRow  iRow + 1 If FALSE, VLOOKUP will only find an exact
Step 5: Stop match. In this case, the values in the first column
of table_array do not need to be sorted. If an
exact match is not found, the error value #N/A
is returned.

© 2016, IRJET ISO 9001:2008 Certified Journal Page 696


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 04 | Apr-2016 www.irjet.net p-ISSN: 2395-0072

The following procedure we have used:


a. O
Open processed excel sheets and move one
by one to a single sheet by
right click on sheet name  select move or
copy  select the sheet where you want to
move
b. S
Select table_array in each sheet.
c. A
Apply VLOOKUP in single sheet where all Fig -7: Target Data Format
need to merged.

6. DATA MAPPING
The fields in final excel sheet obtained are mapped with
MARC tags. Before moving ahead, let me explain about
WHAT is MARC and WHY it is required?

Machine – Readable Cataloguing (MARC) was conceived


in 1966 as a method of converting the data on Library of
congress cards to machine readable form in order to
print bibliographic products. At the turn of new
millennium it has become an international standard
communication format and newest version has
appropriately been renamed MARC 21. [5]

Now Question arises WHY there is need for MARC


21?
Fig -5: Move or copy one sheet to another
There is a tendency to transfer towards the MARC 21
because of need for exchange of bibliographic data
within the framework of world library network that is
based on the MARC 21 format. Reasons are:
 Standardization: Standardisation in the
exchange formats and structure of a database is
essential to facilitate exchange of data in
efficient and effective way between the libraries.
The adoption of different standard creates
incompatibility in exchanging data which act as
a major barrier in the use of bibliographic and
related information. Format compatibilities are
necessary for computerized cataloguing data
and these are being standardized by the ISO. The
MARC 21 format is one of the popular standard
exchange format which adhere to ISO 2709
standard and are using majority of the countries
in the world for exchanging data in machine
Fig -6: VLookUp Formula readable form. [6]

Now we have complete data sorted in multiple fields Other standards under development: Other
merged in a single file. The following snapshot will do so: standards for encoding digital information in
machine readable form such as Dublin core,

© 2016, IRJET ISO 9001:2008 Certified Journal Page 697


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 04 | Apr-2016 www.irjet.net p-ISSN: 2395-0072

extensible mark-up language(XML) are still


under development.[5]
 C
Carries information: It carries lot of
information in a standard, easy-to-process,
clearly designated sequence of bytes.[5]

Now fields in final sheet are mapped with MARC 21 tags.


The tags are followed by the name they represent.
Examples include:

0XX Control information, numbers, codes


1XX Main entry
2XX Titles, edition, imprint
3XX Physical description etc.
4XX Series statements
Fig -8: MarcEdit Tool
5XX Notes
6XX Subject added entries
7XX Added entries other than subjects
8XX Series added entries
9XX Items table information like barcode, etc.[9]

In MARC 21 tags, the notation XX is often used to refer to


a group of related tags. For example : 1XX refers to all
the tags in the 100s; 100, 110, 130 & so on.
We have mapped fields with corresponding MARC 21
tags .For carrying out this task, we have used MARCEDIT
TOOL which is a simplified metadata processing tool
that provides simplest way to convert excel sheets to
marc files – marc text files(.mrk) and machine readable
cataloguing file( .mrc) which is required to migrate data
into Koha .

6.1 Excel  .mrk


a.
Use delimited and click on “NEXT” to accept
excel sheet as an input file and .mrk as an Fig -9: Convert Excel Sheet into .mrk file
output file. The following snapshots will do
so: a.
Map with Marc 21 tags , Join similar items,
and click on Finish

© 2016, IRJET ISO 9001:2008 Certified Journal Page 698


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 04 | Apr-2016 www.irjet.net p-ISSN: 2395-0072

Fig-10: Mapping with marc tags Fig -12: Select Marc Tools

Fig -11: Format of .mrk File


Fig -13: Use Marcmaker
6.2 mrk .mrc
a. S
Select MARC Tools
b. T
Make .mrk file as an input and .mrc as an
output and Use MarcMaker and Execute
c. F
Format of .mrc file (Fig.14)

Fig -14: Format of .mrc file

7. DATA MIGRATION
Data migration is the process of transferring data from
one system to another. It is an important step and is a

© 2016, IRJET ISO 9001:2008 Certified Journal Page 699


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 04 | Apr-2016 www.irjet.net p-ISSN: 2395-0072

critical process that directly influences the quality of


data management. Data migration had affected on the
quality of the data, such as, accuracy, data elements, and
data accessibility, and all data performances. So it is
important that data migration should not hamper quality
of data. The steps we have followed to accomplish data
migration process includes:

7.1 Upload .mrc File


Upload .mrc file created by MarcEdit tool:

 G
Go to KOHA Home  Tools  Stage
Marc Records for Import
 B
Browse and upload .mrc file created

7.2 Import Batch Into Catalog


 G
Go To KOHA Home Tools  Stage
Marc record management
 M
Manage Staged Records
 S
Select framework
 I
Import batch into catalog
Fig -15: Data Flow Diagram
7.3 Rebuild Zebra
One of the frequently used search engines is 8. DATA VALIDATION
Zebra. Zebra is used for indexing structured
documents (such as e-mail, XML, MARC Data validation is the process of ensuring data quality.
records) and for the retrieval of documents Data migration is a critical process that directly
using the Z39.50 protocol and SRW/U.[7] influenced the quality of data management. The accuracy
Records can also be imported from library of data is fundamental dimension in order to ensure the
of congress through Z39.50. higher quality of data if the data were wrong, the other
Command used in Linux to rebuild zebra so dimensions matter little[8].
that all the records get updated in MYSQL The following figure defines the common problems faced
database. in data quality during data migration:
[koha @localhost]# perl -I
/usr/share/koha /lib/ /usr/share/koha
/bin/migration_tools/rebuild_zebra.pl -r DATA QUALITY PROBLEMS
-b -v -a

The following Data Flow Diagram explains the


Database (Schema) Level Data Entry Level
complete procedure from Data Transformation to
Lack of integrity constraints Misspellings
Data Migration
Poor schema design Redundancy/Duplicates

Uniqueness, Referential Integrity Contradictory


values
Fig -16: Data Validation

At the database level, MYSQL is used as database,


following commands are run to ensure data quality:

© 2016, IRJET ISO 9001:2008 Certified Journal Page 700


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 04 | Apr-2016 www.irjet.net p-ISSN: 2395-0072

it prone to failures. Thorough understanding of purpose


[root@localhost]# mysql –u root –p koha of migration , proper migration design and predicting the
{Koha is the name of database} migration output can bring down the possible chances of
failure drastically. Therefore, being aware of modern
Mysql > select * from biblio; software and technology and current issues in data
{This will display all biblionumbers and their related migration, execution of steps becomes easier and can
information in biblio table} prove to be critical in successfully accomplishing a
migration project.
Mysql > select * from biblioitems;
{This table includes marc information done in data 10. ACKNOWLEDGEMENT
mapping step}

Mysql > select * from items; Special thanks and appreciation goes to Sanjay Burde,
{This table holds all information of items migrated Senior Principal Scientist, Charu Verma, Principal
to Koha } Scientist and Salim Ansari, Senior Technical officer for
their tremendous support.
Referential Integrity is maintained in the way:
We use various tables in Koha database which are 11. REFERENCES
connected to each other via primary key- foreign key [1] Cheong Youn and Cyril S. Ku Bell
hence fulfilling referential integrity. The following figure Communications Research, “Data Migration”,
will show referential integrity among 3 tables: Biblio , Piscataway, NJ 08855-1379,p.1255,1992.
Biblioitems and Items [10]. [2] Dr. Sanjay Kataria, Mohit Sharma and Anshul
Pachouri, “Integrating Open Source Knowledge
Management Tools into Library Management
for Automation: A case study of Jaypee Institute
of Information Technology University”, Noida,
India, p.317, 2010.
[3] K.T. Anuradha, R. Sivakaminathan and P. Arun
Kumar, “Open-source tools for enhancing full-
text searching of OPACs-Use of Koha,
Greenstone and
Fedora”,Bangalore,India,p.233,2011.
[4] Shivpal Singh Kushwah, J. N. Gautam and Ritu
Singh, “Library Automation and Open Source
Solutions Major Shifts & Practices: A
Comparative Case Study of Library Automation
Systems in India”, India, p. 148, 2008.
[5] Zahiruddin Khurshid, “From MARC to MARC 21
and beyond: some reflections on MARC and the
Fig -17: Referential Integrity Arabic language”,Dhahran, Saudi Arabia,p.370,
2002.
At data entry level, problem of misspellings, [6] Dhrubajit Das, “MARC 21 : The Standard
redundancy and contradictory values are resolved in Exchange Format for the 21st Century”,
data transformation process itself (Refer Fig. 3 and Fig.7) Ahmedabad, India, p.154, 2004.
[7] Branko Milosavljevic, Danijela Boberic´ and
Hence the correctness and effectiveness of Dusˇan Surla, “Retrieval of bibliographic records
transformation and migration process has been using Apache Lucene”, Novi
validated and thereby data quality is ensured in Koha . Sad,Serbia,p.526,2009.
[8] Ikhlas Fuad Zamzami, Hanan Abdullah A. Fatani
and Nuha Abdullah H. Zammarah, “Data
9. CONCLUSION Migration Challenges: The Impact of Data
Quality”, Kuala Lumpur,Malaysia,p.1.
With the advent of new technology and growth of [9] http://www.loc.gov/marc/bibliographic/
information technology, it becomes necessary to migrate [10] http://schema.koha-community.org
the data from their legacy system to a new one. The [11] http://manual.koha-community.org/
migration cannot be overlooked as a simple step. It is a [12] http://www.libsys.co.in/
complex process that holds various phases which makes [13] https://support.office.com/

© 2016, IRJET ISO 9001:2008 Certified Journal Page 701


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 03 Issue: 04 | Apr-2016 www.irjet.net p-ISSN: 2395-0072

12. BIOGRAPHIES
Principal Scientist & Principal
Investigator, CSIR Knowledge Senior Project Fellow, CSIR
Gateway Project at CSIR-National Knowledge Gateway Project at
Institute of Science CSIR-National Institute of science
Communication and Information communication & Information
Resources, New Delhi Resources, New Delhi
E-mail: E-mail:
[email protected] [email protected]

© 2016, IRJET ISO 9001:2008 Certified Journal Page 702

You might also like