1
Data Mining
Name of the student
Course
Tutor
Date
2
Data Mining
Introduction:
Organizations that use data mining to transform raw data into meaningful
information. Data mining is the practice of examining vast amounts of data to identify trends
and patterns. Data mining tools identify relationships between the data based on the
variables that users request or contribute. Employers may discover further about their
consumers by employing software to search for patterns in enormous amounts of data. This
allows them to design more successful marketing campaigns, improve sales, and save
expenses. Competent collection of data, storage, and computational capabilities are required
for data mining. Organizations could use data mining to discover about just what their
consumers are engaged in or would like to purchase, as well as for detecting fraud as well as
malware scanning (Sumathi 2006).
Discovering and evaluating enormous collections of information to uncover relevant
relationships and correlations is what data mining is all about. It may be used for marketing
strategy, credit risk management, detection of fraud, filtering the Email off spam messages,
and even determining user attitude or opinion. In order to make business, digital media
corporations utilize data mining tools to homogenize its users(Sumathi 2006). This
application of data mining has recently been heavily criticised since consumers are frequently
uninformed that data analysis is taking place with their private details, particularly if it's used
to affect opinions.
The data mining process breaks down into five steps. First, organizations collect data
and load it into their data warehouses. Next, they store and manage the data, either on in-
house servers or the cloud (Sumathi 2006). Business analysts, management teams, and
information technology professionals access the data and determine how they want to
organize it. Then, application software sorts the data based on the user's results, and finally,
the end-user presents the data in an easy-to-share format, such as a graph or table.
3
Literature review
Data mining, also known as knowledge discovery in databases, can be defined as the
process of analyzing large information repositories and of discovering implicit, but
potentially useful information (Han, Kamber, & Pei, 2011). Data mining has the capability to
uncover hidden relationships and to reveal unknown patterns and trends by digging into large
amounts of data (Sumathi & Sivanandam, 2006). The functions, or models, of data mining
can be categorized according to the task performed: association, classification, clustering, and
regression (Hui & Jha, 2000; Kao, Chang, & Lin, 2003; Nicholson, 2006b). Data mining
analysis is based normally on three techniques: classical statistics, artificial intelligence, and
machine learning (Girija & Srivatsa, 2006).
Classical statistics is mainly used for studying data, data relationships, as well as for
dealing with numeric data in large databases (David J. Hand, 1998). Examples of classical
statistics include regression analysis, cluster analysis, and discriminate analysis. Artificial
intelligence (AI) applies “human-thought-like” processing to statistical problems (Girija &
Srivatsa, 2006). AI uses several techniques such as genetic algorithms, fuzzy logic, and
neural computing. Finally, machine learning is the combination of advanced statistical
methods and AI heuristics, used for data analysis and knowledge discovery (Kononenko &
Kukar, 2007). Machine learning uses several classes of techniques: neural networks,
symbolic learning, genetic algorithms, and swarm optimization.
Data mining benefits from these technologies, but differs from the objective pursued:
extracting patterns, describing trends, and predicting behavior. This research project was
funded by the Flemish Interuniversity Council (VLIR-IUC), the National Secretariat of
Higher Education, Science, Technology and Innovation of Ecuador (SENESCYT); and
supported by the CEPRA VII project “Plataforma de integracio n, publicaci on y consulta
integrada de recursos bibliogr aficos en la Web Semantica” funded by the Ecuadorian
Consortium for Advanced Internet Development (CEDIA). The authors thank Andres
Auquilla for the fruitful discussions on data mining techniques trends, and Paul Vanegas for
reviewing some drafts of this article. 3 formats. These raw data are cleansed in order to
remove noise, and duplicated and inconsistent data (Han et al., 2011). These cleansed data are
then transformed into appropriated formats that can be understood by other data mining tools,
and filtration and aggregation techniques are applied to the data in order to extract
4
summarized data. In fact, interesting knowledge is extracted from the transformed data. This
information is analyzed in order to identify the truly interesting patterns. Eventually,
knowledge is visualized to the user. More detailed information regarding a data mining
process can be found in Han et al. (2011).
Data mining techniques are applied in a wide range of domains where large amounts
of data are available for the identification of unknown or hidden information. In this sense, N.
Girija and S.K. Srivatsa (2006) indicate that data mining techniques used in www are called
web mining, used in text are called text mining, and used in libraries are called bibliomining.
The term bibliomining, or data mining for libraries, was first used by Scott Nicholson and
Jeffrey Stanton (2003) to describe the combination of data warehousing, data mining and
bibliometrics. This term is used to track patterns, behavior changes, and trends of library
systems transactions. Although the concept is not new, the term bibliomining was created to
facilitate the search of the terms 4 “library” and “data mining” in the context of libraries
rather than in software libraries.
Interesting patterns are analyzed and visualized through reports. The mining process
will be iterated until the resulted information is verified and proved by key users such as
librarians and library managers (Shieh, 2010). The application of bibliomining tools is an
emerging trend that can be used to understand patterns of behavior among library users and
staff, and patterns of information resource use throughout the library (Nicholson & Stanton,
2006). Bibliomining is highly recommended to provide useful and necessary information for
library management requirements, focusing on the professional librarianship issues, but
highly database technical dependent (Shieh, 2010). Bibliomining can also be used to provide
a comprehensive overview of the library workflow in order to monitor staff performance,
determine areas of deficiency, and predict future user requirements (Prakash, Chand, &
Gohel, 2004).
The resulting information gives the possibility to perform scenario analysis of the
library system, where different situations that need to be taken into account during a decision-
making process are evaluated (Nicholson, 2006a). An additional application is to standardize
structures and reports in order to share data warehouses among groups of libraries, allowing
libraries to benchmark their information (Nicholson, 2006a). The aim of this study is to
investigate how far academic libraries are pragmatically using data mining tools, and in
which library aspects librarians are implementing them.
5
Impact of Data Mining on the Field of Nursing:
In health care, data mining is becoming increasingly popular, if not increasingly
essential. Heterogeneous medical data have been generated in various health care
organizations, including payers, medicine providers, pharmaceuticals information,
prescription information, doctor's notes, or clinical records produced day by day. These
quantitative data can be used to do clinical text mining, predictive modeling, survival
analysis, patient similarity analysis, and clustering, to improve care treatment and reduce
waste. In health care area, association analysis, clustering, and outlier analysis can be
applied. Treatment record data can be mined to explore ways to cut costs and deliver
better medicine (Koh 2005).
Data mining also can be used to identify and understand high-cost patients and
applied to mass of data generated by millions of prescriptions, operations, and treatment
courses to identify unusual patterns and uncover fraud. Using data mining, the treatments
can be improved. By continuous comparison of symptoms, causes, and medicines, data
analysis can be performed to make effective treatments. Data mining is also used for the
treatment of specific diseases, and the association of side-effects of treatments. Data
mining applications are used to find abnormal patterns such as laboratory, physician’s
results, inappropriate prescriptions, and fraudulent medical claims (Koh 2005).
Implementations of Data Mining
Mobile service providers use data mining to design their marketing campaigns and to
retain customers from moving to other vendors.From a large amount of data such as billing
information, email, text messages, web data transmissions, and customer service, the data
mining tools can predict “churn” that tells the customers who are looking to change the
vendors. With these results, a probability score is given. The mobile service providers are
then able to provide incentives, offers to customers who are at higher risk of churning. This
kind of mining is often used by major service providers such as broadband, phone, gas
providers (Matillion 2020).
6
IT team has enriched data mining skill and return on investment can be measured.
Researchers leverage association analysis and clustering to provide the insight of what
product combinations were purchased; it encourages customers to purchase related products
that they may have been missed or overlooked. Users’ behaviors are monitored and analyzed
to find similarities and patterns in Web surfing behavior so that the Web can be more
successful in meeting user needs (Matillion 2020).
Data Mining detects outliers across a vast amount of data. The criminal data includes
all details of the crime that has happened. Data Mining will study the patterns and trends and
predict future events with better accuracy.The agencies can find out which area is more prone
to crime, how much police personnel should be deployed, which age group should be
targeted, vehicle numbers to be scrutinized (Matillion 2020).
Advantages of Data Mining
Data mining benefits include:
It helps companies gather reliable information.
It’s an efficient, cost-effective solution compared to other data applications.
It helps businesses make profitable production and operational adjustments.
Data mining uses both new and legacy systems(Simplilearn 2021).
It helps businesses make informed decisions.
It helps detect credit risks and fraud.
It helps data scientists easily analyze enormous amounts of data quickly.
Data scientists can use the information to detect fraud, build risk models, and
improve product safety.
It helps data scientists quickly initiate automated predictions of behaviors and
trends and discover hidden patterns (Simplilearn 2021).
7
Disadvantages of Data Mining
These are the major issues in data mining:
Many data analytics tools are complex and challenging to use. Data scientists need
the right training to use the tools effectively. Data mining requires large databases,
making the process hard to manage.
Speaking of the tools, different ones work with varying types of data mining,
depending on the algorithms they employ. Thus, data analysts must be sure to
choose the correct tools (Mishal 2021).
Data mining techniques are not infallible, so there’s always the risk that the
information isn’t entirely accurate. This obstacle is especially relevant if there’s a
lack of diversity in the dataset.
Companies can potentially sell the customer data they have gleaned to other
businesses and organizations, raising privacy concerns (Mishal 2021).
Possible Future Directions:
Some of the key data mining trends for the future include -
1. Multimedia Data Mining
This is one of the latest methods which is catching up because of the growing
ability to capture useful data accurately. It involves the extraction of data
from different kinds of multimedia sources such as audio, text, hypertext,
video, images, etc. and the data is converted into a numerical representation
in different formats. Can be used in clustering and classifications, performing
similarity checks, and also to identify associations (Data Entry 2021).
2. Distributed Data Mining
It involves the mining of huge amount of information stored in different
company locations or at different organizations. Highly sophisticated
algorithms are used to extract data from different locations and provide
proper insights and reports based upon them.
3. Spatial and Geographic Data Mining
8
It includes extracting information from environmental, astronomical, and
geographical data which also includes images taken from outer space. This
type of data mining can reveal various aspects such as distance and topology
which is mainly used in geographic information systems and other navigation
applications.
4. Time Series and Sequence Data Mining
The primary application of this type of data mining is study of cyclical and
seasonal trends. It is helpful in analyzing even random events which occur
outside the normal series of events. This method is mainly being use by retail
companies to access customer's buying patterns and their behaviors (Data
Entry 2021).
Conclusion:
Data mining is used in diverse applications such as banking, marketing, healthcare,
telecom industries, and many other areas. Data mining techniques help companies to gain
knowledgeable information, increase their profitability by making adjustments in processes
and operations. It is a fast process which helps business in decision making through analysis
of hidden patterns and trends. The insights mined from such data can prove invaluable
in improving care delivery, early diagnosis, disease identification and hospital staffing. There
is nearly limitless potential for leveraging data across the spectrums of patient care and
safety, as well as operational decision-making and academia.
9
References
Siguenza-Guzman, Lorena & Saquicela, Victor & Avila-Ordoñez, Elina & Vandewalle, Joos
& Cattrysse, Dirk. (2015). Literature Review of Data Mining Applications in
Academic Libraries. The Journal of Academic Librarianship. 41. 499-510.
10.1016/j.acalib.2015.06.007.https://www.researchgate.net/publication/
280101455_Literature_Review_of_Data_Mining_Applications_in_Academic_Librari
es
Koh, H. C., & Tan, G. (2005). Data mining applications in healthcare. Journal of healthcare
information management : JHIM, 19(2), 64–72.
https://pubmed.ncbi.nlm.nih.gov/15869215/
Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.
https://www.sciencedirect.com/book/9780123814791/data-mining-concepts-and-
techniques
Nicholson, J. K. (2006). Global systems biology, personalized medicine and molecular
epidemiology. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1682018/
Hand, D. J. (1998). Data mining: statistics and more?. The American Statistician, 52(2), 112-
118. https://www.tandfonline.com/doi/abs/10.1080/00031305.1998.10480549
Prakash, K., Chand, P., & Gohel, U. (2004). Application of data mining in library and
Information services.
https://www.researchgate.net/publication/265496914_Application_of_Data_Mining_i
n_Library_and_Information_Services
Kononenko, I., & Kukar, M. (2007). Machine learning and data mining. Horwood
Publishing. https://www.sciencedirect.com/book/9781904275213/machine-learning-
and-data-mining
Yeh, J. R., Shieh, J. S., & Huang, N. E. (2010). Complementary ensemble empirical mode
decomposition: A novel noise enhanced data analysis method. Advances in adaptive
data analysis, 2(02), 135-156.
https://www.worldscientific.com/doi/abs/10.1142/S1793536910000422
Girija, N., & Srivatsa, S. K. (2006). A research study: Using data mining in knowledge base
business strategies. Information Technology Journal, 5(3), 590-600.
https://www.semanticscholar.org/paper/A-Research-Study%3A-Using-Data-Mining-
in-Knowledge-Girija-S.K.Srivatsa/9678aa65cb7d01c14b19d95ac4fae3b2d2433953
10
Sumathi, S., & Sivanandam, S. N. (2006). Introduction to data mining and its
applications (Vol. 29). Springer.
https://ir.inflibnet.ac.in:8443/ir/bitstream/1944/435/1/04Planner_22.pdf
Data Entry Services(2021). In 5 important Future Trends in Data Mining. Retreived from
https://www.flatworldsolutions.com/data-management/articles/data-mining-future-
trends.php
Simplilearn (June 2021). In What is Data Mining: Definition,Benefits, Applications,Top
Techniques, and more. Retrieved from https://www.simplilearn.com/what -is-data-
ming
Matillion (June 2020). In 5 real life applications of Data Mining and Business Intelligence.
Retrieved from https://www.matillion.com/resources/blog/5-real-life-applications-of-
data-mining-and -business-intelligence
Mishal Roomi(April 2021). In 7 Advantages and Disadvantages of Data Mining | Limitations
& Benefits of Data Mining. Retrieved from https://www.hitechwhizz.com/2021/04/7-
advantages-and-disadvantages-limitations-benefits-of-data-mining.html?m=1