0% found this document useful (0 votes)

17 views5 pages

Lps Week 16 Iatb

Data mining, also known as Knowledge Discovery in Database (KDD), is a process used to extract valuable information from large datasets through various steps including data cleaning, integration, and pattern evaluation. It has applications across multiple sectors such as healthcare, finance, and marketing, enabling organizations to make data-driven decisions and identify trends. However, challenges such as data privacy, complexity, and the need for advanced tools can complicate its implementation.

Uploaded by

mphaolyn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views5 pages

Lps Week 16 Iatb

Uploaded by

mphaolyn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

LPS Week 16 – IATB

Data Mining

Data mining is one of the most useful techniques that help entrepreneurs, researchers, and individuals to extract
valuable information from huge sets of data. Data mining is also called Knowledge Discovery in Database (KDD). The
knowledge discovery process includes Data cleaning, Data integration, Data selection, Data transformation, Data
mining, Pattern evaluation, and Knowledge presentation.

The process of extracting information to identify patterns, trends, and useful data that would allow the business to
take the data-driven decision from huge sets of data is called Data Mining.

In other words, we can say that Data Mining is the process of investigating hidden patterns of information to various
perspectives for categorization into useful data, which is collected and assembled in particular areas such as data
warehouses, efficient analysis, data mining algorithm, helping decision making and other data requirement to
eventually cost-cutting and generating revenue.

Data mining is the act of automatically searching for large stores of information to find trends and patterns that go
beyond simple analysis procedures. Data mining utilizes complex mathematical algorithms for data segments and
evaluates the probability of future events. Data Mining is also called Knowledge Discovery of Data (KDD).
Data Mining is a process used by organizations to extract specific data from huge databases to solve business
problems. It primarily turns raw data into useful information.

STEPS IN DATA MINING

The data mining process involves several steps:
Defining the problem.
Building the database.
Examining the data.
Preparing a model to be used to probe the data.
Testing the model.
Using the model.
Putting the results into action.

Types of Data Mining

Data mining can be performed on the following types of data:
Relational Database:
A relational database is a collection of multiple data sets formally organized by tables, records, and columns from
which data can be accessed in various ways without having to recognize the database tables. Tables convey and
share information, which facilitates data searchability, reporting, and organization.

Data warehouses:
A Data Warehouse is the technology that collects the data from various sources within the organization to provide
meaningful business insights. The huge amount of data comes from multiple places such as Marketing and Finance.
The extracted data is utilized for analytical purposes and helps in decision making for a business organization. The data
warehouse is designed for the analysis of data rather than transaction processing.
Data Repositories:
The Data Repository generally refers to a destination for data storage. However, many IT professionals utilize the
term more clearly to refer to a specific kind of setup within an IT structure. For example, a group of databases, where
an organization has kept various kinds of information.
Object-Relational Database:
A combination of an object-oriented database model and relational database model is called an object relational
model. It supports Classes, Objects, Inheritance, etc.
Transactional Database:
A transactional database refers to a database management system (DBMS) that has the potential to undo a database
transaction if it is not performed appropriately.

Advantages of Data Mining

The Data Mining technique enables organizations to obtain knowledge-based data.
Data Mining helps the decision-making process of an organization.
It Facilitates the automated discovery of hidden patterns as well as the prediction of trends and behaviors.
It can be induced in the new system as well as the existing platforms.
It is a quick process that makes it easy for new users to analyze enormous amounts of data in a short time.

Disadvantages of Data Mining

There is a probability that the organizations may sell useful data of customers to other organizations for money.
Many data mining analytics software is difficult to operate and needs advance training to work on.
Different data mining instruments operate in distinct ways due to the different algorithms used in their design.
Therefore, the selection of the right data mining tools is a very challenging task.
The data mining techniques are not precise, so that it may lead to severe consequences in certain conditions.

Data Mining Applications

Data Mining is primarily used by organizations with intense consumer demands- Retail, Communication, Financial,
marketing company, determine price, consumer preferences, product positioning, and impact on sales, customer
satisfaction, and corporate profits. Data mining enables a retailer to use point-of-sale records of customer purchases to
develop products and promotions that help the organization to attract the customer. These are :

Data Mining in Healthcare:

Data mining in healthcare has excellent potential to improve the health system. It uses data and analytics for better
insights and to identify best practices that will enhance health care services and reduce costs.

Data Mining in Market Basket Analysis:

Market basket analysis is a modeling method based on a hypothesis. If you buy a specific group of products, then
you are more likely to buy another group of products.

Data mining in Education:

Education data mining is a newly emerging field, concerned with developing techniques that explore knowledge from
the data generated from educational Environments. EDM objectives are recognized as affirming student's future
learning behavior, studying the impact of educational support, and promoting learning science.
.
Data Mining in Manufacturing Engineering:
Knowledge is the best asset possessed by a manufacturing company. Data mining tools can be beneficial to find
patterns in a complex manufacturing process. Data mining can be used in system-level designing to obtain the
relationships between product architecture, product portfolio, and data needs of the customers. It can also be used to
forecast the product development period, cost, and expectations among the other tasks.

Data Mining in CRM (Customer Relationship Management):

Customer Relationship Management (CRM) is all about obtaining and holding Customers, also enhancing customer
loyalty and implementing customer-oriented strategies. To get a decent relationship with the customer, a business
organization needs to collect data and analyze the data. With data mining technologies, the collected data can be used
for analytics.

Data Mining in Fraud detection:

Billions of dollars are lost to the action of frauds. Traditional methods of fraud detection are a little bit time consuming
and sophisticated. Data mining provides meaningful patterns and turning data into information.
Data Mining in Lie Detection:
Apprehending a criminal is not a big deal, but bringing out the truth from him is a very challenging task. Law
enforcement may use data mining techniques to investigate offenses, monitor suspected terrorist communications,
etc. constructed.

Data Mining Financial Banking:

The Digitalization of the banking system is supposed to generate an enormous amount of data with every new
transaction. The data mining technique can help bankers by solving business-related problems in banking and finance
by identifying trends, casualties, and correlations in business information and market costs that are not instantly
evident to managers or executives because the data volume is too large or are produced too rapidly on the screen by
experts.

Challenges of Implementation in Data mining

Incomplete and noisy data:

The process of extracting useful data from large volumes of data is data mining. The data in the real world is
heterogeneous, incomplete, and noisy. Data in huge quantities will usually be inaccurate or unreliable.

Data Distribution:
Real-worlds data is usually stored on various platforms in a distributed computing environment. It might be in a
database, individual systems, or even on the internet. Practically, It is a quite tough task to make all the data to a
centralized data repository mainly due to organizational and technical concerns.

Complex Data:
Real-world data is heterogeneous, and it could be multimedia data, including audio and video, images, complex data,
spatial data, time series, and so on.

Performance:
The data mining system's performance relies primarily on the efficiency of algorithms and techniques used. If the
designed algorithm and techniques are not up to the mark, then the efficiency of the data mining process will be
affected adversely.

Data Privacy and Security:

Data mining usually leads to serious issues in terms of data security, governance, and privacy. For example, if a
retailer analyzes the details of the purchased items, then it reveals data about buying habits and preferences of
the customers without their permission.

Data Visualization:
In data mining, data visualization is a very important process because it is the primary method that shows the output to
the user in a presentable way. The extracted data should convey the exact meaning of what it intends to express.

Techniques For Data Mining

Cluster Analysis.
Cluster analysis is a data reduction technique that groups together either variables or cases based on similar data
characteristics. This technique is useful for finding customer segments based on characteristics such as demographic
and financial information or purchase behavior. For example, suppose a bank wants to find segments of customers
based on the types of accounts they open.

Linear Regression.
Linear regression is a method that fits a straight line through data. If the line is upward sloping, it means that an
independent variable such as the size of a sales force has a positive effect on a dependent variable such as revenue. If
the line is downward sloping, there is a negative effect. The steeper the slope, the more effect the independent
variable has on the dependent variable.

Correlation.
Correlation is a measure of the relationship between two variables. For example, a high correlation between purchases
of certain products such as cheese and crackers indicates that these products are likely to be purchased together.

Factor Analysis.
Factor analysis is a data reduction technique. This technique detects underlying factors, also called "latent variables,"
and provides models for these factors based on variables in the data. For example, suppose you have a market
research survey that asks the importance of nine product attributes.

Decision Trees.
Decision trees separate data into sets of rules that are likely to have different effects on a target variable. For example,
we might want to find the characteristics of a person likely to respond to a direct mail piece. These characteristics can
be translated into a set of rules.

Neural Networks.
Neural networks mimic the human brain and can "learn" from examples to find patterns in data or to classify data. The
advantage is that it is not necessary to have any specific model in mind when running the analysis. Also, neural
networks can find interaction effects (such as effects from the combination of age and gender) which must be explicitly
specified in regression.

Association Models.
Association models examine the extent to which values of one field depend on, or are predicted by, values of another
field. Association discovery finds rules about items that appear together in an event such as a purchase transaction.

Data Mining Tools/Software

1. Orange Data Mining:

Orange is a perfect machine learning and data mining software suite. It supports the visualization
and is a software-based on components written in Python computing language and developed at
the bioinformatics laboratory at the faculty of computer and information science, Ljubljana
University, Slovenia.

The instrument has machine learning components, add-ons for bioinformatics and text mining,
and it is packed with features for data analytics. This is also used as a python library.

Learners can also be diversified by altering their parameter sets. In orange, ensembles are simply
wrappers around learners. They act like any other learner. Based on the data, they return models
that can predict the results of any data instance.

2. SAS Data Mining:

SAS stands for Statistical Analysis System. It is a product of the SAS Institute created for analytics
and data management. SAS can mine data, change it, manage information from various sources,
and analyze statistics. It offers a graphical UI for non-technical users.

SAS data miner allows users to analyze big data and provide accurate insight for timely decision-
making purposes. SAS has distributed memory processing architecture that is highly scalable. It is
suitable for data mining, optimization, and text mining purposes.

3. DataMelt Data Mining:

DataMelt is a computation and visualization environment which offers an interactive structure for
data analysis and visualization. It is primarily designed for students, engineers, and scientists. It is
also known as DMelt.

DMelt is a multi-platform utility written in JAVA. It can run on any operating system which is
compatible with JVM (Java Virtual Machine). It consists of Science and mathematics libraries.

DMelt can be used for the analysis of the large volume of data, data mining, and statistical
analysis. It is extensively used in natural sciences, financial markets, and engineering.

4. Rattle:
Ratte is a data mining tool based on GUI. It uses the R stats programming language. Rattle exposes
the statical power of R by offering significant data mining features. While rattle has a
comprehensive and well-developed user interface, It has an integrated log code tab that produces
duplicate code for any GUI operation.

The data set produced by Rattle can be viewed and edited. Rattle gives the other facility to review
the code, use it for many purposes, and extend the code without any restriction.

5. Rapid Miner:
Data Mining Tools
Rapid Miner is one of the most popular predictive analysis systems created by the company with
the same name as the Rapid Miner. It is written in JAVA programming language. It offers an
integrated environment for text mining, deep learning, machine learning, and predictive analysis.

Rapid Miner provides the server on-site as well as in public or private cloud infrastructure. It has a
client/server model as its base. A rapid miner comes with template-based frameworks that enable
fast delivery with few errors(which are commonly expected in the manual coding writing process)

The Future Of Data Mining

One of the key issues raised by data mining technology is not a business or technological one, but a social one. It is
concern about individual privacy. Data mining makes it possible to analyze routine business transactions and glean a
significant amount of information about individuals' buying habits and preferences.
Another issue is that of data integrity. Clearly, data analysis can only be as good as the data that is being analyzed. A
key implementation challenge is integrating conflicting or redundant data from different sources. For example, a bank
may maintain credit card accounts on several different databases. The address (or even the name) of a single
cardholder may be different in each. Software must translate data from one system to another and select the address
most recently entered.

Finally, there is the issue of cost. While system hardware costs have dropped dramatically within the past five years,
data mining and data warehousing tend to be self-reinforcing. The more powerful the data mining queries, the greater
the usefulness of the information being gleaned from the data, and the greater the pressure to increase the amount of
data being collected and maintained. The result is increased pressure for faster, more powerful data mining queries.
These more efficient data mining systems often cost more than their predecessors.

References:
https://www.javatpoint.com/data-mining
https://www.encyclopedia.com/science-and-technology/computers-and-electrical
engineering/computers-and-computing/data-mining
https://www.javatpoint.com/data-mining-tools

Logo Proof Sheet Template
0% (1)
Logo Proof Sheet Template
1 page
Absract:: Data, Information, and Knowledge
No ratings yet
Absract:: Data, Information, and Knowledge
7 pages
Data Warehousing&Dat Mining
No ratings yet
Data Warehousing&Dat Mining
12 pages
Data Mining Notes
No ratings yet
Data Mining Notes
21 pages
Cs507 Data Mining
100% (1)
Cs507 Data Mining
3 pages
Notes DATA MINING MBA III
No ratings yet
Notes DATA MINING MBA III
8 pages
Data Mining Tutorial Guide
No ratings yet
Data Mining Tutorial Guide
30 pages
Back To School PowerPoint Template
No ratings yet
Back To School PowerPoint Template
36 pages
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
No ratings yet
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
14 pages
Presentation On Data Mining
100% (1)
Presentation On Data Mining
51 pages
Data Mining Prologues: K.Sankar Lecturer / M.E., (P.HD) ., D.V.Rajkumar M.C.A., M.Phil Lecturer
No ratings yet
Data Mining Prologues: K.Sankar Lecturer / M.E., (P.HD) ., D.V.Rajkumar M.C.A., M.Phil Lecturer
4 pages
HGS-HSM-SL-21-001 - Improvement of Safety Function For DF Engine
No ratings yet
HGS-HSM-SL-21-001 - Improvement of Safety Function For DF Engine
6 pages
Unit 1
No ratings yet
Unit 1
27 pages
Unit II Data Mining
No ratings yet
Unit II Data Mining
8 pages
Data Mining
No ratings yet
Data Mining
19 pages
Data Mining Techniques Unit-1
No ratings yet
Data Mining Techniques Unit-1
122 pages
DM Mod 1
No ratings yet
DM Mod 1
17 pages
Data Mining for Business Insights
No ratings yet
Data Mining for Business Insights
13 pages
Data Mining Insights for Professionals
No ratings yet
Data Mining Insights for Professionals
89 pages
Data Mining: Applications and Techniques
No ratings yet
Data Mining: Applications and Techniques
60 pages
Motivation of Data Mining
No ratings yet
Motivation of Data Mining
4 pages
Free IDS V86 Software Download Link For FORD - MAZDA IDS V86
No ratings yet
Free IDS V86 Software Download Link For FORD - MAZDA IDS V86
2 pages
Data Mining1
No ratings yet
Data Mining1
37 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
24 pages
Project Report On Aaj
No ratings yet
Project Report On Aaj
57 pages
Unit 1 Datamining For Business Intelligence
No ratings yet
Unit 1 Datamining For Business Intelligence
101 pages
Clinical Data Management
No ratings yet
Clinical Data Management
5 pages
BIDW Lecture 2
No ratings yet
BIDW Lecture 2
33 pages
Module 4 - Data Mining
No ratings yet
Module 4 - Data Mining
13 pages
Data Mining Tutorial Guide
No ratings yet
Data Mining Tutorial Guide
12 pages
Data Mining Essentials for Students
No ratings yet
Data Mining Essentials for Students
15 pages
DM Unit-1
No ratings yet
DM Unit-1
27 pages
Data Mining Cognate
No ratings yet
Data Mining Cognate
23 pages
KM Notes Unit-3
No ratings yet
KM Notes Unit-3
20 pages
Industrial Automation Courses
No ratings yet
Industrial Automation Courses
14 pages
DM Material
No ratings yet
DM Material
98 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
B SC (IT) VI-DSE3-M5
No ratings yet
B SC (IT) VI-DSE3-M5
13 pages
Data Mining Basics for Professionals
No ratings yet
Data Mining Basics for Professionals
40 pages
DMW Notes by Me
No ratings yet
DMW Notes by Me
45 pages
Data Mining-Introduction
No ratings yet
Data Mining-Introduction
8 pages
Process Synchronization Basics
No ratings yet
Process Synchronization Basics
58 pages
Data Mining
No ratings yet
Data Mining
11 pages
DWDM 2
No ratings yet
DWDM 2
15 pages
2010-08-18 Zernik, J: Data Mining of Online Judicial Records of The Networked US Federal Courts, International Journal On Social Media: Monitoring, Measurement, Mining, 1:69-83 (2010)
No ratings yet
2010-08-18 Zernik, J: Data Mining of Online Judicial Records of The Networked US Federal Courts, International Journal On Social Media: Monitoring, Measurement, Mining, 1:69-83 (2010)
13 pages
Data Mining Notes
No ratings yet
Data Mining Notes
46 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
46 pages
Introduction To Using C# For Graphics and Guis: Learning Objectives
No ratings yet
Introduction To Using C# For Graphics and Guis: Learning Objectives
13 pages
Brocade Fabric OS v9.2.1 Release Notes
No ratings yet
Brocade Fabric OS v9.2.1 Release Notes
54 pages
Data Mining and Data Warehousing Unit 3 Part 1
No ratings yet
Data Mining and Data Warehousing Unit 3 Part 1
13 pages
TJ 11 2017 3 128 132
No ratings yet
TJ 11 2017 3 128 132
5 pages
HTML Beginner
No ratings yet
HTML Beginner
16 pages
DMT Unit 5
No ratings yet
DMT Unit 5
25 pages
Creating Variables To MATLAB
No ratings yet
Creating Variables To MATLAB
9 pages
Data Mining M1
No ratings yet
Data Mining M1
64 pages
Data Mining Unit 1 (MSC Ds 3 Sem)
No ratings yet
Data Mining Unit 1 (MSC Ds 3 Sem)
119 pages
SCM - Modernizing Oracle Fusion Cloud SCM Applications With Redwood
No ratings yet
SCM - Modernizing Oracle Fusion Cloud SCM Applications With Redwood
41 pages
Data Mining
No ratings yet
Data Mining
22 pages
Naveed Ahmed CV
No ratings yet
Naveed Ahmed CV
3 pages
Data Mining
No ratings yet
Data Mining
8 pages
6 Arrow Diagram With Example - PDF - Project Management
No ratings yet
6 Arrow Diagram With Example - PDF - Project Management
6 pages
Data Mining L1,2
No ratings yet
Data Mining L1,2
26 pages
Data Mining PDF
No ratings yet
Data Mining PDF
6 pages
Data Mining
No ratings yet
Data Mining
18 pages
Chen Et Al 2019
No ratings yet
Chen Et Al 2019
35 pages
Ps C:/Users/Faiza C/Users/Faiza/C/Firstrepo
No ratings yet
Ps C:/Users/Faiza C/Users/Faiza/C/Firstrepo
25 pages
EC8691 Lesson Plan Microprocessor and Micro COntroller
No ratings yet
EC8691 Lesson Plan Microprocessor and Micro COntroller
7 pages
L - 1 Data Mining
No ratings yet
L - 1 Data Mining
17 pages
Introduction To Data Mining - 125604
No ratings yet
Introduction To Data Mining - 125604
7 pages
Internet and Web Technologies - Notes X2023 April-1
No ratings yet
Internet and Web Technologies - Notes X2023 April-1
90 pages
Advanced Security Scanner Tech
No ratings yet
Advanced Security Scanner Tech
2 pages
SWEN3165 Lecture 9 - Data Mining
No ratings yet
SWEN3165 Lecture 9 - Data Mining
32 pages
ART Multiscale Finite Element Calculations in Python Using SfePy
No ratings yet
ART Multiscale Finite Element Calculations in Python Using SfePy
25 pages
Data Mining
No ratings yet
Data Mining
395 pages
I2C Protocol for Data Storage
100% (2)
I2C Protocol for Data Storage
4 pages
Unit 1 Data Warehouse and Data Mining
No ratings yet
Unit 1 Data Warehouse and Data Mining
13 pages
BCA DBMS Exam June 2023
No ratings yet
BCA DBMS Exam June 2023
2 pages
Dmi Unit 1 - 186 - N3
No ratings yet
Dmi Unit 1 - 186 - N3
12 pages
DWH Unit 3
No ratings yet
DWH Unit 3
7 pages
Consumer Protection Act 2 1
No ratings yet
Consumer Protection Act 2 1
26 pages
Unit-01 - Data Mining
No ratings yet
Unit-01 - Data Mining
12 pages
Recitation05 Cachelab
No ratings yet
Recitation05 Cachelab
97 pages
Adm Unit 3
No ratings yet
Adm Unit 3
67 pages
DWDM Lab Using Python
No ratings yet
DWDM Lab Using Python
15 pages
Biodata Etrio Widodo
No ratings yet
Biodata Etrio Widodo
3 pages
How To Make Custom Shops in Elden Ring - Introduction To Talk Menus
No ratings yet
How To Make Custom Shops in Elden Ring - Introduction To Talk Menus
35 pages
2024 Navori Presentation English PDF
No ratings yet
2024 Navori Presentation English PDF
38 pages
Ad. Python Ch-2 - Notes
No ratings yet
Ad. Python Ch-2 - Notes
15 pages
Afar 01 Prelim Reviewer
No ratings yet
Afar 01 Prelim Reviewer
13 pages
Final Audit Summary
No ratings yet
Final Audit Summary
6 pages
Audit Summary 2.0
No ratings yet
Audit Summary 2.0
7 pages
MyWalboxApp QuickStartGuide EU
No ratings yet
MyWalboxApp QuickStartGuide EU
15 pages
Chapter 1 (Introduction)
No ratings yet
Chapter 1 (Introduction)
17 pages

Lps Week 16 Iatb

Uploaded by

Lps Week 16 Iatb

Uploaded by

LPS Week 16 – IATB

STEPS IN DATA MINING

Types of Data Mining

Advantages of Data Mining

Disadvantages of Data Mining

Data Mining Applications

Data Mining in Healthcare:

Data Mining in Market Basket Analysis:

Data mining in Education:

Data Mining in CRM (Customer Relationship Management):

Data Mining in Fraud detection:

Data Mining Financial Banking:

Challenges of Implementation in Data mining

Incomplete and noisy data:

Data Privacy and Security:

Techniques For Data Mining

Data Mining Tools/Software

2. SAS Data Mining:

3. DataMelt Data Mining:

The Future Of Data Mining

You might also like