MIS Notes Unit-3
MIS Notes Unit-3
UNIT-III
What is the need of data management?
What is a Database?
A database is a collection of related data which represents some aspect of the real world. A
database system is designed to be built and populated with data for a certain task.
What Is Data Management
Data management involves collecting, storing, organizing, protecting, verifying, and
processing essential data and making it available to an organization.
What is need for data management?
Since we know that the data management system includes
Creation of a database.
Retrieval of information from the database.
Updating the database.
Managing a database.
So, to facilitate the above task we need the data base management system
These include:
1. Accuracy: It is important that the data is collected after clearly defining the
purpose for which it is to be used. Furthermore, the data should be collected at
the time of its intended use to ensure effective results on analysis.
2. Validity: Certain guidelines always have to be determined in order to make the
data as appropriate for use as possible.
3. Reliability: The data has to be collected from sources that can be trusted to be
credible.
4. Timeliness: It is important to collect data as close to the time of analysis as
possible because the recorded answers can change over time, rendering the
results ineffective.
5. Relevance: A study to improve HR practices should focus on collecting data
from people related to the field mostly.
6. Completeness: The data should by no means be incomplete in terms of the
earlier specified guidelines.
All these factors have to be considered and met while collecting data and though it might
seem easy, on the field it is extremely challenging.
2. Data protection
Data is an extremely valuable asset that is collected after extreme considerations and
allocation of resources. It holds sensitive information that can be disastrous in many ways to
the organization itself as well as the respondents. It is, therefore, the foremost responsibility
of a company to protect the data from hackers, misuse, compliance requirements and others
with maligned interests.
3. Analysis
Data in its raw form holds no meaning to anyone. Although the advent of technology has
made life extremely easy for corporations in this regard, there are still certain challenges that
crop up during data analysis. These challenges can range from inability to appropriately run
the tools or inaccurate input of information that results in wayward results. Clients
need tools that can extract data easily, systematically and provide a range of options for
manipulating that data.
4. Data automation
Collecting, storing and sorting this data is a huge task that is impossible to do manually. This
is where data automation steps in. The process streamlines the entire flow from collection to
analysis without much human intervention. However, the relevant expertise is still key
without which the whole exercise can go to waste.
5. Integration
This is the part where many people and systems make blunders. It’s one thing to collect data,
it’s quite another to make that data available to the organization and to any number of
downstream systems. Not only do you need to provide tools to make data available. It still all
starts at data, a small error that was made during data collection can result in failure of the
entire strategy based on results obtained from it.
What is Data independence:
Data independence is the type of data transparency. Data independence is a database
management system (DBMS) characteristic that lets programmers modify information
definitions and organization without affecting the programs or applications that use it. Such
property allows various users to access and process the same data for different purposes,
regardless of changes made to it.
A database containing patient information, for example, could serve various purposes. A
hospital’s billing department can use the data to obtain patients’ charges, discounts, and
insurance details. On the other hand, the food services department would need the same data
to see the patients’ nutritional requirements. How each department uses the data should not
affect the stored information regardless of the changes it undergoes, such as where the patient
details are stored or how they are labelled.
What is Data redundancy:
Data redundancy occurs when the same piece of data is stored in two or more separate places.
Suppose you create a database to store sales records, and in the records for each sale, you
enter the customer address. Yet, you have multiple sales to the same customer so the same
address is entered multiple times. The address that is repeatedly entered is redundant data.
What is Data consistency
In a single-user database, the user can modify data in the database without concern for other
users modifying the same data at the same time. However, in a multiuser database, the
statements within multiple simultaneous transactions can update the same data. Transactions
executing at the same time need to produce meaningful and consistent results.
Data concurrency means that many users can access data at the same time.
Data consistency means that each user sees a consistent view of the data, including
visible changes made by the user's own transactions and transactions of other users.
Data administration
Data administration is the process by which data is monitored, maintained and managed by a
data administrator and/or an organization. Data administration allows an organization to
control its data assets, as well as their processing and interactions with different applications
and business processes. Data administration ensures that the entire life cycle of data use and
processing is on par with the enterprise’s objective.
Since Data administration is a high-level function that is responsible for the overall
management of data resources in an organization, including:
Data definition, policies, procedures and standards
Data conflict resolution
Database planning, analysis, design, implementation, and maintenance
Data protection
Data performance assurance
User training, education, and consulting support
Whereas Database administration is a hands-on, physical involvement with the management
of a database including:
Physical database design
Technical issues of database security, performance, backup and recovery
Disadvantages of DBMS
o Cost of Hardware and Software: It requires a high speed of data processor and large
memory size to run DBMS software.
o Size: It occupies a large space of disks and large memory to run them efficiently.
o Complexity: Database system creates additional complexity and requirements.
o Higher impact of failure: Failure is highly impacted the database because in most of
the organization, all the data stored in a single database and if the database is damaged
due to electric failure or database corruption then the data may be lost forever.
Types of Database Management Systems
There are several types of database management systems. Here is a list of seven common
database management systems:
1. Hierarchical databases
2. Network databases
3. Relational databases
4. Object-oriented databases
5. Graph databases
6. ER model databases
7. Document databases
8. NoSQL databases
Hierarchical Databases
In a hierarchical database management system (hierarchical DBMSs) model, data is stored in
a parent-children relationship node. In a hierarchical database, besides actual data, records
also contain information about their groups of parent/child relationships.
In a hierarchical database model, data is organized into a tree-like structure. The data is
stored in the form of a collection of fields where each field contains only one value. The
records are linked to each other via links into a parent-children relationship. In a hierarchical
database model, each child record has only one parent. A parent can have multiple children.
To retrieve a field’s data, we need to traverse through each tree until the record is found.
Hierarchical databases are widely used to build high-performance and availability
applications usually in the banking and telecommunications industries.
The IBM Information Management System (IMS) and Windows Registry are two popular
examples of hierarchical databases.
Advantage
A hierarchical database can be accessed and updated rapidly. As shown in the figure above,
its model structure is like a tree and the relationships between records are defined in advance.
This feature is a double-edged sword.
Disadvantage
This type of database structure is that each child in the tree may have only one parent.
Relationships or linkages between children are not permitted, even if they make sense from a
logical standpoint. Hierarchical databases are like this in their design. Adding a new field or
record requires that the entire database be redefined.
Network Databases
Network database management systems (Network DBMSs) use a network structure to create
a relationship between entities. Network databases are mainly used on large digital
computers. Network databases are hierarchical databases, but unlike hierarchical databases
where one node can have a single parent only, a network node can have a relationship with
multiple entities. A network database looks more like a cobweb or interconnected network of
records.
In network databases, children are called members and parents are called occupiers. The
difference between each child or member is that it can have more than one parent.
The approval of the network data model is similar to a hierarchical data model. Data in a
network database is organized in many-to-many relationships.
Relational Databases
In a relational database management system (RDBMS), the relationship between data is
relational and data is stored in tabular form of columns and rows. Each column of a table
represents an attribute and each row in a table represents a record. Each field in a table
represents a data value.
Structured Query Language (SQL) is the language used to query RDBMS, including
inserting, updating, deleting, and searching records. Relational databases work on each table
that has a key field that uniquely indicates each row. These key fields can be used to connect
one table of data to another.
Relational databases are the most popular and widely used databases. Some of the popular
DDBMS are Oracle, SQL Server, MySQL, SQLite, and IBM DB2.
The relational database has two major advantages:
Object-Oriented Model
In this Model, we have to discuss the functionality of object-oriented Programming. It takes
more than the storage of programming language objects. Object DBMS's increase in the
semantics of C++ and Java. It provides full-featured database programming capabilities while
containing native language compatibility. It adds the database functionality to object
programming languages. This approach is analogical of the application and database
development into a constant data model and language environment. Applications require less
code, use more natural data modeling, and code bases are easier to maintain. Object
developers can write complete database applications with a decent amount of additional
effort.
The object-oriented database derivation is the integrity of object-oriented programming
language systems and consistent systems. The power of object-oriented databases comes
from the cyclical treatment of both consistent data, as found in databases, and transient data,
as found in executing programs.
Object-oriented databases use small, recyclable separated from software called objects. The
objects themselves are stored in the object-oriented database.
Each object contains two elements:
What is table
In data base management system, we use tables to store data. A table is a collection of related
data entries and contains rows and columns to store data. A table is the simplest example of
data storage in RDBMS. Let's see the example of student table.
1 Ajeet 24 B.Tech
2 aryan 20 C.A
3 Mahesh 21 BCA
4 Ratan 22 MCA
5 Vimal 26 BSC
Field is a smaller entity of the table which contains specific information about every record in
the table. In the above example, the field in the student table consist of id, name, age, course.
A column is a vertical entity in the table which contains all information associated with a
specific field in a table.
Ajee
t
Arya
Mah
esh
Rata
Vim
al
A row of a table is also called record. It contains the specific information of each individual
entry in the table. It is a horizontal entity in the table. For example: The above table contains
5 records.
1 Ajeet 2 B.Tech
4
A data warehouse is a technique for collecting and managing data from varied sources to
provide meaningful business insights. It is a blend of technologies and components which
allows the strategic use of data.
Data Warehouse is electronic storage of a large amount of information by a business which is
designed for query and analysis instead of transaction processing. It is a process of
transforming data into information and making it available to users for analysis.
What Is Data Mining?
Data mining is looking for hidden, valid, and potentially useful patterns in huge data sets.
Data Mining is all about discovering unsuspected/ previously unknown relationships amongst
the data. It is a multi-disciplinary skill that uses machine learning, statistics, AI and database
technology.
Characteristics of Data Warehouse
There are following characteristics of data warehouse
1. Subject-Oriented
A data warehouse target on the modelling and analysis of data for decision-makers.
Therefore, data warehouses typically provide a concise and straightforward view around a
particular subject, such as customer, product, or sales, instead of the global organization's
ongoing operations. This is done by excluding data that are not useful concerning the subject
and including all data needed by the users to understand the subject.
2. Integrated
A data warehouse integrates various heterogeneous data sources like RDBMS, flat files, and
online transaction records. It requires performing data cleaning and integration during data
warehousing to ensure consistency in naming conventions, attributes types, etc., among
different data sources.
3. Time-Variant
Historical information is kept in a data warehouse. For example, one can retrieve files from 3
months, 6 months, 12 months, or even previous data from a data warehouse. These variations
with a transactions system, where often only the most current file is kept.
4. Non-Volatile
The data warehouse is a physically separate data storage, which is transformed from the
source operational RDBMS. The operational updates of data do not occur in the data
warehouse, i.e., update, insert, and delete operations are not performed. It usually requires
only two procedures in data accessing: Initial loading of data and access to data. Therefore,
the DW does not require transaction processing, recovery, and concurrency capabilities,
which allows for substantial speedup of data retrieval. Non-Volatile defines that once entered
into the warehouse, and data should not change.
2) Store historical data: Data Warehouse is required to store the time variable data from the
past. This input is made to be used for various purposes.
3) Make strategic decisions: Some strategies may be depending upon the data in the data
warehouse. So, data warehouse contributes to making strategic decisions.
4) For data consistency and quality: Bringing the data from different sources at a
commonplace, the user can effectively undertake to bring the uniformity and consistency in
data.
5) High response time: Data warehouse has to be ready for somewhat unexpected loads and
types of queries, which demands a significant degree of flexibility and quick response time.
Data Mining Techniques
Data mining includes the utilization of refined data analysis tools to find previously
unknown, valid patterns and relationships in huge data sets. These tools can incorporate
statistical models, machine learning techniques, and mathematical algorithms, such as neural
networks or decision trees. Thus, data mining incorporates analysis and prediction.
1. Classification:
This technique is used to obtain important and relevant information about data and metadata.
This data mining technique helps to classify data in different classes.
Data mining techniques can be classified by different criteria, as follows:
i. Classification of Data mining frameworks as per the type of data sources mined:
This classification is as per the type of data handled. For example, multimedia, spatial
data, text data, time-series data, World Wide Web, and so on.
ii. Classification of data mining frameworks as per the database involved:
This classification based on the data model involved. For example. Object-oriented
database, transactional database, relational database, and so on...
iii. Classification of data mining frameworks as per the kind of knowledge
discovered:
This classification depends on the types of knowledge discovered or data mining
functionalities. For example, discrimination, classification, clustering,
characterization, etc. some frameworks tend to be extensive frameworks offering a
few data mining functionalities together.
iv. Classification of data mining frameworks according to data mining techniques
used:
This classification is as per the data analysis approach utilized, such as neural
networks, machine learning, genetic algorithms, visualization, statistics, data
warehouse-oriented or database-oriented, etc. The classification can also take into
account, the level of user interaction involved in the data mining procedure, such as
query-driven systems, autonomous systems, or interactive exploratory systems.
2. Clustering:
Clustering is a division of information into groups of connected objects. Describing the data
by a few clusters mainly loses certain confine details, but accomplishes improvement. It
models data by its clusters. Data modelling puts clustering from a historical point of view
rooted in statistics, mathematics, and numerical analysis. From a machine learning point of
view, clusters relate to hidden patterns, the search for clusters is unsupervised learning, and
the subsequent framework represents a data concept. From a practical point of view,
clustering plays an extraordinary job in data mining applications. For example, scientific data
exploration, text mining, information retrieval, spatial database applications, CRM, Web
analysis, computational biology, medical diagnostics, and much more.
In other words, we can say that Clustering analysis is a data mining technique to identify
similar data. This technique helps to recognize the differences and similarities between the
data. Clustering is very similar to the classification, but it involves grouping chunks of data
together based on their similarities.
3. Regression:
Regression analysis is the data mining process is used to identify and analyse the relationship
between variables because of the presence of the other factor. It is used to define the
probability of the specific variable. Regression, primarily a form of planning and modelling.
For example, we might use it to project certain costs, depending on other factors such as
availability, consumer demand, and competition. Primarily it gives the exact relationship
between two or more variables in the given data set.
4. Association Rules:
This data mining technique helps to discover a link between two or more items. It finds a
hidden pattern in the data set. Association rules are if-then statements that support to show
the probability of interactions between data items within large data sets in different types of
databases. Association rule mining has several applications and is commonly used to help
sales correlations in data or medical data sets. The way the algorithm works is that you have
various data, for example, a list of grocery items that you have been buying for the last six
months. It calculates a percentage of items being purchased together.
These are three major measurements technique:
o Lift: This measurement technique measures the accuracy of the confidence over how
often item B is purchase
(Confidence) / (item B)/ (Entire dataset)
o Support: This measurement technique measures how often multiple items are
purchased and compared it to the overall dataset.
(Item A + Item B) / (Entire dataset)
o Confidence: This measurement technique measures how often item B is purchased
when item A is purchased as well
(Item A + Item B)/ (Item A)
5. Outer detection:
This type of data mining technique relates to the observation of data items in the data set,
which do not match an expected pattern or expected behaviour. This technique may be used
in various domains like intrusion, detection, fraud detection, etc. It is also known as Outlier
Analysis or Outlier mining. The outlier is a data point that diverges too much from the rest of
the dataset. The majority of the real-world datasets have an outlier. Outlier detection plays a
significant role in the data mining field. Outlier detection is valuable in numerous fields like
network interruption identification, credit or debit card fraud detection, detecting outlying in
wireless sensor network data, etc.
6. Sequential Patterns:
The sequential pattern is a data mining technique specialized for evaluating sequential
data to discover sequential patterns. It comprises of finding interesting sub sequences in a set
of sequences, where the stake of a sequence can be measured in terms of different criteria like
length, occurrence frequency, etc.
7. Prediction:
Prediction used a combination of other data mining techniques such as trends, clustering,
classification, etc. It analyses past events or instances in the right sequence to predict a future
event.
What is Business Intelligence
BI (Business Intelligence) is a set of processes, architectures, and technologies that
convert raw data into meaningful information that drives profitable business actions. It is
a suite of software and services to transform data into actionable intelligence and
knowledge.BI has a direct impact on organization's strategic, tactical and operational
business decisions. BI supports fact-based decision making using historical data rather
than assumptions and gut feeling.
OR
The term Business Intelligence (BI) refers to technologies, applications and practices for
the collection, integration, analysis, and presentation of business information. The
purpose of Business Intelligence is to support better business decision making.
Essentially, Business Intelligence systems are data-driven Decision Support Systems
(DSS)