UNIT 5
Brief introduction to different database types
RDBMS:
As I have mentioned previously, SQL-based RDBMS software of various makes was
traditionally used as the most common data storage technology. All of RDBMS
systems consist of tables that can be joined to each other via common values in certain
columns or foreign keys .
All of the systems that belong in this category use SQL as the language that retrieves
and manipulated the data and dictates the structure of the databases themselves.
Although different RDBMS systems use distinct dialects of SQL (e.g.
Oracle’s PL/SQLvs Microsoft’s T-SQL), the core of the syntax is exactly the same.
Most of the database design best practices are also applicable to all makes of
RDBMS. The best-known examples of RDBMS are Microsoft SQL Server, Oracle
Database, MySQL and PostgreSQL.
NOSQL
NoSQL, on the other hand, consists of several completely unrelated technologies,
each consisting of its own data manipulation language, capabilities and best practices.
They can broadly be split into 4 distinct categories. However, this categorization is
very broad, as no two NoSQL systems from the same category are similar enough to
each other that knowing one of them would imply that only relatively little learning
effort would be required to become proficient in the other one, like it is the case with
RDBMS. The categories are as follows:
1. Key-Value store: The data is stored in a hash table where each unique key
corresponds to a particular data object. Examples
include DynamoDB, InfinityDB and Redis.
2. Document Store: Data is stored in a form of an object written in a declarative
language, such as JSON or XML. Examples
include MongoDB, CounchDB and DocumentDB.
3. Column-oriented DBMS: Data is stored in tables, just like in traditional
RDBMS, but it is partitioned by columns rather than rows. Examples
include HBase, MariaDB and Metakit.
4. Graph database: Database is expressed in a form of network that can be
visualized. Examples include Neo4j, InfiniteGraph and ArangoDB.
Selection of right databases, RDBMS or NoSQL
based on the following Parameters:
1. Type of application:
When you need a small-scale system or a website
or Type of application/System
If you are building something small that is intended to remain relatively small, classic RDBMS is
the best solution by far. This is because the technology is very mature and there is a large
number of support available on the web.
Many NoSQL databases evolve rapidly, so the information about a particular software package
that you may find online may be out of date already. However, with RDBMS, you can be sure
that the tutorial you will read today will not be obsolete tomorrow.
Also, the performance of RDBMS software only noticeably degrades when you start having
tables several gigabytes in size. Therefore, on a small system, you are unlikely to ever hit the
limitations of RDBMS.
Many technologies that are intended to act as a framework for a small-scale systems are already
using RDBMS. For example, SQLite is used as a data storage mechanisms for mobile apps and
embedded systems
2. Frequency of Accessing Data:
When you need a system that is frequently used and is
expected to grow significantly
The major limitations of RDBMS systems are degrading performance as the tables
grow and difficulty in scaling systems up.
The data is retrieved slowly from large tables because selection queries do table scans
and, as the data grows, greater chunks of disk space need to be scanned. This problem
can be mitigated by adding indexes, so a much smaller collection of data is scanned
before the right data is retrieved.
However, indexes have their own problem. Every time new data is inserted, index
needs to be updated. This will make insertion operations slow, which may not be so
good for multi-user interactivity scenario.
3. Type of Data:
When you would expect your data structure to change
fairly often
This is a scenario where RDBMS is not the best choice. Relational databases come with a rigid
pre-defined schema and operations that significantly change the data structure can be
computationally expensive; especially when your system already contains large quantities of
data.
Most of NoSQL databases are schema-less; therefore data can be stored in any format.
4. Data Size
As a database grows in size or the number of users multiplies, many
RDBMS-based sites suffer serious performance issues.
The first consideration that needs to be made when selecting a database is the
characteristics of the data you are looking to leverage. If the data has a simple
tabular structure, like an accounting spreadsheet, then the relational model
could be adequate.
Data such as geo-spatial, engineering parts, or molecular modeling, on the
other hand, tends to be very complex. It may have multiple levels of nesting
and the complete data model can be complicated. Such data has, in the past,
been modeled into relational tables, but has not fit into that two-dimensional
row-column structure naturally.
In similar cases today, one should consider NoSQL databases as an option.
Multi-level nesting and hierarchies are very easily represented in the
JavaScript Object Notation (JSON) format used by some NoSQL products.
5. Based On Performance
The first option of sticking with RDBMSs remains attractive, especially since there are numerous
tools in the RDBMS ecosystem to help manage the various data types used in enterprises today.
If, for example, you’re working with a combination of product data, social media data, and sales
data in the same application, you can use object-relational mapping (ORM) tools to simplify the
process of modeling complex data into RDBMS tables. That tends to work well, so there doesn’t
appear to be much of a trade-off. However, in many situations, the complex SQL generated by
your ORM tools can create a significant performance drain. While some performance issues
might not show up in smaller environments, with sources like social media (with data that
continually grows), you will soon put a heavy load on your RDBMS, and your users will suffer a
performance hit. Also, it’s important to recognize early that your production loads will be quite
different from those of your smaller test environment, so satisfactory performance during a
testing stage might not extrapolate to a production environment.
Certainly, the performance hit could be overcome with more powerful hardware, but the ongoing
upgrade process of your hardware is the exact issue that NoSQL databases will solve. Instead of
having to purchase new servers with more memory, NoSQL databases support a scale-out model
where you can simply add new servers to the cluster. This allows you to grow
incrementally without having to replace your existing hardware investment. This benefit alone
makes it worthwhile to consider NoSQL for many of your modern use cases
6. Based on Business needs
The best way to determine which database is right for your business is to analyze what
you need its functions to be.
SQL is a good choice for:
Any organization that will benefit from a predefined structure and set schemas,
particularly if they require multi-row transactions.
Situations when all data must be consistent without leaving room for error, such
as with accounting systems.
NoSQL is a good choice for those companies experiencing rapid growth with no clear
schema definitions. NoSQL offers much more flexibility than a relational database
and is a solid option for companies who must analyze large quantities of data or
whose data structures they manage are variable.