Relational and Non-Relational Databases

Một phần của tài liệu Balusamy b big data concepts, technology, and architecture 2021 (Trang 57 - 61)

Relational databases organize data into tables of rows and columns. The rows are called records, and the columns are called attributes or fields. A database with only one table is called a flat database, while a database with two or more tables that are related is called a relational database. Table 2.1 shows a simple table that stores the details of the students registering for the courses offered by an institution.

In the above example, the table holds the details of the students and CourseId of the courses for which the students have registered. The above table meets the basic needs to keep track of the courses for which each student has registered. But it has some serious flaws in accordance with efficiency and space utilization. For example, when a student registers for more than one course, then details of the student has to be entered for every course he registers. This can be overcome by dividing the data across multiple related tables. Figure 2.12 represents the data in the above table is divided among multiple related tables with unique primary and foreign keys.

Relational tables have attributes that uniquely identify each row. The attributes which uniquely identify the tuples are called primary key. StudentId is the primary key, and hence its value should be unique. Attribute in one table that references to

StudentTable

StudentId StudentName Phone DOB

1615 James 541 754 3010 03/05/1985

1418 John 415 555 2671 05/01/1992

1718 Richard 415 570 2453 09/12/1999 1313 Michael 555 555 1234 12/12/1995 1718 Richard 415 555 2671 02/05/1989

ID CourseId Faculty

1615 1 Dr.Jeffrey

1418 2 Dr.Lewis

1718 2 Dr.Philips

1313 3 Dr.Edwards

1819 4 Dr.Anthony

RegisteredCourse

CoursesOffered CourseId CourseName

1 Databases

2 Hadoop

3 R Programming 4 Data Mining

Figure 2.12 Data divided across multiple related tables.

StudentName Phone DOB CourseId Faculty

James 541 754 3010 03/05/1985 1 Dr.Jeffrey

John 415 555 2671 05/01/1992 2 Dr.Lewis

Richard 415 570 2453 09/12/1999 2 Dr.Philips Michael 555 555 1234 12/12/1995 3 Dr.Edwards Richard 415 555 2671 02/05/1989 4 Dr.Anthony

Attributes/Fields

Tuples Table 2.1 Student course registration database.

the primary key in another table is called foreign key. CourseId in RegisteredCourse is a foreign key, which references to CourseId in the CoursesOffered table.

Relational databases become unsuitable when organizations collect vast amount of customer databases, transactions, and other data, which may not be structured to fit into relational databases. This has led to the evolution of non-relational databases, which are schema-less. NoSQL is a non-relational database and a few frequently used NoSQL databases are Neo4J, Redis, Cassandra, and MongoDb. Let us have a quick look at the properties of RDBMS and NoSQL databases.

2.4.1 RDBMS Databases

RDBMS is vertically scalable and exhibits ACID (atomicity, consistency, isolation, durability) properties and support data that adhere to a specific schema. This schema check is made at the time of inserting or updating data, and hence they are not ideal for capturing and storing data arriving at high velocity. The architectural limitation of RDBMS makes it unsuitable for big data solutions as a primary storage device.

For the past decades, relational database management systems that were run- ning in corporate data centers have stored the bulk of the world’s data. But with the increase in volume of the data, RDBMS can no longer keep pace with the volume, velocity, and variety of data being generated and consumed.

Big data, which is typically a collection of data with massive volume and variety arriving at a high velocity, cannot be effectively managed with traditional data management tools. While conventional databases are still existing and used in a large number of applications, one of the key advancements in resolving the prob- lems with big data is the emergence of modern alternate database technologies that do not require any fixed schema to store data; rather, the data is distributed across the storage paradigm. The main alternative databases are NoSQL and NewSQL databases.

2.4.2  NoSQL Databases

A NoSQL (Not Only SQL) database includes all non-relational databases. Unlike RDBMS, which exhibits ACID properties, a NoSQL database follows the CAP theo- rem (consistency, availability, partition tolerance) and exhibits the BASE (basically, available, soft state, eventually consistent) model, where the storage devices do not provide immediate consistency; rather, they provide eventual consistency. Hence, these databases are not appropriate for implementing large transactions.

The various types of NoSQL databases, namely, Key-value databases, document databases, column-oriented databases, graph databases, were discussed in detail in Section 2.3. Table 2.2 shows examples of various types of NoSQL databases.

2.4.3  NewSQL Databases

NewSQL databases provide scalable performance similar to that of NoSQL systems combining the ACID properties of a traditional database management system. VoltDB, NuoDB, Clustrix, MemSQL, and TokuDB are some of the exam- ples of NewSQL database.

NewSQL databases are distributed in nature, horizontally scalable, fault tolerant, and support relational data model with three layers: the administrative layer, transactional layer, and storage layer. NewSQL database is highly scalable and operates in shared nothing architecture. NewSQL has SQL compliant syntax and uses relational data model for storage. Since it supports SQL compliant syn- tax, transition from RDBMS to the highly scalable system is made easy.

The applications targeting these NewSQL systems are those that execute the same queries repeatedly with different inputs and have a large number of transac- tions. Some of the commercial products of NewSQL databases are briefed below.

2.4.3.1 Clustrix

Clustrix is a high performance, fault tolerant, distributed database. Clustrix is used in applications with massive, high transactional volume.

2.4.3.2  NuoDB

NuoDB is a cloud based, scale-out, fault tolerant, distributed database. They sup- port both batch and real-time SQL queries.

2.4.3.3 VoltDB

VoltDB is a scale-out, in-memory, high performance, fault tolerant, distributed database. They are used to make real-time decisions to maximize business value.

2.4.3.4  MemSQL

MemSQL is a high performance, in-memory, fault tolerant, distributed database.

MemSQL is known for its blazing fast performance and used for real-time analytics.

Table 2.2 Popular NoSQL databases.

Key-value databases Document databases Column databases Graph databases

Redis MongoDB DynamoDB Neo4j

Riak CouchDB Cassandra OrientDB

SimpleDB RethinkDB Accumulo ArangoDB

BerkeleyDB Oracle MarkLogic Big Table FlockDB

Một phần của tài liệu Balusamy b big data concepts, technology, and architecture 2021 (Trang 57 - 61)

Tải bản đầy đủ (PDF)

(371 trang)