Database by LinuxAcedemy

Fuzzy search when someone types wrong spelling accidentally; database has to return the desired result
Elastic Search and Solar are Search Engines and not databases

Search Engines: potentially data could be lost. It provides availability and redundancy
Database: it gives you a guarantee that data wouldn’t be lost
Takeaways:

Never keep search engines as your primary source of data
primary source data store should be somewhere else; you could load the data using Search Engines to provide searching capabilities

Store metric data/ Time series - Open TSDB

Application metrics tracking system: apps that are pushing metrics related to their throughput, CPU utilization, and latencies.

Now you want to build a system to support the above capabilities - use Time-series Database

The time-series database is an extension of RDB with certain functionalities

Regular database: you can update a lot of records, giving you the ability to query random records
Metric monitoring system: you will never do random updates

You’ll do a sequential update in an append-only mode where first entry T1 and second entry T2 will be greater than T1
Read query that you do is bulk with the time range; ex: query for last hour, day, week data.

Data warehouse for analytics - Use Hadoop

A large database in that you dump all the data; provide various querying capabilities on top of the data to serve a lot of reports.

Not suited for a transactional system; it’s used for offline reporting

Analytics for all the business transactions, total orders, geographic location

Choosing the Database

If you have structured data use RDB

Where you can store data in terms of tables = rows + columns
ex: web app where you store user ID, email, DOB, City, PhoneNo

When to use RDB

When you need ACID = atomicity and transactional guarantee

ex: building a payment system in bank where one person is debiting about into another person’s account
Query has to work fully and about either has to be debited to not called transactional query

Provider of RDB which provides ACID guarantee

My Sql
Oracle
SQL Server
Postgres

Edge case

When you have relational data but need ACID guarantee // you can choose either relational or non-relational database

Because normally you could easily map a structured data into no-sql model

Architecting Database

Must haves

Multi Az for disaster capabilities

How it works?

Primary database in one AZ and Secondary database(exact copy of your primary database) in another AZ
If your primary DB shuts down, server then use standby database endpoint by filing over to standby database

It’s not used when your primary database is still running

How technically it works?

You get a endpoint of both of your database
EC2 server hooked up with your primary database
AWS automatically updated the DNS endpoint to point to another IP address of your standby DB when primary DB shuts down

Useful for - Failover capabilities

This is just for disaster recovery and not for improving performance
Can’t connect to standby database when the primary database is active.
Can’t run your queries on your second database with multi AZ
Can’t send your web server traffic to secondary database
Your second database simply there for failure

Improve performance using read replicas

Automated backups
Improve RDS performance by read replicas

How it works?

You have exact replica of your primary database and you can use it for read querying
Can be in cross AZ or cross region

How tech works

With read replica you get your own unique DNS endpoint. So 1 for primary and 1 for RR
RR can promoted and make it independent database, and this will break the replication
You do it when doing online processing where you can queuing to this database without affecting your primary

Useful for

Scaling read performance
Requires automatic backups in order to deploy read replica
Multiple read replica supported (5 per each DB instance)

RDS

When to use

Used for online transaction processing workload- OLTP

OLTP

Process the data from Transactions in real time
Ex: when you order something from amazon it has to store order details in table

OLAP

Process complex queries with large amount of data to analyze historic data
ex: net profit for past 3 years

Download and Installation

Wget: download the software

Sudo yum installation: install the software

Database

It’s a collection of data
It’s organized data, and data is organized in a way that we can go and retrieve it

If the data isn’t organized and can’t retrieve it, we need a database management system that makes it easier to work with data

MongoDB
Couchbase
PostgreSQL
MSSQL
MySql

Type of database

Flat files

Regular file, could have text, comma separator, XML, YMAL

Relational

It’s a schema-based database

Has rows and columns; spreadsheets
Always have column numbers and data type of the entires

Schemaless // NoSql

Key-value pair (json)

Databases

MongoDB

For NoSql

Steps to installation:

Installation MongoDb server
Working in a server to create, read, update, and delete database

TODO

Create a database

Use employees // will create employees database

Create a collection(table)

Is a logical grouping of document(json pages). It’s like a table

db.createCollection(“staff_info”) // create a collection/table
db.staff_info.insert({name:”John”,phone:”1234567890”,status:”okay” }) // insert value to the table

Calling a method on collection
Insert an array of values

Show collections // to see list of collections
db.staff_info.find() // see all the entries, select all

db.staff_info.find({name:”John”}) // gives entry named John

Update database

Update a record in database

db.staff_info.updateOne({name:”John”}, {$set:{status:”Not oKAY”}})

Find the name John
Update the status with that name from Okay to Not oKAY

Delete a record

db.staff_info.remove({name: “John”}) // remove record named with John
db.staff_info.drop() // drop the collection/table
db.dropDatabase() // drop database

Will delete data in the database

Bulk import data

Create a database and import data from csv file

mongoimport -d cityinfo -c cities --ype CSV --file ./cities.csv --headerline

Couchbase

It’s N1QL(non-first normal form query language) // Nickel
Stores document in JSON
Looks like sql

Installation steps:

Install couchbase server
Start and setup(for login to the server) server
Connect to a server

TODO

create a bucket

It holds a document // S3 bucket
The document is object, descriptors for different items

Create a document

Query: document name/id & JSON Data

Index a bucket to query data (query like SQL) // to read bucket

To perform actions to look up things

Update the value in bucket
Clean up bucket

PostgreSQL

RDBMS Database
Is a Sql standard compliant and requires a schema

Installaton

Install Postgre
Intialize database server
Enable and start the server/service
Connect to PostgreSQL

TODO

Create/Read database

Create database music; // to create music database
Create table friends() // to create table with schema
Insert into friends VALUES (‘John’, ‘123456’, ‘jessy’’) // insert values in the table

Update Table

Update friend set status=‘mine’ where name=’John’;

Delete table

Delete from friends where name=’John’; // to delete data

Drop table

Drop table friends; // to delete table

TASK

Copy csv data into the table

Postgres vs MySql

Postgres: is an object-relational database management system

Includes features like table inheritance and function overloading // Key difference in comparison with MySql

MySql:

Postgres	MySql
RDMS	RDMS
object-relational database management system features like table inheritance and function overloading	No features included such as table inheritance and function overloading
Offers Advanced features that are difficult to set up but boost database for specific use cases	Requires less expertise to manage and boost higher performance
Support for a programming language like .net, js, python
Use case: more complex websites and apps that need highly customizable database solution	Use case: web base application Used when Speed is a big concern With No additional features
	Lightweight and high-performance
Support: open-source Fewer resources compare to MySql You may have to troubleshoot a issue on your own when the problem is complex	Support: open-source More popular and that’s why it has more resources than any other tool