Database by LinuxAcedemy

Database by LinuxAcedemy        1

Links        2

Database design architecting concepts        2

Database design depends on 3 categories        2

Database Concepts        2

Choosing the Database        3

When to use RDB        3

Provider of RDB which provides ACID guarantee        4

Edge case        4

Architecting Database        4

Must haves        4

RDS        5

When to use        5

OLTP        5

OLAP        6

Download and Installation        6

Database        6

Type of database        7

MongoDB        8

TODO        8

Create a database        8

Update database        8

Delete a record        8

Bulk import data        9

Couchbase        9

PostgreSQL        12

Create/Read database        12

Update Table        12

Delete table        12

Drop table        12

Postgres vs MySql        13


Database design architecting concepts

Database design depends on 3 categories
  1. How organized your data is (structured vs non-structured)
  2. Query pattern you have
  3. Amount of scale that you need to handle

Database Concepts

  1. Caching - use Redis
  2. Storage - Blob Type - Use S3 + CDN
  3. Text searching capabilities - Text search engine - Elastic Search + Solar = Apache Lucene
  1. Fuzzy search when someone types wrong spelling accidentally; database has to return the desired result
  2. Elastic Search and Solar are Search Engines and not databases
  1. Search Engines: potentially data could be lost. It provides availability and redundancy  
  2. Database: it gives you a guarantee that data wouldn’t be lost
  3. Takeaways:
  1. Never keep search engines as your primary source of data
  2. primary source data store should be somewhere else; you could load the data using Search Engines to provide searching capabilities
  1. Store metric data/ Time series - Open TSDB
  1. Application metrics tracking system: apps that are pushing metrics related to their throughput, CPU utilization, and latencies.
  1. Now you want to build a system to support the above capabilities - use Time-series Database  
  1. The time-series database is an extension of RDB with certain functionalities
  1. Regular database: you can update a lot of records, giving you the ability to query random records
  2. Metric monitoring system: you will never do random updates
  1. You’ll do a sequential update in an append-only mode where first entry T1 and second entry T2 will be greater than T1
  2. Read query that you do is bulk with the time range; ex: query for last hour, day, week data.  
  1. Data warehouse for analytics - Use Hadoop
  1. A large database in that you dump all the data; provide various querying capabilities on top of the data to serve a lot of reports.
  1. Not suited for a transactional system; it’s used for offline reporting
  1. Analytics for all the business transactions, total orders, geographic location

Choosing the Database

If you have structured data use RDB

When to use RDB
Provider of RDB which provides ACID guarantee
Edge case
  1. When you have relational data but need ACID guarantee // you can choose either relational or non-relational database
  1. Because normally you could easily map a structured data into no-sql model

Architecting Database

Must haves
  1. Multi Az for disaster capabilities
  1. How it works?
  1. Primary database in one AZ and Secondary database(exact copy of your primary database) in another AZ
  2. If your primary DB shuts down, server then use standby database endpoint by filing over to standby database
  1. It’s not used when your primary database is still running
  1. How technically it works?
  1. You get a endpoint of both of your database
  2. EC2 server hooked up with your primary database
  3. AWS automatically updated the DNS endpoint to point to another IP address of your standby DB when primary DB shuts down
  1. Useful for  - Failover capabilities
  1. This is just for disaster recovery and not for improving performance
  2. Can’t connect to standby database when the primary database is active.
  3. Can’t run your queries on your second database with multi AZ
  4. Can’t send your web server traffic to secondary database
  5. Your second database simply there for failure
  1. Improve performance using read replicas
  1. Automated backups
  2. Improve RDS performance by read replicas
  1. How it works?
  1. You have exact replica of your primary database and you can use it for read querying
  2. Can be in cross AZ or cross region
  1. How tech works
  1. With read replica you get your own unique DNS endpoint. So 1 for primary and 1 for RR
  2. RR can promoted and make it independent database, and this will break the replication  
  3. You do it when doing online processing  where you can queuing to this database without affecting your primary
  1. Useful for
  1. Scaling read performance
  2. Requires automatic backups in order to deploy read replica
  3. Multiple read replica supported (5 per each DB instance)


When to use

Download and Installation

Wget: download the software

Sudo yum installation: install the software  


Type of database

  1. Flat files
  1. Regular file, could have text, comma separator, XML, YMAL
  1. Relational
  1. It’s a schema-based database
  1. Has rows and columns; spreadsheets
  2. Always have column numbers and data type of the entires
  1. Schemaless // NoSql
  1. Key-value pair (json)



Steps to installation:

  1. Installation MongoDb server
  2. Working in a server to create, read, update, and delete database
Create a database  

Create a collection(table)

Update database
Delete a record
Bulk import data

Installation steps:

  1. Install couchbase server
  2. Start and setup(for login to the server) server
  3. Connect to a server


  1. create a bucket
  1. It holds a document // S3 bucket
  2. The document is object, descriptors for different items
  1. Create a document
  1. Query: document name/id & JSON Data


  1. Index a bucket to query data (query like SQL) // to read bucket
  1. To perform actions to look up things
  1. Update the value in bucket
  2. Clean up bucket




Postgres vs MySql

Postgres: is an object-relational database management system






object-relational database management system

  • features like table inheritance and function overloading

No features included such as table inheritance and function overloading

Offers Advanced features that are difficult to set up but boost database for specific use cases

Requires less expertise to manage and boost higher performance

Support for a programming language like .net, js, python

Use case: more complex websites and apps  that need highly customizable database solution

Use case:

  • web base application
  • Used when Speed is a big concern
  • With No additional features  

Lightweight and high-performance

Support: open-source  

  • Fewer resources compare to MySql
  • You may have to troubleshoot a issue on your own when the problem is complex

Support: open-source

  • More popular and that’s why it has more resources than any other tool