[System Design] Part#2: My notes on - How will you choose any component for System Design
This is kind of an addition to earlier “System design notes” medium post and I am creating a new post here , just dedicated to “how and why we choose any database/component for a system design purpose?”. I will try to add all my notes on why choice1 over choice2 or vice versa? I will continue to UPDATE this post as I find new things to add !!!
Disclaimer: There is no “one-size-fits-all” solution when it comes to optimizing the performance of a system whether its backend or frontend components. Trade-offs are inevitable in many situations , but our effort should be focused on getting best out of the situation as much as possible. I have made few notes as I read/listened to hundreds of articles/YouTube videos and I am just sharing them through these medium posts :). If you find any better resource on any of these topics, feel free to share in comments. My intention is to learn and share.
Key database concepts which can help us choosing the RIGHT database for our system design -
(1) CAP theorem
(2) Transactions / Need for ACID properties
(3) Schema structure : fixed/not frequently changing, dynamic
(4) Scalability need
Table of contents:
CAP theorem in selections:
C+P (consistency + partition tolerance) :
- HBase(part of HDFS and runs on top of the Hadoop Cluster)
A+P(availability+ partition tolerance):
Apache Cassandra [ NoSQL]:
- No ad-hoc query support: You need to model the data based on “how you query” !
- Leaderless replication relies on quorum to ensure consensus; write to any of the replica and replica is responsible for broadcasting it to other nodes
- Aggregation is on the partition level. If you want to run aggregate queries on Cassandra then you need to specify the partition key since the aggregation will happen on the partition level. So better use some other db like Apache Spark streaming for “aggregation”
- No joins or Can not sort by a partition key column
- Cassandra’s data distribution is based on consistent hashing. It works like this: every node has a token defining the range of this node’s hash values. During the write, Cassandra transforms the data’s partition key into a hash value and checks the tokens to identify the needed node. When Cassandra finds the needed node, it stores the data on it and replicates it to a number of other nodes. This particular number depends on the tunable replication factor, but usually, it’s 3. This means that your data is stored on 3 separate nodes, and if one or even two of them fail, your data will still be available.
- Scanning performance issue:
Cassandra’s read is very quick and efficient as long as you know the primary key of the data you need. If you don’t, to find the required data, you may need to resort to scanning. And Cassandra doesn’t like scans: if it takes longer than a particular time, it returns an error and your data will probably not be found. However, if you integrate Cassandra with Apache Spark, performant scans become more available.
- Cassandra is good for IoT, recommendation and personalization engines, fraud detection, messaging systems, etc. Cassandra’s quick write and read operations coupled with extremely low latency and linear scalability make it a nice fit for these applications
- “Not good-fit for” :
- When you want a lot of different types of queries or you can’t predict your data usage
- When you want strong ACID compliance
- When you want many-to-many mappings or join tables
- When you don’t want a rigid schema
- It can be used as a Distributed File System or as a Message Queue(It guarantees ordering)
- Although Zookeeper provides similar functionality as Paxos algorithm(Consensus protocol based), the core consensus algorithm of Zookeeper is not Paxos; its called ZAB, short for ZooKeeper Atomic Broadcast. Like Paxos, it relies on a quorum for durability.
- What’s “having a quorum”? — It means that more than half of the number of nodes are up and running. If your client is connecting with a Zookeeper server which does not participate in a quorum, then it will not be able to answer any queries. This is the only way Zookeeper is capable of protecting itself against split brains in case of a network partition.
Remember, zookeeper is CP system w.r.t CAP theorem; it sacrifices Availability to achieve Consistency.
- Possibly “Not good fit for” =>
* application logging; choose something with less consistency requirements
* storing binaries; Instead store binaries in S3 and store the URLs in zookeeper
* metrics; scaling is problem
- Cassandra + Spark is suitable for “Recommendation Engine + Real-time analytics”
While Cassandra is all about the storage and distribution of data, Spark is about computation. Cassandra and Spark fit very well together in big data architecture because Cassandra is well-designed to store and dispatch any amount of data in milliseconds and Spark handles complicated lab queries and data analytics. In other words, Cassandra stores the data; Spark worker nodes are co-located with Cassandra and do the data processing.
- It’s an open-source data analytics cluster computing framework; It is built on top of Hadoop Distributed File System.
Data analytics can be applied to:
(1) Making recommendations
(2) Fraud detection
(3) Social networks and weblink analysis
(4) Marketing and advertising decisions
(5) Sales and stock market analytics
- Spark has capability to process the iterative and interactive machine learning algorithms faster than traditional MapReduce model.
(1) Spark performs better than Hadoop in case certain specific applications. Spark is not restricted by two stage MapReduce paradigm
(2) The data loaded into main memory can be used repeatedly by subsequent database accesses and it speeds up the entire response time
Amazon Dynamo DB:
- Key-value and document NoSQL database
- A table which has collections of items and each item is collection of attributes
- Uses primary key to shard the data and sort key to sort within the partition
- Eventually and strongly consistent reads
- Conditional writes help complex transactions
- Global Tables automatically replicate all the tables across the regions