Distributed Systems Concepts
CORE CONCEPTS
Distributed System: A collection of interconnected computers that communicate and coordinate to achieve a common goal
Scalability: Handling increased workload
Horizontal: Add more machines (nodes)
Vertical: Upgrade existing machines (e.g. CPU, RAM)
Reliability: Functioning correctly despite failures
Availability: Proportion of uptime
Consistency: Agreement on data values across all nodes at a given time
Partition Tolerance: Operating despite network splits
CAP Theorem: (Pick two) Consistency, Availability, Partition Tolerance
State Management: Data storage/synchronization across nodes (stateless/stateful)
Data Partitioning: Dividing data for scalability (horizontal/vertical, hash/range-based)
Replication Strategies: Copying data for fault tolerance
(synchronous/asynchronous/quorum-based)
Consistency Models: Trade-offs between data agreement and performance (strong/eventual/causal)
Time and Order: Event ordering (Lamport timestamps/ vector clocks)
Fault Tolerance: Handling failures gracefully (failover/redundancy)
Concurrency Control: Managing simultaneous access (optimistic/pessimistic)
ARCHITECTURES
Client-Server: Clients request services from servers
Peer-to-Peer (P2P): Nodes act as both clients and servers
Master-Slave: One master node coordinates, multiple slaves execute Microservices: System decomposed into small, independent services
Event-Driven Architecture: Components communicate by reacting to events
Service-Oriented Architecture (SOA): A collection of services that communicate with each other
Lambda Architecture: A hybrid architecture for batch and real-time processing
KEY TECHNOLOGIES
Container Orchestration: Kubernetes, Docker Swarm Service Discovery: Consul, etc
API Gateways: Kong, Tyk, Apigee, AWS API Gateway Load Balancers: HAProxy, NGINX, Traefik, Amazon ELB
Monitoring & Observability: Prometheus, Grafana
Distributed Tracing: Zipkin, Jaeger Service Mesh: Istio, Linkerd
Infrastructure as Code (laC): Terraform, Pulumi, CloudFormation Configuration Management: Ansible, Chef, Puppet Distributed Caches: Redis, Memcached
Stream Processing: Apache Flink, Apache Spark Streaming Workflow Orchestration: Apache Airflow, Argo Workflows Job Schedulers: Quartz, Apache Oozie, Kubernetes CronJobs Cloud Providers: AWS, Azure, Google Cloud Platform Serverless Platforms: AWS Lambda, Azure Functions, Google Cloud Functions
FALLACIES OF DISTRIBUTED COMPUTING
The network is reliable
Latency is zero
Bandwidth is infinite
The network is secure
Topology doesn’t change
There is one administrator
Transport cost is zero
The network is homogeneous
COMMUNICATION
Remote Procedure Call (RPC): Call a function on a remote machine Message Passing: Asynchronous communication between nodes
Message Queues (Point-to-Point): RabbitMQ, Amazon SQS
Message Brokers (Publish-Subscribe): Kafka, Apache Pulsar
REST: Query language for APls with fine-grained data fetching GraphQL: TLS/SSL, HTTPS, SSH
gRPC: High-performance RPC framework using Protocol Buffers Webhooks: Automatic notifications sent when certain events occur
COORDINATION & CONSENSUS
Consensus Algorithms: Paxos, Raft
Distributed Locks: ZooKeeper, etcd
Distributed Transactions: Two-phase commit (2PC)
Leader Election: Choosing a coordinator node
Gossip Protocol: Decentralized information dissemination
DESIGN PRINCIPLES
Loose Coupling: Minimize component dependencies
High Cohesion: Group related functionality together
Abstraction: Hide implementation details
Single Responsibility: Each component does one thing well Separation of Concerns: Divide system into distinct modules
Statelessness: (Where possible) Avoid storing session data
Idempotence: Operations can be repeated safely
Eventual Consistency: Accept temporary inconsistency for availability and performance
Fail-Fast: Detect and report errors quickly
Circuit Breaker: Isolate failing components
Retry Mechanisms: Automatically retry failed operations
Asynchronicity: Design for non-blocking communication Timeouts/Deadlines: Set limits to prevent indefinite waits
DATA MANAGEMENT
Distributed Databases - NoSQL: MongoDB, Cassandra, DynamoDB (flexible schema, scalable) - NewSQL: CockroachDB, YugabyteDB (scalable with SQL-like features) - Distributed File Systems
HDFS: For big data (Hadoop)
Ceph: Unified, distributed storage system
Caching: Redis, Memcached (in-memory for speed)
ACID: Atomicity, Consistency, Isolation, Durability (relational database guarantees)
BASE: Basically Available, Soft state, Eventual consistency (NoSQL database characteristics)
Data Partitioning: Distributing data across nodes (sharding, hashing, range partitioning)
Replication Strategies: Copying data for fault tolerance