Distributed Systems Concepts

CORE CONCEPTS

  • Distributed System: A collection of interconnected computers that communicate and coordinate to achieve a common goal

  • Scalability: Handling increased workload

    • Horizontal: Add more machines (nodes)

    • Vertical: Upgrade existing machines (e.g. CPU, RAM)

  • Reliability: Functioning correctly despite failures

  • Availability: Proportion of uptime

  • Consistency: Agreement on data values across all nodes at a given time

  • Partition Tolerance: Operating despite network splits

  • CAP Theorem: (Pick two) Consistency, Availability, Partition Tolerance

  • State Management: Data storage/synchronization across nodes (stateless/stateful)

  • Data Partitioning: Dividing data for scalability (horizontal/vertical, hash/range-based)

  • Replication Strategies: Copying data for fault tolerance

  • (synchronous/asynchronous/quorum-based)

  • Consistency Models: Trade-offs between data agreement and performance (strong/eventual/causal)

  • Time and Order: Event ordering (Lamport timestamps/ vector clocks)

  • Fault Tolerance: Handling failures gracefully (failover/redundancy)

  • Concurrency Control: Managing simultaneous access (optimistic/pessimistic)

ARCHITECTURES

  • Client-Server: Clients request services from servers

  • Peer-to-Peer (P2P): Nodes act as both clients and servers

  • Master-Slave: One master node coordinates, multiple slaves execute Microservices: System decomposed into small, independent services

  • Event-Driven Architecture: Components communicate by reacting to events

  • Service-Oriented Architecture (SOA): A collection of services that communicate with each other

  • Lambda Architecture: A hybrid architecture for batch and real-time processing

KEY TECHNOLOGIES

  • Container Orchestration: Kubernetes, Docker Swarm Service Discovery: Consul, etc

  • API Gateways: Kong, Tyk, Apigee, AWS API Gateway Load Balancers: HAProxy, NGINX, Traefik, Amazon ELB

  • Monitoring & Observability: Prometheus, Grafana

  • Distributed Tracing: Zipkin, Jaeger Service Mesh: Istio, Linkerd

  • Infrastructure as Code (laC): Terraform, Pulumi, CloudFormation Configuration Management: Ansible, Chef, Puppet Distributed Caches: Redis, Memcached

  • Stream Processing: Apache Flink, Apache Spark Streaming Workflow Orchestration: Apache Airflow, Argo Workflows Job Schedulers: Quartz, Apache Oozie, Kubernetes CronJobs Cloud Providers: AWS, Azure, Google Cloud Platform Serverless Platforms: AWS Lambda, Azure Functions, Google Cloud Functions

FALLACIES OF DISTRIBUTED COMPUTING

  • The network is reliable

  • Latency is zero

  • Bandwidth is infinite

  • The network is secure

  • Topology doesn’t change

  • There is one administrator

  • Transport cost is zero

  • The network is homogeneous

COMMUNICATION

  • Remote Procedure Call (RPC): Call a function on a remote machine Message Passing: Asynchronous communication between nodes

  • Message Queues (Point-to-Point): RabbitMQ, Amazon SQS

  • Message Brokers (Publish-Subscribe): Kafka, Apache Pulsar

  • REST: Query language for APls with fine-grained data fetching GraphQL: TLS/SSL, HTTPS, SSH

  • gRPC: High-performance RPC framework using Protocol Buffers Webhooks: Automatic notifications sent when certain events occur

COORDINATION & CONSENSUS

  • Consensus Algorithms: Paxos, Raft

  • Distributed Locks: ZooKeeper, etcd

  • Distributed Transactions: Two-phase commit (2PC)

  • Leader Election: Choosing a coordinator node

  • Gossip Protocol: Decentralized information dissemination

DESIGN PRINCIPLES

  • Loose Coupling: Minimize component dependencies

  • High Cohesion: Group related functionality together

  • Abstraction: Hide implementation details

  • Single Responsibility: Each component does one thing well Separation of Concerns: Divide system into distinct modules

  • Statelessness: (Where possible) Avoid storing session data

  • Idempotence: Operations can be repeated safely

  • Eventual Consistency: Accept temporary inconsistency for availability and performance

  • Fail-Fast: Detect and report errors quickly

  • Circuit Breaker: Isolate failing components

  • Retry Mechanisms: Automatically retry failed operations

  • Asynchronicity: Design for non-blocking communication Timeouts/Deadlines: Set limits to prevent indefinite waits

DATA MANAGEMENT

Distributed Databases - NoSQL: MongoDB, Cassandra, DynamoDB (flexible schema, scalable) - NewSQL: CockroachDB, YugabyteDB (scalable with SQL-like features) - Distributed File Systems

  • HDFS: For big data (Hadoop)

  • Ceph: Unified, distributed storage system

  • Caching: Redis, Memcached (in-memory for speed)

  • ACID: Atomicity, Consistency, Isolation, Durability (relational database guarantees)

  • BASE: Basically Available, Soft state, Eventual consistency (NoSQL database characteristics)

  • Data Partitioning: Distributing data across nodes (sharding, hashing, range partitioning)

  • Replication Strategies: Copying data for fault tolerance