- What is horizontal vs vertical scaling?
- Vertical scaling (scale-up) means adding more resources to a single machine — more CPU, RAM, or faster disk. Simple but has a physical ceiling and creates a single point of failure. Horizontal scaling (scale-out) means adding more machines and distributing load across them using a load balancer. Preferred for large-scale systems because it's theoretically unlimited, provides redundancy, and allows rolling deployments. The trade-off is that horizontal scaling requires stateless services or an external session store.
- What is the CAP theorem?
- CAP states that a distributed system can only guarantee 2 of the 3 properties simultaneously: Consistency (every read returns the most recent write or an error), Availability (every request receives a non-error response, though it may not be the latest data), Partition Tolerance (the system continues operating even if messages between nodes are dropped). Since network partitions are inevitable in any distributed system, you're always choosing between CP (consistent but may be unavailable) and AP (always available but potentially stale). Example: Cassandra is AP; HBase is CP.
- What is database sharding?
- Sharding splits a database table horizontally across multiple instances (shards) so that each shard holds a subset of rows. A shard key (e.g., userId) determines which shard stores a given row. Benefits: distributes write load, allows dataset to exceed single-machine capacity. Challenges: cross-shard queries and joins become complex, transactions spanning shards require distributed transaction protocols (2PC), and rebalancing when adding shards is operationally difficult. Consistent hashing minimises rebalancing cost.
- When should you use SQL vs NoSQL?
- Use SQL (Postgres, MySQL) when: data has a well-defined, stable schema, you need ACID transactions (banking, orders), relationships between entities are complex, and your query patterns are varied (ad-hoc SQL). Use NoSQL when: you need horizontal write scalability beyond what a single SQL node can handle, the data model is document-like (MongoDB), you need key-value lookups at very high throughput (DynamoDB, Redis), or you're storing time-series or graph data. Many modern systems use both — SQL for transactional data, NoSQL for high-throughput or flexible-schema data.
- What is a message queue and why is it used?
- A message queue (Kafka, RabbitMQ, SQS) is middleware that decouples producers from consumers using asynchronous messaging. Producers publish messages without waiting for consumers to process them. Benefits: absorbs traffic spikes (queue acts as a buffer), allows independent scaling of producers and consumers, provides fault tolerance (messages are persisted and retried), and enables event-driven architectures. Common patterns: task queues (background jobs), event streaming (audit logs, analytics), and fan-out (one event triggers multiple downstream services).
- What is eventual consistency?
- In a distributed system with replicas, writes go to one node and are propagated asynchronously to others. During this propagation window, different nodes may return different values for the same key — they are temporarily inconsistent. Eventually, all nodes converge to the same state (assuming no new writes). This model is used by DynamoDB, Cassandra, and DNS. It trades consistency for availability and lower latency. It's acceptable for: social media feeds, product recommendations, counters. It's NOT acceptable for: bank balances, inventory counts, or any data requiring linearizability.
- What is a CDN and how does it improve performance?
- A Content Delivery Network (CDN) is a geographically distributed network of edge servers that cache and serve static content (images, CSS, JS, videos) from a location physically close to the end user. Instead of all requests hitting your origin server, the CDN serves cached responses from the nearest edge node, reducing latency from hundreds of milliseconds to single-digit milliseconds. CDNs also absorb DDoS traffic and reduce origin server load. Popular CDNs: Cloudflare, AWS CloudFront, Fastly. Cache invalidation is the main operational challenge.
- How do you design for high availability?
- High availability (HA) means the system remains operational despite failures. Key techniques: Eliminate single points of failure — every component must have a replica or failover. Use load balancers to route around unhealthy instances. Replicate databases with automatic failover (primary-replica with promoted replica). Use health checks and circuit breakers to detect and isolate failing services. Deploy across multiple availability zones or regions. Design for graceful degradation — serve a cached or reduced response when a dependency fails. Target availability as SLAs: 99.9% = 8.7 hrs downtime/year; 99.99% = 52 min/year.