# Performance Optimization: Building Systems That Scale
A 100ms delay in page load = 1% reduction in conversions. A 1-second delay = 7% reduction in conversions.
Performance isn't a feature; it's a business metric.
Yet most engineers optimize after the fact: system is slow; scramble to fix. Better approach: build fast from the start.
Performance Optimization Hierarchy
1. Measure First
You can't optimize blind.
**Metrics to track:**
Endpoint latency (API) or page load time (web)
Database query duration
Cache hit rate
CPU, memory, disk usage
Error rate
**Tools:**
Application Performance Monitoring (APM): New Relic, Datadog, Elastic APM
Browser performance: Web Vitals, Lighthouse
Profilers: py-spy (Python), pprof (Go), Chrome DevTools (JavaScript)
**Real example:**
``` Profile application: Identify hotspots
top(3) functions by CPU time: 1. database_query() - 60% of CPU time 2. json_serialize() - 20% of CPU time 3. regex_validation() - 15% of CPU time
Action: Optimize database_query first (biggest impact) ```
2. Database Optimization
Most performance issues are database-related.
**Techniques:**
**Indexing:**
```sql
-- Slow query (full table scan) SELECT * FROM orders WHERE user_id = 123
-- 5 seconds (scanned 1M rows)
-- Add index CREATE INDEX idx_orders_user_id ON orders(user_id)
-- Same query now
-- 5ms (index + lookup) ```
**Query optimization:**
```sql
-- Bad: N+1 problem (1 query + N subqueries) SELECT * FROM orders -- 1000 orders For each order: SELECT * FROM items WHERE order_id = order.id -- 1000 queries Total: 1001 queries; 10 seconds
-- Good: single query with join SELECT orders.*, items.* FROM orders JOIN items ON items.order_id = orders.id Total: 1 query; 100ms ```
**Connection pooling:**
``` Each database connection costs resources. Reuse connections.
Without pooling: Request 1: Create connection (10ms), query (5ms), close (2ms) = 17ms Request 2: Create connection (10ms), query (5ms), close (2ms) = 17ms
With pooling (10 connections): Request 1: Get from pool (1ms), query (5ms), return to pool (1ms) = 7ms Request 2: Get from pool (1ms), query (5ms), return to pool (1ms) = 7ms ```
3. Caching
Avoid recomputation; store results.
**Multi-level cache:**
**L1: Request cache (seconds):**
Same request?Same response
Cache key: hash(request params)
TTL: 10 seconds
Hit rate: 50-80%
**L2: User cache (minutes):**
User-specific data
Cache key: user_id
TTL: 5-60 minutes
Hit rate: 80%+
**L3: CDN cache (hours):**
Static content, API responses
Distributed globally
Reduces latency to 1-10ms (vs. 100ms+ for server)
**Example:**
``` Request: GET /user/123/recommendations Cache miss → Compute (100ms) → Return result Cache saved in Redis (key=user:123:recs, TTL=1hour)
Next request for same user → Cache hit (1ms) → Return result 8/10 requests cached; average latency: 20ms (vs. 100ms) ```
4. Architecture Optimization
Design for performance from the start.
**Async processing:**
``` Synchronous (blocking): POST /order → Validate → Payment → Email → Return (2 seconds) User waits 2 seconds for response
Asynchronous (non-blocking): POST /order → Validate → Queue(Payment) → Return immediately (100ms) Payment processed in background (user doesn't wait) ```
**Read replicas:**
``` Single database can't handle 10K reads/sec
Solution: Master-replica replication Master: Handles writes Read replicas (10x): Handle reads
Load: 1K writes/sec + 9K reads/sec Master: 1K writes Replicas: 9K reads (distributed) ```
**Horizontal scaling:**
``` Single server: 1000 requests/sec 2 servers: 2000 requests/sec 10 servers: 10K requests/sec 100 servers: 100K requests/sec
Scaling beyond single-machine limits ```
5. Frontend Optimization
Web page load is multiplicative; every component matters.
**Techniques:**
Code splitting: Load only what's needed
Compression: Gzip/Brotli reduces size 50%+
Lazy loading: Load images on scroll
Image optimization: AVIF/WebP > JPEG
Tree shaking: Remove unused code
Minification: Remove whitespace, rename variables
**Real impact:**
``` Before: 5MB JavaScript, 2MB images, 1MB CSS Load time: 10 seconds (on 4G)
After optimization: Code split: 500KB initial + 4.5MB lazy (200ms first view) Image compression: 500KB (5x reduction) Minification + tree shaking: 600KB JavaScript
Load time: 2 seconds (5x improvement) ```
Performance Optimization Workflow
1. **Measure:** Baseline performance with metrics 2. **Profile:** Identify bottlenecks (database, CPU, I/O) 3. **Optimize:** Fix top bottleneck 4. **Test:** Verify improvement 5. **Repeat:** Continue until performance acceptable
**Example:**
``` Week 1: Baseline latency: 500ms p99 Week 1 action: Profile → Database N+1 problem → Add query optimization Week 2: Baseline latency: 250ms p99 (2x improvement)
Week 2: Profile → Uncached API calls → Add caching Week 3: Baseline latency: 100ms p99 (2.5x from previous)
Week 4: Profile → Inefficient CSS parsing → Defer non-critical CSS Week 5: Baseline latency: 80ms p99 (1.25x from previous)
Total improvement: 500ms → 80ms (6.25x, from baseline) ```
Real-World Performance Scenarios
Scenario 1: E-commerce Site Slow
Traffic: 10K requests/day; site feels slow (2+ second load)
Investigation:
Profile: Database query for "related products" taking 500ms
Root cause: No index on product_id
Fix: Create index
Result: 500ms → 5ms query (100x improvement)
New load time: 2 seconds → 1.5 seconds
Scenario 2: Mobile App Crashes Under Load
Beta launch: 100K users. App crashes; timeouts.
Investigation:
Trace requests: Database connection pool exhausted (10 connections)
Reason: Request holds connection for entire 2-second processing
Fix: Release connection immediately; store result in cache
Result: Connection used 50ms instead of 2000ms; pool handles 400x more requests
Scenario 3: CDN Cache Misconception
Company "optimized" with CDN. Performance still slow.
Investigation:
Most requests bypass cache (personalized content)
90% of latency from custom rendering
Cache miss rate: 95%
Fix: Pre-compute common personalized content; cache at edge
Result: Cache hit rate: 95% → average latency 10ms
Performance Budget
Allocate latency budget per component.
**Example (100ms budget):**
``` API call: 10ms Database query: 20ms Cache (miss): 15ms Processing: 30ms JSON serialization: 10ms Network/JSON parsing: 15ms Total: 100ms (p99) ```
If database degrades to 40ms, you over budget. Alert and investigate.
The Bottom Line
Performance is a feature. Users notice latency; it affects business.
Measure, profile, optimize, repeat. Track metrics obsessively.
A 10x improvement in latency drives 5-10% conversion increase. That's revenue.
Build fast from start. Don't optimize later.
Senthil Kumar
Founder & CEO
Founder & CEO of Sentos Technologies. Passionate about AI-powered IT solutions and helping mid-market enterprises advance beyond.