Skip to main content

Command Palette

Search for a command to run...

Industry News

Performance Optimization: Building Systems That Scale

13 May 202614 min readSenthil Kumar

# Performance Optimization: Building Systems That Scale

A 100ms delay in page load = 1% reduction in conversions. A 1-second delay = 7% reduction in conversions.

Performance isn't a feature; it's a business metric.

Yet most engineers optimize after the fact: system is slow; scramble to fix. Better approach: build fast from the start.

Performance Optimization Hierarchy

1. Measure First

You can't optimize blind.

**Metrics to track:**

Endpoint latency (API) or page load time (web)

Database query duration

Cache hit rate

CPU, memory, disk usage

Error rate

**Tools:**

Application Performance Monitoring (APM): New Relic, Datadog, Elastic APM

Browser performance: Web Vitals, Lighthouse

Profilers: py-spy (Python), pprof (Go), Chrome DevTools (JavaScript)

**Real example:**

``` Profile application: Identify hotspots

top(3) functions by CPU time: 1. database_query() - 60% of CPU time 2. json_serialize() - 20% of CPU time 3. regex_validation() - 15% of CPU time

Action: Optimize database_query first (biggest impact) ```

2. Database Optimization

Most performance issues are database-related.

**Techniques:**

**Indexing:**

```sql

-- Slow query (full table scan) SELECT * FROM orders WHERE user_id = 123

-- 5 seconds (scanned 1M rows)

-- Add index CREATE INDEX idx_orders_user_id ON orders(user_id)

-- Same query now

-- 5ms (index + lookup) ```

**Query optimization:**

```sql

-- Bad: N+1 problem (1 query + N subqueries) SELECT * FROM orders -- 1000 orders For each order: SELECT * FROM items WHERE order_id = order.id -- 1000 queries Total: 1001 queries; 10 seconds

-- Good: single query with join SELECT orders.*, items.* FROM orders JOIN items ON items.order_id = orders.id Total: 1 query; 100ms ```

**Connection pooling:**

``` Each database connection costs resources. Reuse connections.

Without pooling: Request 1: Create connection (10ms), query (5ms), close (2ms) = 17ms Request 2: Create connection (10ms), query (5ms), close (2ms) = 17ms

With pooling (10 connections): Request 1: Get from pool (1ms), query (5ms), return to pool (1ms) = 7ms Request 2: Get from pool (1ms), query (5ms), return to pool (1ms) = 7ms ```

3. Caching

Avoid recomputation; store results.

**Multi-level cache:**

**L1: Request cache (seconds):**

Same request?Same response

Cache key: hash(request params)

TTL: 10 seconds

Hit rate: 50-80%

**L2: User cache (minutes):**

User-specific data

Cache key: user_id

TTL: 5-60 minutes

Hit rate: 80%+

**L3: CDN cache (hours):**

Static content, API responses

Distributed globally

Reduces latency to 1-10ms (vs. 100ms+ for server)

**Example:**

``` Request: GET /user/123/recommendations Cache miss → Compute (100ms) → Return result Cache saved in Redis (key=user:123:recs, TTL=1hour)

Next request for same user → Cache hit (1ms) → Return result 8/10 requests cached; average latency: 20ms (vs. 100ms) ```

4. Architecture Optimization

Design for performance from the start.

**Async processing:**

``` Synchronous (blocking): POST /order → Validate → Payment → Email → Return (2 seconds) User waits 2 seconds for response

Asynchronous (non-blocking): POST /order → Validate → Queue(Payment) → Return immediately (100ms) Payment processed in background (user doesn't wait) ```

**Read replicas:**

``` Single database can't handle 10K reads/sec

Solution: Master-replica replication Master: Handles writes Read replicas (10x): Handle reads

Load: 1K writes/sec + 9K reads/sec Master: 1K writes Replicas: 9K reads (distributed) ```

**Horizontal scaling:**

``` Single server: 1000 requests/sec 2 servers: 2000 requests/sec 10 servers: 10K requests/sec 100 servers: 100K requests/sec

Scaling beyond single-machine limits ```

5. Frontend Optimization

Web page load is multiplicative; every component matters.

**Techniques:**

Code splitting: Load only what's needed

Compression: Gzip/Brotli reduces size 50%+

Lazy loading: Load images on scroll

Image optimization: AVIF/WebP > JPEG

Tree shaking: Remove unused code

Minification: Remove whitespace, rename variables

**Real impact:**

``` Before: 5MB JavaScript, 2MB images, 1MB CSS Load time: 10 seconds (on 4G)

After optimization: Code split: 500KB initial + 4.5MB lazy (200ms first view) Image compression: 500KB (5x reduction) Minification + tree shaking: 600KB JavaScript

Load time: 2 seconds (5x improvement) ```

Performance Optimization Workflow

1. **Measure:** Baseline performance with metrics 2. **Profile:** Identify bottlenecks (database, CPU, I/O) 3. **Optimize:** Fix top bottleneck 4. **Test:** Verify improvement 5. **Repeat:** Continue until performance acceptable

**Example:**

``` Week 1: Baseline latency: 500ms p99 Week 1 action: Profile → Database N+1 problem → Add query optimization Week 2: Baseline latency: 250ms p99 (2x improvement)

Week 2: Profile → Uncached API calls → Add caching Week 3: Baseline latency: 100ms p99 (2.5x from previous)

Week 4: Profile → Inefficient CSS parsing → Defer non-critical CSS Week 5: Baseline latency: 80ms p99 (1.25x from previous)

Total improvement: 500ms → 80ms (6.25x, from baseline) ```

Real-World Performance Scenarios

Scenario 1: E-commerce Site Slow

Traffic: 10K requests/day; site feels slow (2+ second load)

Investigation:

Profile: Database query for "related products" taking 500ms

Root cause: No index on product_id

Fix: Create index

Result: 500ms → 5ms query (100x improvement)

New load time: 2 seconds → 1.5 seconds

Scenario 2: Mobile App Crashes Under Load

Beta launch: 100K users. App crashes; timeouts.

Investigation:

Trace requests: Database connection pool exhausted (10 connections)

Reason: Request holds connection for entire 2-second processing

Fix: Release connection immediately; store result in cache

Result: Connection used 50ms instead of 2000ms; pool handles 400x more requests

Scenario 3: CDN Cache Misconception

Company "optimized" with CDN. Performance still slow.

Investigation:

Most requests bypass cache (personalized content)

90% of latency from custom rendering

Cache miss rate: 95%

Fix: Pre-compute common personalized content; cache at edge

Result: Cache hit rate: 95% → average latency 10ms

Performance Budget

Allocate latency budget per component.

**Example (100ms budget):**

``` API call: 10ms Database query: 20ms Cache (miss): 15ms Processing: 30ms JSON serialization: 10ms Network/JSON parsing: 15ms Total: 100ms (p99) ```

If database degrades to 40ms, you over budget. Alert and investigate.

The Bottom Line

Performance is a feature. Users notice latency; it affects business.

Measure, profile, optimize, repeat. Track metrics obsessively.

A 10x improvement in latency drives 5-10% conversion increase. That's revenue.

Build fast from start. Don't optimize later.

Senthil Kumar

Founder & CEO

Founder & CEO of Sentos Technologies. Passionate about AI-powered IT solutions and helping mid-market enterprises advance beyond.

Share this article

Want more insights?

Subscribe to the Sentos newsletter for expert perspectives on managed IT, cybersecurity, AI, and digital transformation.

Advance Beyond.