Files
dss/.dss/WORKFLOWS/03-debug-performance.md
Digital Production Factory 276ed71f31 Initial commit: Clean DSS implementation
Migrated from design-system-swarm with fresh git history.
Old project history preserved in /home/overbits/apps/design-system-swarm

Core components:
- MCP Server (Python FastAPI with mcp 1.23.1)
- Claude Plugin (agents, commands, skills, strategies, hooks, core)
- DSS Backend (dss-mvp1 - token translation, Figma sync)
- Admin UI (Node.js/React)
- Server (Node.js/Express)
- Storybook integration (dss-mvp1/.storybook)

Self-contained configuration:
- All paths relative or use DSS_BASE_PATH=/home/overbits/dss
- PYTHONPATH configured for dss-mvp1 and dss-claude-plugin
- .env file with all configuration
- Claude plugin uses ${CLAUDE_PLUGIN_ROOT} for portability

Migration completed: $(date)
🤖 Clean migration with full functionality preserved
2025-12-09 18:45:48 -03:00

14 KiB

Workflow 03: Debug Performance Issues

Purpose: Diagnose and resolve performance issues in DSS dashboard and API

When to Use:

  • Dashboard loads slowly
  • API requests taking too long
  • Browser becomes unresponsive
  • High memory usage warnings
  • Long task warnings in logs

Estimated Time: 15-45 minutes


Prerequisites

  • Browser logger active (window.__DSS_BROWSER_LOGS available)
  • Access to server logs and metrics
  • Basic understanding of performance metrics
  • DevTools Performance panel knowledge

Step-by-Step Procedure

Step 1: Gather Performance Baseline

Browser Performance Metrics:

// Get diagnostic with performance data
const diag = window.__DSS_BROWSER_LOGS.diagnostic();
console.table({
  'Uptime (ms)': diag.uptime,
  'Total Logs': diag.totalLogs,
  'Network Requests': diag.networkRequests,
  'Memory Used (MB)': (diag.memory.usedJSHeapSize / 1024 / 1024).toFixed(2),
  'Memory Limit (MB)': (diag.memory.jsHeapSizeLimit / 1024 / 1024).toFixed(2),
  'Memory Usage %': diag.memory.usagePercent
});

// Get performance entries
const perfMetrics = window.__DSS_BROWSER_LOGS.getLogs({ category: 'performance' });
console.table(perfMetrics);

Expected Baseline:

  • Page load: <2000ms
  • DOM content loaded: <500ms
  • API requests: <200ms each
  • Memory usage: <50%
  • No long tasks >100ms

Performance Issues Indicators:

  • Page load >5000ms → Slow initial load (Step 2)
  • API requests >1000ms → Slow API (Step 3)
  • Memory usage >80% → Memory leak (Step 4)
  • Multiple long tasks >100ms → CPU bottleneck (Step 5)

Step 2: Diagnose Slow Page Load

Get Page Load Metrics:

const perfData = performance.getEntriesByType('navigation')[0];
console.table({
  'DNS Lookup (ms)': perfData.domainLookupEnd - perfData.domainLookupStart,
  'TCP Connection (ms)': perfData.connectEnd - perfData.connectStart,
  'Request (ms)': perfData.responseStart - perfData.requestStart,
  'Response (ms)': perfData.responseEnd - perfData.responseStart,
  'DOM Processing (ms)': perfData.domInteractive - perfData.domLoading,
  'DOM Content Loaded (ms)': perfData.domContentLoadedEventEnd - perfData.domContentLoadedEventStart,
  'Total Load (ms)': perfData.loadEventEnd - perfData.fetchStart
});

Diagnosis Matrix:

Slow Phase Cause Solution
DNS Lookup >100ms DNS issues Check DNS settings, use different DNS
TCP Connection >200ms Network latency Check connection, use CDN
Response >1000ms Large HTML file Minify HTML, lazy load components
DOM Processing >2000ms Heavy JavaScript Code splitting, lazy imports
DOM Content Loaded >500ms Blocking scripts Async/defer scripts, move to bottom

Common Fixes:

Issue 1: Large Initial Bundle

// Check resource sizes
performance.getEntriesByType('resource').forEach(r => {
  if (r.transferSize > 100000) {  // >100KB
    console.log(`Large file: ${r.name} (${(r.transferSize / 1024).toFixed(2)} KB)`);
  }
});

Solution:

  • Split large JavaScript files
  • Use code splitting with dynamic imports
  • Compress assets (gzip/brotli)

Issue 2: Blocking Scripts

<!-- Bad: Blocking -->
<script src="/admin-ui/js/app.js"></script>

<!-- Good: Async -->
<script src="/admin-ui/js/app.js" defer></script>
<script type="module" src="/admin-ui/js/app.js"></script>

Step 3: Diagnose Slow API Requests

Get Network Performance:

const network = window.__DSS_BROWSER_LOGS.network();
const slowRequests = network.filter(r => r.data.duration > 500);

console.group('Slow Requests (>500ms)');
console.table(slowRequests.map(r => ({
  URL: r.data.url,
  Method: r.data.method,
  Status: r.data.status,
  Duration: r.data.duration + 'ms'
})));
console.groupEnd();

Server-Side Check:

# Check API response times in server logs
journalctl -u dss-api -n 200 | grep "INFO.*GET\|POST"

# Check database query times (if logged)
journalctl -u dss-api -n 200 | grep "query took"

Common Slow API Issues:

Issue 1: Database Query Slow (N+1 Problem)

# Bad: N+1 queries
for project in projects:
    components = get_components(project.id)  # Separate query each time

# Good: Single query with JOIN
components = get_all_components_with_projects()

Diagnosis:

# Enable SQLite query logging
sqlite3 .dss/dss.db
.log stdout
.timer on
SELECT * FROM Projects;

Solution:

  • Use JOINs instead of multiple queries
  • Add indexes on frequently queried columns
  • Cache repeated queries

Issue 2: Large Response Payload

// Check response sizes
network.forEach(r => {
  if (r.data.headers && r.data.headers['content-length']) {
    const sizeKB = parseInt(r.data.headers['content-length']) / 1024;
    if (sizeKB > 100) {
      console.log(`Large response: ${r.data.url} (${sizeKB.toFixed(2)} KB)`);
    }
  }
});

Solution:

  • Implement pagination (limit results to 50-100 items)
  • Use field selection (only return needed fields)
  • Compress responses (gzip)
  • Add API caching

Issue 3: Synchronous Processing

# Bad: Synchronous heavy processing
def get_analysis():
    data = fetch_all_data()
    analysis = process_data(data)  # Blocking, takes 5 seconds
    return analysis

# Good: Async or background job
async def get_analysis():
    data = await fetch_all_data()
    # Trigger background job, return immediately
    job_id = queue_analysis(data)
    return {"status": "processing", "job_id": job_id}

Step 4: Diagnose Memory Leaks

Check Memory Usage:

// Get current memory
const mem = performance.memory;
console.table({
  'Used (MB)': (mem.usedJSHeapSize / 1024 / 1024).toFixed(2),
  'Total (MB)': (mem.totalJSHeapSize / 1024 / 1024).toFixed(2),
  'Limit (MB)': (mem.jsHeapSizeLimit / 1024 / 1024).toFixed(2),
  'Usage %': ((mem.usedJSHeapSize / mem.jsHeapSizeLimit) * 100).toFixed(2)
});

// Monitor over time
let memorySnapshots = [];
setInterval(() => {
  const m = performance.memory;
  memorySnapshots.push({
    time: Date.now(),
    used: m.usedJSHeapSize
  });
  if (memorySnapshots.length > 20) memorySnapshots.shift();

  // Check if memory is growing
  const first = memorySnapshots[0].used;
  const last = memorySnapshots[memorySnapshots.length - 1].used;
  const growth = ((last - first) / first * 100).toFixed(2);
  console.log(`Memory growth over ${memorySnapshots.length} checks: ${growth}%`);
}, 5000);

Memory Leak Indicators:

  • Memory usage steadily increasing (>10% per minute)
  • Memory warnings in browser logs
  • Browser becoming slow/unresponsive over time

Common Memory Leak Causes:

Cause 1: Event Listeners Not Removed

// Bad: Creates new listener on each render, never removes
function render() {
  window.addEventListener('resize', handleResize);
}

// Good: Remove old listener
let resizeHandler = null;
function render() {
  if (resizeHandler) {
    window.removeEventListener('resize', resizeHandler);
  }
  resizeHandler = handleResize;
  window.addEventListener('resize', resizeHandler);
}

Cause 2: Detached DOM Nodes

// Bad: References keep DOM nodes in memory
let cachedNodes = [];
function cacheNode(node) {
  cachedNodes.push(node);  // Node stays in memory even if removed from DOM
}

// Good: Use WeakMap for node cache
let cachedNodes = new WeakMap();
function cacheNode(node, data) {
  cachedNodes.set(node, data);  // Auto-removed when node is GC'd
}

Cause 3: Timers Not Cleared

// Bad: Timer keeps running even after component unmounted
setInterval(() => {
  updateData();
}, 1000);

// Good: Clear timer on unmount
let timerId = null;
function startTimer() {
  timerId = setInterval(updateData, 1000);
}
function stopTimer() {
  if (timerId) clearInterval(timerId);
}

Diagnosis Tools:

  1. Chrome DevTools → Memory → Take heap snapshot
  2. Compare snapshots over time
  3. Look for "Detached DOM tree" entries
  4. Find objects growing in number

Step 5: Diagnose CPU Bottlenecks

Get Long Tasks:

const longTasks = window.__DSS_BROWSER_LOGS.getLogs({
  category: 'longTask',
  limit: 50
});

console.group('Long Tasks (>50ms)');
console.table(longTasks.map(t => ({
  Name: t.data.name,
  Duration: t.data.duration.toFixed(2) + 'ms',
  Time: new Date(t.timestamp).toLocaleTimeString()
})));
console.groupEnd();

Performance Profiling:

  1. Open DevTools → Performance
  2. Click Record
  3. Perform slow action
  4. Stop recording
  5. Analyze flame graph for long tasks

Common CPU Bottlenecks:

Issue 1: Synchronous Loop Over Large Array

// Bad: Blocks UI for large arrays
function processItems(items) {
  items.forEach(item => {
    expensiveOperation(item);  // If items.length = 10000, UI freezes
  });
}

// Good: Batch processing with breaks
async function processItems(items) {
  const batchSize = 100;
  for (let i = 0; i < items.length; i += batchSize) {
    const batch = items.slice(i, i + batchSize);
    batch.forEach(item => expensiveOperation(item));
    await new Promise(resolve => setTimeout(resolve, 0));  // Give UI a break
  }
}

Issue 2: Frequent DOM Manipulation

// Bad: Multiple reflows
for (let i = 0; i < 1000; i++) {
  const div = document.createElement('div');
  div.textContent = i;
  container.appendChild(div);  // Reflow on each append
}

// Good: Single reflow with fragment
const fragment = document.createDocumentFragment();
for (let i = 0; i < 1000; i++) {
  const div = document.createElement('div');
  div.textContent = i;
  fragment.appendChild(div);
}
container.appendChild(fragment);  // Single reflow

Issue 3: Inefficient Rendering

// Bad: Re-render entire list on every change
function renderList(items) {
  container.innerHTML = '';  // Destroy all
  items.forEach(item => {
    container.appendChild(createItem(item));  // Recreate all
  });
}

// Good: Update only changed items (use virtual DOM or diff)
function renderList(items, previousItems) {
  const changes = diff(items, previousItems);
  changes.forEach(change => {
    if (change.type === 'add') {
      container.appendChild(createItem(change.item));
    } else if (change.type === 'remove') {
      change.element.remove();
    } else if (change.type === 'update') {
      updateItem(change.element, change.item);
    }
  });
}

Step 6: Server-Side Performance Check

Check Server Resource Usage:

# CPU usage
top -b -n 1 | grep "uvicorn\|python"

# Memory usage
ps aux --sort=-%mem | grep "uvicorn\|python" | head -5

# Disk I/O
iostat -x 1 5

# Network
iftop -t -s 10

Check Database Performance:

# Database size
ls -lh .dss/dss.db

# Table sizes
sqlite3 .dss/dss.db << EOF
SELECT name, COUNT(*) as row_count
FROM sqlite_master sm
LEFT JOIN pragma_table_info(sm.name) ON 1=1
WHERE sm.type='table'
GROUP BY name;
EOF

# Check for missing indexes
sqlite3 .dss/dss.db << EOF
SELECT name, sql FROM sqlite_master
WHERE type='index' AND sql IS NOT NULL;
EOF

Database Optimization:

# Vacuum to reclaim space and reorganize
sqlite3 .dss/dss.db "VACUUM;"

# Analyze to update statistics
sqlite3 .dss/dss.db "ANALYZE;"

# Check index usage (run slow query with EXPLAIN QUERY PLAN)
sqlite3 .dss/dss.db << EOF
EXPLAIN QUERY PLAN
SELECT * FROM Projects WHERE name LIKE '%test%';
EOF

Performance Optimization Checklist

Browser Optimizations

  • Code splitting implemented
  • Lazy loading for routes/components
  • Images optimized and lazy-loaded
  • Scripts deferred or async
  • CSS minified and critical CSS inlined
  • Service worker for caching
  • Event listeners properly cleaned up
  • No memory leaks detected

API Optimizations

  • Database queries optimized (indexes, JOINs)
  • Response pagination implemented
  • API caching enabled
  • Compression enabled (gzip/brotli)
  • Connection pooling configured
  • Async processing for heavy tasks
  • Rate limiting to prevent abuse

System Optimizations

  • Database vacuumed and analyzed
  • Log rotation configured
  • Disk space sufficient (>20% free)
  • Memory sufficient (>30% free)
  • Supervisord restart policies configured

Success Criteria

  • Page load <2000ms
  • API requests <200ms
  • Memory usage <50%
  • No long tasks >100ms
  • No memory growth over time
  • Smooth scrolling and interactions

Performance Metrics to Track

Browser:

  • First Contentful Paint (FCP): <1000ms
  • Largest Contentful Paint (LCP): <2500ms
  • Time to Interactive (TTI): <3000ms
  • Total Blocking Time (TBT): <200ms
  • Cumulative Layout Shift (CLS): <0.1

API:

  • Response time p50: <100ms
  • Response time p95: <500ms
  • Response time p99: <1000ms
  • Throughput: >100 req/sec
  • Error rate: <1%

Database:

  • Query time p50: <10ms
  • Query time p95: <50ms
  • Query time p99: <100ms
  • Connection pool usage: <80%

Next Steps

  • If performance acceptable: Document baseline for monitoring
  • If still slow: Use Chrome Performance Profiler for deeper analysis
  • If database slow: Consider adding indexes or caching layer
  • If memory leaks: Use Chrome Memory Profiler to find retaining paths
  • Schedule regular performance audits (monthly)

  • .dss/MCP_DEBUG_TOOLS_ARCHITECTURE.md - Performance monitoring in MCP
  • admin-ui/js/core/browser-logger.js - Performance capture implementation
  • Web Vitals: https://web.dev/vitals/

MCP Tool Access

From Claude Code:

Use tool: dss_get_browser_diagnostic (includes memory metrics)
Use tool: dss_get_server_diagnostic (includes performance metrics)