Migrated from design-system-swarm with fresh git history.
Old project history preserved in /home/overbits/apps/design-system-swarm
Core components:
- MCP Server (Python FastAPI with mcp 1.23.1)
- Claude Plugin (agents, commands, skills, strategies, hooks, core)
- DSS Backend (dss-mvp1 - token translation, Figma sync)
- Admin UI (Node.js/React)
- Server (Node.js/Express)
- Storybook integration (dss-mvp1/.storybook)
Self-contained configuration:
- All paths relative or use DSS_BASE_PATH=/home/overbits/dss
- PYTHONPATH configured for dss-mvp1 and dss-claude-plugin
- .env file with all configuration
- Claude plugin uses ${CLAUDE_PLUGIN_ROOT} for portability
Migration completed: $(date)
🤖 Clean migration with full functionality preserved
14 KiB
14 KiB
Workflow 03: Debug Performance Issues
Purpose: Diagnose and resolve performance issues in DSS dashboard and API
When to Use:
- Dashboard loads slowly
- API requests taking too long
- Browser becomes unresponsive
- High memory usage warnings
- Long task warnings in logs
Estimated Time: 15-45 minutes
Prerequisites
- Browser logger active (window.__DSS_BROWSER_LOGS available)
- Access to server logs and metrics
- Basic understanding of performance metrics
- DevTools Performance panel knowledge
Step-by-Step Procedure
Step 1: Gather Performance Baseline
Browser Performance Metrics:
// Get diagnostic with performance data
const diag = window.__DSS_BROWSER_LOGS.diagnostic();
console.table({
'Uptime (ms)': diag.uptime,
'Total Logs': diag.totalLogs,
'Network Requests': diag.networkRequests,
'Memory Used (MB)': (diag.memory.usedJSHeapSize / 1024 / 1024).toFixed(2),
'Memory Limit (MB)': (diag.memory.jsHeapSizeLimit / 1024 / 1024).toFixed(2),
'Memory Usage %': diag.memory.usagePercent
});
// Get performance entries
const perfMetrics = window.__DSS_BROWSER_LOGS.getLogs({ category: 'performance' });
console.table(perfMetrics);
Expected Baseline:
- Page load: <2000ms
- DOM content loaded: <500ms
- API requests: <200ms each
- Memory usage: <50%
- No long tasks >100ms
Performance Issues Indicators:
- Page load >5000ms → Slow initial load (Step 2)
- API requests >1000ms → Slow API (Step 3)
- Memory usage >80% → Memory leak (Step 4)
- Multiple long tasks >100ms → CPU bottleneck (Step 5)
Step 2: Diagnose Slow Page Load
Get Page Load Metrics:
const perfData = performance.getEntriesByType('navigation')[0];
console.table({
'DNS Lookup (ms)': perfData.domainLookupEnd - perfData.domainLookupStart,
'TCP Connection (ms)': perfData.connectEnd - perfData.connectStart,
'Request (ms)': perfData.responseStart - perfData.requestStart,
'Response (ms)': perfData.responseEnd - perfData.responseStart,
'DOM Processing (ms)': perfData.domInteractive - perfData.domLoading,
'DOM Content Loaded (ms)': perfData.domContentLoadedEventEnd - perfData.domContentLoadedEventStart,
'Total Load (ms)': perfData.loadEventEnd - perfData.fetchStart
});
Diagnosis Matrix:
| Slow Phase | Cause | Solution |
|---|---|---|
| DNS Lookup >100ms | DNS issues | Check DNS settings, use different DNS |
| TCP Connection >200ms | Network latency | Check connection, use CDN |
| Response >1000ms | Large HTML file | Minify HTML, lazy load components |
| DOM Processing >2000ms | Heavy JavaScript | Code splitting, lazy imports |
| DOM Content Loaded >500ms | Blocking scripts | Async/defer scripts, move to bottom |
Common Fixes:
Issue 1: Large Initial Bundle
// Check resource sizes
performance.getEntriesByType('resource').forEach(r => {
if (r.transferSize > 100000) { // >100KB
console.log(`Large file: ${r.name} (${(r.transferSize / 1024).toFixed(2)} KB)`);
}
});
Solution:
- Split large JavaScript files
- Use code splitting with dynamic imports
- Compress assets (gzip/brotli)
Issue 2: Blocking Scripts
<!-- Bad: Blocking -->
<script src="/admin-ui/js/app.js"></script>
<!-- Good: Async -->
<script src="/admin-ui/js/app.js" defer></script>
<script type="module" src="/admin-ui/js/app.js"></script>
Step 3: Diagnose Slow API Requests
Get Network Performance:
const network = window.__DSS_BROWSER_LOGS.network();
const slowRequests = network.filter(r => r.data.duration > 500);
console.group('Slow Requests (>500ms)');
console.table(slowRequests.map(r => ({
URL: r.data.url,
Method: r.data.method,
Status: r.data.status,
Duration: r.data.duration + 'ms'
})));
console.groupEnd();
Server-Side Check:
# Check API response times in server logs
journalctl -u dss-api -n 200 | grep "INFO.*GET\|POST"
# Check database query times (if logged)
journalctl -u dss-api -n 200 | grep "query took"
Common Slow API Issues:
Issue 1: Database Query Slow (N+1 Problem)
# Bad: N+1 queries
for project in projects:
components = get_components(project.id) # Separate query each time
# Good: Single query with JOIN
components = get_all_components_with_projects()
Diagnosis:
# Enable SQLite query logging
sqlite3 .dss/dss.db
.log stdout
.timer on
SELECT * FROM Projects;
Solution:
- Use JOINs instead of multiple queries
- Add indexes on frequently queried columns
- Cache repeated queries
Issue 2: Large Response Payload
// Check response sizes
network.forEach(r => {
if (r.data.headers && r.data.headers['content-length']) {
const sizeKB = parseInt(r.data.headers['content-length']) / 1024;
if (sizeKB > 100) {
console.log(`Large response: ${r.data.url} (${sizeKB.toFixed(2)} KB)`);
}
}
});
Solution:
- Implement pagination (limit results to 50-100 items)
- Use field selection (only return needed fields)
- Compress responses (gzip)
- Add API caching
Issue 3: Synchronous Processing
# Bad: Synchronous heavy processing
def get_analysis():
data = fetch_all_data()
analysis = process_data(data) # Blocking, takes 5 seconds
return analysis
# Good: Async or background job
async def get_analysis():
data = await fetch_all_data()
# Trigger background job, return immediately
job_id = queue_analysis(data)
return {"status": "processing", "job_id": job_id}
Step 4: Diagnose Memory Leaks
Check Memory Usage:
// Get current memory
const mem = performance.memory;
console.table({
'Used (MB)': (mem.usedJSHeapSize / 1024 / 1024).toFixed(2),
'Total (MB)': (mem.totalJSHeapSize / 1024 / 1024).toFixed(2),
'Limit (MB)': (mem.jsHeapSizeLimit / 1024 / 1024).toFixed(2),
'Usage %': ((mem.usedJSHeapSize / mem.jsHeapSizeLimit) * 100).toFixed(2)
});
// Monitor over time
let memorySnapshots = [];
setInterval(() => {
const m = performance.memory;
memorySnapshots.push({
time: Date.now(),
used: m.usedJSHeapSize
});
if (memorySnapshots.length > 20) memorySnapshots.shift();
// Check if memory is growing
const first = memorySnapshots[0].used;
const last = memorySnapshots[memorySnapshots.length - 1].used;
const growth = ((last - first) / first * 100).toFixed(2);
console.log(`Memory growth over ${memorySnapshots.length} checks: ${growth}%`);
}, 5000);
Memory Leak Indicators:
- Memory usage steadily increasing (>10% per minute)
- Memory warnings in browser logs
- Browser becoming slow/unresponsive over time
Common Memory Leak Causes:
Cause 1: Event Listeners Not Removed
// Bad: Creates new listener on each render, never removes
function render() {
window.addEventListener('resize', handleResize);
}
// Good: Remove old listener
let resizeHandler = null;
function render() {
if (resizeHandler) {
window.removeEventListener('resize', resizeHandler);
}
resizeHandler = handleResize;
window.addEventListener('resize', resizeHandler);
}
Cause 2: Detached DOM Nodes
// Bad: References keep DOM nodes in memory
let cachedNodes = [];
function cacheNode(node) {
cachedNodes.push(node); // Node stays in memory even if removed from DOM
}
// Good: Use WeakMap for node cache
let cachedNodes = new WeakMap();
function cacheNode(node, data) {
cachedNodes.set(node, data); // Auto-removed when node is GC'd
}
Cause 3: Timers Not Cleared
// Bad: Timer keeps running even after component unmounted
setInterval(() => {
updateData();
}, 1000);
// Good: Clear timer on unmount
let timerId = null;
function startTimer() {
timerId = setInterval(updateData, 1000);
}
function stopTimer() {
if (timerId) clearInterval(timerId);
}
Diagnosis Tools:
- Chrome DevTools → Memory → Take heap snapshot
- Compare snapshots over time
- Look for "Detached DOM tree" entries
- Find objects growing in number
Step 5: Diagnose CPU Bottlenecks
Get Long Tasks:
const longTasks = window.__DSS_BROWSER_LOGS.getLogs({
category: 'longTask',
limit: 50
});
console.group('Long Tasks (>50ms)');
console.table(longTasks.map(t => ({
Name: t.data.name,
Duration: t.data.duration.toFixed(2) + 'ms',
Time: new Date(t.timestamp).toLocaleTimeString()
})));
console.groupEnd();
Performance Profiling:
- Open DevTools → Performance
- Click Record
- Perform slow action
- Stop recording
- Analyze flame graph for long tasks
Common CPU Bottlenecks:
Issue 1: Synchronous Loop Over Large Array
// Bad: Blocks UI for large arrays
function processItems(items) {
items.forEach(item => {
expensiveOperation(item); // If items.length = 10000, UI freezes
});
}
// Good: Batch processing with breaks
async function processItems(items) {
const batchSize = 100;
for (let i = 0; i < items.length; i += batchSize) {
const batch = items.slice(i, i + batchSize);
batch.forEach(item => expensiveOperation(item));
await new Promise(resolve => setTimeout(resolve, 0)); // Give UI a break
}
}
Issue 2: Frequent DOM Manipulation
// Bad: Multiple reflows
for (let i = 0; i < 1000; i++) {
const div = document.createElement('div');
div.textContent = i;
container.appendChild(div); // Reflow on each append
}
// Good: Single reflow with fragment
const fragment = document.createDocumentFragment();
for (let i = 0; i < 1000; i++) {
const div = document.createElement('div');
div.textContent = i;
fragment.appendChild(div);
}
container.appendChild(fragment); // Single reflow
Issue 3: Inefficient Rendering
// Bad: Re-render entire list on every change
function renderList(items) {
container.innerHTML = ''; // Destroy all
items.forEach(item => {
container.appendChild(createItem(item)); // Recreate all
});
}
// Good: Update only changed items (use virtual DOM or diff)
function renderList(items, previousItems) {
const changes = diff(items, previousItems);
changes.forEach(change => {
if (change.type === 'add') {
container.appendChild(createItem(change.item));
} else if (change.type === 'remove') {
change.element.remove();
} else if (change.type === 'update') {
updateItem(change.element, change.item);
}
});
}
Step 6: Server-Side Performance Check
Check Server Resource Usage:
# CPU usage
top -b -n 1 | grep "uvicorn\|python"
# Memory usage
ps aux --sort=-%mem | grep "uvicorn\|python" | head -5
# Disk I/O
iostat -x 1 5
# Network
iftop -t -s 10
Check Database Performance:
# Database size
ls -lh .dss/dss.db
# Table sizes
sqlite3 .dss/dss.db << EOF
SELECT name, COUNT(*) as row_count
FROM sqlite_master sm
LEFT JOIN pragma_table_info(sm.name) ON 1=1
WHERE sm.type='table'
GROUP BY name;
EOF
# Check for missing indexes
sqlite3 .dss/dss.db << EOF
SELECT name, sql FROM sqlite_master
WHERE type='index' AND sql IS NOT NULL;
EOF
Database Optimization:
# Vacuum to reclaim space and reorganize
sqlite3 .dss/dss.db "VACUUM;"
# Analyze to update statistics
sqlite3 .dss/dss.db "ANALYZE;"
# Check index usage (run slow query with EXPLAIN QUERY PLAN)
sqlite3 .dss/dss.db << EOF
EXPLAIN QUERY PLAN
SELECT * FROM Projects WHERE name LIKE '%test%';
EOF
Performance Optimization Checklist
Browser Optimizations
- Code splitting implemented
- Lazy loading for routes/components
- Images optimized and lazy-loaded
- Scripts deferred or async
- CSS minified and critical CSS inlined
- Service worker for caching
- Event listeners properly cleaned up
- No memory leaks detected
API Optimizations
- Database queries optimized (indexes, JOINs)
- Response pagination implemented
- API caching enabled
- Compression enabled (gzip/brotli)
- Connection pooling configured
- Async processing for heavy tasks
- Rate limiting to prevent abuse
System Optimizations
- Database vacuumed and analyzed
- Log rotation configured
- Disk space sufficient (>20% free)
- Memory sufficient (>30% free)
- Supervisord restart policies configured
Success Criteria
- ✅ Page load <2000ms
- ✅ API requests <200ms
- ✅ Memory usage <50%
- ✅ No long tasks >100ms
- ✅ No memory growth over time
- ✅ Smooth scrolling and interactions
Performance Metrics to Track
Browser:
- First Contentful Paint (FCP): <1000ms
- Largest Contentful Paint (LCP): <2500ms
- Time to Interactive (TTI): <3000ms
- Total Blocking Time (TBT): <200ms
- Cumulative Layout Shift (CLS): <0.1
API:
- Response time p50: <100ms
- Response time p95: <500ms
- Response time p99: <1000ms
- Throughput: >100 req/sec
- Error rate: <1%
Database:
- Query time p50: <10ms
- Query time p95: <50ms
- Query time p99: <100ms
- Connection pool usage: <80%
Next Steps
- If performance acceptable: Document baseline for monitoring
- If still slow: Use Chrome Performance Profiler for deeper analysis
- If database slow: Consider adding indexes or caching layer
- If memory leaks: Use Chrome Memory Profiler to find retaining paths
- Schedule regular performance audits (monthly)
Related Documentation
.dss/MCP_DEBUG_TOOLS_ARCHITECTURE.md- Performance monitoring in MCPadmin-ui/js/core/browser-logger.js- Performance capture implementation- Web Vitals: https://web.dev/vitals/
MCP Tool Access
From Claude Code:
Use tool: dss_get_browser_diagnostic (includes memory metrics)
Use tool: dss_get_server_diagnostic (includes performance metrics)