Files
dss/.dss/WORKFLOWS/02-diagnose-errors.md
Digital Production Factory 276ed71f31 Initial commit: Clean DSS implementation
Migrated from design-system-swarm with fresh git history.
Old project history preserved in /home/overbits/apps/design-system-swarm

Core components:
- MCP Server (Python FastAPI with mcp 1.23.1)
- Claude Plugin (agents, commands, skills, strategies, hooks, core)
- DSS Backend (dss-mvp1 - token translation, Figma sync)
- Admin UI (Node.js/React)
- Server (Node.js/Express)
- Storybook integration (dss-mvp1/.storybook)

Self-contained configuration:
- All paths relative or use DSS_BASE_PATH=/home/overbits/dss
- PYTHONPATH configured for dss-mvp1 and dss-claude-plugin
- .env file with all configuration
- Claude plugin uses ${CLAUDE_PLUGIN_ROOT} for portability

Migration completed: $(date)
🤖 Clean migration with full functionality preserved
2025-12-09 18:45:48 -03:00

8.8 KiB

Workflow 02: Diagnose Errors

Purpose: Systematically diagnose and resolve errors in DSS dashboard and API

When to Use:

  • Dashboard displays error messages
  • API requests failing
  • JavaScript exceptions in browser
  • Server health check shows degraded status
  • User reports functionality not working

Estimated Time: 10-30 minutes


Prerequisites

  • Browser logs captured (use Workflow 01 if needed)
  • Access to server logs (journalctl or log files)
  • API server running
  • Database accessible

Step-by-Step Procedure

Step 1: Identify Error Scope

Action: Determine if error is browser-side, API-side, or database-side

Browser-Side Check:

// In browser console
const errors = window.__DSS_BROWSER_LOGS.errors();
console.table(errors);

API-Side Check:

# Check API health
curl http://localhost:3456/health

Expected Results:

{
  "status": "healthy|degraded|error",
  "database": "ok|error",
  "mcp": "ok|error",
  "figma": "ok|not_configured|error"
}

Database Check:

sqlite3 /home/overbits/dss/.dss/dss.db "SELECT name FROM sqlite_master WHERE type='table';"

Decision Matrix:

  • Browser errors only → Browser-side issue (Step 2)
  • API health degraded → Server-side issue (Step 3)
  • Database errors → Database issue (Step 4)
  • Multiple components failing → System-wide issue (Step 5)

Step 2: Diagnose Browser-Side Errors

Get Error Details:

const errors = window.__DSS_BROWSER_LOGS.errors();
errors.forEach(e => {
  console.group(`ERROR: ${e.message}`);
  console.log('Timestamp:', new Date(e.timestamp).toLocaleString());
  console.log('Category:', e.category);
  console.log('Data:', e.data);
  if (e.data.stack) {
    console.log('Stack:', e.data.stack);
  }
  console.groupEnd();
});

Common Browser Errors:

Error Type 1: Uncaught TypeError

{
  message: "Cannot read property 'x' of undefined",
  category: "uncaughtError",
  data: {
    filename: "/admin-ui/js/core/app.js",
    lineno: 42,
    colno: 15
  }
}

Diagnosis:

  • Variable not initialized
  • API response structure unexpected
  • Async timing issue

Solution:

  1. Check line 42 in app.js
  2. Add null checks before property access
  3. Verify API response format

Error Type 2: Network/Fetch Error

{
  message: "GET /api/config",
  category: "fetchError",
  data: {
    error: "Failed to fetch",
    url: "/api/config"
  }
}

Diagnosis:

  • API endpoint not responding
  • CORS issue
  • Network timeout

Solution:

  1. Check if API server running: ps aux | grep "uvicorn.*server:app"
  2. Test endpoint directly: curl http://localhost:3456/api/config
  3. Check server logs: journalctl -u dss-api -n 50

Error Type 3: Module Loading Error

{
  message: "Failed to load module script",
  category: "uncaughtError",
  data: {
    filename: "/admin-ui/js/core/missing-module.js"
  }
}

Diagnosis:

  • File not found (404)
  • Import path incorrect
  • Module syntax error

Solution:

  1. Check file exists: ls -la admin-ui/js/core/missing-module.js
  2. Check import paths in HTML and JS files
  3. Check browser Network tab for 404 errors

Step 3: Diagnose API-Side Errors

Check API Health:

curl http://localhost:3456/health

If degraded, check server logs:

# Last 100 lines
journalctl -u dss-api -n 100

# Follow live logs
journalctl -u dss-api -f

# Filter for errors
journalctl -u dss-api | grep -i error

Common API Errors:

Error Type 1: Import Error (like previous bug)

[HEALTH] Database error: NameError: name 'get_connection' is not defined

Diagnosis: Missing import in server.py

Solution:

  1. Find the function being called
  2. Check imports at top of server.py
  3. Add missing import
  4. Restart API server: systemctl restart dss-api

Error Type 2: Database Connection Error

sqlite3.OperationalError: unable to open database file

Diagnosis:

  • Database file missing
  • Permission denied
  • File corrupted

Solution:

  1. Check file exists: ls -la .dss/dss.db
  2. Check permissions: chmod 644 .dss/dss.db
  3. Check directory permissions: chmod 755 .dss
  4. Verify database integrity: sqlite3 .dss/dss.db "PRAGMA integrity_check;"

Error Type 3: Port Already in Use

ERROR: [Errno 98] Address already in use

Diagnosis: Another process using port 3456

Solution:

# Find process using port
lsof -i :3456

# Kill process
kill -9 <PID>

# Restart API server
systemctl restart dss-api

Step 4: Diagnose Database Errors

Check Database Health:

sqlite3 /home/overbits/dss/.dss/dss.db << EOF
PRAGMA integrity_check;
SELECT COUNT(*) as table_count FROM sqlite_master WHERE type='table';
.tables
EOF

Expected Result:

ok
22
ActivityLog         Components          ESREDefinitions   ...

Common Database Errors:

Error Type 1: Locked Database

sqlite3.OperationalError: database is locked

Diagnosis: Another process has database open

Solution:

# Find processes with database open
lsof | grep dss.db

# If safe, close the connection
# Or restart API server: systemctl restart dss-api

Error Type 2: Corrupted Database

PRAGMA integrity_check;
*** in database main ***
Page 123: btree page has out-of-order cells

Diagnosis: Database file corrupted

Solution:

# Backup first
cp .dss/dss.db .dss/dss.db.backup

# Try to recover
sqlite3 .dss/dss.db ".recover" | sqlite3 .dss/dss_recovered.db

# If successful, replace
mv .dss/dss.db .dss/dss.db.corrupted
mv .dss/dss_recovered.db .dss/dss.db

# Restart API
systemctl restart dss-api

Error Type 3: Missing Tables

sqlite3.OperationalError: no such table: Projects

Diagnosis: Database not initialized or schema changed

Solution:

# Check if database has any tables
sqlite3 .dss/dss.db ".tables"

# If empty, reinitialize
cd /home/overbits/dss
python3 -c "from storage.database import init_database; init_database()"

# Restart API
systemctl restart dss-api

Step 5: Diagnose System-Wide Issues

Check all components:

# API server
systemctl status dss-api

# MCP server
systemctl status dss-mcp

# Database
ls -lh .dss/dss.db

# Disk space
df -h .

# Memory
free -h

Common System Issues:

Issue 1: Out of Disk Space

No space left on device

Solution:

# Find large files
du -h . | sort -h | tail -20

# Clean up logs
journalctl --vacuum-time=7d

# Clean npm cache
npm cache clean --force

Issue 2: Out of Memory

MemoryError: Unable to allocate...

Solution:

# Check memory usage
free -h

# Find memory-hungry processes
ps aux --sort=-%mem | head -10

# Restart services
systemctl restart dss-api dss-mcp

Issue 3: Service Not Running

systemctl status dss-api
● dss-api.service
   Loaded: loaded
   Active: failed (Result: exit-code)

Solution:

# Check why it failed
journalctl -u dss-api -n 50

# Try to start manually
cd /home/overbits/dss
uvicorn tools.api.server:app --host 0.0.0.0 --port 3456

# Check for errors in output

# If successful, restart service
systemctl restart dss-api

Error Resolution Checklist

  • Captured error message and stack trace
  • Identified error scope (browser/API/database/system)
  • Checked relevant logs
  • Identified root cause
  • Applied fix
  • Restarted affected services
  • Verified fix with health check
  • Tested functionality in browser
  • Documented issue and solution

Verification Steps

After applying fix:

  1. Health Check:
curl http://localhost:3456/health

Expected: {"status": "healthy", ...}

  1. Browser Check:
window.__DSS_BROWSER_LOGS.diagnostic()

Expected: errorCount: 0 (or reduced)

  1. Functionality Check: Test the specific feature that was failing

  2. Monitor: Watch for 5-10 minutes to ensure error doesn't recur


Success Criteria

  • Root cause identified
  • Fix applied and tested
  • Health check returns "healthy"
  • No new errors in browser logs
  • Functionality restored
  • Issue documented

Next Steps

  • If performance issues remain: Use Workflow 03 (Debug Performance)
  • If multiple errors persist: Consider full system restart
  • If complex issue: Create detailed diagnostic report
  • Document solution in .dss/KNOWN_ISSUES.md

  • .dss/DSS_DIAGNOSTIC_REPORT_20251206.md - Example diagnostic report
  • .dss/DEBUG_SESSION_SUMMARY.md - Previous debugging session
  • .dss/MCP_DEBUG_TOOLS_ARCHITECTURE.md - Debug tool architecture

MCP Tool Access

From Claude Code:

Use tool: dss_get_browser_errors
Use tool: dss_get_server_diagnostic

These retrieve error information automatically via MCP.