Files
dss/.dss/WORKFLOWS/02-diagnose-errors.md
Digital Production Factory 276ed71f31 Initial commit: Clean DSS implementation
Migrated from design-system-swarm with fresh git history.
Old project history preserved in /home/overbits/apps/design-system-swarm

Core components:
- MCP Server (Python FastAPI with mcp 1.23.1)
- Claude Plugin (agents, commands, skills, strategies, hooks, core)
- DSS Backend (dss-mvp1 - token translation, Figma sync)
- Admin UI (Node.js/React)
- Server (Node.js/Express)
- Storybook integration (dss-mvp1/.storybook)

Self-contained configuration:
- All paths relative or use DSS_BASE_PATH=/home/overbits/dss
- PYTHONPATH configured for dss-mvp1 and dss-claude-plugin
- .env file with all configuration
- Claude plugin uses ${CLAUDE_PLUGIN_ROOT} for portability

Migration completed: $(date)
🤖 Clean migration with full functionality preserved
2025-12-09 18:45:48 -03:00

478 lines
8.8 KiB
Markdown

# Workflow 02: Diagnose Errors
**Purpose**: Systematically diagnose and resolve errors in DSS dashboard and API
**When to Use**:
- Dashboard displays error messages
- API requests failing
- JavaScript exceptions in browser
- Server health check shows degraded status
- User reports functionality not working
**Estimated Time**: 10-30 minutes
---
## Prerequisites
- Browser logs captured (use Workflow 01 if needed)
- Access to server logs (`journalctl` or log files)
- API server running
- Database accessible
---
## Step-by-Step Procedure
### Step 1: Identify Error Scope
**Action**: Determine if error is browser-side, API-side, or database-side
**Browser-Side Check**:
```javascript
// In browser console
const errors = window.__DSS_BROWSER_LOGS.errors();
console.table(errors);
```
**API-Side Check**:
```bash
# Check API health
curl http://localhost:3456/health
```
**Expected Results**:
```json
{
"status": "healthy|degraded|error",
"database": "ok|error",
"mcp": "ok|error",
"figma": "ok|not_configured|error"
}
```
**Database Check**:
```bash
sqlite3 /home/overbits/dss/.dss/dss.db "SELECT name FROM sqlite_master WHERE type='table';"
```
**Decision Matrix**:
- Browser errors only → Browser-side issue (Step 2)
- API health degraded → Server-side issue (Step 3)
- Database errors → Database issue (Step 4)
- Multiple components failing → System-wide issue (Step 5)
---
### Step 2: Diagnose Browser-Side Errors
**Get Error Details**:
```javascript
const errors = window.__DSS_BROWSER_LOGS.errors();
errors.forEach(e => {
console.group(`ERROR: ${e.message}`);
console.log('Timestamp:', new Date(e.timestamp).toLocaleString());
console.log('Category:', e.category);
console.log('Data:', e.data);
if (e.data.stack) {
console.log('Stack:', e.data.stack);
}
console.groupEnd();
});
```
**Common Browser Errors**:
#### Error Type 1: Uncaught TypeError
```javascript
{
message: "Cannot read property 'x' of undefined",
category: "uncaughtError",
data: {
filename: "/admin-ui/js/core/app.js",
lineno: 42,
colno: 15
}
}
```
**Diagnosis**:
- Variable not initialized
- API response structure unexpected
- Async timing issue
**Solution**:
1. Check line 42 in app.js
2. Add null checks before property access
3. Verify API response format
---
#### Error Type 2: Network/Fetch Error
```javascript
{
message: "GET /api/config",
category: "fetchError",
data: {
error: "Failed to fetch",
url: "/api/config"
}
}
```
**Diagnosis**:
- API endpoint not responding
- CORS issue
- Network timeout
**Solution**:
1. Check if API server running: `ps aux | grep "uvicorn.*server:app"`
2. Test endpoint directly: `curl http://localhost:3456/api/config`
3. Check server logs: `journalctl -u dss-api -n 50`
---
#### Error Type 3: Module Loading Error
```javascript
{
message: "Failed to load module script",
category: "uncaughtError",
data: {
filename: "/admin-ui/js/core/missing-module.js"
}
}
```
**Diagnosis**:
- File not found (404)
- Import path incorrect
- Module syntax error
**Solution**:
1. Check file exists: `ls -la admin-ui/js/core/missing-module.js`
2. Check import paths in HTML and JS files
3. Check browser Network tab for 404 errors
---
### Step 3: Diagnose API-Side Errors
**Check API Health**:
```bash
curl http://localhost:3456/health
```
**If degraded, check server logs**:
```bash
# Last 100 lines
journalctl -u dss-api -n 100
# Follow live logs
journalctl -u dss-api -f
# Filter for errors
journalctl -u dss-api | grep -i error
```
**Common API Errors**:
#### Error Type 1: Import Error (like previous bug)
```
[HEALTH] Database error: NameError: name 'get_connection' is not defined
```
**Diagnosis**: Missing import in server.py
**Solution**:
1. Find the function being called
2. Check imports at top of server.py
3. Add missing import
4. Restart API server: `systemctl restart dss-api`
---
#### Error Type 2: Database Connection Error
```
sqlite3.OperationalError: unable to open database file
```
**Diagnosis**:
- Database file missing
- Permission denied
- File corrupted
**Solution**:
1. Check file exists: `ls -la .dss/dss.db`
2. Check permissions: `chmod 644 .dss/dss.db`
3. Check directory permissions: `chmod 755 .dss`
4. Verify database integrity: `sqlite3 .dss/dss.db "PRAGMA integrity_check;"`
---
#### Error Type 3: Port Already in Use
```
ERROR: [Errno 98] Address already in use
```
**Diagnosis**: Another process using port 3456
**Solution**:
```bash
# Find process using port
lsof -i :3456
# Kill process
kill -9 <PID>
# Restart API server
systemctl restart dss-api
```
---
### Step 4: Diagnose Database Errors
**Check Database Health**:
```bash
sqlite3 /home/overbits/dss/.dss/dss.db << EOF
PRAGMA integrity_check;
SELECT COUNT(*) as table_count FROM sqlite_master WHERE type='table';
.tables
EOF
```
**Expected Result**:
```
ok
22
ActivityLog Components ESREDefinitions ...
```
**Common Database Errors**:
#### Error Type 1: Locked Database
```
sqlite3.OperationalError: database is locked
```
**Diagnosis**: Another process has database open
**Solution**:
```bash
# Find processes with database open
lsof | grep dss.db
# If safe, close the connection
# Or restart API server: systemctl restart dss-api
```
---
#### Error Type 2: Corrupted Database
```
PRAGMA integrity_check;
*** in database main ***
Page 123: btree page has out-of-order cells
```
**Diagnosis**: Database file corrupted
**Solution**:
```bash
# Backup first
cp .dss/dss.db .dss/dss.db.backup
# Try to recover
sqlite3 .dss/dss.db ".recover" | sqlite3 .dss/dss_recovered.db
# If successful, replace
mv .dss/dss.db .dss/dss.db.corrupted
mv .dss/dss_recovered.db .dss/dss.db
# Restart API
systemctl restart dss-api
```
---
#### Error Type 3: Missing Tables
```
sqlite3.OperationalError: no such table: Projects
```
**Diagnosis**: Database not initialized or schema changed
**Solution**:
```bash
# Check if database has any tables
sqlite3 .dss/dss.db ".tables"
# If empty, reinitialize
cd /home/overbits/dss
python3 -c "from storage.database import init_database; init_database()"
# Restart API
systemctl restart dss-api
```
---
### Step 5: Diagnose System-Wide Issues
**Check all components**:
```bash
# API server
systemctl status dss-api
# MCP server
systemctl status dss-mcp
# Database
ls -lh .dss/dss.db
# Disk space
df -h .
# Memory
free -h
```
**Common System Issues**:
#### Issue 1: Out of Disk Space
```
No space left on device
```
**Solution**:
```bash
# Find large files
du -h . | sort -h | tail -20
# Clean up logs
journalctl --vacuum-time=7d
# Clean npm cache
npm cache clean --force
```
---
#### Issue 2: Out of Memory
```
MemoryError: Unable to allocate...
```
**Solution**:
```bash
# Check memory usage
free -h
# Find memory-hungry processes
ps aux --sort=-%mem | head -10
# Restart services
systemctl restart dss-api dss-mcp
```
---
#### Issue 3: Service Not Running
```
systemctl status dss-api
● dss-api.service
Loaded: loaded
Active: failed (Result: exit-code)
```
**Solution**:
```bash
# Check why it failed
journalctl -u dss-api -n 50
# Try to start manually
cd /home/overbits/dss
uvicorn tools.api.server:app --host 0.0.0.0 --port 3456
# Check for errors in output
# If successful, restart service
systemctl restart dss-api
```
---
## Error Resolution Checklist
- [ ] Captured error message and stack trace
- [ ] Identified error scope (browser/API/database/system)
- [ ] Checked relevant logs
- [ ] Identified root cause
- [ ] Applied fix
- [ ] Restarted affected services
- [ ] Verified fix with health check
- [ ] Tested functionality in browser
- [ ] Documented issue and solution
---
## Verification Steps
After applying fix:
1. **Health Check**:
```bash
curl http://localhost:3456/health
```
Expected: `{"status": "healthy", ...}`
2. **Browser Check**:
```javascript
window.__DSS_BROWSER_LOGS.diagnostic()
```
Expected: `errorCount: 0` (or reduced)
3. **Functionality Check**: Test the specific feature that was failing
4. **Monitor**: Watch for 5-10 minutes to ensure error doesn't recur
---
## Success Criteria
- ✅ Root cause identified
- ✅ Fix applied and tested
- ✅ Health check returns "healthy"
- ✅ No new errors in browser logs
- ✅ Functionality restored
- ✅ Issue documented
---
## Next Steps
- If performance issues remain: Use Workflow 03 (Debug Performance)
- If multiple errors persist: Consider full system restart
- If complex issue: Create detailed diagnostic report
- Document solution in `.dss/KNOWN_ISSUES.md`
---
## Related Documentation
- `.dss/DSS_DIAGNOSTIC_REPORT_20251206.md` - Example diagnostic report
- `.dss/DEBUG_SESSION_SUMMARY.md` - Previous debugging session
- `.dss/MCP_DEBUG_TOOLS_ARCHITECTURE.md` - Debug tool architecture
---
## MCP Tool Access
**From Claude Code**:
```
Use tool: dss_get_browser_errors
Use tool: dss_get_server_diagnostic
```
These retrieve error information automatically via MCP.