Troubleshooting

Voltis is designed for reliability, but edge environments can introduce challenges like network instability or resource constraints. This guide covers common errors, diagnostic steps, and resolutions based on source code behaviors and typical failure modes.

General Debugging Tips

  • Enable Verbose Logging: The daemon uses slog at debug level by default. Redirect output:

    voltis daemon 2>&1 | tee daemon.log

    Look for prefixes like “system::daemon”, “api::server”.

  • CLI Verbose: Add -v flag (future; currently logs to stderr).

  • Database Inspection: SQLite tools for debugging:

    sqlite3 /var/lib/voltis/voltis.sqlite.db
    .tables  # List: workloads, services, etc.
    SELECT * FROM workloads;  # Check active/digests
  • System Logs: journalctl -u voltis -f --all

  • Service Logs: journalctl -u docker -f.

  • Reconciliation: Force by restarting daemon; loop runs every 3s.

  • Test Connectivity: voltis ping and curl http://<node>:4650/ping.

  • Test Health: voltis health and curl http://<node>:4650/health.

Common Issues

1. Daemon Startup Failures

Symptoms: “voltis daemon” exits immediately; no API on port 4650.

Causes and Fixes:

  • DB Permissions: Error like “unable to open database file”.
    • Fix: sudo mkdir -p /var/lib/voltis && sudo chown $USER /var/lib/voltis.
    • Use --store ~/voltis.db for testing.
  • Port Conflict: “bind: address already in use”.
    • Fix: lsof -i :4650 to kill conflicting process; use --listen-address :4651.
  • Missing Dependencies: “no such table: workloads” on first run.
    • Fix: Schema auto-runs; if fails, delete DB and restart.
  • SIGTERM Handling: Daemon ignores signals if not foreground.
    • Fix: Run with nohup or systemd service.
  • Other Issues: Remove entire deb package
    • Fix: sudo dpkg --purge voltis
    • Ensure /var/lib/voltis is empty sudo rm -rf /var/lib/voltis

Diagnostic:

voltis daemon --store test.db --listen-address :4651 2>&1 | grep ERROR
netstat -tlnp | grep 4651  # Check listening

2. CLI Connectivity Errors

Symptoms: “dial tcp: connection refused” or “parse api url: invalid”.

Causes and Fixes:

  • Wrong Address: Default localhost; remote node unreachable.
    • Fix: export VOLTIS_API_ADDRESS=http://192.168.1.100:4650 or --address.
  • Firewall: Port 4650 blocked.
    • Fix: sudo ufw allow 4650 or cloud security groups.
  • HTTPS Mismatch: Daemon is HTTP-only.
    • Fix: Use proxy for TLS; client doesn’t support HTTPS yet.
  • Invalid URL: Malformed env var.
    • Fix: Validate with curl $VOLTIS_API_ADDRESS/health.

Diagnostic:

echo $VOLTIS_API_ADDRESS  # Verify
telnet <host> 4650        # Test port
voltis health             # Exit 0 if OK

3. Workload Push/Install Failures

Symptoms: “oldString not found” (edit error?); “tar: invalid” or task execution fails.

Causes and Fixes:

  • Missing voltis.toml: Build warns; push fails validation.
    • Fix: Ensure root has valid TOML; lint with toml lint voltis.toml.
  • Tarball Corruption: Gzip/tar issues during build/transfer.
    • Fix: Rebuild: voltis workload buildfile . --output fresh.tar.gz; verify gunzip -c fresh.tar.gz | tar tv.
  • Task Execution Errors: Shell cmds fail (e.g., “apt: command not found”).
    • Fix: Run tasks manually on node: task -f service1.voltis.taskfile.yml action.install.
    • Check idempotency: Status cmds must return 0 if up-to-date.
  • Digest Mismatch: Push overwrites but DB conflict.
    • Fix: Use unique names; delete old: curl -X DELETE $VOLTIS_API_ADDRESS/workload/old-name.
  • Size Limits: Large tarballs (>100MB) timeout.
    • Fix: Compress better or split workloads.

Diagnostic:

voltis workload push test.tar.gz --name debug --status inactive  # Dry-run inactive
# On node: tail -f daemon.log during push
sqlite3 voltis.db "SELECT name, message FROM workloads;"

4. Service State Mismatches

Symptoms: voltis service list shows “current=failed” or drifts after reboot.

Causes and Fixes:

  • Systemd Issues: Unit not found or misconfigured.
    • Fix: Ensure extras/*.service copied correctly; run systemctl daemon-reload in taskfile.
    • Verify: systemctl status <unit>; check journal: journalctl -u <unit> -e.
  • Preconditions Fail: Task skips due to unmet checks.
    • Fix: Adjust preconditions (e.g., add fallbacks); make robust.
  • Reconciliation Loop Stuck: Ticker not firing.
    • Fix: Restart daemon; check goroutines in logs.
  • Resource Limits: OOM on edge device.
    • Fix: Monitor with free -h; optimize tasks (e.g., no parallel apt).

Diagnostic:

voltis service list  # Spot mismatches
# Force reconcile: POST /service with state update
journalctl -u voltis -f & voltis workload active --name my-workload  # Watch logs

5. Package/Job Problems

Symptoms: Packages not installing; jobs not completing.

Causes and Fixes:

  • No Taskfile: Component listed but missing *.taskfile.yml.
    • Fix: Add taskfile or remove from voltis.toml.
  • Continuous Jobs Hanging: Infinite loops in tasks.
    • Fix: Add timeouts in cmds (e.g., timeout 300s my-script); set continuous: false if one-shot.
  • Version Conflicts: Apt holds or pinned versions.
    • Fix: Use apt-mark hold in uninstall; specify exact versions.

Diagnostic:

voltis package list  # Check installed/message
# For jobs: GET /job/{name}/logs (if implemented)
ps aux | grep task  # Running tasks

6. API and Network Errors

Symptoms: 5xx responses; timeouts.

Causes and Fixes:

  • DB Locks: Concurrent access (rare, single-threaded).
    • Fix: Retry operations; avoid manual DB edits.
  • Streaming Failures: Logs endpoint hangs.
    • Fix: Use ?follow=false for snapshots.
  • CORS/Proxy Issues: If using frontend.
    • Fix: Configure proxy headers.

Diagnostic:

curl -v http://localhost:4650/workload  # Verbose HTTP
# Check daemon: netstat -tlnp | grep 4650

Advanced Diagnostics

  • Profile Daemon: Add pprof (future; extend server.go).
  • Trace Reconciliation: Add logs in Reconcile(): slog.Debug("Checking service", "name", svc.Name).
  • Simulate Failures: Stop systemd; observe loop recovery.
  • DB Vacuum: sqlite3 voltis.db "VACUUM;" for bloat.

When to Seek Help

  • Check GitHub issues: Search for error messages.
  • Community: Voltis Discord/Slack (future).
  • Source: Dive into pkg/system/workload_controller.go for custom fixes.
  • Logs: Always include full daemon.log and command output in reports.

If unresolved, file an issue with: OS/version, steps to repro, logs, DB schema dump.

Next: Security Considerations