Troubleshooting
Voltis is designed for reliability, but edge environments can introduce challenges like network instability or resource constraints. This guide covers common errors, diagnostic steps, and resolutions based on source code behaviors and typical failure modes.
General Debugging Tips
Enable Verbose Logging: The daemon uses
slogat debug level by default. Redirect output:voltis daemon 2>&1 | tee daemon.logLook for prefixes like “system::daemon”, “api::server”.
CLI Verbose: Add
-vflag (future; currently logs to stderr).Database Inspection: SQLite tools for debugging:
sqlite3 /var/lib/voltis/voltis.sqlite.db .tables # List: workloads, services, etc. SELECT * FROM workloads; # Check active/digestsSystem Logs:
journalctl -u voltis -f --allService Logs:
journalctl -u docker -f.Reconciliation: Force by restarting daemon; loop runs every 3s.
Test Connectivity:
voltis pingandcurl http://<node>:4650/ping.Test Health:
voltis healthandcurl http://<node>:4650/health.
Common Issues
1. Daemon Startup Failures
Symptoms: “voltis daemon” exits immediately; no API on port 4650.
Causes and Fixes:
- DB Permissions: Error like “unable to open database file”.
- Fix:
sudo mkdir -p /var/lib/voltis && sudo chown $USER /var/lib/voltis. - Use
--store ~/voltis.dbfor testing.
- Fix:
- Port Conflict: “bind: address already in use”.
- Fix:
lsof -i :4650to kill conflicting process; use--listen-address :4651.
- Fix:
- Missing Dependencies: “no such table: workloads” on first run.
- Fix: Schema auto-runs; if fails, delete DB and restart.
- SIGTERM Handling: Daemon ignores signals if not foreground.
- Fix: Run with
nohupor systemd service.
- Fix: Run with
- Other Issues: Remove entire deb package
- Fix:
sudo dpkg --purge voltis - Ensure
/var/lib/voltisis emptysudo rm -rf /var/lib/voltis
- Fix:
Diagnostic:
voltis daemon --store test.db --listen-address :4651 2>&1 | grep ERROR
netstat -tlnp | grep 4651 # Check listening2. CLI Connectivity Errors
Symptoms: “dial tcp: connection refused” or “parse api url: invalid”.
Causes and Fixes:
- Wrong Address: Default localhost; remote node unreachable.
- Fix:
export VOLTIS_API_ADDRESS=http://192.168.1.100:4650or--address.
- Fix:
- Firewall: Port 4650 blocked.
- Fix:
sudo ufw allow 4650or cloud security groups.
- Fix:
- HTTPS Mismatch: Daemon is HTTP-only.
- Fix: Use proxy for TLS; client doesn’t support HTTPS yet.
- Invalid URL: Malformed env var.
- Fix: Validate with
curl $VOLTIS_API_ADDRESS/health.
- Fix: Validate with
Diagnostic:
echo $VOLTIS_API_ADDRESS # Verify
telnet <host> 4650 # Test port
voltis health # Exit 0 if OK3. Workload Push/Install Failures
Symptoms: “oldString not found” (edit error?); “tar: invalid” or task execution fails.
Causes and Fixes:
- Missing voltis.toml: Build warns; push fails validation.
- Fix: Ensure root has valid TOML; lint with
toml lint voltis.toml.
- Fix: Ensure root has valid TOML; lint with
- Tarball Corruption: Gzip/tar issues during build/transfer.
- Fix: Rebuild:
voltis workload buildfile . --output fresh.tar.gz; verifygunzip -c fresh.tar.gz | tar tv.
- Fix: Rebuild:
- Task Execution Errors: Shell cmds fail (e.g., “apt: command not found”).
- Fix: Run tasks manually on node:
task -f service1.voltis.taskfile.yml action.install. - Check idempotency: Status cmds must return 0 if up-to-date.
- Fix: Run tasks manually on node:
- Digest Mismatch: Push overwrites but DB conflict.
- Fix: Use unique names; delete old:
curl -X DELETE $VOLTIS_API_ADDRESS/workload/old-name.
- Fix: Use unique names; delete old:
- Size Limits: Large tarballs (>100MB) timeout.
- Fix: Compress better or split workloads.
Diagnostic:
voltis workload push test.tar.gz --name debug --status inactive # Dry-run inactive
# On node: tail -f daemon.log during push
sqlite3 voltis.db "SELECT name, message FROM workloads;"4. Service State Mismatches
Symptoms: voltis service list shows “current=failed” or drifts after reboot.
Causes and Fixes:
- Systemd Issues: Unit not found or misconfigured.
- Fix: Ensure
extras/*.servicecopied correctly; runsystemctl daemon-reloadin taskfile. - Verify:
systemctl status <unit>; check journal:journalctl -u <unit> -e.
- Fix: Ensure
- Preconditions Fail: Task skips due to unmet checks.
- Fix: Adjust preconditions (e.g., add fallbacks); make robust.
- Reconciliation Loop Stuck: Ticker not firing.
- Fix: Restart daemon; check goroutines in logs.
- Resource Limits: OOM on edge device.
- Fix: Monitor with
free -h; optimize tasks (e.g., no parallel apt).
- Fix: Monitor with
Diagnostic:
voltis service list # Spot mismatches
# Force reconcile: POST /service with state update
journalctl -u voltis -f & voltis workload active --name my-workload # Watch logs5. Package/Job Problems
Symptoms: Packages not installing; jobs not completing.
Causes and Fixes:
- No Taskfile: Component listed but missing *.taskfile.yml.
- Fix: Add taskfile or remove from voltis.toml.
- Continuous Jobs Hanging: Infinite loops in tasks.
- Fix: Add timeouts in cmds (e.g.,
timeout 300s my-script); setcontinuous: falseif one-shot.
- Fix: Add timeouts in cmds (e.g.,
- Version Conflicts: Apt holds or pinned versions.
- Fix: Use
apt-mark holdin uninstall; specify exact versions.
- Fix: Use
Diagnostic:
voltis package list # Check installed/message
# For jobs: GET /job/{name}/logs (if implemented)
ps aux | grep task # Running tasks6. API and Network Errors
Symptoms: 5xx responses; timeouts.
Causes and Fixes:
- DB Locks: Concurrent access (rare, single-threaded).
- Fix: Retry operations; avoid manual DB edits.
- Streaming Failures: Logs endpoint hangs.
- Fix: Use ?follow=false for snapshots.
- CORS/Proxy Issues: If using frontend.
- Fix: Configure proxy headers.
Diagnostic:
curl -v http://localhost:4650/workload # Verbose HTTP
# Check daemon: netstat -tlnp | grep 4650Advanced Diagnostics
- Profile Daemon: Add pprof (future; extend server.go).
- Trace Reconciliation: Add logs in
Reconcile():slog.Debug("Checking service", "name", svc.Name). - Simulate Failures: Stop systemd; observe loop recovery.
- DB Vacuum:
sqlite3 voltis.db "VACUUM;"for bloat.
When to Seek Help
- Check GitHub issues: Search for error messages.
- Community: Voltis Discord/Slack (future).
- Source: Dive into
pkg/system/workload_controller.gofor custom fixes. - Logs: Always include full daemon.log and command output in reports.
If unresolved, file an issue with: OS/version, steps to repro, logs, DB schema dump.
Next: Security Considerations