When a node is misbehaving — events not flowing, peers not syncing, tasks stalling — the diagnostics API and kernel console give you a structured view into what the network bridge is actually doing. Rather than grepping raw logs, you can query the running node directly and get a snapshot of gossip topics, backfill cursors, transport events, and peer state all in one place.
GET /api/diagnostics
The primary diagnostics endpoint reads the node’s diagnostics/wattswarm_node.jsonl log and combines it with a live bridge observability snapshot.
curl http://127.0.0.1:7788/api/diagnostics | jq .
You can filter the event list with query parameters:
| Parameter | Description |
|---|
limit | Maximum number of diagnostic events to return |
level | Filter by log level (e.g. error, warn, info) |
component | Filter by component name (e.g. gossip, backfill) |
category | Filter by event category |
mode | Filter by mode field |
phase | Filter by lifecycle phase |
event_id | Filter by specific SEL event ID |
object_id | Filter by task, run, or object ID |
source_node_id | Filter events from a specific remote node |
search | Free-text search across event payloads |
What the response includes
The response object contains the following fields:
network_service_started — true if the P2P bridge started successfully during this node session.
network_service_status — current bridge status string (e.g. running, stopped).
snapshot — the latest bridge observability snapshot, which includes:
p2p_foundation — "iroh" on the standard state-dir startup path; indicates which transport stack is active.
local_iroh_endpoint_id — the Iroh NodeId / EndpointId for this node’s active endpoint.
- Iroh gossip topic IDs joined — the set of deterministic gossip topic IDs derived from
network_id + scope + gossip_kind.
- Known Iroh contact count — number of peers for which persisted Iroh contact material is available.
legacy_transport_active — false on normal state-dir startup. If p2p_foundation is "iroh", legacy transport fields in the snapshot are compatibility placeholders only.
diagnostics — array of structured transport, gossip, backfill, and agent-callback event entries from the JSONL log.
Accessing diagnostics via the UI
The kernel console includes a formatted diagnostics view. Open http://127.0.0.1:7788/diagnostics in your browser to see the same data rendered with filtering controls. Use this when you want to scan events visually rather than parsing JSON.
Open the /swarm dashboard at http://127.0.0.1:7788/swarm to watch real-time task state transitions driven by live kernel calls. The dashboard calls /api/swarm/state and /api/swarm/tick against actual executor runtimes, so the panels reflect true SEL and projection state — not a simulation.
CLI log commands
The wattswarm log subcommands give you direct access to the append-only Structured Event Log (SEL) stored in PostgreSQL.
# Show the latest events in the SEL (most recent head sequence number and entries)
wattswarm log head --pg-url postgres://postgres:postgres@127.0.0.1:55432/wattswarm
# Replay all events and rebuild projections from scratch
wattswarm log replay --pg-url postgres://postgres:postgres@127.0.0.1:55432/wattswarm
# Verify log integrity — checks sequence continuity and hash chain
wattswarm log verify --pg-url postgres://postgres:postgres@127.0.0.1:55432/wattswarm
Use log replay after a crash or manual schema change to rebuild the node’s projection tables from the raw event history. Use log verify to confirm the log has not been truncated or tampered with.
Node status
To check the node’s current identity, network membership, and P2P info:
curl http://127.0.0.1:7788/api/node/status | jq .
The response includes node_id, running, mode, local_protocol_version, and peer_protocol_distribution — a map of protocol version strings to the count of peers seen at each version.
Peers list
To see all discovered peers and how they were found:
curl http://127.0.0.1:7788/api/peers/list | jq .
# or via CLI
wattswarm peers list --pg-url postgres://postgres:postgres@127.0.0.1:55432/wattswarm
Each peer entry includes a source_kind that tells you how the node learned about it:
source_kind | Meaning |
|---|
udp | Discovered via UDP multicast or broadcast announce |
bootstrap | Loaded from startup bootstrap_contacts |
connected | Established an active Iroh session |
identify | Learned via P2P handshake/identify |
bootstrap_index | Found via bootstrap index lookup |
local_discovery | Discovered via mDNS (legacy compatibility path) |
Checking executor health from the kernel
To trigger an executor health check from within the running kernel (rather than from the CLI), send a POST request directly:
curl -X POST http://127.0.0.1:7788/api/executors/check \
-H "Content-Type: application/json" \
-d '{"name": "rt"}'
The kernel calls the executor’s /health and /capabilities endpoints and returns a structured result. This is the same check the worker performs before dispatching a step.
Common investigation patterns
Missing events between nodes
If you expect events from a remote node but they are not appearing in the local SEL, check the backfill cursor state in the diagnostics log:
curl 'http://127.0.0.1:7788/api/diagnostics?component=backfill&limit=50' | jq '.diagnostics'
Look for backfill events where phase is request but there is no corresponding response, or where error payloads appear. A persistent gap in cursors means the peer is either unreachable or returning empty backfill responses. Check that the remote node is UP and that its head_seq is advancing with wattswarm log head.
Slow finalization
If tasks are taking longer than expected to finalize, check the resolution path breakdown in the run result:
curl 'http://127.0.0.1:7788/api/run/result?run_id=<run-id>' | jq '.result.aggregation.resolution_paths'
The resolution_paths field shows which aggregation steps were taken (e.g. TIE, REEXPLORE, STOCHASTIC). Cross-reference with aggregation.null_resolution to see if null paths were triggered. For latency profiling, the runtime metrics endpoint exposes true p95 latency computed from collected samples.
Sync issues after reconnect
If a node reconnects after a partition and events are still missing, look for gossip backfill events with error payloads in the diagnostics log:
curl 'http://127.0.0.1:7788/api/diagnostics?component=gossip&level=error' | jq '.diagnostics'
Backfill is lane-aware and persists cursor state across reconnects, so a successfully reconnected peer should resume from where it left off. Error payloads here indicate the remote peer rejected or could not serve the backfill range — check whether the remote node’s event log covers the requested sequence range.