Resource Monitoring
Trailer.dev collects resource-usage metrics on each agent host and reports them to the server, where they are stored as time series and rendered as charts in the web UI. Three families of metrics are collected: host, per-workspace, and GPU.
What is collected
Section titled âWhat is collectedâHost metrics
Section titled âHost metricsâGathered on the agent host itself:
- Number of host CPUs.
- Host CPU usage percentage.
- Host memory: total, free, and used (in bytes).
- Disk IO: bytes read and written.
- Network: bytes received and transmitted.
- Process count.
Workspace metrics
Section titled âWorkspace metricsâGathered per running container, one entry per workspace:
- Workspace identity (id and name).
- Container CPU usage percentage.
- Container memory used.
- Disk IO: bytes read and written.
- Network: bytes received and transmitted.
- Process count.
GPU metrics
Section titled âGPU metricsâGathered per GPU (see GPU and NVIDIA acceleration):
- GPU identity (uuid and name).
- GPU usage percentage.
- GPU memory: total, free, and used.
- Temperature.
- Power draw.
How metrics flow
Section titled âHow metrics flowâMetrics ride along with the agentâs regular heartbeat. There is no separate metrics channel.
flowchart LR A[Agent host] -->|collect host, workspace, GPU metrics| B[Heartbeat] B --> C[Server] C -->|store time series| D[(Metrics store)] C -->|latest snapshot| E[(Host record)] F[Web UI] -->|query + realtime| C
- On each heartbeat, the agent attaches host, workspace, and GPU metrics to the request when collection is enabled and possible.
- The server writes the latest snapshot onto the host record and, when the matching per-host collection toggle is set, stores the data points as time series.
- The heartbeat response returns the hostâs current settings back to the agent, including the heartbeat interval and the three collection toggles. Toggling collection in the UI takes effect on the next heartbeat.
Collection toggles and retention
Section titled âCollection toggles and retentionâCollection is controlled per host by three toggles, one each for host, workspace, and GPU metrics. They are exposed in the Metrics Collection section of the host details page.
Each metric family has its own retention window in days, configurable per host. A background task periodically deletes data points older than the per-host retention cutoff.
Where to view metrics
Section titled âWhere to view metricsâHost monitor page
Section titled âHost monitor pageâThe host monitor page is the main view. It renders:
- Host charts: CPU, memory, IO, and network.
- GPU charts: utilization, memory, temperature, and power. Shown only when the host has GPUs.
- Workspace charts: per-workspace usage across the host.
A header summary shows the hostâs live CPU percentage and a used/total memory bar. Each chart group has a time-range selector with presets (5m, 15m, 1h, 6h, 24h, 48h, 7d) and a custom range. Longer ranges are downsampled to a target number of points; the field used for downsampling (for example CPU vs memory) is selectable in the chart toolbar.
Charts refetch on an interval equal to the hostâs heartbeat interval, so new data appears roughly as fast as the agent reports it. The page also subscribes to realtime updates on the host record, which keeps the header summary current.
When the host is offline the page shows a banner and dims the charts.
Per-workspace usage
Section titled âPer-workspace usageâOn the workspaces list, the resources cell exposes a popover with sparkline charts for that single workspace (CPU, memory, and GPU when applicable) over a recent window. It refreshes on the host heartbeat interval.
GPU snapshot
Section titled âGPU snapshotâThe host details page has a GPU section showing a live snapshot from the NVIDIA driver: driver version, CUDA version, and per-GPU details. This is the current state rather than the historical time series shown on the monitor page.