Skip to content
Trailer.devDocumentation

Search is only available in production builds. Try building and previewing the site to test it out locally.

Resource Monitoring

The host monitor page showing host CPU, memory, IO, and network charts with the time-range selector, plus the per-workspace charts below.

Trailer.dev collects resource-usage metrics on each agent host and reports them to the server, where they are stored as time series and rendered as charts in the web UI. Three families of metrics are collected: host, per-workspace, and GPU.

Gathered on the agent host itself:

  • Number of host CPUs.
  • Host CPU usage percentage.
  • Host memory: total, free, and used (in bytes).
  • Disk IO: bytes read and written.
  • Network: bytes received and transmitted.
  • Process count.

Gathered per running container, one entry per workspace:

  • Workspace identity (id and name).
  • Container CPU usage percentage.
  • Container memory used.
  • Disk IO: bytes read and written.
  • Network: bytes received and transmitted.
  • Process count.

Gathered per GPU (see GPU and NVIDIA acceleration):

  • GPU identity (uuid and name).
  • GPU usage percentage.
  • GPU memory: total, free, and used.
  • Temperature.
  • Power draw.

Metrics ride along with the agent’s regular heartbeat. There is no separate metrics channel.

flowchart LR
  A[Agent host] -->|collect host, workspace, GPU metrics| B[Heartbeat]
  B --> C[Server]
  C -->|store time series| D[(Metrics store)]
  C -->|latest snapshot| E[(Host record)]
  F[Web UI] -->|query + realtime| C
  1. On each heartbeat, the agent attaches host, workspace, and GPU metrics to the request when collection is enabled and possible.
  2. The server writes the latest snapshot onto the host record and, when the matching per-host collection toggle is set, stores the data points as time series.
  3. The heartbeat response returns the host’s current settings back to the agent, including the heartbeat interval and the three collection toggles. Toggling collection in the UI takes effect on the next heartbeat.

Collection is controlled per host by three toggles, one each for host, workspace, and GPU metrics. They are exposed in the Metrics Collection section of the host details page.

Each metric family has its own retention window in days, configurable per host. A background task periodically deletes data points older than the per-host retention cutoff.

The host monitor page is the main view. It renders:

  • Host charts: CPU, memory, IO, and network.
  • GPU charts: utilization, memory, temperature, and power. Shown only when the host has GPUs.
  • Workspace charts: per-workspace usage across the host.

A header summary shows the host’s live CPU percentage and a used/total memory bar. Each chart group has a time-range selector with presets (5m, 15m, 1h, 6h, 24h, 48h, 7d) and a custom range. Longer ranges are downsampled to a target number of points; the field used for downsampling (for example CPU vs memory) is selectable in the chart toolbar.

Charts refetch on an interval equal to the host’s heartbeat interval, so new data appears roughly as fast as the agent reports it. The page also subscribes to realtime updates on the host record, which keeps the header summary current.

When the host is offline the page shows a banner and dims the charts.

On the workspaces list, the resources cell exposes a popover with sparkline charts for that single workspace (CPU, memory, and GPU when applicable) over a recent window. It refreshes on the host heartbeat interval.

The host details page has a GPU section showing a live snapshot from the NVIDIA driver: driver version, CUDA version, and per-GPU details. This is the current state rather than the historical time series shown on the monitor page.