Releases

Changelog and version history for the Tropic platform.

v1.37.72026-05-30
  • Follow-up to v1.37.6 telemetry fix: skip the OpenClaw-CLI blocks in `applyFirstStartConfig` for NanoClaw containers. NanoClaw runs a different runtime and doesn't ship the `openclaw` CLI; previously the telemetry-heal and memory-swap blocks would emit `openclaw: command not found` warns on NanoClaw containers during resync. The typed-hook gate only applies to standard OpenClaw containers anyway.
FixedGate the telemetry-heal and memory-swap blocks in `ContaboInstanceService.applyFirstStartConfig` on `inst.runtime !== 'nanoclaw'`. Standard OpenClaw containers continue to run both blocks as before.
v1.37.62026-05-30
  • Telemetry pipeline self-heal. `oc.llm.usage` and `oc.model.cost` events have been silently dropped on every Community VM (and any Cloud VM provisioned before 2026-05-17) since 2026-05-16 — when OpenClaw 2026.4.23 rolled out and started enforcing typed-hook gating. The metrics dashboard read from those exact events, which is why Community instances looked empty. Fixed both the image bake and added a runtime self-heal so existing containers come back into telemetry on the next restart.
FixedImage bake (`api/packer/scripts/lib/install-tropic-plugins.sh`) now sets `plugins.entries.tropic-telemetry.hooks.allowConversationAccess = true` (and the same for sondera and policy-check, for consistency). OpenClaw ≥ 2026.4.23 silently drops typed hooks like `llm_output`, `agent_end`, and `before_agent_reply` for non-bundled plugins without this flag — that's why `tropic-telemetry`'s `llm_output` handler stopped firing on 2026-05-16, taking `oc.llm.usage` / `oc.model.cost` with it.
AddedRuntime self-heal in `ContaboInstanceService.applyFirstStartConfig`. Runs three `openclaw config set …allowConversationAccess true` commands (tropic-telemetry, sondera, policy-check) before `syncAllToInstance`, so the gateway restart that step triggers also picks up the config write. Containers built before the image fix come back into telemetry on the next `restart-with-latest` or `resync-config`. Failure is logged but not fatal.
v1.37.52026-05-30
  • Policy-check classifier no longer waits 10s per call when the internal Ollama proxy is down. Added a cached health probe + dropped the system Anthropic Haiku fallback in favour of routing through whichever classifier-capable key the org has configured (Gemini Flash, their own Anthropic key, or OpenRouter Haiku). Diagnosed off Community VM 54 which was spending ~33 seconds per "hello" turn on classifier waits.
Changed`OLLAMA_TIMEOUT_MS` dropped from 10s to 2s. With the health-check cache (next item) we usually skip the call entirely while Ollama is unhealthy, so this only fires when the probe thinks Ollama is up but `/api/chat` itself hangs.
AddedOllama health probe at `/api/tags` with a 1.5s timeout, cached for 60s at module level. A failed real classify call also marks Ollama unhealthy in the cache, so a single live failure trips the breaker without waiting for the next probe.
RemovedSystem `CLAUDE_API_KEY` fallback. Per product decision the classifier should not run on Tropic's billing account when the org has no applicable key — better to fail open and surface the gap.
AddedOrg-key fallback ladder: Gemini Flash → org's own Anthropic Haiku → OpenRouter `anthropic/claude-haiku-4.5`. Picked from `OrgSecret` rows via `CredentialsService.decrypt`. If the org has none of these, classify fails open with a clear warn log naming the org id.
Changed`classify` and `classifyIntent` now take an `organizationId` parameter. `PolicyCheckController` looks it up from the `ManagedInstance` row resolved by the telemetry token and passes it through.
Changed`PolicyCheckModule` now imports `CredentialsModule`. `PolicyCheckService` constructor takes `PrismaService` + `CredentialsService` so it can decrypt org secrets at classify time.
v1.37.42026-05-29
  • Community gets the same Tropic RAG memory toggle Cloud has — but as an opt-OUT, since the Contabo image bakes tropic-memory in as the memory slot owner by default. Unchecking now actually swaps the backend to memory-core via the openclaw CLI inside the container during applyFirstStartConfig.
AddedCommunity launch form: real `Tropic RAG memory` checkbox (`communityUseTropicMemory` state, defaults to true). Replaces the static info card from v1.37.3 that just told you it was always on.
Added`CreateContaboInstanceDto.useTropicMemory` (optional boolean, `class-validator`-validated). `ContaboInstanceService.provisionForUser` persists it to `ManagedInstance.useTropicMemory` (column already existed for the Cloud path).
Added`applyFirstStartConfig` swap path: when `inst.useTropicMemory === false`, runs `openclaw plugins disable tropic-memory || true; openclaw plugins enable memory-core; openclaw config set plugins.slots.memory memory-core` inside the container via the existing `instanceExec.runShell` plumbing. Runs BEFORE `syncAllToInstance` so the gateway restart that step triggers covers the memory swap too (no extra restart).
ChangedCLI-only swap, per the project rule against jq-ing openclaw.json at runtime. Failure is logged but not fatal — provisioning continues.
v1.37.32026-05-29
  • Launch modal polish: every dropdown is now the project-standard `DropdownSelect` (no more macOS-native menu jumping out of a Tailwind modal), Community and Cloud share the same Runtime-then-Region order, and the OpenClaw Version selector is gone from Cloud (auto-update picks the rolled-out tag anyway).
ChangedThe AI Model picker (introduced in v1.37.2) and the AWS Region picker on the Cloud form both switched from native `<select>` to `DropdownSelect`. Native selects render the OS menu, which sits above the modal in a different style and looks broken next to the Tailwind controls around it.
ChangedField order in Community now matches Cloud: AI Model → Runtime → Region. Previously Community had Region first which made the two tiers feel inconsistent when you switched between cards.
RemovedOpenClaw Version dropdown from the Cloud launch form. With the auto-update path (Phase 1-2 of the OpenClaw auto-update workstream) the rolled-out tag is selected centrally; letting each launch pick a version was off-trend with where provisioning is heading and added a field most users never touched.
v1.37.22026-05-29
  • Launch modal: replaced the per-provider card stack on the Cloud form with a single AI Model dropdown listing every model the user has actually configured (provider API key in Secrets, or ChatGPT OAuth). Same dropdown now also appears in the Community launch form so both tiers feel identical.
ChangedCloud launch form: dropped the 3-card `CloudModelPicker` (one tile per provider with an inline dropdown when the provider had multiple models). Replaced with `ConfiguredModelDropdown` — one `<select>` whose entries are `Provider Name — Model Name` for every enabled model the user has the env var key for, plus the OAuth (`openai-codex/*`) entries when `OPENAI_OAUTH_CONNECTED` is set. Empty state (no keys, no OAuth) renders the existing `Open Secrets →` link.
AddedCommunity launch form: new AI Model dropdown (`ConfiguredModelDropdown`) before Region + Runtime. Previously the form had no model selector at all; Community instances silently picked from `orgSetting.openclawModel` or `pickBestModel()`. Now the user gets the same explicit picker the Cloud form has.
Changed`POST /contabo/instances` accepts an optional `model` field (validated as a string in `CreateContaboInstanceDto`); `ContaboInstanceService.provisionForUser` resolves selectedModel in this order: explicit `model` arg → `orgSetting.openclawModel` → `pickBestModel(orgSecrets)`. Without (1), the launch modal's choice was being silently discarded.
RemovedPer-provider tile UI (`CloudModelPicker` function, ~80 LOC) — no longer referenced.
v1.37.12026-05-29
  • Three fixes against Phase 2 of the OpenClaw auto-update path, all verified end-to-end on a real provision in ap-southeast-1: SSH access to Docker-EC2 boxes now works, container nginx no longer races a redundant host nginx for port 80, and the CI notify step succeeds for every release tag instead of dying on an IAM permission gap.
FixedSSH key plumbing on Docker EC2. The bootstrap deletes the stock `ubuntu` user (to give uid 1000 to `tropic`), which threw away the `/home/ubuntu/.ssh/authorized_keys` file AWS populates from the launch KeyName. The box ended up unreachable over SSH. Fix: read `authorized_keys` into a shell variable before `userdel`, write it back to `/home/tropic/.ssh/authorized_keys` after `useradd`, with `chmod 700`/`chmod 600` and `chown tropic:tropic`. Confirmed working: `ssh tropic@<ec2-ip>` now succeeds with `~/tropic-prod.pem`.
FixedHost vs container nginx racing for port 80. The container runs `--network host`, which means its internal nginx (serving the entrypoint's `/files` routing) wants the host's `0.0.0.0:80`. The previous bootstrap also installed nginx on the host with a `gateway` site enabled, and whichever started second silently lost — `[entrypoint] nginx failed to start; /files routing unavailable, gateway still up on :18789` was the smoking gun. Fix: drop nginx from the host package install + drop the host `/etc/nginx/sites-available/gateway` write + drop the `systemctl restart nginx` from the bootstrap. The container's nginx now owns port 80 cleanly. Confirmed: `ss -tlnp` shows only the container's nginx PID holding `:80`.
FixedCI workflow notify step. `aws ecr describe-images` was called to resolve the just-pushed image digest, but the CI's `tropic-ecr-push` IAM user is push-only and lacks `ecr:DescribeImages` — every notify call exited non-zero, so no `OpenclawImage` row was inserted after v1.37.0 shipped. Fix: give the `docker/build-push-action@v5` step an `id` and read `${{ steps.<id>.outputs.digest }}` directly. The build action already knows the digest it pushed; no AWS call needed, no IAM change needed. Also: the `TROPIC_API_URL` GitHub secret was set without the `/api` global prefix that the NestJS app uses (`POST tropic-api.fly.dev/admin/openclaw-images` returns 404). Updated the secret to `https://tropic-api.fly.dev/api`. Verified end-to-end: a manual `gh workflow run` against the patched workflow inserted the expected row with the correct digest.
v1.37.02026-05-29
  • Phase 2 of OpenClaw auto-update: a new flag-gated EC2 provisioning path that boots stock Ubuntu 24.04 Noble, installs Docker via UserData, pulls the latest soaked `tropic-openclaw` image from ECR, and runs OpenClaw inside a container. The legacy custom-AMI path is unchanged and still the default; `USE_DOCKER_EC2=true` flips a new EC2 onto the Docker path. Plugin parity with the AMI path (sondera, policy-check, tropic-telemetry, tropic-memory) is a follow-up before flipping the flag globally.
Added`OpenclawImagesService.getLatestCompletedTag()`: returns the newest image with `status=complete`, or null. Used by the Docker provisioning path to pick which tag a fresh EC2 should pull. Picking "latest complete" means new boxes converge to where the rest of the fleet already is, rather than racing onto a still-soaking release.
Added`api/src/vm/docker-userdata.ts`: pure builder for the bash UserData script. Installs Docker + AWS CLI v2, creates the `tropic` user, writes `/etc/tropic/environment` + the openclaw.json into a persistent volume at `/var/lib/tropic/openclaw`, authenticates Docker to ECR via the instance role, pulls the explicit tag (never `:latest`), and writes a systemd unit wrapping `docker run --network host --rm`. 10 unit tests covering quoting, env var inclusion, and image reference shape.
Added`ensureSSMRole()`: idempotently attaches `AmazonEC2ContainerRegistryReadOnly` to the existing `TropicVMSSMRole` so EC2 instances can `docker pull` from ECR using their instance role. `AttachRolePolicy` on an already-attached policy returns 200, so this is safe to run every provision; legacy AMI instances simply ignore the extra grant.
Added`resolveUbuntuNobleAmi()`: picks the latest Canonical-published Noble 24.04 AMI for the region + architecture. Phase-2+ Docker EC2s use this instead of the Tropic custom AMI. (Bootstrap script's docker apt source references `noble`, so we can't fall back to Jammy.)
Changed`provisionVmAsync()` branches on `USE_DOCKER_EC2` env var. When `true`, uses Noble AMI + Docker UserData; otherwise the legacy AMI path. Flag is off by default; production stays on AMI until we flip it after Phase 2 soak.
Changed`AdminModule` exports `OpenclawImagesService`; `VmModule` forwardRef-imports `AdminModule` so `VmProvisioningService` can consume it without a circular dependency. DI graph verified at runtime.
v1.36.02026-05-29
  • Phase 1 of OpenClaw auto-update: every CI image publish now records a row in the new `openclaw_images` table via `POST /admin/openclaw-images`. A read-only admin page at `/admin/openclaw-updates` lists published images with their soak countdown. No instances are touched yet — Phase 2 (EC2 Docker-on-Ubuntu provisioning), Phase 3 (unified update entrypoint), and Phase 4 (scheduler + staggered rollout worker) land in subsequent releases.
AddedPrisma tables `openclaw_images` and `instance_update_attempts`, plus observability columns `current_image_tag` and `last_update_attempt_id` on `managed_instances`. Foundation for the auto-update rollout/retry machinery described in `docs/superpowers/specs/2026-05-28-openclaw-auto-update-design.md`.
AddedNestJS module `api/src/admin/openclaw-images.{service,controller}.ts` exposing idempotent `POST /admin/openclaw-images` (CI publish hook), `GET /admin/openclaw-images` (list), and `GET /admin/openclaw-images/:tag` (detail). All routes guarded by `@SuperAdmin()`; idempotent POST avoids 409 on CI retry.
AddedCI workflow `build-openclaw-image.yml` gains a "Notify Tropic API" step after the ECR push that resolves the digest via `aws ecr describe-images` and calls the new endpoint. Safe no-op when `TROPIC_API_URL` / `TROPIC_ADMIN_API_KEY` GitHub secrets are not yet configured.
AddedAdmin page `/admin/openclaw-updates`: read-only Section A listing tag, OpenClaw npm version, built-relative, scheduled-rollout-relative, and status badge (soaking / rolling_out / complete / halted). Halt / rollback controls and the failing-instance Section B follow in Phase 4.
v1.35.72026-05-27
  • Fix Contabo OAuth completion: the gateway restart that re-reads `auth-profiles.json` now actually restarts the container. The previous path ran `openclaw gateway restart` inside the container, which depends on systemd (absent in our Contabo image) and silently no-ops, leaving the gateway serving the pre-OAuth in-memory state until something else recycled the container. New `ContaboInstanceService.restartContainer()` does a host-side `docker restart` over SSH instead — the actual supervisor we have.
Added`ContaboInstanceService.restartContainer(instanceId)`: SSHes to the host as root and runs `docker restart tropic-<id>`. Volume-preserving, no image pull, no tear-down/bring-up dance — the right primitive for "the running gateway needs to re-read its on-disk config".
FixedOAuth completion path (`VmService.completeOpenAIOAuthInner`) now branches on `instance.type === 'contabo'` and uses `ContaboInstanceService.restartContainer` instead of the systemd-dependent shell command. EC2 and Nanoclaw paths unchanged.
v1.35.62026-05-27
  • getConsoleConfig now returns the canonical `instance.status` instead of the EC2-only `instance.vmStatus`. The MCP drawer gates on `status !== "running"`, and for Contabo Console instances `vmStatus` is never updated (defaults to "stopped"), so the chat surface refused to connect after the EC2 → Contabo cutover. Per schema.prisma's own comment: "status is canonical, use THIS for all queries and UI; vmStatus is the raw EC2 state — do NOT query on vmStatus".
Fixed`VmService.getConsoleConfig`: return `instance.status` (canonical lifecycle field) instead of `instance.vmStatus` (raw AWS state, undefined for Contabo). Was a latent bug that only surfaced when the shared Console moved off EC2.
v1.35.52026-05-27
  • Fix the mcporter config the openclaw entrypoint writes for shared-Console containers. mcporter (0.7.3) reads `~/.mcporter/mcporter.json` with the `mcpServers` + `baseUrl` schema; the earlier release wrote `~/.config/mcporter/config.json` with `servers` + `url`, which mcporter silently ignored. `mcporter list` now finds the proxy server on a freshly provisioned Console container.
Fixed`openclaw-entrypoint.sh`: write mcporter config to `/home/tropic/.mcporter/mcporter.json` with the schema mcporter 0.7.3 actually consumes (`{"mcpServers":{"api":{"baseUrl":...,"headers":...}},"imports":[]}`).
v1.35.42026-05-27
  • Pin `mcporter` to 0.7.3 in the openclaw container image to match EC2. mcporter 0.9.x changed the config path / schema and made the existing `~/.config/mcporter/config.json` invisible, so a freshly built Contabo image had mcporter installed but `mcporter list` returned "No MCP servers configured". Pinning to 0.7.3 (what the live EC2 Console already uses) is the smallest possible change to keep the Tropic Console agent functional on Contabo while the 0.9.x config-add migration is handled separately.
Changed`npm install -g mcporter` → `npm install -g mcporter@0.7.3` in `api/docker/openclaw.Dockerfile`. Inline note flags that the future cleanup is migrating to 0.9.x via `mcporter config add` in entrypoint + agent template prompt.
v1.35.32026-05-27
  • Bake `mcporter` into the openclaw container image so Contabo containers can run MCP tool calls without a hand-deploy. On EC2 this was installed manually; the new image installs it globally via npm alongside openclaw / clawhub / clawscan.
Added`npm install -g mcporter` in `api/docker/openclaw.Dockerfile`. Closes a gap discovered during Phase 1 of the Tropic Console EC2 → Contabo migration: a fresh Contabo image had the mcp-auth-proxy listening on `127.0.0.1:3847` but no `mcporter` binary to drive it from agent shells.
v1.35.22026-05-27
  • Phase 0 of the Tropic Console EC2 → Contabo migration. The mcp-auth-proxy is now baked into the openclaw container image and starts conditionally when the new TROPIC_CONSOLE_API_KEY env var is set. Regular Community containers are unaffected — the proxy stays dormant, no ports open, no mcporter config written. The shared-Console activation is keyed on a new ManagedInstance.sharedRole column so only the one Console instance gets the env var on bring-up.
  • restart-with-latest now preserves TROPIC_CONSOLE_API_KEY across image bumps. Unlike browser proxy URLs, which intentionally skip backfill, the Console API key is tied to a stable per-instance property (sharedRole) and must survive every restart.
Added`api/src/vm/mcp-auth-proxy/` source is now COPYed into the openclaw container image at `/opt/tropic/mcp-auth-proxy`, root-owned. The proxy itself reads `TROPIC_CONSOLE_API_KEY` at startup to pre-populate `activeApiKey` so it is useful immediately without anyone calling `/set-key` first (matches the single-tenant containment the EC2 Console has today).
Added`openclaw-entrypoint.sh` gated block: when `TROPIC_CONSOLE_API_KEY` is present, writes `/home/tropic/.config/mcporter/config.json` pointing at `http://127.0.0.1:3847/mcp` (server name `api`) and background-starts the proxy under a small restart loop. When absent (every regular Community container), the block is skipped entirely.
Added`ManagedInstance.sharedRole` nullable TEXT column. Only the one shared Console instance gets `sharedRole='tropic-console'`; provisioning never sets this automatically. `ContaboInstanceService` reads it and injects the Fly secret `TROPIC_CONSOLE_API_KEY` into the docker env only for that one row.
Added`bring-up-container.sh` now accepts arg 18 `TROPIC_CONSOLE_API_KEY`. Same pattern as `TROPIC_BROWSER_URL`: omitted from the docker `-e` list when empty, so nothing leaks into regular containers.
Changedrestart-with-latest call site in `ContaboInstanceService` now passes positional placeholders for args 14-17 and the sharedRole-derived value for arg 18, so the Console keeps its env var across every image bump.
ChangedMigration plan for moving the Tropic Console off EC2 lives in `docs/migrations/tropic-console-ec2-to-contabo.md`. The EC2 VM stays alive and primary throughout Phases 1-4; only Phase 5 (post-soak) terminates it.
v1.35.12026-05-27
  • CMEK onboarding is now zero-friction: Tropic auto-generates a default recovery key per org on first visit to Settings → Backup Encryption (or on first Community provision). Owners + admins can download the private key any time as a recovery copy. Users who want stronger guarantees add their own recipient and optionally retire the Tropic-managed one — multi-recipient already supports both modes side-by-side.
  • Admin endpoint to trigger a Contabo host's backup orchestrator on demand instead of waiting for the 03:00 UTC systemd timer. Returns the script's journal output so you can verify which containers backed up successfully.
Added`OrgBackupRecipient.kind` (`tropic` / `user`) and `encryptedPrivateKey` (TEXT, AES-256-GCM at rest via CredentialsService). Tropic-managed rows hold both halves of the keypair; user-provided rows have NULL `encryptedPrivateKey` because the org pasted only the public key.
Added`OrgBackupRecipientsService.ensureTropicManagedRecipient(orgId)`: idempotent server-side `age` keypair generation + persistence. Called lazily from `GET /orgs/current/backup-recipients` (first visit) and from `ContaboInstanceService.provisionForUser` (first provision attempt).
Added`POST /orgs/current/backup-recipients/:id/download-private-key` (admin/owner). Decrypts the stored private key and returns plaintext + recipient + fingerprint + label for the dashboard to assemble a downloadable text file. Throws for `kind='user'` recipients (we never had their private side).
Added`POST /contabo/instances/hosts/:id/trigger-backup` (superadmin). SSHes to the host and runs `systemctl start --wait tropic-contabo-backup.service`, then dumps the last 200 lines from `journalctl`. Used to verify end-to-end backup without waiting for the daily timer.
AddedSettings → Backup Encryption surfaces a `Tropic-managed` badge on auto-generated rows and a `Download key` button (admin/owner only) that emits a labeled text file containing the `AGE-SECRET-KEY-1...` private key. Wording explains the tradeoff: convenience now, swap to user-managed any time.
ChangedProvision gate no longer rejects empty-recipient orgs. Instead it calls `ensureTropicManagedRecipient` to lazily provision one, so the gate becomes invisible to legitimate users while still preventing backups from running against a misconfigured org.
ChangedRecipient list response now includes `kind` and `hasPrivateKey` flags so the dashboard can render the badge + download button.
v1.35.02026-05-27
  • Customer-managed encryption keys (CMEK) for Community-tier backups. Daily encrypted tarballs of each Community container land in S3, wrapped with `age` to one or more public keys you control. Tropic never sees the private keys. Multi-recipient by design: register your key + a co-owner's key + a cold-storage paper key, and any of the three can restore.
  • Backups are admin/owner only for decrypt + restore + download. Members can see backups exist (timestamps, recipient fingerprints) but cannot decrypt them. Members are explicitly excluded from sensitive operations.
  • Provisioning Community VMs now requires at least one active recipient. Set up your first key at Settings → Backup Encryption before provisioning.
  • Restore is a first-class browser-mediated flow: paste your private key, the dashboard unwraps a single data key locally, and only that 32-byte DEK reaches Tropic to drive the host-side untar. Private keys never cross the wire.
  • Offline restore is the secondary safety net: any admin/owner can download the encrypted blob + the wrapped data keys, then decrypt with the `age` CLI + `openssl` without any Tropic infrastructure.
AddedNew `OrgBackupRecipient` table (`age` recipient + label + fingerprint + active flag + optional `OrgMembership` link). Soft cap of 10 active recipients per org.
AddedNew `OpenclawBackup.wrappedDeks` JSONB column. Array of `{fingerprint, wrappedDekBase64}` — one entry per active recipient at backup time. Null on legacy EC2 rows (which keep using the Tropic-managed key).
AddedNew `ContaboHost.hostToken` column (encrypted at rest). Per-host bearer secret the backup orchestrator uses to authenticate against `GET /contabo/instances/hosts/backup-context`.
Added`api/src/organizations/age-recipient.helper.ts`: server-side `age1...` syntactic validation + deterministic 16-char SHA-256 fingerprint.
Added`api/src/organizations/org-backup-recipients.service.ts` + controller at `/orgs/current/backup-recipients`. List (any member), add + retire (admin or owner). Refuses to retire the only active recipient.
Added`OpenclawBackupService.mintDownloadUrl` + endpoint `GET /instances/:id/openclaw-backups/:backupId/download-url` (admin/owner only). 60s presigned S3 URL.
Added`OpenclawBackupService.triggerRestore` + endpoint `POST /instances/:id/openclaw-backups/:backupId/restore` (admin/owner only). SSHes to the host with the unwrapped DEK as an arg and runs `tropic-restore-one-container`. DEK never persisted.
AddedEndpoint `GET /instances/:id/openclaw-backups/:backupId/wrapped-deks` (admin/owner only). Returns the per-recipient wrapped DEKs for one backup so the dashboard can unwrap one locally with the user's private key.
AddedEndpoint `GET /contabo/instances/hosts/backup-context` (host-token auth). Returns the running containers on this host + their org's active recipients + each instance's runner token.
Added`api/src/contabo/scripts/tropic-contabo-backup` (systemd timer at 03:00 UTC) and `tropic-backup-one-container` (per-container tar + AES-256-GCM encrypt + `age`-wrap-multi-recipient + S3 upload + register).
Added`api/src/contabo/scripts/tropic-restore-one-container`. Downloads encrypted blob, verifies SHA-256, decrypts with the API-supplied DEK, stops the container, replaces the volume contents, restarts. DEK lives only in args + tmpfs file, shredded on exit.
Added`host-bring-up.sh` installs `age`, `jq`, `awscli`, writes `/etc/tropic/host-meta` + `/etc/tropic/host-token`, installs both backup scripts + the restore script under `/usr/local/sbin/`, and enables the `tropic-contabo-backup.timer` systemd unit.
Added`ui/app/(dashboard)/settings/backup-encryption/page.tsx`: setup wizard for first recipient (generate keypair in browser via `age-encryption` JS lib, save-confirmation step, verify-by-paste step), ongoing management list, test-restore button, retire flow.
Added`ui/components/community-backups-section.tsx`: per-Community-VM backup list on the Agents page with Restore (paste private key, unwrap one DEK locally, POST it to the restore endpoint) and Download for offline restore (downloads blob + wrappedDeks JSON).
Changed`GET /instances/:id/openclaw-backups` response now strips `wrappedDeks` and adds `recipientCount` to the list shape. Members only need the count; admins fetch the full array via the dedicated single-backup endpoint.
Changed`ContaboInstanceService.provisionForUser` refuses to provision when the org has zero active `OrgBackupRecipient`s. Existing instances are unaffected; only new provisions are gated.
Changed`ContaboHostService.bootstrapHost` now also generates the host bearer token (256 bits, encrypted at rest), passes S3 IAM creds + `TROPIC_HOST_ID` + `TROPIC_API_URL` to the script, and writes `/etc/tropic/host-token` on the host. Re-bootstrapping reuses the existing token so the on-host file doesn't need to be redistributed.
Fixed`GET /instances/:id/openclaw-backups` now scopes to org membership of `instance.organizationId` (already shipped as v1.34.1 — restated here for the unified release notes).
FixedBackup scripts never store the DEK on disk except as a 0600 tmpfile in /run (tmpfs) when available. Restore script shreds the DEK file before exit.
FixedPer-container backup script verifies the Docker volume name matches `tropic-<peer-id>-data` AND is mounted to exactly that container AND no other container is mounting it. Refuses to back up otherwise. Cross-tenant volume reads fail loudly instead of silently mis-recording.
v1.34.22026-05-26
  • Schema groundwork for Community-tier customer-managed encryption keys (CMEK) backups. No functional change yet — the producer (host-side backup script), API endpoints, UI, and provision-gate land in v1.35.0. See docs/plans/2026-05-26-contabo-backups-and-cmek.md for the design.
AddedNew `OrgBackupRecipient` table. Multi-recipient by design: an org can register N `age` recipient public keys (e.g. two owners + one cold-storage paper key), and each new Community backup will wrap its data key against all of them. Any one of the matching private keys can later decrypt. Retiring a recipient leaves the row in place so old backups' fingerprints are still resolvable in the UI, but new backups stop being wrapped to it.
Added`OpenclawBackup.wrappedDeks` JSONB column. Holds a per-backup array of `{ fingerprint, wrappedDekBase64 }` entries — one per active recipient at backup time. EC2 rows keep `wrappedDeks = NULL` and continue using the legacy `encryption_key_version` Tropic-managed key.
AddedMigration `20260526200000_org_backup_recipients_cmek`. Creates the new table + the new column. Both are additive and nullable — no impact on existing EC2 or Community rows.
v1.34.12026-05-26
  • Security: `GET /instances/:id/openclaw-backups` is now properly org-scoped. Pre-fix it had no membership check, so any authenticated user could list any instance's backup metadata by guessing the id. Affected Cloud as well as Community.
  • Follow-up fixes to the v1.34.0 multi-host work: bootstrapping a Community host that had a stale `wg-quick@wg0` running from a partial earlier attempt would silently keep the old in-memory keypair (so DB/Fly thought the host had key Y, but the host responded with key X — handshake auth failed). The host-bring-up script now does an explicit `systemctl restart` after writing the conf, so the new private key always takes effect.
Fixed`api/src/vm/openclaw-backup.controller.ts:51`: list endpoint now checks `OrgMembership` against `instance.organizationId` before returning rows, matching the `VmService.getManagedInstance` pattern. Legacy instances (no `organizationId`) fall back to direct owner-check. Returns metadata only — when CMEK ships, the wrappedDeks payload will be on a separate owner-only endpoint.
Fixed`api/src/contabo/scripts/host-bring-up.sh`: replaced `systemctl enable --now wg-quick@wg0` with `systemctl enable wg-quick@wg0` followed by an explicit `systemctl restart wg-quick@wg0`. `--now` is a no-op on an already-running unit; we need a real restart so the freshly-written `/etc/wireguard/wg0.conf` (with the new private key) is what the running interface uses. This was the latent root cause of why host 2's bootstrap appeared to succeed but Fly's WG handshake to it silently failed.
Added`POST /contabo/instances/hosts/:id/bootstrap-wg` (superadmin): runs `bootstrapHost` for a host whose `metadata.wgPublicKey` is null. Refuses by default if the host already has a wgPublicKey set; accepts `{ force: true }` in the body to overwrite (regenerates the WG keypair, which is what you want when the host's live state has diverged from DB).
Added`POST /contabo/instances/hosts/:id/diagnose-wg` (superadmin): SSHes to the host and dumps `wg show wg0`, `systemctl is-active wg-quick@wg0`, `ip route show dev wg0`, `ufw status`, `docker ps`, and `iptables -t nat -L PREROUTING` for diagnostics. Read-only.
Added`ContaboHostService.bootstrapWgOnly(hostId, { force })`: idempotent re-bootstrap path that clears stored `metadata.wgPublicKey` and `wgPort` when forced so `bootstrapHost` takes the fresh-bootstrap code path.
v1.34.02026-05-26
  • Community VMs can now run across multiple Contabo hosts. Each host gets its own /24 slice of 10.99.0.0/16, and Fly's WireGuard tunnel routes each container's traffic to the host that owns it. Before this, every host was registered with the same /16 allowed-ips on Fly, so only the first host received traffic and VMs placed on any later host were unreachable.
  • See docs/plans/2026-05-26-contabo-multi-host-wireguard.md for the design.
Added`ContaboHost.wgSubnet` column (migration `20260526154000_contabo_host_wg_subnet`). Stores the host's slice of 10.99.0.0/16 in CIDR form (e.g. `10.99.0.0/24`). Set at row-create time by `createOnDemandHost` / `reserveExistingHost` via the new `ContaboHostService.pickNextSubnet()` which finds the lowest unused /24 octet.
Added`api/scripts/backfill-contabo-host-subnets.ts`: orders non-terminated hosts by createdAt, assigns 10.99.0.0/24 to host 1, 10.99.1.0/24 to host 2, etc. Re-bootstraps hosts that lack `metadata.wgPublicKey` (so they're actually registered as Fly WG peers); for already-bootstrapped hosts, just rewires the Fly peer with the narrower /24 allowed-ips. Dry-run by default; `--apply` to execute.
Changed`WireguardService.addContaboHostPeer(publicKey, endpoint, wgSubnet)`: signature now requires a per-host `wgSubnet`. WG's allowed-ips is exclusive per peer; sharing 10.99.0.0/16 across hosts is broken-by-design (last-write-wins overwrites earlier hosts' bindings).
Changed`WireguardService.loadExistingPeers`: now reads `wgSubnet` from each ContaboHost row and passes it to `addContaboHostPeer`. Hosts with null `wgSubnet` are skipped with a warning instead of registered with a fallback /16. Also best-effort removes the legacy `10.99.0.0/16` kernel route at boot so the residual route doesn't shadow the per-host /24 routes.
Changed`ContaboInstanceService.allocateWgIp(tx, hostSubnet)`: now host-aware. Parses the host's `10.99.X.0/24` and picks the lowest unused IP in .10–.254 (reserves .0–.9 for host infra). Old global `count + 17` allocator removed.
Changed`ContaboInstanceService.runProvisionSetup`: now rejects placement on a host whose `wgSubnet` is null with `BadRequestException` (means the host predates the multi-host fix and needs the backfill script). Backfill is run as part of this release before the change is exercised, so this should never fire in steady state.
Changed`ContaboHostService.bootstrapHost`: writes the host's `wgSubnet` alongside `metadata.wgPublicKey` in the same DB update, and passes the subnet through to `addContaboHostPeer`. Re-bootstrapping a host reuses its existing `wgSubnet` so the host's existing container IPs stay inside the host's range.
Changed`ContaboHostService.wireFlyPeer`: now requires `wgSubnet` to be set on the host row (throws otherwise). Idempotent re-registration uses the host's `wgSubnet` rather than the legacy /16.
FixedCommunity VM 57 (and any future VM placed on a Contabo host other than the first) was unreachable from Fly because `addContaboHostPeer` used `10.99.0.0/16` for every host. WireGuard's allowed-ips is exclusive per peer, so adding host 2 with the same /16 silently overwrote host 1's binding (last-write-wins). With per-host /24s, each host's peer entry has a distinct allowed-ips range and routing works correctly across the pool.
v1.33.342026-05-26
  • Agents page no longer shows a misleading "Starting machine ~120s" progress bar for VMs whose gateway has already been running. The bar held off until the per-instance gateway-status fetch landed.
  • Cloud cards now render the row immediately with the name, type, region, and status badge while the live AWS DescribeInstances roundtrip is in flight, instead of returning a full-card gray skeleton. Only the right-side actions area shows a spinner.
  • Community VM gateway startup times are now recorded into the gateway_ready audit log, so the progress bar median estimate adapts to real Community-tier durations instead of falling back to the 120s default. Matches behaviour already in place for Cloud instances.
Added`api/src/vm/gateway-startup-tracker.service.ts`: lifts the in-memory "this process triggered a gateway start" marker out of `VmService` and into a shared `GatewayStartupTrackerService`. Both `VmService` (Cloud path) and `ContaboInstanceService` (Community path) inject it so they can mark + consume the same entries without a circular module import.
Added`api/src/contabo/contabo-instance.service.ts`: anchor `gatewayStartedAt` at provision time on `runProvisionSetup`, and at click time on `startInstance` and `restartWithLatest`. Each also calls `gatewayStartupTracker.mark(instanceId, "full")` so the next health check writes a `gateway_ready` audit row feeding `getMedianGatewayStartTime("contabo-vps-s")`.
Changed`api/src/vm/vm.service.ts`: in-class `gatewayStartTriggered` Map removed; all five set/get/delete sites now call through `GatewayStartupTrackerService.mark/peek/consume/clear`.
Fixed`ui/app/(dashboard)/agents/page.tsx`: gate `GatewayProgressBar` on `gatewayStatus` being loaded. Previously the `gatewayStatus?.gatewayStatus !== 'running'` check was true for the undefined initial state, so the bar rendered immediately on page load even for VMs whose gateway was actually healthy. Now shows a small "Checking gateway..." spinner during the first roundtrip and only switches to the progress bar after the API responds with a non-running state.
Fixed`ui/app/(dashboard)/agents/page.tsx`: removed the `!vmStatusReady` early-return that replaced the entire Cloud card with a gray pulse skeleton during the initial DescribeInstances roundtrip. Cards now render with DB-known info (name, type, region, status badge from `instance.status`) immediately, with a single small spinner in the actions column until `useVmStatus` lands.
v1.33.332026-05-26
  • Host bootstrap was failing on the gpg --dearmor step when /etc/apt/keyrings/docker.gpg already existed from a prior partial run. gpg would prompt to overwrite, fail to open /dev/tty (ssh2 exec has no tty), and the script halted under `set -e`. Added `--batch --yes` so gpg doesn't prompt.
Fixed`host-bring-up.sh`: docker keyring fetch now uses `gpg --batch --yes --dearmor` instead of bare `gpg --dearmor`. Makes the step actually idempotent on re-runs.
v1.33.322026-05-26
  • host-bring-up.sh placeholder substitution corrupted the assembled script. JS `String.replace(pattern, replacement)` treats `$` in the replacement string as a special sequence (`$&`, `$'`, `$\`, `$1`-`$9`, `$$`). bring-up-container.sh contains the regex `'^[0-9]+\\.[0-9]+\\.[0-9]+\\.[0-9]+$' \\` — the `$'` triggered "portion after match" injection, which truncated the heredoc mid-script and embedded a duplicated copy of host-bring-up.sh's tail. The assembled script then had two unrelated `HELPER_EOF` terminators and unescaped `__TEAR_DOWN_CONTAINER_PLACEHOLDER__` markers; bash bailed with `syntax error near unexpected token '|'`.
Fixed`ContaboHostService.bootstrapHost` placeholder substitution switched from `.replace(placeholder, content)` to `.split(placeholder).join(content)`. split+join is plain string substitution with no `$`-interpretation. Pre-fix the assembled script was 449 lines with 4 `HELPER_EOF`s and 2 "host-bring-up complete"s; post-fix it's 432 lines with 2 `HELPER_EOF`s and 1 "host-bring-up complete". This is what was causing every on-demand-host bootstrap to silently fail with a bash syntax error mid-script — no on-demand host has ever fully bootstrapped, only the reserved host (which avoided this codepath since the operator pre-bootstrapped it manually). With this fix the entire auto-overflow chain works end-to-end.
v1.33.312026-05-26
  • ssh2 (used by every Contabo SSH/SFTP call) rejects PKCS#8 ed25519 PEMs with "Cannot parse privateKey: Unsupported key format". `createOnDemandHost` generates PKCS#8 PEM by default, so on-demand hosts could never be SSHed into. Reserved hosts dodged this because their key comes from the admin onboard form (typically OpenSSH PEM). `runOverSsh` + `uploadFileOverSftp` now convert PKCS#8 → OpenSSH at the connect point.
FixedNew `ContaboHostService.normalisePrivateKey` helper detects `-----BEGIN PRIVATE KEY-----` PEMs and runs them through `sshpk.parsePrivateKey(pem, "pkcs8").toString("ssh")` before handing to ssh2. Same conversion pattern the SSH-key download path already uses. Applied at both ssh2 `.connect` call sites in `contabo-host.service.ts`. No-op for OpenSSH PEMs (reserved hosts).
v1.33.302026-05-26
  • Contabo response IP parsing was broken. The `ContaboInstance.ipv4` field this codebase referenced is not what Contabo actually returns — the IP lives at `ipConfig.v4.ip`. Every `pollUntilReady` call after auto-overflow would have hung at `if (inst.status === "running" && inst.ipv4)` because `ipv4` was always `undefined`. Caught by trying to call `resurrectHost` (v1.33.29) and getting "has no ipv4 yet" despite Contabo showing the VM running with an IP.
Fixed`ContaboApiClient`: added a `normaliseInstance` helper that maps `ipConfig.v4.ip → ipv4` so callers can continue reading the flat field. Applied to both `createInstance` and `getInstance`. `ContaboInstance` interface extended with the optional `ipConfig` shape Contabo actually returns. This unblocks `pollUntilReady` (auto-overflow happy path) and `resurrectHost` (orphan onboarding).
v1.33.292026-05-26
  • New `POST /contabo/instances/hosts/:id/resurrect` admin endpoint finishes onboarding an on-demand host that never reached `ready`. Lets us drain the prepaid Contabo billing cycle on orphans that were cancelled but are still alive until their cancelDate — instead of leaving paid capacity idle.
Added`ContaboHostService.resurrectHost(hostId)`: fetches the live publicIp + status from Contabo (refuses if VM is not running), resets the DB row (status="provisioning", terminatedAt=null, publicIp), then runs `bootstrapHost` + `markHostReady`. Only supports on-demand rows — reserved hosts have their own admin onboard flow. bootstrapHost regenerates the WG keypair on every call, so re-running on an already-bootstrapped host would clobber its WG identity; the method is documented as "for orphans that never finished onboarding" only.
AddedThree new unit tests cover the happy path, the reserved-host guard, and the "Contabo VM no longer running" guard.
v1.33.282026-05-26
  • Reserved Hosts admin panel no longer shows cancelled/terminated rows. They lingered as ghost entries (one with status="terminated", one stuck in "provisioning") even after Tropic considered them gone, because the listing fetched all rows regardless of status. `listAllHosts` now filters them out.
Fixed`ContaboHostService.listAllHosts` adds `where: { status: { not: "terminated" } }`. Terminated rows stay in the DB for audit (a cancelled Contabo VPS still runs ~30 days until cancelDate; the row is marked terminated immediately so it stops being a placement candidate), but the admin panel no longer surfaces them. Unit test asserts the filter is present in the Prisma query.
v1.33.272026-05-26
  • Two latent bugs surfaced by yesterday's P2028 incident. (1) `ContaboDrainCron` was calling `DELETE /v1/compute/instances/{id}` which Contabo doesn't implement (404). Switched to `POST .../cancel` which is the working endpoint. (2) When auto-overflow created an on-demand host but the container bring-up failed downstream, the freshly-created Contabo VM was orphaned with no instance attached. Now the failure path cancels it automatically (on-demand only — reserved hosts are never auto-cancelled).
Fixed`ContaboApiClient.deleteInstance` renamed to `cancelInstance` and changed endpoint from `DELETE /v1/compute/instances/{id}` (returns 404 — not implemented by Contabo) to `POST /v1/compute/instances/{id}/cancel` (the working "stop billing renewal" path). VMs keep running until `cancelDate` (~30 days, prepaid cycle) then auto-delete. Return shape includes the cancelDate so callers can log it.
Added`ContaboHostService.cancelOnDemandHost(hostId)` helper: best-effort Contabo cancel + always-mark-DB-terminated. Used by `ContaboDrainCron` (replaces the broken inline DELETE) and by the new orphan-cleanup path. If the Contabo API fails, the DB row still gets marked terminated so the host stops being a placement candidate; the cron picks it up on the next pass.
Added`bringUpContainerAsync` failure path now checks if the host is on-demand AND has no other active instances attached; if so, calls `cancelOnDemandHost`. Reserved hosts are never auto-cancelled — they're operator-paid. Logged so the admin email + log can correlate.
Changed`ContaboDrainCron` constructor swapped `ContaboApiClient` for `ContaboHostService` (uses the new helper). Tests rewired accordingly. 5 new unit tests cover the helper + the orphan-cleanup branch (including the "do not cancel a reserved host" and "do not cancel an on-demand host that still has other tenants" guards).
v1.33.262026-05-26
  • Auto-overflow no longer blows the Prisma 5s transaction timeout. createOnDemandHost makes two synchronous Contabo HTTP calls (~2-3s each); doing that inside an interactive transaction tripped P2028 (`Transaction already closed`). Moved host placement + on-demand creation OUTSIDE the transaction; only the DB writes (WireguardPeer + ManagedInstance) stay inside.
Fixed`ContaboInstanceService.runProvisionSetup` restructured: user lookup, host candidate enumeration, slot counting, and `createOnDemandHost` (incl. its Contabo HTTP calls) all happen pre-transaction via `this.prisma`. The transaction now wraps only the writes that need atomicity together — keypair gen, `allocateWgIp`, `wireguardPeer.create`, `managedInstance.create`. Race tradeoff documented inline: under concurrent provisions both processes may briefly see "all hosts full" and each create an on-demand host; the idle one drains after 24h via the existing on-demand drain cron. All 23 contabo-instance tests still pass.
v1.33.252026-05-26
  • Auto-overflow was sending Tropic's friendly region labels ("Asia" / "EU" / "US-Central") straight to Contabo, which returned `Entry Region not found by region = Asia`. Contabo wants their own region codes (SIN / EU / US-central). Added a translation layer at the API boundary.
Fixed`createOnDemandHost` now translates Tropic regions to Contabo region codes via `CONTABO_REGION_CODE` (`EU → EU`, `US-Central → US-central`, `Asia → SIN`) before calling `api.createInstance`. SIN verified live against the existing reserved host (instanceId=203286398) which already reports `region: "SIN"`. Throws clearly if the Tropic region has no Contabo mapping (catches "added a new region to the DTO but not to the translation map" bugs).
Added`CONTABO_REGION_CODE` exported map in `contabo-host.service.ts`. Two new unit tests: one asserts Asia → SIN gets passed to `createInstance`; one asserts an unmapped region throws with a clear message.
v1.33.242026-05-26
  • Asia auto-overflow now uses V91 (Cloud VPS 10 NVMe) — the productId Contabo actually stocks for this account in Singapore. V76 (4C) and V77 (6C) both return "not available" there. Verified by listing the existing reserved host on the account (instanceId 203286398): it runs productId V91.
Fixed`DEFAULT_PRODUCT_BY_REGION.Asia = "V91"` (was V77). V91 = 4 vCPU / 8 GB / 200 GB NVMe → slotCapacity 2 under the 2v/4g floor — same effective capacity as the existing reserved host. Override still possible via `CONTABO_PRODUCT_ID_ASIA` env var if the account-level availability shifts. Test for the Asia default updated to assert 4/8/2.
Added`PRODUCT_SPECS.V91 = { vcpu: 4, memoryGb: 8 }`. Comment cites the verified existing instance so the source-of-truth for the spec is traceable, not folklore.
v1.33.232026-05-26
  • Auto-overflow now picks a region-appropriate Contabo product. The hardcoded V76 (Cloud VPS 4C) is "not available" for this account in Singapore, which silently broke every Asia auto-overflow attempt. Per-region defaults plus per-region env-var overrides. Community provisioning also now defaults to Asia (Singapore) instead of EU.
Changed`ContaboHostService.productIdForRegion(region)` is now region-aware. Lookup order: `CONTABO_PRODUCT_ID_<REGION>` env (e.g. `CONTABO_PRODUCT_ID_ASIA`) → `CONTABO_PRODUCT_ID` (global, backwards-compat) → `DEFAULT_PRODUCT_BY_REGION` baked-in map. Defaults: EU/US-Central → V76 (4 vCPU / 8 GB), Asia → V77 (Cloud VPS 6C, 6 vCPU / 16 GB → slotCapacity 3). Throws clearly if the region has no default.
Added`PRODUCT_SPECS.V77` = `{ vcpu: 6, memoryGb: 16 }`. Used by `createOnDemandHost` to seed the host row's vcpu/memory/slotCapacity columns. New unit tests cover the region-aware lookup and the env-var override path.
Changed`launch-modal.tsx` and `app/(dashboard)/community/page.tsx` now default the community region to `Asia` instead of `EU`. The reserved host is in Singapore and most users are closer to it; EU was a leftover default.
v1.33.222026-05-26
  • Fixes two issues from v1.33.21: (1) auto-overflow to a new Contabo host failed with a 400 from Contabo (`each value in sshKeys must be a number`) because `uploadSshKey` returned the raw envelope so `secretId` was undefined — the call site sent `[undefined]`; (2) the launch modal toasted "provisioning started" before the API responded, so the error toast that followed was hidden behind the optimistic success.
Fixed`ContaboApiClient.uploadSshKey` now unwraps `data[0].secretId` and coerces to `Number`, matching the `data: [obj]` envelope that `getInstance` and `createInstance` already unwrap. Throws a clear error if the response has no parseable secretId instead of letting `undefined` propagate into the next API call. Two new unit tests cover the unwrap + the throw path.
Fixed`launch-modal.tsx:handleProvisionCommunity` no longer toasts "Community instance provisioning started" before the mutation resolves. The success toast moved inside the try block (fires after the API returns the pending row) and the error toast got a 10s duration so it actually stays visible long enough to read. The provision flow can block for a few seconds when the region is at capacity (synchronous Contabo on-demand host creation), so the user now sees the real outcome rather than an optimistic message.
v1.33.212026-05-26
  • Community VM placement now respects the host's real slot capacity. Slot count is derived from the host's vCPU + RAM (2 vCPU + 4 GB per slot floor), so a 4/8 host fits 2 containers and an 8/16 host fits 4 — instead of every host defaulting to 2 and being silently over-allocated. When every ready host in the region is full, Tropic now auto-creates a fresh on-demand host and brings up the container on it (with admin email on failure) instead of returning an opaque "no capacity" error.
Fixed`ContaboInstanceService.runProvisionSetup`: the placement query no longer picks a host whose slots are full. It now `findMany`-s ready hosts in the region ordered `ownership: desc, createdAt: asc` (reserved before on-demand — paid capacity first), walks them counting active `pending|running|stopped` instances per host, and picks the first with a free slot. Slot index = active count, port = `22001 + slotIndex`. The previous code used `findFirst` ordered the same way but with no capacity guard, so once the first host hit slotCapacity it kept being picked and the next container landed on a slot index the host bring-up never exposed.
Added`ContaboHost.vcpuCount` + `ContaboHost.memoryGb` columns (migration 20260526120000). `slotCapacity` is now derived at insert time as `min(floor(vcpu/2), floor(ram/4))` (constants `MIN_VCPU_PER_SLOT=2`, `MIN_GB_PER_SLOT=4` exported from `contabo-host.service.ts`). A host with computed slotCapacity < 1 is rejected. Migration backfills existing rows with 4 vCPU / 8 GB (the reserved host in Asia and the default product spec).
Added`PRODUCT_SPECS` map in `contabo-host.service.ts` maps Contabo productId → vCPU/RAM. `createOnDemandHost` reads from it and throws for unknown productIds. Currently `V76 → { vcpu: 4, memoryGb: 8 }`. Adding a new product tier is a one-line addition.
Added`ReserveContaboHostDto` requires `vcpuCount: int @Min(2)` and `memoryGb: int @Min(4)`. The Onboard reserved host form in `reserved-hosts-panel.tsx` collects both as number inputs (default 4 / 8).
Added`runProvisionSetup` falls through to `hostService.createOnDemandHost(region)` when no ready host has a free slot. The new host row is inserted in `status="provisioning"`; the existing `bringUpContainerAsync` fire-and-forget step now preflights any host with `status="provisioning"` by awaiting `pollUntilReady → bootstrapHost → markHostReady` before the container bring-up. Worst-case latency goes from ~2 min to ~15 min, but the dashboard already polls `pending → running`.
Added`ContaboInstanceService` constructor wires up a Postmark client (lazy, only if `POSTMARK_API_KEY` is set). On bring-up failure, after the existing terminate / audit / release-capacity path, an email is sent to `TROPIC_ADMIN_ALERT_EMAIL` (defaults to `michael.hart@denvelop.com`) with subject `[Tropic] Contabo host bring-up failed` and the instance / host / region / error / stack in the body. Send is wrapped in try/catch so a Postmark outage cannot block the rollback.
v1.33.202026-05-25
  • Calendar day-of-week column dividers now line up with the body grid dividers. The header and the scrollable grid used to be two stacked grids, and the body's vertical scrollbar ate ~15px from its inner width — so each `1fr` day column ended up narrower in the body than in the header.
Fixed`CalendarGrid` (ui/components/scheduled/calendar-grid.tsx): collapsed the header grid and the body grid into a single scrolling wrapper with `overflow-y-auto + maxHeight:70vh`, made the header `sticky top-0 z-10`, and dropped the body's own `overflow-y-auto`. Both grids now share the same post-scrollbar width so `1fr` resolves to the same pixel count, and the day dividers align by construction. The initial scrollTop calculation is unchanged because the sticky header doesn't change document positions of the hour rows beneath it.
v1.33.192026-05-25
  • Scheduled events with no channel picked no longer get reported as "Last run failed" when the agent actually ran fine. Two fixes: (1) Tropic now passes `--no-deliver` to openclaw so isolated/current agentTurns without a channel stop trying to auto-deliver to nonexistent ones; (2) the popover uses openclaw's `state.lastDeliveryStatus` to distinguish delivery failure (amber "Reply not delivered") from agent failure (red "Last run failed").
Fixed`ScheduledEventsService.create`: appends `--no-deliver` to the openclaw `cron add` invocation when no channel was picked and the session is isolated or current. Root cause: openclaw cron-add.ts:224-231 defaults `deliveryMode = "announce"` (channel="last") for every isolated agentTurn. With no messaging channels configured on the VM, delivery resolution errors and openclaw flips the whole job to `state.lastRunStatus = "error"`. Tropic only wants delivery when the user explicitly opted in. Verified in tests: --no-deliver appears when channels=[], does NOT appear when a channel is picked.
Added`ScheduledJobCache.lastDeliveryStatus` column (migration 20260524180000). Parsed from openclaw `state.lastDeliveryStatus` — orthogonal to `lastRunStatus` because openclaw collapses agent-failed and delivery-failed into the same "error". Threaded through `parseRawJob`, `assembleEvents`, both runtime upserts, the CronJob type, and the frontend `ScheduledEvent`. nanoclaw branch sets it to null (no equivalent sidecar field).
ChangedEventPopover label now reads "Reply not delivered" (amber) when `lastRunStatus === "error"` AND `lastDeliveryStatus` is anything but "delivered" or "skipped" — these are the cases where the agent almost certainly ran fine and the cron just couldn't deliver. Body text leads with "Agent ran fine but the reply was not delivered to any channel." before the raw error string. Pure agent failures (lastDeliveryStatus="delivered" but status="error") keep the red "Last run failed". Click "Show agent reply" to see the actual reply text via cron runs --id.
ChangedSurfaced by user pointing out that "Last run failed" was misleading when the agent had clearly replied. The reply text was being fetched correctly by v1.33.17's on-demand runs endpoint; the headline label was just lying about what failed.
v1.33.182026-05-25
  • Tropic no longer lets you schedule events in the past. The calendar grays out past slots (with a partial overlay covering the elapsed portion of the current hour) and rejects clicks on them. The modal blocks past dates and, for today, past times — plus a client-side submit check so a stale modal can't slip a past-instant past openclaw, which used to bounce the create with a wall-of-red toast.
Fixed`CalendarGrid` (ui/components/scheduled/calendar-grid.tsx): each hour cell whose END is `<= now` gets `bg-gray-100`. Today's column adds an absolutely-positioned gray overlay covering exactly `(now.hour * HOUR_HEIGHT + now.minute/60 * HOUR_HEIGHT)` so the boundary between past and future is visible to the pixel. Fully-past days get an `absolute inset-0` overlay. `nowInTz(orgTimezone)` is computed every render — cheap, and the past/future boundary slides as the user keeps the calendar open. New `isSlotPast(dayStr, hour, minute, now)` helper compares (date, hour, minute) tuples in the org's timezone to avoid DST/offset drift.
FixedSlot click and hover are now gated on `isSlotPast`: clicking a gray cell returns early, hovering over one suppresses the `+ HH:MM` indicator so the UI doesn't suggest a drop target where none exists.
Fixed`EventModal` adds native `min` attributes — `min=today` on the date input, `min=current time` on the time input when the picked date is today. Skipped when editing an existing event so users can still tweak fields on a past one-shot without being forced to bump its date. New `parseLocalDateTimeInTz` helper interprets the (date, time) pair as a wall-clock instant in the org TZ via an Intl.DateTimeFormat round-trip, then `handleSubmit` rejects one-shot picks where the instant is `<= Date.now()` with a friendly toast. Recurring (cron) modes are unaffected: the date is just a UI default to seed the time-of-day.
ChangedSurfaced by a live error: scheduling a one-shot for 01:43 SGT today after that time had passed went straight to the openclaw gateway, which returned `GatewayClientRequestError: schedule.at is in the past`. Tropic should never have let the click happen.
v1.33.172026-05-25
  • Calendar popover now shows the agent's actual reply text from past scheduled runs. The cron session itself lives at a `cron:<jobId>` discriminator that openclaw's Control UI session picker hides by design (only `main` and `explicit:*` show there), so Tropic now fetches and renders the run summary directly.
Added`GET /orgs/:orgId/scheduled-events/:id/runs?agentId=&limit=` proxies to `openclaw cron runs --id <jobId>` on the agent's VM and returns a flattened `CronRunRecord[]` (status, agent reply summary, error, duration, model, sessionId). On-demand only — popover gates the query on a user click via React Query `enabled`, no extra load on the 5-min background sync. nanoclaw returns empty (no equivalent sidecar endpoint).
Added`EventPopover` shows a "Show agent reply" expander beneath the existing run-outcome callout for past events. Expanding fetches `useScheduledEventRuns(orgId, id, agentId)` and renders the latest run's `summary` text in a scrollable inline block with the model id. Replaces the previous "SSH into the VM" hint that was sitting under error states.
ChangedSurfaced after the user pointed out: their cron session `agent:main:cron:998a4f03-4283-41f0-b703-12c26f9e74eb` exists on VM 54 disk (sessionId dbb1a8d6) but isn't in openclaw's Control UI session dropdown (which only renders `main` + `explicit:*`, per `agents/command/session.ts:57`). We can't modify openclaw, so we render the reply inside Tropic instead.
FixedTwo new tests in `scheduled-events.service.spec.ts`: openclaw `cron runs` parsing (status / summary / error / durationMs / model / sessionId all flatten correctly, ts/runAtMs ISO-converted) and nanoclaw short-circuit (returns `{runs:[], truncated:false}` without shelling). 22/22 in the schedule suite, 457/457 across the API.
v1.33.162026-05-25
  • Calendar event popover now shows the last-run outcome (ok / error, duration, and the cron-layer error or diagnostic summary). For events that ended in error, the popover also tells you which `openclaw cron runs --id <jobId>` command to run on the VM if you want the agent's full reply.
Added`ScheduledJobCache` gets four new optional columns: `last_run_status`, `last_run_duration_ms`, `last_run_error`, `last_diagnostic_summary`. Populated from openclaw cron `state.last*` on every sync (free — already in the existing `cron list --all --json` response). Threaded through `parseRawJob` → upsert → `assembleEvents` → API response → frontend `ScheduledEvent`. Migration 20260524160000_add_scheduled_job_run_state.
ChangedEventPopover renders a colored callout (red for error, green for ok) showing run status + duration + the cron error/diagnostic summary. Verified mapping against live VM 54 outcome for jobId 998a4f03: status=error, duration=103570 ms, error="Channel is required (no configured channels detected)." For error rows the popover hints at the SSH command `openclaw cron runs --id <jobId>` so users can pull the full agent reply.
ChangedScope note: this surfaces the cron-layer outcome only. The agent's actual reply text lives in `openclaw cron runs --id <jobId>` (or the per-run isolated session jsonl), which requires a separate per-job fetch and is intentionally NOT on the sync hot path. Deferred to a follow-up that fetches on-demand when the popover opens.
FixedNew test in `scheduled-events.service.spec.ts` asserts the four state.last* values land on both the create and update halves of the cache upsert. nanoclaw branch sets all four to null (its sidecar doesn't expose equivalent run-state fields).
v1.33.152026-05-24
  • Past scheduled events ACTUALLY stay visible now. v1.33.13's `--keep-after-run` stopped openclaw from deleting fired jobs, but openclaw was also flipping `enabled: false` on the job, and `openclaw cron list` hides disabled jobs by default. Tropic's sync was reading an empty list and pruning the cache row anyway. Sync now uses `cron list --all --json`.
Fixed`ScheduledEventsService.syncAgent` now calls `openclaw cron list --all --json` instead of `cron list --json`. Verified live on Community VM 54: jobId 998a4f03-4283-41f0-b703-12c26f9e74eb fired at 15:45 UTC, ran a real agentTurn for 103,570 ms (gpt-5.5 via openai-codex), but openclaw immediately set `enabled: false` on the now-completed one-shot. `cron list --json` returned `{jobs:[], total:0}` even though the job and its full `state.lastRunStatus` were still on disk in jobs.json. `cron list --all --json` returned the disabled job with all run metadata intact. So Tropic's sync was reading [] and the prune pass was deleting the cache row, making the event disappear from the dashboard the moment it ran — defeating the `--keep-after-run` fix from v1.33.13.
FixedNew test asserts the openclaw `cron list` invocation contains `--all`. Captures the `runShell` mock's call args and grep the command string.
ChangedConfirmed end-to-end: v1.33.14's session=isolated default is working as intended — agent.modelProvider/openai-codex actually ran the user's message ("chck inbox and summarise") for 103 seconds and produced a summary. The agent's reported failure ("no configured message/email channels detected, so I can't read an inbox") is a *separate* issue: it requires email/messaging tools to be configured on the VM, not a Tropic schedule bug.
v1.33.142026-05-24
  • Scheduled events on openclaw now default to session=isolated so the agent actually runs your message as a task. The old default (session=main) was heartbeat-only — openclaw discarded the message text and woke the agent for a 33 ms HEARTBEAT_OK check.
Changed`ui/components/scheduled/event-modal.tsx`: default session for new openclaw events is now `isolated` instead of `main`. Verified live on Community VM 54: a job scheduled at 22:25 SGT with sessionTarget="main" + payload.kind="systemEvent" landed in openclaw's cron run log (`runs/<id>.jsonl`) with `action:"finished" status:"ok" durationMs:33` but wrote nothing into the agent's main session jsonl. Root cause: openclaw routes `sessionTarget="main"` jobs to `runHeartbeatOnce`, which makes the agent read `HEARTBEAT.md` (mostly empty) and respond `HEARTBEAT_OK`. The cron's payload.text ("Check my inbox and summarise") was metadata only, never the agent's task. Confirmed in openclaw `cron/service.main-job-passes-heartbeat-target-last.test.ts` and `cron/service/jobs.ts:267` ("main cron jobs require payload.kind=systemEvent"). `isolated` sends `--message` instead, which becomes an `agentTurn` payload — the message text IS the task. Channel delivery (`--announce --channel X`) still works the same way; picking a channel already force-promoted the job to isolated in `ScheduledEventsService.create`, so this default change does not affect channel-delivery flows.
ChangedModal dropdown reordered (isolated → current → main) with clarified labels. The old "main — append to agent's main chat" label was actively misleading: it implied the typed message would land in main chat history, but openclaw never writes the cron text into the session log. New label: "main — heartbeat poke; ignores your message text".
ChangedEditing an existing event still preserves the original session selection (`setSession(event.session || 'isolated')` only falls back when the field is missing). No migration of existing scheduled jobs.
v1.33.132026-05-24
  • Past scheduled events now stay visible in the calendar instead of vanishing the moment they run. openclaw auto-deletes one-shot `at:` jobs by default; Tropic now passes `--keep-after-run` on every cron add so the job, and therefore the cache row, persists.
Fixed`ScheduledEventsService.create` now appends `--keep-after-run` to every `openclaw cron add` invocation. Root cause: openclaw's `normalize.ts:596-602` auto-sets `deleteAfterRun=true` on any `schedule.kind === "at"` job that doesn't explicitly opt out. Once the job fires, openclaw removes it from its own store, then the next scheduler sync sees an empty list, runs the prune pass, and deletes the ScheduledJobCache row — the past event disappears from the dashboard. Verified live against a real fired job on Community VM 54 (jobId 97c0fad3-1f60-410b-8c77-65e40e050ee3): `runs/<id>.jsonl` recorded `action:"finished" status:"ok"` but the parent jobs.json was already empty because of the auto-delete. Recurring `--cron` jobs are unaffected by openclaw's auto-delete; the flag is a no-op for them.
FixedNew test in `scheduled-events.service.spec.ts` asserts the openclaw `cron add` invocation contains `--keep-after-run`. Mocks `instanceExec.runShell` to capture the command string and walks `runShell.mock.calls` for the `cron add` call.
ChangedKnown follow-up (NOT fixed in this commit): the schedule modal defaults `session='main'`, which Tropic translates to `--system-event` on openclaw. That writes a system event into the agent's main session log; it does NOT spawn an agent turn. End-to-end evidence: jobId 97c0fad3 ran with `durationMs: 33` — that is "wrote the system event line and done", not "agent did the work the user asked for." For "Check my inbox and summarise" at 22:25 SGT to actually trigger an agent, the job needed `session='isolated'` (which the modal supports but does not default to) so Tropic sends `--message` and openclaw spawns an `agentTurn`. Surfaced this to the user; default session change kept out of this commit per scope.
v1.33.122026-05-24
  • Scheduled events on agents with no slug (user-created agents, Books Extractor) no longer vanish from the Schedule dashboard after the next 5-minute sync. The push and the sync filter were normalizing the slug differently, so jobs round-tripped through openclaw as agentId="main" and then got rejected on read, deleting the cache row.
Fixed`ScheduledEventsService` (api/src/scheduled-events/scheduled-events.service.ts): create() pushed `--agent ${agent.slug ?? 'main'}` to openclaw, but openclaw's `sanitizeAgentId` also collapses `''` to the DEFAULT_AGENT_ID `"main"`, so jobs created on a Tropic agent with slug=null OR slug='' got stored on the VM with agentId="main". The sync filter then compared the raw value (`p.agentSlug === agent.slug` → `'main' === null` → false), excluded the job, and the prune pass deleted the existing cache row. Net effect: the event was alive on the VM but disappeared from the dashboard within 5 minutes, and the "All Agents" view was missing every event from any null-slug agent. Extracted a single `effectiveAgentSlug(agent)` helper (`agent.slug?.trim() || 'main'`) and used it in BOTH the `cron add --agent ...` push AND the sync filter so both ends normalize identically. A custom-slug Tropic agent on a shared VM still does not pick up jobs attributed to main (the v1.30.9 dedup invariant holds).
FixedNew tests in `scheduled-events.service.spec.ts` lock in the round-trip: slug=null, slug="", slug="main" all cache an openclaw job stored with agentId="main"; slug="custom" does NOT (no regression on the v1.30.9 fix); slug="custom" does cache a job stored with agentId="custom".
ChangedDiagnosis trail: queried Production DB and confirmed Michael had ZERO scheduled_job_cache rows across 30 agents despite "Last synced 2m ago", and that Books Extractor (slug="") was one of the affected agents. Read openclaw's `sanitizeAgentId` (src/routing/session-key.ts) to confirm DEFAULT_AGENT_ID is "main" and empty string normalizes to it, closing the loop on why jobs created with `--agent main` come back with agentId="main" on `cron list --json`.
ChangedScope note: bug #2 from the user report ("event on VM 54 did not execute at the scheduled time") is NOT fixed here. The agent picked was VM 54 Main Workspace (slug="main"), which is not affected by the null-slug bug — the round-trip works for that case. Investigating why the run did not fire on VM 54 would have required SSH-into-container diagnostics that we deferred per the user's scope choice.
v1.33.112026-05-23
  • NanoClaw Community VMs now say "Chat starting" instead of "Gateway starting" while booting. There is no openclaw gateway in a NanoClaw container, just the chat UI server -- the old label read as a bug.
Changed`ui/app/(dashboard)/agents/page.tsx:GatewayProgressBar`: accepts an optional `runtime` prop and switches the running-phase label to `Chat starting` when `runtime === "nanoclaw"`. The `Starting machine` label for the pre-boot phase is unchanged (covers both). Both call sites (desktop and mobile action area) now pass `runtime={instance.runtime}`. Paired with v1.33.10 the steady state for a healthy nanoclaw VM has no bar at all; the bar only shows briefly during boot/restart, with the right copy.
v1.33.102026-05-23
  • NanoClaw Community VMs no longer get stuck on "Gateway starting ~120s remaining" forever. The gateway health probe was checking port 18789, but NanoClaw listens on 3737.
Fixed`VmService.getGatewayStatus` hardcoded `checkGatewayViaHttp(wireguardIp, ..., 18789)` and `/health` for every contabo instance, ignoring `instance.runtime`. NanoClaw's gateway is on `:3737/healthz` (mirrors `bring-up-container.sh`'s HEALTH_PORT/HEALTH_PATH branch). Result: every nanoclaw Community VM read as "Gateway not responding" forever, lit up the "Gateway starting" progress bar in the agents-page card permanently, and tripped the auto-restart codepath every poll (which then silently failed because contabo has no EC2 `instanceId` — see follow-up note). Branched the probe on `instance.runtime === 'nanoclaw'` and pass port + path through. `checkGatewayViaHttp` now accepts an optional `path` param (defaults to `/health` to preserve every existing caller). Verified end-to-end: from the Fly box, `curl http://10.99.0.20:3737/healthz` returns `{"ok":true}` — same path the probe now hits.
ChangedDiagnosis trail: SSHed to the host as the contabo admin key, found PID 1 was `node /opt/nanoclaw-chat-ui/dist/server.js` (not the openclaw gateway), confirmed `:3737/healthz` healthy. DB showed `runtime: "nanoclaw"` on the affected instance and `runtime: "standard"` on the working one (same host) — surfaced the runtime gap immediately.
ChangedKnown follow-up (NOT fixed here): `checkGatewayViaHttp`'s auto-restart path calls `controlGateway(userId, instanceId, "restart")`, which throws `BadRequestException("No VM instance found")` for contabo because it expects an EC2 `instanceId`. For genuinely-down contabo gateways this means we log `Auto-restart gateway failed: No VM instance found` every probe cycle and never actually recover. Same path also writes a misleading `reason: "dashboard 'restart' button"` audit row even when the system (not the user) triggered the restart. Untouched in this commit to keep scope tight — to be revisited after this fix lands.
v1.33.92026-05-23
  • Community VM SSH host keys now persist across `Restart with latest` and across container recreations, so `ssh` no longer trips REMOTE HOST IDENTIFICATION HAS CHANGED after every image bump.
Changed`api/docker/sshd_config`: added explicit `HostKey /home/tropic/.tropic-host-keys/ssh_host_{rsa,ecdsa,ed25519}_key` directives. With no HostKey lines sshd falls back to `/etc/ssh/ssh_host_*`, which is regenerated by the openssh-server postinst on every image build, so a new image = a new keypair = ssh client refuses to connect to the same `<host>:<port>` endpoint. The new path lives on the persistent /home/tropic volume, so the keypair survives stop/start, restart-with-latest, and image bumps. Volume is destroyed on terminate (intentional: a new container in the same slot should not impersonate the old one to clients).
Changed`api/docker/openclaw-entrypoint.sh`: before `sshd` start, `mkdir -p /home/tropic/.tropic-host-keys && chown root:root && chmod 700`, then `for type in rsa ecdsa ed25519; ssh-keygen -q -t "$type" -f .../ssh_host_${type}_key -N ""` if missing. mode 700 on the dir hides the keys from the tropic shell user even though they live inside that user's home volume. Verified end-to-end in a fresh ubuntu:24.04 container: keys land root-owned mode 600, and `sshd -t` accepts the new config.
FixedNet effect for end users: after this image lands and the user hits `Restart with latest` once (which still rotates the keypair one final time, exactly as before this change), `ssh` re-connects cleanly on every subsequent restart. Previously every image bump silently invalidated their `~/.ssh/known_hosts` entry.
v1.33.82026-05-23
  • Downloaded Community VM `.pem` SSH key now actually loads in OpenSSH (`ssh -i tropic-community-XXXX.pem ...` was failing with `Load key "...": invalid format`).
Fixed`ContaboInstanceService.getOneTimeSshKey` was decrypting the stored ed25519 private key and returning it verbatim in PKCS#8 PEM (`-----BEGIN PRIVATE KEY-----`). OpenSSH does not accept PKCS#8 for ed25519, so `ssh -i tropic-community-XXXX.pem` rejected the file with `invalid format` before even attempting auth. `terminal.service.ts:resolveHostAndUser` already re-serialised to OpenSSH PEM via `sshpk.parsePrivateKey(pem, 'pkcs8').toString('ssh')` for the in-browser terminal; the download path was just missed. Applied the same conversion. Works for already-provisioned instances too because the conversion happens at read time, not at provisioning. Re-download the key from the instance card to get the fixed format.
v1.33.72026-05-23
  • Community VM card no longer shows the WhatsApp / Telegram / Slack / terminal buttons while the gateway is still starting. The "Gateway starting ~120s remaining" progress bar now stands on its own instead of sharing the row with non-functional action buttons.
Fixed`ui/app/(dashboard)/agents/page.tsx`: `gatewayReady` was `!isEc2 || isNemoClaw || gatewayStatus?.gatewayStatus === 'running'`. `isContabo` short-circuited the `!isEc2` clause to true, so the channel buttons rendered the instant the contabo container row hit `status==='running'`, even while the in-container gateway was still booting. Widened to `!(isEc2 || isContabo) || isNemoClaw || gatewayStatus?.gatewayStatus === 'running'` and switched both the desktop (~L922) and mobile (~L1108) action-area render guards to use the shared `gatewayReady` const instead of re-inlining the old wrong check. Buttons now wait for the gateway probe just like EC2 does.
v1.33.62026-05-23
  • Community VM "SSH Endpoint" on the agents page now shows the real host IP instead of the literal string "pending", so the copy-pasted ssh command actually connects.
Fixed`AgentsService.getPageData` (the query backing `/agents/page-data`, which feeds the dashboard agents list) did `prisma.managedInstance.findMany({ where, orderBy })` with no relation includes. The UI reads `instance.contaboHost?.publicIp` to render the SSH endpoint, so it always fell through to the `'pending'` fallback and copies looked like `ssh -i ... tropic@pending -p 22002`. `InstancesService.findAllForUser` had the right include all along; only `getPageData` was missing it. Added `include: { contaboHost: { select: { publicIp: true } } }`. Host's publicIp is non-nullable in the schema, so no DNS workaround needed: the container is reachable at `<host-ip>:<containerSshHostPort>` via DNAT on the host VM.
v1.33.52026-05-23
  • Workspace Files panel now works on contabo VMs (was 400-ing with "Instance is not running" regardless of actual state), and restart-with-latest on contabo now shows a progress bar like EC2 does so users can tell when the container is mid-recycle vs done.
Fixed`InstancesService.getInstanceForFiles` had a two-branch isRunning check (EC2: vmStatus===running, else: status===online). Contabo fell into the else branch, but contabo's running state is status===running (not online — that's local-instance convention). Every /files / /files/download / /files/zip / /files/upload / DELETE /files request on contabo returned BadRequestException("Instance is not running"), which the UI surfaced as "File browser not available on this instance". Added the contabo branch. File browser now actually lists workspace contents on community VMs.
Added`ui/app/(dashboard)/agents/page.tsx`: `GatewayProgressBar` now renders for contabo too, not just EC2. `wantGatewayStatus` extended to `(isEc2 || isContabo)` so the gateway-status poll fires on community VMs; `isBooting` extended to recognize contabo's `pending` state for fresh provisions. During restart-with-latest the contabo status stays `running` the whole time, but `gatewayStatus !== running` now lights up the progress bar so the user can see "Gateway starting…" instead of guessing whether the restart actually fired. Backend `VmService.getGatewayStatus` was already contabo-aware (hits wireguardIp:18789); only the UI gating needed widening.
v1.33.42026-05-23
  • Hotfix for v1.33.3: nginx in the contabo image was being launched via `runuser -u tropic -- nginx`, which resets the inherited capability set. The container had `CAP_NET_BIND_SERVICE` (granted by bring-up-container.sh), but `runuser` stripped it on the user-switch, so nginx couldn't bind port 80, exited non-zero, and the entrypoint's `set -e` killed the whole container — crash loop. Community VM 54 was unreachable for ~30 minutes after Restart-with-latest until this hotfix shipped.
FixedEntrypoint now starts nginx as root (master process) without runuser. nginx.conf adds `user tropic;` so worker processes still drop privileges before handling requests. Standard nginx pattern, matches Ubuntu's default config. Verified locally: master is root + bound to :80, worker is tropic, `/files` proxies correctly.
FixedBoth the file-server background invocation and the nginx invocation in the entrypoint are now soft-fail (`|| true` for chown, `|| echo "..."` for nginx). A transient permission glitch or missing config no longer crashes the container — gateway still comes up on port 18789 directly, only /files routing is lost. Defense against any future addition silently tipping the entrypoint into a restart loop.
v1.33.32026-05-23
  • Made the Tropic Browser skill actually usable end-to-end on community VMs, fixed three production blockers along the way, and ported the workspace file browser to contabo so the dashboard's Workspace Files panel stops hanging. All discovered live by sending the real agent on Community VM 54 a "screenshot google.com" prompt and watching it succeed on the third iteration. Final agent reply: "Done — saved screenshot to /tmp/agent-shot-final.png. ls -la reports size: 28,853 bytes." Two tool calls, zero failures.
FixedSkill description was too technical for the agent's pattern matcher. Rewrote with trigger phrases the user actually types ("visit a site", "open URL", "browse to a page", "screenshot a webpage", "save as PDF", "log into a website", "fill out a form"). Verified in the Sondera audit log that the agent now matches and engages on the first turn rather than going straight to the broken built-in browser.
FixedCloudflare WAF was 403-ing default Python urllib UA with error 1010. Skill MD now mandates `-A "tropic-agent/1.0"` on every curl example so the agent's shell-out passes Cloudflare's bot detection. Bot-Fight-Mode bypass via Custom Rules requires Pro plan; the `-A` flag is the load-bearing workaround.
FixedSondera policy28 blocks bash strict-mode wrappers (`set -euo pipefail`). Skill MD now has an explicit "Strict rules" section telling the agent: no wrappers, no if-guards, no python urllib fallback (it'll hit the CF issue). Use the single-line curl as shown.
AddedSkill install runs `openclaw plugins disable browser` via `post_install_commands` in the seed frontmatter. The built-in browser plugin expects `/usr/bin/google-chrome` (not installed on Tropic VMs) so every browser request was wasting a turn trying the built-in first. Now Tropic Browser is the only browser tool the agent sees after install — first attempt goes straight to the right place. Uses the openclaw CLI per the never-jq-openclaw.json-directly rule.
FixedReconnect OpenAI button above the Model selector on the agent card was gated on `(!isExternal && isEc2) || isAndroid` — contabo VMs never showed it, so when OAuth tokens went stale (container restart, expiry) users had no in-dashboard path back. Widened to `!isExternal`; renamed to "Reconnect OpenAI" since it's almost always used to refresh existing auth.
AddedWorkspace file-server now bundled in the contabo image. EC2 has shipped `tropic-file-server.service` (a Node server on :3848 fronted by nginx on :80) since the AMI was built; contabo had no equivalent so the dashboard's Workspace Files panel timed out then fell through to "File browser not available on this instance". Added nginx + the same Node server to `api/docker/openclaw.Dockerfile`, routed `/files` → `:3848` + `/` → `gateway:18789` matching EC2's nginx config. ~25 MB image size increase. Existing community VMs need a Restart-with-latest from the dashboard to pick up the new image.
v1.33.22026-05-23
  • Tropic-API browser proxy now appends Browserless's TOKEN to the upstream URL, so end-to-end browser calls actually return content instead of 502. Confirmed live by curling `http://10.100.3.1:3000/screenshot?token=<TOKEN>` from Fly and getting back a real 800x600 google.com PNG (28.8 KB). Without the token, Browserless rejects the request at the connection level and our nginx proxy surfaces that as a 502 to the caller — that's what every `/api/browser/*` request was hitting before this change. Token lives as the `BROWSERLESS_TOKEN` Fly secret on tropic-api, never reaches the agent VM and never appears in any logged URL on the tenant side; it's only added at the API-to-cluster hop.
Added`BrowserService.proxyPost` now reads `BROWSERLESS_TOKEN` from `ConfigService` and appends `?token=<urlencoded>` to the upstream URL. Logs loud error (not 401) when the env var is missing, since the request will still go out and surface as a 502 from browserless rather than a Tropic-side auth failure.
Changed`BrowserService` constructor takes `ConfigService` as a third arg. `BrowserModule` already imported `ConfigModule` transitively; no module change needed.
Added`browser.service.spec` covers the token-appended URL (`?token=test-token`) and the BROWSERLESS_TOKEN-unset path (no `?token=` appended). 447 tests pass.
FixedDocumented the deployment requirement: `flyctl secrets set BROWSERLESS_TOKEN=<value> --app tropic-api` must be set with the same value as the Browserless container's TOKEN env. Mismatch = 502.
v1.33.12026-05-22
  • v1.33.0 set the /browser/* proxy to authenticate via `OPENCLAW_API_TOKEN`, which is only injected on EC2 — Contabo containers receive `GATEWAY_TOKEN`, `TROPIC_TELEMETRY_TOKEN`, `TROPIC_API_URL`, etc. via `docker -e`, but never the per-user vmApiToken. Result: contabo agents (the larger fleet) would have been locked out of the browser skill. Switched to the per-instance `TROPIC_TELEMETRY_TOKEN`, which is universally present on every Tropic-provisioned VM. Auth now looks up `ManagedInstance.telemetryToken` (same pattern as `TelemetryService.bulkIngest`), so the API also gets the instance id alongside the user id for free — useful for the per-instance audit + rate-limit we want to add next.
Changed`BrowserService.authenticateAgent` now looks up `prisma.managedInstance.findFirst({ where: { telemetryToken: token } })` and returns `{ instanceId, userId }`. Dropped the `MessagingService` dep, added `PrismaService`. `BrowserModule` imports `PrismaModule` instead of `MessagingModule`.
Changed`tropic-browser` marketplace seed: all curl examples and the failure-modes section now reference `$TROPIC_TELEMETRY_TOKEN` instead of `$OPENCLAW_API_TOKEN`. Lifetime stays per-instance (the token rotates if the instance is recreated, not on user actions).
FixedBrowser specs updated to match the new auth path. 446 tests pass.
v1.33.02026-05-22
  • Tropic Browser skill end-to-end actually works from inside tenant VMs. Verified via live install on Community VM 54 (10.99.0.21) that the skill markdown lands at /home/tropic/.openclaw/skills/tropic-browser/SKILL.md and the gateway restart fires. But the curl from inside the container to http://10.100.3.1:3000/* timed out: tenant contabo containers only route 10.99.0.0/16 through their WireGuard tunnel, so the admin subnet (10.100.3.0/24, where browserless-proxy and ollama-proxy live) is unreachable from agents. Rather than widen tenant→admin WG routing across the whole fleet, the agent now goes through the Tropic API: `$TROPIC_API_URL/browser/{content,screenshot,pdf,scrape}` with `Authorization: Bearer $OPENCLAW_API_TOKEN`. API authenticates the bearer against UserCredential.vmApiTokenEncrypted (same scheme as the messaging callback), forwards over WG to browserless-proxy, and pipes the response back. WG topology unchanged; per-user auth, audit, and rate-limit live at the API edge for free.
Added`api/src/browser/{browser.module,browser.controller,browser.service}.ts` — four `@Public()` POST endpoints `/browser/{content,screenshot,pdf,scrape}` that authenticate via `MessagingService.verifyVmToken` (reusing the existing vmApiToken scheme — no new credential to provision), resolve browserless-proxy's WG IP via `WireguardService.getServiceUrl`, and forward the JSON body. Response status, content-type, and binary body all flow back unchanged so screenshot/pdf binary streams work alongside the JSON content/scrape endpoints.
Added`api/src/browser/browser.service.spec.ts` — 7 unit tests covering bearer parsing (missing/malformed/invalid → 401, valid → user id), upstream URL construction, body/headers/status round-trip, and translation of fetch-level failures to HTTP 502 (not 500). 446 tests pass.
Changed`tropic-browser` marketplace seed updated: all curl examples now hit `$TROPIC_API_URL/browser/*` with `Authorization: Bearer $OPENCLAW_API_TOKEN`. Both env vars are always injected on Tropic-provisioned VMs, so the skill no longer depends on `TROPIC_BROWSER_URL` reaching the agent. Live-viewer hand-off URL builder unchanged.
FixedVerified install pathway on Community VM 54 in production: marketplaceSkill row `4d241c08...` seeded on v1656 deploy, InstalledSkill row `393b21ab...` created at 13:05:22 UTC, SKILL.md (5274 bytes) lands at `/home/tropic/.openclaw/skills/tropic-browser/SKILL.md`, gateway restart audit fires. The earlier-noted blocker (tenant→admin subnet unreachable) is resolved by this proxy.
v1.32.42026-05-22
  • Tropic API Fly auto-deploy had been silently broken since the nanoclaw chat-ui sidecar was added under `api/docker/nanoclaw-chat-ui/`. `api/tsconfig.build.json` did not exclude `docker/`, so `nest build` walked into the sidecar TS files and tried to typecheck `import Database from 'better-sqlite3'` and `import { CronExpressionParser } from 'cron-parser'`. Those modules live only in the sidecar's own `package.json` (private nested node_modules), so the parent build saw five `TS2307: Cannot find module` errors and exited 1, which made Fly's Docker build fail and silently keep the running machine on a stale image (last deploy: v1655 on 2026-05-21 08:07 UTC). Every push in this window (v1.31.1 through v1.32.3) shipped UI changes via Vercel but never made it to the API. Added `"docker"` to the `tsconfig.build.json` exclude list. Verified by reproducing the original error locally, applying the fix, and re-running both `nest build` and a full `docker build` against `api/Dockerfile`.
Fixed`api/tsconfig.build.json` exclude list now includes `docker`. nest build sourceRoot was already `src` per nest-cli.json, but tsc still traversed into `docker/nanoclaw-chat-ui/*.ts` because tsconfig.build did not exclude it. Five TS2307 errors gone; Docker image build now reaches `exporting layers` instead of failing at `npm run build`.
v1.32.32026-05-22
  • Followup to v1.32.2: the type / region / OAuth badges each lived in their own fixed-width grid column, which left visible dead space between them and floated the EC2 region `<select>` above the static badges (browser-default select chrome adds ~4-6px of vertical padding). Collapsed all three into a single flex cluster with gap-2 so they sit tight side-by-side, and stripped the select chrome with `appearance-none` so it visually matches the surrounding spans.
Changed`ui/app/(dashboard)/agents/page.tsx` desktop header grid template `12px / 1fr / 110px / 120px / 130px / 290px` → `12px / 1fr / auto / 290px`. Cluster column auto-sizes to its tight contents. Actions still anchored at the right edge of the card.
FixedEC2 region `<select>` now uses `appearance-none` and matches the static badge classes (bg-gray-100, text-xs, px-2 py-0.5, no border). Native dropdown chrome added enough vertical padding to visibly misalign the select from the surrounding `Cloud` / `Asia Pacific` spans on the same row.
v1.32.22026-05-22
  • Followup to the v1.31.1 column alignment: the 100px status text column was eating space that should belong to the instance name, and names were truncating to two or three characters. Removed the status text column entirely. The colored dot on the left already encodes status (yellow=transitioning, green=running/online, red=failed, gray=stopped/offline), and now has a `title` attribute so hovering surfaces the actual label. Net effect on a typical 1024px viewport: the name column gets ~110px back, which is the difference between a fully visible name and "Co...".
Changed`ui/app/(dashboard)/agents/page.tsx` desktop header grid template `12px / 1fr / 100px / 110px / 120px / 130px / 290px` → `12px / 1fr / 110px / 120px / 130px / 290px`. One fewer column.
RemovedVisible status text per card (`Running` / `Pending` / `Offline` etc.) on the desktop header. Mobile (`lg:hidden`) layout still shows the status label since its rows stack vertically and don't have the horizontal pressure.
AddedStatus dot now carries `title={cfg.label}` so hover shows the textual status. The EC2 cold-boot "Starting up..." note is folded into the same title (` — starting up...`) instead of needing its own visible element.
v1.32.12026-05-22
  • NanoClaw skill install was a stub: the marketplace drawer would dispatch the install through `NanoClawAdapter.installSkill`, which created an empty directory `${NC_ROOT}/groups/default/skills/<slug>/` and restarted the host. The skill's markdown body was never written. Drawer reported success; the agent had nothing to read. Fixed: the adapter now takes the full skill payload, writes `SKILL.md` from the skillMd content via a quoted heredoc (matching the openclaw-contabo SSH pattern), and rejects installs that have only a `clawhubSlug` because NanoClaw has no registry to pull from on its own runtime.
Breaking`RuntimeAdapter.installSkill` / `deploySkill` signature changed from `(instance, slug)` to `(instance, { slug, skillMd? })`. Internal interface; the only external call site is `MarketplaceService.installSkillOnContaboInstance` and was updated in the same change.
Fixed`NanoClawAdapter.installSkill` now writes `${NC_ROOT}/groups/default/skills/<slug>/SKILL.md` from the inline skillMd, then restarts the host. Throws a clear error when called without skillMd so the drawer surfaces a real failure instead of silent no-op success.
Changed`MarketplaceService.installSkillOnContaboInstance` nanoclaw branch now strips Tropic-internal frontmatter (`stripTropicFrontmatter`) before handing off to the adapter, mirroring the openclaw branch. Adapters stay runtime-specific and never see Tropic metadata.
Added`api/src/marketplace/marketplace.service.spec.ts` covers (a) nanoclaw dispatch passes stripped skillMd to the adapter, (b) nanoclaw install fails fast when only `clawhubSlug` is set, (c) standard-contabo install for tropic-browser writes SKILL.md via SSH heredoc with the curl-friendly body intact. `nanoclaw.adapter.spec.ts` covers the SKILL.md heredoc path and the no-skillMd reject. 439 tests pass.
v1.32.02026-05-22
  • Live human-in-the-loop viewer for the cluster Browserless. Sessions the agent opens against `$TROPIC_BROWSER_URL` can now be handed back to the user mid-flow (login, CAPTCHA, checkout) and the user can drive Chrome directly from their own browser. The path is `user browser → Tropic API (Fly) → WireGuard → browserless-proxy`: a new Express middleware on the Tropic API claims `/browser-live/*`, authenticates via Clerk (JWT in query, bearer header, or `__session` cookie), resolves the proxy's WG IP from the `wireguardPeer` table, and forwards both HTTP and WebSocket traffic with CSP/X-Frame-Options stripped so the dashboard can iframe the viewer. A companion dashboard route at `/agents/browser-live` mints the Clerk JWT client-side and iframes the proxy so the agent only has to surface one stable URL.
Added`api/src/browser-live/browser-live.middleware.ts` reverse-proxies `/browser-live/*` to `http://<browserless-proxy-wg-ip>:3000`. HTTP path uses `http-proxy-middleware`; WS upgrades use `http-proxy` directly (matching the gateway proxy pattern in `proxy.service.ts`). 30s heartbeat ping frames on the upstream socket keep Fly's 60s idle timeout from closing live sessions while the user reads.
Added`BrowserLiveMiddleware.handles(path)` static check + dispatch in `api/src/main.ts` upgrade handler so `/browser-live/*` WebSocket upgrades route to this middleware before falling through to the gateway proxy. Mirrors the `/terminal/ws` early-route pattern.
Added`ui/app/(dashboard)/agents/browser-live/page.tsx` — minimal Clerk-protected page that takes `?session=<id>`, fetches the user's Clerk JWT via `useAuth().getToken()`, and iframes `https://api.tropic.bot/browser-live/devtools/inspector.html?session=<id>&token=<jwt>`. Single stable URL shape the agent can construct from `$TROPIC_API_URL` alone.
Changed`api/src/marketplace/seeds/tropic-browser.md` documents the live-viewer flow: open session via `POST /sessions`, drive to handoff point, surface `<dashboard>/agents/browser-live?session=<id>` to the user. Explicitly warns against pasting the raw `liveUrl` Browserless returns (which points at the pod's internal address).
v1.31.22026-05-22
  • Agents on new VMs can now reach the cluster-hosted Browserless via a new opt-in marketplace skill, `tropic-browser`. A second admin WireGuard service (`browserless-proxy`, parallel to `ollama-proxy`) is registered in the same EKS cluster; at provision time the API resolves its allocated WG IP from the `wireguardPeer` table and writes `TROPIC_BROWSER_URL=http://<wg-ip>:3000` into the VM's systemd environment (`/etc/openclaw/environment` for EC2 / `-e TROPIC_BROWSER_URL` for Contabo). The skill is a markdown recipe that documents the four REST endpoints — `/content`, `/screenshot`, `/pdf`, `/scrape` — and tells the agent when to reach for each via `curl $TROPIC_BROWSER_URL`. Auth is WireGuard-mesh-only, matching the ollama-proxy pattern; no per-request token. Install per-instance from the Skills drawer. Existing VMs are intentionally not backfilled — new provisions only. Human-in-the-loop live viewer ships next.
Added`WireguardService.getServiceUrl(name, port)` — generic lookup against the `wireguardPeer` table, returns `http://<ip>:<port>` or `null` when the named peer has not registered yet. Two unit tests cover the present and absent cases.
Added`api/src/marketplace/seeds/tropic-browser.md` — opt-in skill documenting the four Browserless REST endpoints with `curl` examples using `$TROPIC_BROWSER_URL`. Auto-seeded into the marketplace DB on API boot via `seedDefaultSkills()`. Includes guidance on when NOT to use the skill (plain `curl` suffices, multi-step interactive flow needs the live session variant).
Changed`api/src/vm/vm-provisioning.service.ts` (EC2 cold-boot + post-provision sync paths): resolves `browserless-proxy` URL and writes `TROPIC_BROWSER_URL` into `/etc/openclaw/environment` alongside `TROPIC_API_URL` / `TROPIC_MEMORY_URL`. Skipped when the proxy is not registered (warns; skill no-ops on the VM).
Changed`api/src/contabo/scripts/bring-up-container.sh`: accepts a new optional arg 17 `TROPIC_BROWSER_URL` and conditionally adds `-e "TROPIC_BROWSER_URL=..."` to `docker run` only when non-empty. `ContaboInstanceService.bringUpContainerAsync` resolves the URL at fresh-provision time and passes it; the restart-with-latest call site does not, matching the new-provisions-only backfill policy.
Changed`writeTelemetryEnvVars` and `deployTelemetryPlugin` (sync/update sites that run against existing VMs) intentionally do not write `TROPIC_BROWSER_URL`. Avoids surprise backfill into already-running VMs.
v1.31.12026-05-22
  • The /agents page now lays out instance cards as a table. Previously the card header was a flex row with `gap-4`, so the status text, type badge, region, and OAuth slot sat at whatever x-position the previous element's width happened to end at — meaning two cards stacked vertically had nothing visually lining up. Converted the desktop header to a CSS grid with explicit column widths (`12px / 1fr / 100px / 110px / 120px / 130px / 290px`). Every card shares the same template, so dot, name, status, type, region, OAuth, and the right-side actions all align in vertical columns down the page. The OAuth column reserves its space even when empty, so cards needing the "Connect OpenAI" button don't displace the actions column on cards that don't.
Changed`ui/app/(dashboard)/agents/page.tsx` desktop card header: `lg:flex … justify-between` → `lg:grid` with fixed column template. Name column flexes (`minmax(0,1fr)`, truncates if long); all other columns reserve fixed width so cross-card alignment holds whether or not their conditional content (NemoClaw badge, "Starting up…" subnote, Connect OpenAI button, gateway progress bar, chat buttons) is present. Mobile (`lg:hidden`) layout untouched.
v1.31.02026-05-22
  • The Schedule page now works against NanoClaw workspaces. Previously the page was inert on every NanoClaw instance because the API only knew how to shell out to OpenClaw's native cron. NanoClaw stores tasks in per-session SQLite (`messages_in` with `kind='task'`) instead, and exposes them through a runtime adapter that talks to a Tropic-owned sidecar inside the container — no NanoClaw source modifications. Create, edit, cancel, run-now, plus net-new Pause/Resume actions all round-trip end to end. The create-task modal grows a session picker for NanoClaw agents (defaults to most recently active); OpenClaw modal is unchanged.
AddedSchedule page supports NanoClaw runtime end to end. Create / edit / cancel / run-now / pause / resume.
Added`NanoClawAdapter.{listCron,addCron,editCron,removeCron,runCron,pauseCron,resumeCron,listSessions}` — adapter shells `curl` to the chat-ui sidecar at `localhost:3737`. Translation layer maps NanoClaw `messages_in` rows ↔ Tropic `CronJob` shape. `_tropic.name` is stashed inside `content` JSON (namespaced) so the JSON-merge `updateTask` path preserves the friendly name.
AddedSidecar (`api/docker/nanoclaw-chat-ui`) gains `/tasks` and `/sessions` REST endpoints over `better-sqlite3`. Bearer-token auth (`TROPIC_GATEWAY_TOKEN`), `NC_DATA_DIR` env-driven path discovery, `busy_timeout=5000` to coexist with the host sweep. 53 sidecar tests + 12 service tests cover the runtime branch.
Added`ScheduledEventsService` branches on `instance.runtime` per-agent. OpenClaw shell-out path untouched — the runtime branch is a single check at the top of each public method. Cache writes still go through the same `ScheduledJobCache` table because translation happens at the adapter boundary.
AddedSchema-assertion smoke check at image build time: asserts every `messages_in` column the sidecar reads is present. Build fails loudly on NanoClaw schema drift, not in front of a user.
AddedModal session selector for NanoClaw rows — populated from `GET /scheduled-events/sessions?agentId=...`, defaults to the agent_group's most recently active session. Popover gains Pause / Resume buttons for NanoClaw rows. Empty-state copy generalized to not name a CLI.
v1.30.222026-05-19
  • Sondera's credit-card secret-redaction regexes were too permissive: `\b3[47]\d{2}\d{6}\d{5}\b` for Amex (and the 16-digit cousin for Visa/MC/Discover) only required `\b` on either side. A floating-point number like a `memory_search` similarity score of `0.370896023485467` has 15 digits starting with `37`, and `.` is non-word, so the word-boundary check passed and the score got redacted with `[REDACTED BY SONDERA POLICY]` mid-result. Caught while debugging why tropic-memory results looked corrupted on Community VM 37. Tightened both card regexes with `(?<![.\d])` and `(?!\d)` so they refuse to match inside floats or longer digit sequences.
Fixed`api/src/vm/sondera-plugin/index.js:639-640` credit-card and credit-card-amex regexes now use `(?<![.\d])PATTERN(?!\d)`. Real card numbers preceded by space, comma, equals, or string-quote still match (verified against 7 positive/negative cases). False-positives on floating-point scores, fixed-width IDs starting with Visa/Amex prefixes, and 17+ digit numbers no longer redact.
v1.30.212026-05-17
  • tropic-memory now activates by default on Community (Contabo) containers. The plugin was being installed and its env vars were reaching the container, but openclaw.json never claimed the memory slot for it — so `memory_search`/`memory_get`/`memory_save` silently dispatched to the bundled `memory-core` (local-file SQLite, doesn't survive container churn) and the agent_end ingest hook was dropped by openclaw's ≥2026.4.23 typed-hook gate. Telemetry and Sondera were unaffected because they ride `diagnostics.enabled` events, not the memory slot or `agent_end`. Now baked into the image build, with build-time assertions so this can't regress unnoticed.
Fixed`api/packer/scripts/lib/install-tropic-plugins.sh` jq pipeline now sets `plugins.slots.memory = "tropic-memory"`, `plugins.entries["tropic-memory"].hooks.allowConversationAccess = true`, and drops `memory-core` from `plugins.allow`. These were already set by `vm-provisioning.service.ts` for EC2 when `useTropicMemory=true` at cloud-init time, but the Contabo path had no equivalent — the openclaw-entrypoint.sh on-start jq render only touches gateway keys.
Added`api/packer/scripts/lib/configure-openclaw-json.sh` adds three build-time assertions: `plugins.slots.memory == "tropic-memory"`, `tropic-memory.hooks.allowConversationAccess == true`, and `memory-core` not in `plugins.allow`. Both AMI and Docker builds fail loudly if any regress.
v1.30.202026-05-17
  • Community VMs now get a per-user sequential label (`Community VM N`) at provision time, matching Cloud's `Cloud VM N` convention. The Gateway tab dropdown previously listed Community instances by raw UUID, which made two of a user's own instances visually indistinguishable. That triggered a reasonable but incorrect "the new VM still has old sessions" reaction in practice — the user had two running Community VMs (the older one full of sessions, the newer one empty) and the dropdown gave no way to tell them apart, so opening the older one looked like volume reuse. Existing unnamed rows are backfilled on next API boot. Also tightened tear-down so `docker volume rm` failures fail loud instead of silently leaving the volume on the host.
Added`ContaboInstanceService.runProvisionSetup` now sets `name = "Community VM N"` on every new contabo `ManagedInstance` row, where N is the per-user count of that user's prior contabo provisions (including terminated rows, so numbering is monotonic).
Added`ContaboInstanceService.backfillContaboNames` (runs in `onModuleInit`) assigns names to pre-v1.30.20 contabo rows whose `name` is still the default empty string. Per-user numbered in `createdAt` order so the older instance gets a lower number.
Changed`gateway/page.tsx instanceLabel`: when `name` is empty, fall back to `subdomain` instead of `id`. The subdomain (always per-instance unique) gives users at least a recognizable handle for any rows the backfill missed.
Fixed`api/src/contabo/scripts/tear-down-container.sh`: when `DESTROY_VOLUME=yes`, `docker volume rm` is now retried up to 3× and the script fails (exit 1) if the volume still exists after retries. Previously the rm was `... || true` which silently leaked the volume to the host on any transient failure. Container-to-container data isolation was never affected (each provision uses a UUID-keyed volume name), but a stricter cleanup makes the leak path impossible by construction.
v1.30.192026-05-17
  • Contabo bring-up + restart-with-latest now self-refresh the on-host scripts (`bring-up-container.sh`, `tear-down-container.sh`) before each run. Previously a repo-side change to those scripts only landed on the host when a superadmin remembered to call `POST /admin/contabo/hosts/:id/refresh-scripts`. That gap is exactly how Tropic plugin telemetry stayed dark on Community VMs after v1.30.15 — the renamed `TROPIC_TELEMETRY_TOKEN` docker env was in the repo but still `TELEMETRY_TOKEN` on the host, so every fresh container kept reading an empty token and short-circuiting the dashboard telemetry POST. Now `.sh` changes propagate the same way TS changes do: push to master, deploy, terminate + reprovision, done.
Changed`ContaboInstanceService.bringUpContainerAsync` calls `refreshHostScripts(hostId)` immediately before invoking `/usr/local/sbin/bring-up-container.sh` over SSH. Wrapped in a try/catch with a warn log so a sftp upload glitch doesn't block provisioning — the bring-up itself can still run against whatever scripts the host already has.
Changed`ContaboInstanceService.restartWithLatest` calls `refreshHostScripts(hostId)` between the tear-down and the bring-up so restart-with-latest also picks up script changes, not just fresh provisioning.
v1.30.182026-05-17
  • v1.30.17 fixed the wrong half of the dropdown-inside-Dialog interaction. The data-dropdown-panel + onPointerDownOutside guard stopped Radix from auto-closing the modal, but options still couldn't even be hovered: Radix Dialog with `modal={true}` sets `pointer-events: none` on the body for the duration of the open dialog, and portaled descendants inherit hit-test denial unless they explicitly opt back in. Forced `pointer-events: auto` on the DropdownSelect panel.
Fixed`DropdownSelect` panel className now includes `pointer-events-auto`. With v1.30.17's data-dropdown-panel + onPointerDownOutside guards already in place, the dropdown is now fully usable inside a Radix Dialog: hover highlights, clicks register on options, and the modal doesn't dismiss on selection.
v1.30.172026-05-17
  • Scheduled-event agent dropdown was un-clickable after v1.30.14 — labels rendered correctly but every option click dismissed the modal before `onChange` could run. Root cause: `DropdownSelect` portals its menu panel to `document.body` (so the menu can escape the dialog's `overflow` clipping), and Radix Dialog treats any pointer-down outside `DialogContent` as "click outside → close", including clicks on portaled descendants. The user perceived the dropdown as broken.
Fixed`DropdownSelect` panel now carries `data-dropdown-panel="true"`. `EventModal`'s `DialogContent` registers `onPointerDownOutside` + `onInteractOutside` handlers that call `preventDefault()` when the event target is inside a tagged dropdown panel. True outside clicks (anywhere else on the page) still dismiss the modal as expected.
v1.30.162026-05-16
  • Follow-up to v1.30.14: the scheduled-event agent picker was falling through to "instance de609d66" (short id) instead of the human-readable subdomain because `AgentsService.findAllForUser` and `getPageData` didn't include `subdomain` in the `instance` Prisma `select`. Frontend's `instanceLabel(name → subdomain → instanceId → short id)` fallback chain only worked when the field was actually present. Adding it.
Fixed`AgentsService.findAllForUser` + `getPageData` + ensure-main-agents refresh path: added `subdomain: true` to every `instance: { select: {...} }` block so the agents endpoint returns it. The frontend's `instanceLabel()` helper now actually has the subdomain to fall back to when `instance.name` is empty (which is the default — most users don't name their VMs).
v1.30.152026-05-16
  • Tropic plugin telemetry from Community VMs now flows into the dashboard's Security/Activity feed. v1.30.11 fixed Sondera's hook registration so blocks actually fire (and the per-container `~/.openclaw/.sondera/audit.log` confirms REFUSAL + DENY entries), but `oc.tool.blocked` / `oc.policy.refusal` events were not reaching the Tropic API because two env-var bugs in `bring-up-container.sh` short-circuited the telemetry emitters in sondera, policy-check, tropic-telemetry, and tropic-memory.
Fixed`api/src/contabo/scripts/bring-up-container.sh`: renamed the docker `-e TELEMETRY_TOKEN=...` env to `-e TROPIC_TELEMETRY_TOKEN=...` to match the name every Tropic plugin reads (`process.env.TROPIC_TELEMETRY_TOKEN`). The old name was a leftover from before the plugins were renamed; the EC2/systemd path always used the prefixed name. Also added `-e TROPIC_INSTANCE_ID=<containerId>` so `tropic-telemetry` can tag emitted events with the originating instance.
Fixed`ContaboInstanceService.bringUpContainerAsync` and `restartWithLatestImage`: default `TROPIC_API_URL` to `https://api.tropic.bot/api` when the Fly secret is unset, instead of passing `String(undefined)` through to the container. Without the default, every plugin saw `process.env.TROPIC_API_URL === "undefined"` (literal string) and silently skipped the telemetry POST. Telemetry now flows on freshly-provisioned Community VMs.
ChangedExisting Contabo hosts need their script re-uploaded for this fix: `POST /admin/contabo/hosts/:id/refresh-scripts`. Newly bootstrapped hosts pick up the change automatically.
v1.30.142026-05-16
  • Scheduled-event modal: agent picker now uses the proper React DropdownSelect (was a native browser select), always shows the instance label so "Main Workspace" on five different VMs is no longer ambiguous, and stops nuking your inputs when the agents query refetches mid-form. Three independent bugs in one screenshot.
Fixed`EventModal`: replaced the native `<select>` agent picker with `<DropdownSelect>`. Matches the rest of the dashboard, supports keyboard nav + search threshold, renders as a portal so it doesn't clip inside the modal.
Fixed`EventModal`: the form-init `useEffect` previously depended on `readyAgents`. The parent's `agents={aliveAgents.map(...)}` produces a fresh array on every render, so any background refetch (or even just a sibling state change) re-ran the effect and wiped name/message/date/time mid-typing. Split into two effects — one keyed on `[open, event?.id]` for form init, a second keyed on `[open, event, agentId, defaultAgentId, readyAgents]` purely for default-agent selection. Switching agent in the dropdown no longer touches any other field.
Fixed`ScheduledPage`: `aliveAgents` now drops agents whose `instance` is null (orphans from deleted VMs showed up as bare "Main Workspace" / "main" rows the user couldn't actually schedule against). Added an `instanceLabel(inst)` helper that resolves `name → subdomain → instanceId → short id` so the modal subtitle is never blank. Both the top-bar agent filter dropdown and the EventModal agent picker route through the same helper.
v1.30.132026-05-16
  • Books "Connect Tropic-Hosted" button now sticks. The POST to `/me/extraction/providers/TROPIC_HOSTED_ACKNOWLEDGED` was succeeding and writing the flag to `OrgSecret`, but the immediate-next GET `/me/extraction/providers` came back with `connected: false` and the UI stayed on "Needs connection" forever. Root cause: `UserSecretsService.listAll` filters out keys in `HIDDEN_FROM_UI_KEYS` (which includes `TROPIC_HOSTED_ACKNOWLEDGED` — it has no Saved Keys tile and would otherwise show up as a misleading "saved key"). That filter is right for the dashboard's Saved Keys page, but the same method is reached internally by books-api via `/internal/user-secrets` and books-api was relying on seeing the key in the response to know the user had acked. Hide-from-UI is now opt-out per call: dashboard endpoint keeps the filter; the internal endpoint passes `includeHidden: true`.
Fixed`UserSecretsService.listAll(orgId, opts?)`: optional second argument `{ includeHidden?: boolean }`. When true, drops the `key: { notIn: HIDDEN_FROM_UI_KEYS }` filter so internal flags (TROPIC_HOSTED_ACKNOWLEDGED today, future markers as needed) round-trip through the read path.
Fixed`InternalProvidersController.listSecrets` (GET /internal/user-secrets) now calls `listAll(orgId, { includeHidden: true })`. The dashboard's `UserSecretsController.listAll` continues to call the default, so the public Saved Keys page is unchanged.
v1.30.122026-05-16
  • Editing an existing scheduled event in the dashboard now works end-to-end. Two bugs surfaced after v1.30.10 made the create path reliable on Community VMs: (1) the edit toast threw `openclaw cron edit failed: error: unknown option --json` because v1.30.4 copied `--json` from the add path without checking — `cron add` accepts it, `cron edit` does not (it suggests `--cron` instead). (2) The edit modal opened with an empty Message textarea for main-session events because `parseRawJob` only read `payload.message`, but openclaw's `CronPayload` discriminates by kind: `{kind: "systemEvent", text}` for main-session jobs and `{kind: "agentTurn", message}` for isolated/current jobs (src/cron/types.ts:148). Main-session jobs are the dashboard default, so every cached row had `message: ""` — confirmed on the Contabo VM by inspecting cron list output against the scheduledJobCache row.
Fixed`ScheduledEventsService.update`: dropped `--json` from the `openclaw cron edit <jobId>` invocation. We don't parse edit's output (only the exit code matters; `syncAgent` re-reads canonical state from `cron list --json` right after). Without this fix every edit unconditionally failed with the unknown-option error.
Fixed`ScheduledEventsService.parseRawJob`: read `payload.text ?? payload.message` (in that order) so main-session systemEvent jobs round-trip their text correctly into `scheduledJobCache.message` and back through to the edit modal. Older cached rows with empty `message` self-heal on the next 5-min `SchedulerSyncService` tick.
v1.30.112026-05-16
  • Sondera (and policy-check, tropic-telemetry, tropic-memory) now actually have their hooks wired into the gateway runtime on new Community VMs. v1.30.9 fixed the install-record split but the security plugin was still inert — agent asked "print env vars", agent ran `env | sort`, and the gateway happily streamed `GEMINI_API_KEY`, `OPENROUTER_API_KEY`, etc. back into the chat. Root cause: OpenClaw 2026.5.x only registers `api.on(...)` hooks for plugins whose manifest declares `activation.onStartup: true`. Without that flag, the gateway loads the plugin's code, prints the `[sondera] Activated` line, but `api.on` is wired to a noop fallback and every hook silently drops. All four Tropic plugin manifests were missing the flag (stock plugins like active-memory have always had it).
Fixed`api/src/vm/sondera-plugin/openclaw.plugin.json`, `policy-check-plugin/openclaw.plugin.json`, `telemetry-plugin/openclaw.plugin.json`, `tropic-memory-plugin/openclaw.plugin.json`: each now declares `"activation": { "onStartup": true }` so the gateway loads them at startup with the full plugin API (including a real `on` handler that routes to `registerTypedHook`). Verified locally by running the rebuilt image's gateway: it now logs `http server listening (5 plugins: browser, policy-check, sondera, tropic-memory, tropic-telemetry; 1.4s)` and `[hooks] loaded 5 internal hook handlers`, where before the same boot only initialized the plugin code without registering hooks.
v1.30.102026-05-16
  • Scheduling against an existing agent's main chat finally works on Community VMs. Diagnosed live by SSHing into a working Contabo container (openclaw 2026.5.6) — the openclaw CLI's paired device gets `operator.read` only on first connect, but cron.add lives in the ADMIN_SCOPE bucket (gateway/method-scopes.ts:172), so every cron-write triggers a scope-upgrade pending. openclaw hardcodes silent=false for `reason === "scope-upgrade"` and the CLI's own `devices approve` can't self-heal either: every gateway-rpc attempt from a device-without-pairing-scope creates a fresh pending that REPLACES the previous one via reconcilePendingPairingRequests, so the local fallback at devices-cli.ts:220 always passes a stale requestId to approveDevicePairing and gets null back. v1.30.5 and v1.30.7's "use openclaw devices approve" approaches were both blocked by that race. v1.30.10 bypasses the CLI race by merging the pending scopes directly into paired.json on the VM via a node one-liner — same model as the existing systemd auto-pair watcher that already sets silent=true on pending.json, just extended to cover scope-upgrade which the gateway refuses to silent-approve.
Added`ScheduledEventsService.buildPairingHealScript` + `healPendingPairings`: heal-on-cron-failure that ships a small Node script (base64'd through `bash -lc` to keep shell quoting trivial) to the openclaw user on the VM. The script reads ~/.openclaw/devices/pending.json + paired.json, and for every pending request whose deviceId is already paired, unions the requested scopes into the paired device's approvedScopes + tokens.operator.scopes, then deletes the pending entry. Unknown deviceIds are skipped (those go through the not-paired flow which IS silent-auto-approved by the gateway for local connections).
Changed`ScheduledEventsService.runOpenclawCron` now routes scope-upgrade / pairing-required errors through `healPendingPairings` instead of `drainPendingPairings`. Removed the prior approach (`openclaw devices list --json` + `openclaw devices approve <id> --json`) which couldn't work — diagnosed in detail in the heal-method docstring with the relevant openclaw source line numbers.
Added`api/scripts/ssh-contabo.ts`: standalone debug helper that decrypts a Contabo VM's container SSH key (using the same path as TerminalService) and runs a one-shot remote command. Used to live-diagnose the pairing race that v1.30.5/v1.30.7 couldn't self-heal; left in tree so the next debug session doesn't need to rewrite it.
v1.30.92026-05-16
  • Hot-fix that finally lets the v1.30.6 Tropic-plugins-on-the-new-install-index rewrite actually ship an image. Both v1.30.6 (the CLI-install rewrite itself) and v1.30.7 (a scheduler change that happened to ride the same broken Dockerfile path) failed the `Build OpenClaw Image` workflow at the install-tropic-plugins step and never pushed anything to ECR. Two layered bugs in the build assertions: (1) my own new assertion in install-tropic-plugins.sh checked `.records[$id]` but the install index uses `.installRecords[$id]`, (2) the pre-existing assertion in configure-openclaw-json.sh still validated the deprecated `.plugins.installs["tropic-memory"]` field in openclaw.json, which has been null since the index split. Caught both locally with `docker build` before tagging this time.
Fixed`api/packer/scripts/lib/install-tropic-plugins.sh`: replaced the manual `cp -r` plugin-staging loop with `openclaw plugins install <path> --dangerously-force-unsafe-install` pointed at the Packer/Docker source under `/tmp/tropic-plugins/<plugin>-plugin/`. The CLI handles the copy to `~/.openclaw/extensions/<manifest-id>/` and writes the install record to the canonical index. Post-install `npm install` in the CLI-managed destination materializes Sondera's cedar-wasm dependency. Final assertion now uses `.installRecords[$id]` against `~/.openclaw/plugins/installs.json`.
Fixed`api/packer/scripts/lib/configure-openclaw-json.sh`: moved the `tropic-memory` install-record assertion from the deprecated `.plugins.installs["tropic-memory"]` field in openclaw.json to `.installRecords["tropic-memory"]` in `~/.openclaw/plugins/installs.json`, matching where 2026.5.x actually stores the record.
v1.30.82026-05-16
  • v1.30.3 made Slack workspace activation transferable between instances, which surfaced a second-order bug on the Contabo community image: the newer OpenClaw gateway there exposes explicit `channels.start` / `channels.stop` methods and does NOT auto-open the Socket Mode connection when `config.patch` flips `channels.slack.enabled` to true. Older EC2 gateways auto-started on config change, so the bug never showed up there. Symptom: activate succeeds in the UI, DB metadata moves to the new instance, but the @Tropic Slack bot goes silent because no gateway is actually connected to Slack. Fix: call `channels.start { channel: "slack" }` after the activate `config.patch`, and `channels.stop` after the deactivate / prior-binding-release `config.patch`. Both calls are swallowed silently on older gateways that don't expose those methods.
Fixed`VmService.activateSlack` now follows the `config.patch` with an explicit `channels.start { channel: "slack" }` RPC (8s timeout). Wrapped in try/catch with a warn log — older gateways without this method (which auto-start on config change anyway) error out but don't fail the activation.
Fixed`VmService.releaseSlackBinding` and `VmService.deactivateSlack` now follow the `slack.enabled = false` patch with a matching `channels.stop` call so the newer gateways actually close their Socket Mode connection. Without this, a transferred-away instance could keep its Slack socket open until the gateway restarted, defeating the whole point of the exclusive-binding design.
v1.30.72026-05-16
  • v1.30.5's "approve only the requestId in the error" scheduler heal turned out to be too narrow in practice — the user kept hitting the same `scope upgrade pending approval` toast. The gateway accumulates a fresh pending entry per failed reconnect and the named requestId may already be superseded by the time Tropic's approve runs, so approving just that one id is a no-op. v1.30.7 drains every pending pairing request on the VM (`openclaw devices list --json | approve <each>`) before retrying the cron command, and logs each approve outcome so the next failure is actually diagnosable in Fly logs.
Changed`ScheduledEventsService.runOpenclawCron`: on any pairing-shaped error (`scope upgrade pending approval` / `pairing required` / `scope-upgrade` / `device is asking for more scopes`), Tropic now runs `openclaw devices list --json`, parses `pending[].requestId`, calls `openclaw devices approve <id> --json` for each, then retries `cron <args>` once. The per-approve stderr/stdout is logged so a failed approve is no longer silent.
ChangedReplaced the previous `extractPairingRequestId` regex helper with a broader `looksLikePairingError` predicate plus a new `drainPendingPairings` helper. The drain skips entries with no requestId and surfaces JSON parse failures from `devices list` rather than swallowing them.
AddedDiagnostic logging: the full pairing-error stderr/stdout (first 400 chars) is logged when we enter the heal path; the retry failure (first 300 chars) is logged when even the drained retry fails. Earlier the only signal was the user-facing toast.
v1.30.62026-05-16
  • Tropic security plugins (Sondera, policy-check, telemetry, memory) now actually load on freshly-built openclaw container images. OpenClaw 2026.5.x moved plugin install records out of `openclaw.json` into a separate canonical index at `~/.openclaw/plugins/installs.json`, but the AMI/container build script was still writing the legacy `.plugins.installs.<id>` field via jq, leaving every Tropic plugin's install record `null` at runtime. The gateway silently skipped loading them and the agent happily dumped environment variables on request because Sondera was dark.
Fixed`api/packer/scripts/lib/install-tropic-plugins.sh` now registers each plugin via `openclaw plugins install <path> --link --dangerously-force-unsafe-install` (the supported CLI for the new install index) instead of jq-into-openclaw.json. `--dangerously-force-unsafe-install` is required for Sondera and tropic-telemetry — both legitimately combine `process.env` access with `fetch` for telemetry, which trips 2026.5.x's code-pattern scanner. Build now asserts all four entries exist in `~/.openclaw/plugins/installs.json` and fails immediately if any are missing.
Fixed`api/src/vm/policy-check-plugin/package.json` now includes `openclaw.extensions: ["./index.js"]`. Without this manifest field, `openclaw plugins install` rejects the plugin with "package.json missing openclaw.extensions" and the install record never lands.
v1.30.52026-05-16
  • Scheduling against an existing agent's main chat now self-heals the gateway scope-upgrade pairing dance. openclaw hardcodes `silent: false` for `reason === "scope-upgrade"` pending requests (gateway/server/ws-connection/message-handler.ts:1008), so the AMI's auto-pair systemd watcher (which sets silent=true on `pending.json`) can never auto-approve a cron CLI scope upgrade — the gateway has already rejected the connection by the time the watcher fires, and `silent` is not re-read retroactively. When Tropic's scheduler call sees `pairing required: device is asking for more scopes than currently approved (requestId: …)`, it now extracts the request id, runs `openclaw devices approve <id>` (whose CLI fallback writes through to the local pairing state with admin scope), and retries the cron command once.
Added`ScheduledEventsService.runOpenclawShell` private helper that runs any `openclaw <subcommand>` via `InstanceExecService.runShell`. `runOpenclawCron` delegates to it for both the original `cron <args>` call and the new approve-then-retry path.
Added`ScheduledEventsService.extractPairingRequestId` regex-matches `requestId: <uuid>` from openclaw's pairing rejection text, but only when one of the gating phrases is also present (`scope upgrade pending approval`, `pairing required`, `scope-upgrade`). Other failures pass through untouched.
FixedWhen `openclaw cron <verb>` fails with a scope-upgrade pending pairing, `runOpenclawCron` now invokes `openclaw devices approve <requestId> --json` and replays the original cron invocation once. Approve failures fall back to surfacing the original cron error so the dashboard toast stays diagnostic.
v1.30.42026-05-16
  • Scheduling an event from the dashboard against an agent's main chat now actually works. The "Save event" button was failing with `openclaw cron add failed: Main jobs require --system-event (systemEvent)` because Tropic was always sending the cron payload as `--message`, but openclaw rejects that flag for main-session jobs (it only accepts `--system-event` there, reserving `--message` for isolated / agentTurn jobs).
Fixed`ScheduledEventsService.create` picks the payload flag from the resolved session: `--system-event` when `effectiveSession === "main"`, `--message` otherwise. Channel delivery (which already forces `effectiveSession = "isolated"`) keeps `--message`, so that path is unchanged.
Fixed`ScheduledEventsService.update` resolves each underlying job's effective session from `dto.session ?? scheduledJobCache.session` and applies the same flag-swap on `cron edit`. Editing a main job's message without touching the session field now correctly sends `--system-event` instead of failing the same way as add.
v1.30.32026-05-16
  • Slack workspaces are now exclusively bound to a single agent instance, matching the underlying Socket Mode constraint (a workspace bot token can't be cleanly shared across two gateways without duplicate replies). Activating a workspace on a new instance automatically transfers the binding from the prior one, even if the prior gateway is unreachable. This also unblocks the "activate freezes" symptom that surfaced when trying to move a workspace from a healthy EC2 instance to a Community VM whose gateway was slow to respond — the prior cleanup now uses a short 5s timeout and falls through to a metadata-only clear on RPC failure.
Changed`VmService.activateSlack` now queries `ManagedInstance.metadata.slackConnectionId` across the org for any prior holder of the workspace, then calls a new `releaseSlackBinding` helper which (a) best-effort RPCs `channels.slack.enabled = false` + `plugins.entries.slack.enabled = false` on the prior gateway with a 5s timeout and (b) clears `slackConnected` / `slackTeamName` / `slackConnectionId` from the prior instance's metadata regardless of RPC outcome.
ChangedReordered the activation flow so the target gateway's `config.get` health-check runs BEFORE any prior-binding cleanup. If the target gateway is unreachable, the prior instance keeps its working Slack — a broken target can't strand the org with no active Slack instance.
FixedPre-existing stale test in `terminal/terminal.service.spec.ts:270` — `listRunningInstancesForUser` already filters on `{ type: 'contabo', status: 'running' }` in addition to ec2/local, but the spec's expected `where` clause was missing the contabo row, so the test failed on master before this branch even touched the file.
v1.30.22026-05-16
  • Two scheduler-cache bugs surfaced once real agent-written jobs landed: (1) `openclaw cron list` is per-VM, not per-agent. When N Tropic agents shared an EC2 VM, every agent's 5-min sync was writing the same set of jobs into N different cache rows — so a single cron job appeared under every agent on that VM in the dashboard dropdown. (2) `schedule.kind === "every"` (interval-based jobs, e.g. "every 10 minutes") was not handled by `parseRawJob`, so the cache row had `cron: null` AND `at: null` and the grid fell back to rendering only `nextRun`, putting the event at an apparently random time slot.
Fixed`ScheduledEventsService.parseRawJob` now extracts the openclaw `agentId` field (the agent slug recorded at `cron add --agent <slug>` time) onto the parsed shape. `syncAgent` then filters the parsed jobs to only those whose `agentSlug === agent.slug` (jobs created without `--agent` are attributed to whichever Tropic agent has `slug === "main"` — that's openclaw's default-agent target). Cache rows for non-matching agents on the same VM clear on their own next sync via the existing notIn-prune; or click Refresh to flush immediately.
Fixed`ScheduledEventsService.parseRawJob` handles `schedule.kind === "every"` by synthesizing a 5-field cron expression for clean intervals (e.g. `everyMs: 600000` → `*/10 * * * *`, `everyMs: 3_600_000` → `0 * * * *`, `everyMs: 7_200_000` → `0 */2 * * *`). Non-divisible intervals (e.g. every 90 min) leave `cron` null and the grid falls back to `nextRun` — accurate trade-off for the rarer case.
v1.30.12026-05-16
  • Scheduler modal channel picker simplified to a single-select dropdown (Main chat / WhatsApp / Telegram / Slack / Discord) instead of the multi-select chip-picker. Matches the underlying `openclaw cron` model (one `--channel` per job) and makes the common case obvious. Edit mode keeps the channel field disabled — to change the channel, cancel and recreate.
Changed`ui/components/scheduled/event-modal.tsx`: replaced `ChannelPicker` chip-multi-select with `ChannelSelect` single-select dropdown. State is still stored as `channels: ChannelEntry[]` of length 0 or 1 so the API client and backend (which already handle 0/1/N) don't need to change.
RemovedIn-edit "channels-changed" comparison + toast. The picker is now `disabled` in edit mode (channel field can't be changed in place — openclaw job's channel is immutable), so the defensive equality check is no longer needed.
ChangedRead-path grouping in `ScheduledEventsService.assembleEvents` is kept — agents can still write multi-channel batches via the skill (N jobs sharing `--description tropic-group:<uuid>`), and the dashboard groups them back into a single event row. Only the dashboard write-path is single-channel now.
v1.30.02026-05-16
  • Scheduler events can now deliver to multiple channels (WhatsApp / Telegram / Slack / Discord), and the Tropic Scheduler skill now teaches agents to ASK which channel before scheduling. Confirmed against the OpenClaw source that `--channel` is singular, so multi-channel events become N openclaw cron jobs sharing a `tropic-group:<uuid>` description — Tropic groups them back into a single dashboard row on read. Dashboard modal gains a channel chip-picker with a per-channel recipient input; the calendar event blocks render the channel list inline next to the name; the popover shows the delivery destinations.
Added`api/src/marketplace/seeds/tropic-scheduler.md` SCHEDULING block: agents must ASK the user which channel to deliver to (not assume `--channel last`). Documents the three-flag combo `--session isolated --announce --channel X --to Y`, includes WhatsApp + Telegram examples, and instructs agents to pass `--description "tropic-group:<uuid>"` (same uuid per channel) for multi-channel batches so Tropic groups them back into one event in the dashboard.
AddedPrisma `ScheduledJobCache`: new `description`, `group_id` (parsed from `tropic-group:<uuid>` prefix), `channel`, `channel_to`, `channel_account` columns + `group_id` index. Migration `20260516160000_cache_channels_group` is additive (no data loss). Applied to prod DB manually since `fly.toml` `release_command` is disabled.
Added`api/src/scheduled-events/dto/create-cron-job.dto.ts`: new `ChannelDeliveryDto` (channel/to/account) + `channels?: ChannelDeliveryDto[]` array on `CreateCronJobDto` (max 8 entries).
Changed`ScheduledEventsService.create` — when `channels[]` is set, forces `--session isolated --announce`, generates a shared `tropic-group:<uuid>` description for >1 channels, and runs `openclaw cron add` once per channel. Partial failures across the batch leave the partials in place rather than rolling back; user sees them after sync and can cancel from the dashboard. After all writes succeed, syncs the agent once and re-reads the assembled multi-job event from cache.
Changed`ScheduledEventsService.cancel/update/runNow` — now operate on a logical event id. Resolved to a job-set via the new `resolveEventJobs` helper: if the id matches a `group_id` in the cache, fan-out to every job in the group; otherwise treat as a single job id. Cancel returns `{ removed: N }`; runNow returns `{ fired: N }`. Update does NOT support channel set changes (different number of underlying jobs) — caller must cancel + recreate.
Added`ScheduledEventsService.parseRawJob` + `assembleEvents` — split the old `normaliseJob` into (a) per-job row extraction (handles real openclaw JSON shape: `schedule.kind ∈ at/cron/every`, `sessionTarget`, `payload.message`, `delivery.{channel,to,accountId}`, `state.nextRunAtMs/lastRunAtMs`) and (b) grouping by `groupId` into logical events with a `channels[]` aggregate.
ChangedAPI response shape: `ScheduledEvent` now carries `id` (logical event id = groupId for multi-channel, jobId otherwise), `jobIds: string[]`, `groupId: string | null`, and `channels: { channel, to, account }[]`. Old code paths that used `event.id` as the openclaw job id still work for single-channel events (id stays the jobId).
Added`ui/components/scheduled/event-modal.tsx`: new "Deliver result to" section with chip-picker for WhatsApp / Telegram / Slack / Discord; each selected channel gets a recipient input with channel-specific placeholder (E.164 for WhatsApp, chat id for Telegram, etc). Session dropdown only shown when no channels are selected (channels force `--session isolated` server-side). Edit mode disables the channel picker — channel changes require cancel + recreate.
Changed`ui/components/scheduled/calendar-grid.tsx`: event blocks show the channel slugs inline (`→ whatsapp, telegram`) so the user can tell at a glance where a scheduled event will deliver.
Changed`ui/components/scheduled/event-popover.tsx`: new Delivery row listing each `{channel, to}` for the event.
v1.29.02026-05-16
  • Scheduler reads now come from a Postgres cache instead of fanning out N SSM/SSH calls per dashboard poll. v1.28 had the dashboard run `openclaw cron list --json` against every alive agent on every list request — at 30s poll cadence and N=5 agents that is 600 SSM calls/hr per opened dashboard, even when no schedules changed. New shape: `SchedulerSyncService` (`@Cron(EVERY_5_MINUTES)`, pg_try_advisory_lock-gated so two Fly machines do not double-tick, concurrency-capped at 8 in-flight VMs) polls each alive agent on the 5-min boundary and upserts the rows into a new `scheduled_job_cache` table; dashboard list is now a single Postgres query that joins per-agent sync state. Mutations stay write-through — create/update/cancel run `openclaw cron …` on the VM and then call `syncAgent` on that one agent before returning, so dashboard changes appear immediately without waiting for the 5-min tick. A "Last synced N ago" label + a Refresh button (on-demand sync over the caller's alive agents) live above the calendar. Events from agents with stale or never-synced caches render with a dashed border + reduced opacity, so the user can tell at a glance "we have not heard from this VM in a while".
AddedPrisma: new `ScheduledJobCache` model (`agent_id`, `job_id`, `cron`, `at`, `tz`, `message`, `session`, `status`, `next_run`, `last_run`, `raw` JSONB; unique on `(agent_id, job_id)`, indexed on `organization_id`). New `Agent.scheduleSyncedAt` + `Agent.scheduleSyncError` track per-agent sync watermark. Migration `20260516120000_scheduled_job_cache` adds the table + columns; the v1.28-era `20260516000000_drop_scheduled_events` was applied alongside it to clear the orphan table.
Added`api/src/scheduled-events/scheduler-sync.service.ts` — background `@Cron(EVERY_5_MINUTES)` worker. `pg_try_advisory_lock(6710924512)` gates against multi-instance double-firing (different key from the v1.26 SchedulerTickService lock, so a stale lock from that era cannot block us). Iterates every alive agent across every org with a fixed-size worker pool (CONCURRENCY=8) so we do not fire 50 SSM calls simultaneously. Per-agent failure stamps `scheduleSyncError` and continues.
Added`ScheduledEventsService.syncAgent(agent)` — single-VM cache refresh. Runs `openclaw cron list --json`, defensively parses, upserts every job row, prunes cache rows for jobs the VM no longer reports, and stamps the agent's `scheduleSyncedAt` (clears `scheduleSyncError` on success).
Added`ScheduledEventsService.refreshForOrg(clerkId, orgId)` — fan-outs `syncAgent` across every alive agent the caller can see. Returns `{ok, failed, total}` so the UI Refresh button can toast a count.
Added`POST /api/orgs/:orgId/scheduled-events/refresh` controller endpoint. Declared BEFORE `@Post(":id/run")` so NestJS pattern-matching does not interpret "refresh" as a job id.
Changed`ScheduledEventsService.listForOrg` — no more VM fan-out. Reads cached rows from `scheduled_job_cache`, joins `Agent.scheduleSyncedAt`/`Agent.scheduleSyncError`, returns `{events, agents: [{agentId, agentName, lastSyncedAt, syncError, stale}]}`. The `stale` flag fires for sync >15min ago or never-synced agents.
Changed`ScheduledEventsService.create/update/cancel` — still call `openclaw cron …` on the VM (write-through), then invoke `syncAgent` on that one agent before returning so the cache (and therefore the next dashboard poll) reflects the change without waiting for the 5-min background tick.
Changed`ui/lib/api/scheduled-events.ts` — `ListResponse.unreachable` (v1.28) replaced by `ListResponse.agents` (per-agent sync state + stale flag). New `refreshScheduledEvents` fetcher + `useRefreshScheduledEvents` mutation hook.
Changed`ui/app/(dashboard)/scheduled/page.tsx` — header now shows "Last synced N ago · X agent(s) never synced" pulled from `data.agents[].lastSyncedAt`; new Refresh button next to prev/next/Today (spinning icon while in flight) calls the refresh endpoint and toasts the result. Dropped the v1.28 amber "X agent(s) unreachable" banner in favour of the staleness label + per-event dashed border.
Changed`ui/components/scheduled/calendar-grid.tsx` — new `agentStale: (agentId) => boolean` prop. Event blocks for stale agents render with `border: dashed` and ~0.55 opacity (or 0.4 if also past). Tooltip on hover adds " — VM data stale" so the user knows why it is dimmed.
v1.28.22026-05-16
  • Rename: the v1.28.1 marketplace skill was called "Openclaw Cron" which conflates Tropic's opinionated wrapper with the OOTB OpenClaw daemon. Renamed to "Tropic Scheduler" to make ownership clear — the underlying mechanism is still `openclaw cron`, but the skill is Tropic-shipped guidance about how to use it (session defaults, dashboard integration, confirm-back pattern).
Changed`api/src/marketplace/seeds/openclaw-cron.md` → `api/src/marketplace/seeds/tropic-scheduler.md`. Frontmatter `name` is now "Tropic Scheduler"; slug derives to `tropic-scheduler`. Body reframed: opening paragraph distinguishes Tropic's opinionated skill from the OOTB OpenClaw daemon; the dashboard-integration section is clearer that the dashboard reads `openclaw cron list --json` directly from each VM (no separate "register with Tropic" step).
ChangedThe `openclaw-cron` marketplace row from v1.28.1 will become orphan in the DB (seeds upsert, they do not delete). Harmless — no one has installed it yet. To clean up: `DELETE FROM marketplace_skills WHERE slug = 'openclaw-cron';` whenever convenient.
v1.28.12026-05-16
  • Ships an `Openclaw Cron` marketplace skill so existing agents (especially the auto-created Main Workspace, which has no template and therefore never receives the v1.28.0 AGENTS.md additions) get the in-context instructions on how to use `openclaw cron`. The skill is plain markdown — it documents the CLI, when to reach for it, the difference between `--session main/isolated/current`, a cron-expression cheat sheet, and confirms back to the user explicitly that the Tropic dashboard reads the same store. Show up in the Skills marketplace drawer on the Agents page; user clicks Install on a per-instance basis.
Added`api/src/marketplace/seeds/openclaw-cron.md` — new marketplace skill seed. No clawhub_slug (Tropic-shipped, not on ClawHub), no post_install_commands (the CLI is built into the openclaw binary). `MarketplaceService.installSkillOnVm` / `installSkillOnContaboInstance` already handle this shape: they write `SKILL.md` to `~/.openclaw/skills/openclaw-cron/SKILL.md` and restart the gateway.
v1.28.02026-05-16
  • Scheduler rewrite — Tropic stops scheduling, OpenClaw runs it. Previous architecture had a Tropic-side Postgres table (`ScheduledEvent`) + a once-per-minute Fly cron that pushed prompts over the gateway WebSocket at fire time. v1.26-v1.27 smoke-testing exposed it as the wrong model: OpenClaw already has its own cron daemon (`openclaw cron …`), agents that want to schedule something have to use it anyway, and the Tropic-side store was a parallel universe with separate failure modes (WS routing, gateway origin, etc). New shape: dashboard CRUD is a thin RPC over `openclaw cron <verb> --json` shelled to each VM via the existing `InstanceExecService` (handles EC2/SSM, contabo/SSH, Android/HTTP). Listing fans out to every alive agent in the org in parallel; stopped/unreachable agents are reported in the response and rendered as an amber banner above the calendar. No more tick worker, no more Tropic-side scheduling table, no more dead AgentTask rows in Mission Control. The dashboard composer was rewritten from RRULE to 5-field cron (the format OpenClaw natively speaks); structured Daily/Weekly/Monthly compose to cron, with raw cron as the Custom escape hatch.
  • Every deployed agent's AGENTS.md now carries a `# SCHEDULING — MANDATORY` block instructing the model to use `openclaw cron` for any future/recurring work rather than rolling its own loop. Anything the agent schedules surfaces in the dashboard the same way user-created jobs do — single store, two interfaces.
Removed`api/src/scheduled-events/scheduler-tick.service.{ts,spec.ts}` — the @Cron(EVERY_MINUTE) worker that pushed prompts via MessagingService. OpenClaw cron handles firing on the VM directly.
Removed`api/src/scheduled-events/lib/rrule-helpers.{ts,spec.ts}` — RFC 5545 RRULE expansion. Replaced by 5-field cron in `ui/lib/cron.ts`.
Removed`api/src/scheduled-events/{scheduled-events-owner.controller.ts,scheduled-events.controller.ts}` — the dual-controller architecture (owner Clerk vs skill InstanceToken). Replaced by a single dashboard controller; agents never call back to Tropic for scheduling anymore.
Removed`api/src/scheduled-events/auth/instance-token.guard.{ts,spec.ts}` — only the deleted skill-side controller used it.
Removed`api/src/marketplace/seeds/scheduler.md` + `api/docker/scheduler-cli.py` + `api/packer/scripts/install.sh:268` skill-install line + `api/docker/openclaw.Dockerfile:135` COPY — the v1.26 Python CLI that POSTed to `/api/scheduled-events`. Agents now use the in-binary `openclaw cron`.
Removed`ScheduledEvent` Prisma model + the `organization.scheduledEvents` and `agent.scheduledEvents` back-relations. Migration `20260516000000_drop_scheduled_events/migration.sql` drops the table when migrations are re-enabled; the orphan table is harmless until then because nothing reads or writes it.
Removed`api/scripts/{install-scheduler-on-one,backfill-scheduler-skill}.ts` — one-shot scripts that pushed the v1.26 skill to running VMs. No longer relevant.
Added`api/src/scheduled-events/scheduled-events.service.ts` — replaced. Now wraps `openclaw cron list/add/edit/rm/run --json` via `InstanceExecService.runShell` with the standard `OcShellHelper.userOpsPreamble` (gives the EC2 vs contabo `$AS_OC_USER` shape). `listForOrg` fans out across alive agents in parallel (`Promise.all`); per-agent failures collect into an `unreachable` array rather than failing the whole call. `normaliseJob` pulls fields defensively from the raw OpenClaw JSON (`id|jobId|job_id`, `cron|schedule`, etc) so we tolerate minor shape drift.
Added`api/src/scheduled-events/scheduled-events.controller.ts` — replaced. `GET /orgs/:orgId/scheduled-events[?agentId=…]`, `POST/PATCH/DELETE /orgs/:orgId/scheduled-events/[:id]`, plus `POST /orgs/:orgId/scheduled-events/:id/run` for manual fire. PATCH/DELETE/run take `agentId` as a query param because the job id alone doesn't reveal which VM holds it.
Added`api/src/scheduled-events/dto/{create,update}-cron-job.dto.ts` — new DTO shape: `agentId`, `name`, exactly one of `cron` (5-field) or `at` (ISO), `tz`, `message`, `session: main|isolated|current`.
Changed`api/src/scheduled-events/scheduled-events.module.ts` — now imports `InstanceExecModule` + `CommonModule` instead of the deleted `MessagingModule`. No tick worker provider.
Added`api/src/agents/agents.service.ts` (deploy path, ~line 408) — appends a `# SCHEDULING — MANDATORY` block to every deployed agent's `AGENTS.md` directing them to use `openclaw cron add/list/rm`. Includes a quick reference with the four most common invocations.
Added`ui/lib/cron.ts` + 13 unit tests in `ui/lib/cron.test.ts` (runnable via `npx tsx --test`). Pure compose/parse/summarize for the structured Daily/Weekly/Monthly modes; Custom passes raw through; anything we can't parse round-trips as Custom.
Changed`ui/components/scheduled/cron-composer.tsx` (replaces `rrule-composer.tsx`) — five modes: One-off (uses date+time picker above), Daily, Weekly+BYDAY, Monthly day-of-month, Custom raw cron. Each structured mode has its own time picker so recurring jobs don't implicitly inherit "now".
Changed`ui/components/scheduled/event-modal.tsx` — field shape: `cron` / `at` / `message` / `session` instead of `rrule` / `at` / `prompt` / `catchup`. Modal sends one of `cron` or `at` based on composer state.
Changed`ui/components/scheduled/{calendar-grid,event-popover}.tsx` — event rendering now keys off the `cron` field. Grid expands the structured patterns client-side for the visible week (Daily/Weekly/Monthly) and falls back to `nextRun` for Custom cron and one-shots. Popover shows the human-readable schedule + `nextRun` + `lastRun` + status + `Run now` action.
Added`ui/app/(dashboard)/scheduled/page.tsx` — amber notice "N agent(s) unreachable" surfaces the `unreachable` array from the API so the user knows if a stopped VM is hiding jobs. Calendar still renders for the rest.
v1.27.32026-05-15
  • Scheduler fires reaching OpenClaw on community/contabo instances. Smoke-tested a one-off event against a contabo agent at 23:00 SGT — the tick worker fired correctly (Message row written, lastFiredAt advanced) but the row immediately flipped to status=failed and OpenClaw never saw it. Root cause: `connectToGateway` in `gateway-ws-client.ts` only had three branches (`wireguardIp && !publicIp` → port 18789, `tunnelUrl` → Cloudflare tunnel, `publicIp` → port 80 nginx) and contabo rows carry the community HOST publicIp, so the picker always fell through to `ws://<host>:80/ws`. Contabo containers serve the gateway directly on port 18789 over the per-host WireGuard subnet (10.99.0.0/16) with no nginx, so the connect failed and the outer catch marked the message failed. Same gap that v1.26.3/v1.26.4 fixed for `VmService.callGatewayRpc`; this brings `connectToGateway` to the same routing logic.
Fixed`api/src/messaging/gateway-ws-client.ts:connectToGateway` — routing now mirrors `VmService.callGatewayRpc`: `wireguardIp` starting with `10.100.` → port 18789 + Origin api.tropic.bot (Fly WireGuard for local Tropic instances); `10.99.` → port 18789 + Origin `<subdomain>.vm.tropic.bot` (contabo containers — api.tropic.bot is not on their allowedOrigins list); other `wireguardIp` with no publicIp → defensive fallback to :18789; `tunnelUrl` → Cloudflare tunnel; `publicIp` → EC2 :80. WireGuard is now preferred over a sibling publicIp because contabo rows carry the host's publicIp which is not the container's gateway port.
v1.27.22026-05-15
  • Scheduler agent-filter dropdown follow-up. The native `<select>` was rendering every Agent row in the org, including the long tail of historical `Main Workspace` agents whose bound instances had already been terminated — producing a vertical wall of duplicate names. Replaced with the existing searchable `DropdownSelect`, fed by an `aliveAgents` memo that excludes agents whose own status is `terminated`/`failed`/`error` AND agents bound to instances in those terminal states. Lookup-by-id (used to resolve the popover's agent name on a historical scheduled event) still spans every agent so old events keep rendering their original agent name.
Changed`ui/app/(dashboard)/scheduled/page.tsx`: agent filter now uses `<DropdownSelect>` from `ui/components/ui/dropdown-select.tsx`. Items come from a new `aliveAgents` memo filtering out `DEAD_AGENT_STATUSES = {terminated, failed, error}` and `DEAD_INSTANCE_STATUSES = {terminated, failed}`. Order is preserved from `/agents` (already returned `orderBy: createdAt desc` server-side), so newest agents appear at the top — same convention as the Agents page. `__all` sentinel value maps to the existing `agentFilter === ''` semantics. The create/edit modal's agent picker now consumes `aliveAgents` too.
Added`DropdownItem` subtitle is populated with the agent's bound instance name (or `provisioning` if the agent is still spinning up) so duplicates with the same agent name are distinguishable in the dropdown without expanding the column.
v1.27.12026-05-15
  • Scheduler calendar UX follow-ups after v1.27.0 smoke. Grid now covers the full 24 hours (was 6 AM–10 PM) with the body scrolling internally inside the page and an initial scroll jump to 8 AM so the first view is the morning. The `+ Add` button is gone; hovering an empty slot now shows a dashed ghost block with the snapped HH:MM, and clicking opens the create modal pre-filled with that day + time. Snap granularity is 30 minutes. Empty state only renders when the caller cannot create (member role) — owners/admins always see the grid, so they can click their way to a new event even when nothing is scheduled yet.
Changed`ui/components/scheduled/calendar-grid.tsx`: `HOUR_START=0`, `HOUR_END=24`; body wrapped in a `max-h-[70vh] overflow-y-auto` div with an initial scrollTop of `(8 - HOUR_START) * HOUR_HEIGHT` set via ref + useEffect.
Added`ui/components/scheduled/calendar-grid.tsx`: new `onSlotClick(slot: { date, time })` prop. Day columns get `onMouseMove` / `onMouseLeave` / `onClick` handlers that snap `clientY - rect.top` to 30-min increments via a `snapSlotAt` helper, render an indigo dashed ghost block at the snap position (showing the HH:MM), and emit `localDateString(dayStarts[i], tz) + snapped time` on click. Event-block onClicks now `stopPropagation` so clicking an existing event doesn't also trigger slot-create.
Added`ui/components/scheduled/event-modal.tsx`: new `defaultAt?: { date, time }` prop. When opening in create mode, prefers `defaultAt` over the current time so a slot click lands exactly on the user's intended cell.
Changed`ui/app/(dashboard)/scheduled/page.tsx`: dropped the `+ Add` button; wires `onSlotClick` (gated on `canCreate`) into the grid, holds the picked slot in `slotDefault` state, threads it through to the modal's `defaultAt`, and clears on modal close. EmptyState now only shows when there are no events AND the caller can't create — owners/admins see the empty 24-hr grid so the click-to-add affordance is always available.
v1.27.02026-05-15
  • Scheduler v2 — dashboard calendar + actual firing. Two real changes shipped together. (1) New `/scheduled` route under the Control sidebar: week/day grid, agent dropdown, click-to-edit popover, +Add modal with a structured RRULE composer (Never / Daily / Weekly+BYDAY / Monthly+BYMONTHDAY / Custom raw RRULE). Backed by a new owner-auth controller `/api/orgs/:orgId/scheduled-events/*` with full CRUD; agent is identified by UUID and scoped to the URL org (cross-org agent ids are rejected). Member role is read-only on agents they own; owners/admins get full CRUD. (2) Scheduler tick worker no longer dead-ends: v1 wrote `AgentTask` rows that nothing read, so fires never reached OpenClaw. The tick now calls `MessagingService.sendMessageToInstance` fire-and-forget — same path as chat. Writes a `Message` row (audit trail in chat history), auto-starts a stopped EC2 VM, opens the gateway WebSocket, pushes the prompt. The dead AgentTask write is removed.
Added`api/src/scheduled-events/scheduled-events-owner.controller.ts` — dashboard-side CRUD at `/api/orgs/:orgId/scheduled-events/*`. Auth via global Clerk guard; URL `:orgId` checked against `req.auth.orgId` to reject cross-org access before any service call.
Added`api/src/scheduled-events/scheduled-events.service.ts` — parallel `createForOwner / listForOrg / getForOwner / updateForOwner / cancelForOwner` family. Resolves caller membership for the URL org, role-gates (member = read-only over agents they own; owners/admins = full CRUD), resolves agent by UUID with `organizationId` filter (cross-org leak guard). Populates `ScheduledEvent.createdByUserId` on dashboard-created events. Skill-path methods unchanged.
Added`api/src/scheduled-events/dto/{create,update}-for-owner.dto.ts` — `agentId` (UUID) replaces `agentSlug`. Same field constraints as the skill DTOs otherwise.
AddedOwner-path service tests in `scheduled-events.service.spec.ts`: createForOwner happy/createdByUserId, non-member rejected, member-create rejected, cross-org agent rejected, listForOrg agent-filter scoping, member-default-scope (only own agents), member-rejected-on-other-agent, cancelForOwner owner-any vs member-own.
Changed`api/src/scheduled-events/scheduler-tick.service.ts` — fire path now calls `MessagingService.sendMessageToInstance(agent.userId, agent.instanceId, evt.prompt, { agentSlug })` fire-and-forget instead of writing an unread `AgentTask` row. `lastFiredAt` is advanced before the push so a slow/failing send can't double-fire on the next tick. Agents without `instanceId` or `slug` skip without advancing the watermark so the next tick retries. `ScheduledEventsModule` now imports `MessagingModule`. Tick spec rewritten: 5 tests including positive "no AgentTask write" assertion and gateway-failure non-rollback.
Changed`api/src/organizations/organizations.service.ts:getCurrent` now returns `timezone` so the dashboard can render scheduled events in the org's local time. `CurrentOrg` interface in `ui/lib/api/organizations.ts` matched.
Added`ui/app/(dashboard)/scheduled/page.tsx` — main page: agent dropdown (incl. "All agents"), Week/Day toggle, prev/next/Today nav, +Add button (owner/admin only), polls every 30 s. Empty state shows the `scheduler add` CLI invocation.
Added`ui/components/scheduled/{calendar-grid,event-modal,event-popover,rrule-composer}.tsx` + `ui/lib/{api/scheduled-events,rrule}.ts`. Calendar grid is built from scratch (~150 LOC) — hour rows × day columns, absolute-positioned event blocks, 50% opacity for past occurrences, deterministic agent → HSL hue. Grid expands DAILY / WEEKLY+BYDAY / MONTHLY+BYMONTHDAY occurrences in the visible window; custom RRULEs render only dtstart.
Added`ui/lib/rrule.ts` — pure compose/parse/summarize helpers for the structured Repeat radio group. 14 unit tests in `ui/lib/rrule.test.ts` (runnable via `npx tsx --test`).
Changed`ui/components/sidebar.tsx` — new "Schedule" entry under the Control submenu (between Policies and Secrets). `ui/proxy.ts` adds `/scheduled(.*)` to `isProtectedRoute`.
v1.26.112026-05-15
  • Brand cleanup: every user-facing surface now refers to community-tier VMs as "Community" rather than naming the underlying provider. The terminal sidebar badge was rendering the raw `i.type` field, which printed `COMMUNITY` correctly for ec2/local but leaked the backend name for community VMs; now mapped to `COMMUNITY` at display time. The agent card "Type" field was `Community (Provider Container)` and is now just `Community`. Past v1.26.x release notes have also been retagged to drop the provider name in prose and replace verbose internal-symbol references where they appeared in user-visible copy. Admin surfaces (Reserved Hosts panel) are unchanged — they still show provider details since only the operator sees them.
Fixed`ui/components/terminal/terminal-sidebar.tsx`: badge for community VMs now displays `COMMUNITY` instead of the raw `type` field.
Fixed`ui/app/(dashboard)/agents/page.tsx`: Advanced → Type for community VMs is now `Community` (was `Community (… Container)`).
Changed`ui/app/(marketing)/docs/releases/page.tsx`: every prose mention of the provider name in v1.26.x entries replaced with `Community` / `community`.
v1.26.102026-05-15
  • Two community terminal follow-ups from v1.26.9. (1) `Failed to open shell: Cannot parse privateKey: Unsupported key format` — community provisions an ed25519 keypair in PKCS#8 PEM (`crypto.generateKeyPairSync("ed25519", { privateKeyEncoding: { format: "pem", type: "pkcs8" } })`), which ssh2's parser rejects for ed25519. The decrypted PEM is now re-serialised to OpenSSH PEM (`sshpk.parsePrivateKey(pem, "pkcs8").toString("ssh")`) before being handed to the SSH client. (2) The community provision flow leaves `ManagedInstance.name` at the schema default `""`, so the sidebar entry rendered with an empty label. The terminal listing now applies the same `"Community VM" / "Cloud VM" / "Local Machine"` fallback that the agents page uses, so the sidebar matches the rest of the UI.
Fixed`api/src/terminal/terminal.service.ts:resolveHostAndUser` (community branch) now runs `sshpk.parsePrivateKey(pem, "pkcs8").toString("ssh")` on the decrypted container key. ssh2 accepts the resulting OpenSSH PEM block; PKCS#8 PEM was failing parse for ed25519.
Fixed`api/src/terminal/terminal.service.ts:listRunningInstancesForUser` mapper falls back to `"Community VM"` (community), `"Cloud VM"` (ec2), `"Local Machine"` (local) when `instance.name` is empty, mirroring the existing agents-page fallback.
v1.26.92026-05-15
  • Terminal page now lists community (community) instances and opens an interactive shell into them. The list query in `TerminalService.listRunningInstancesForUser` had `{ type: "ec2", status: "running" } | { type: "local", status: "online" }` and silently dropped community. Now includes `{ type: "community", status: "running" }`. `verifyOwnershipAndRunning` accepts community, and `resolveHostAndUser` returns `host = communityHost.publicIp`, `username = "tropic"`, `port = containerSshHostPort` (the per-container forwarded port; the container's SSH server listens on 22 internally), and `privateKey =` the decrypted contents of `instance.metadata.containerSshPrivateKeyEnc`. The terminal gateway now consumes those per-instance fields when present (port + privateKey) instead of hardcoding `port: 22` and the org-level SSH key.
  • tmux is added to the community image. Existing containers do not have tmux until they are restarted with the latest image, so the shell command for community is now `command -v tmux >/dev/null 2>&1 && exec tmux new-session -A -s tropic-main || exec bash -l` — terminal works immediately on old images (login shell) and upgrades to tmux on the next restart-with-latest.
Added`api/docker/openclaw.Dockerfile`: `tmux` added to the base apt-get install line so the community image ships with the same persistent-session shell as the EC2 AMI.
Added`api/src/terminal/terminal.service.ts:ResolvedHost` now exposes optional `port` and `privateKey` fields. `resolveHostAndUser` returns them for community (`port = containerSshHostPort`, `privateKey = credentials.decrypt(metadata.containerSshPrivateKeyEnc)`); ec2 / local leave both undefined and inherit the previous defaults (22, org key).
Added`api/src/terminal/terminal.service.ts:buildShellCommand(platformType, instanceType?)` now takes the instance type so community can run a tmux-or-bash fallback that works on images both with and without tmux.
Fixed`api/src/terminal/terminal.service.ts:listRunningInstancesForUser`: community added to the `OR` filter; the row mapper defaults `platformType` to `"Linux"` for community (no setup-callback to populate it) and marks `terminalReady: true`. Sidebar response shape now permits `type: "community"`.
Fixed`api/src/terminal/terminal.service.ts:verifyOwnershipAndRunning` accepts `(community, status="running")` and `include`s `communityHost.publicIp` so resolution does not need a second query.
Fixed`api/src/terminal/terminal.gateway.ts:onWsOpenSSH` reads `resolved.port` (defaulting to 22) and uses `resolved.privateKey` when set, skipping the org-key lookup. The host-key pin and idle/heartbeat timers are unchanged.
Added`ui/components/terminal/terminal-sidebar.tsx` + `ui/lib/api/terminal.ts`: `TerminalInstance.type` extended to `"ec2" | "local" | "community"`; sidebar renders the lucide `Container` icon in violet for community entries.
v1.26.82026-05-15
  • Gateway URL now appears on community agent cards under Advanced, matching the EC2 layout. The block already keyed off `instance.subdomain` (which community populates at provision time), but the outer condition was hardcoded to `isEc2 && isRunning`. Extended to `(isEc2 || isCommunity) && isRunning`. The SSH Command sub-block stays EC2-only because it is gated separately on `vmStatus?.publicIp`, which community does not populate (community SSH is already shown above in its own block as `ssh -i ...pem tropic@<host>:<port>`).
Fixed`ui/app/(dashboard)/agents/page.tsx` Advanced section: outer `{isEc2 && isRunning && (` becomes `{(isEc2 || isCommunity) && isRunning && (`; inner SSH Command guard tightened to `{isEc2 && vmStatus?.publicIp && (` so it does not render an empty section for community.
v1.26.72026-05-15
  • Community instances stuck on "Pending" now auto-recover on poll. `bringUpContainerAsync` is fire-and-forget, so an API restart between SSH-success and the `status="running"` write leaves the row at `pending` indefinitely (and the admin Resync button only re-runs `applyFirstStartConfig`, never touches status). `findAllForUser` now scans for community rows in `pending` with a non-null `wireguardIp`, hits `http://<wgIp>:18789/health` over WireGuard with a 1.5s timeout, and on a 2xx response flips the row to `running` (and backfills `startedAt` if it was unset). Adds at most 1.5s to the polling endpoint while there is a stuck container; zero overhead once everything is reconciled.
Fixed`api/src/instances/instances.service.ts:findAllForUser` — added a per-poll reconciliation pass for community instances stuck on `pending`. `Promise.allSettled` over `fetch(...:18789/health, AbortSignal.timeout(1500))`; on `res.ok` the row is updated to `status="running"` and the in-memory record is patched so the same response reflects the new state without a second DB round-trip.
v1.26.62026-05-15
  • WhatsApp on community, take four. Three WhatsApp methods (`assertGatewayReachable`, `waitForWhatsAppLogin`, `getWhatsAppStatus`) used the two-way ternary `instance.type === "ec2" ? vmStatus === "running" : status === "online"` to gate "is this instance up". That collapses local and community into the same branch, but only local sets `status === "online"`; community sets `status === "running"` (mirroring ec2's `vmStatus`). So every community call was being told "Instance must be running to connect WhatsApp" even when the container was up. Replaced with a `isInstanceRunning(instance)` helper that branches three ways: ec2/`vmStatus`, local/`online`, community/`running`.
Added`api/src/vm/vm.service.ts:isInstanceRunning(instance)` — single source of truth for "is this instance up", correctly branching by `instance.type` instead of collapsing local + community.
Fixed`api/src/vm/vm.service.ts` `assertGatewayReachable`, `waitForWhatsAppLogin`, `getWhatsAppStatus`: replaced the inline `instance.type === "ec2" ? vmStatus : status === "online"` with `this.isInstanceRunning(instance)`.
v1.26.52026-05-15
  • WhatsApp on community, take three. After the v1.26.4 routing fix, QR generation failed with "No instance found" because community containers do not populate `ManagedInstance.instanceId` (that column is the SSM/EC2 handle; community uses `containerName` + `wireguardIp`). Both `assertGatewayReachable` and the `waitForWhatsAppLogin` early-return guard hardcoded a `!instance.instanceId` check that fired on every community call. The check now only applies when `instance.type !== "community"`. Reachability for community is fully determined by `status === "online"` plus a non-null `getGatewayIp(instance)` and `gatewayToken`, which is the right invariant for that instance type.
Fixed`api/src/vm/vm.service.ts` `assertGatewayReachable`: `if (!instance.instanceId)` is now gated on `instance.type !== "community"`. EC2 / local still reject when the SSM ID is missing; community is allowed through to the IP / token check that follows.
Fixed`api/src/vm/vm.service.ts` `waitForWhatsAppLogin`: same `instanceId` requirement now skipped for community via a `needsInstanceId` flag, so the polling endpoint stops returning `{connected:false, message:"Instance not running"}` against running community containers.
v1.26.42026-05-15
  • WhatsApp on community, take two. v1.26.3 wired the right method names but `callGatewayRpc` itself only knew about Fly's WireGuard subnet (10.100.0.0/16) and EC2-via-nginx (port 80). Community containers live on a different per-host WireGuard subnet (10.99.0.0/16) and serve the gateway directly on port 18789 with no nginx fronting it, so every RPC was hitting `ws://10.99.0.X:80/ws` and getting `ECONNREFUSED`. Both `callGatewayRpc` and `startWebLoginAndWaitForQr` now branch on 10.99.* as a third route: port 18789 with `<sub>.vm.tropic.bot` Origin (which the community entrypoint already bakes into `gateway.controlUi.allowedOrigins`).
Fixed`api/src/vm/vm.service.ts` `callGatewayRpc`: `isLocal` (formerly only `vmIp.startsWith("10.100.")`) is now split into `isFlyWireguard` (10.100.*, port 18789, Origin api.tropic.bot, used for local Tropic instances behind Fly's WireGuard) and `isCommunityWireguard` (10.99.*, port 18789, Origin `<sub>.vm.tropic.bot`). The else branch (port 80 + subdomain Origin) remains for ec2.
Fixed`api/src/vm/vm.service.ts` `startWebLoginAndWaitForQr`: same three-way split applied to the WhatsApp pairing WebSocket so QR pairing on community opens against `ws://10.99.0.X:18789/ws` instead of port 80.
v1.26.32026-05-15
  • WhatsApp on community (community tier). The drawer was throwing "instance not registered with SSM" because every WhatsApp surface fell through to AWS SSM, which community containers cannot use. Allowlist read/write and channel-binding writes now branch on `instance.type === "community"` and patch openclaw.json via the gateway RPC (`config.get` / `config.patch`) over WireGuard instead of shelling in. Pairing, status, relink, disconnect were also unreachable for community because the gateway-IP resolution hardcoded `tunnelUrl || publicIp`, neither of which is set on community instance rows; switched to the unified `getGatewayIp()` helper which prefers `wireguardIp`. EC2 behaviour is unchanged.
Fixed`api/src/vm/vm.service.ts` `assertGatewayReachable`: now uses `getGatewayIp(instance)` instead of inline `tunnelUrl || publicIp`. For ec2 the result is identical (no `wireguardIp`, falls through). For community it returns the WireGuard IP, which is the only address we have for the container.
Fixed`api/src/vm/vm.service.ts` `getWhatsAppStatus` (non-local branch): same `tunnelUrl || publicIp` to `getGatewayIp` swap, so community status checks no longer short-circuit to "disconnected".
Fixed`api/src/vm/vm.service.ts` `waitForWhatsAppLogin` (non-local branch): community now resolves to `wireguardIp`. Kept ec2 on `publicIp` only (the `web.login.wait` RPC holds the WS for ~130s and Cloudflare tunnel idle timeout would kill it earlier).
Fixed`api/src/vm/vm.service.ts` `getInstanceWhatsappAllowlist`: added community branch that does gateway RPC `config.get` and reads `channels.whatsapp.allowFrom` from the returned config tree. EC2 keeps the existing `sudo -u ubuntu openclaw config get ... | SSM` path.
Fixed`api/src/vm/vm.service.ts` `setInstanceWhatsappAllowlist`: added community branch that does `config.get` (for `baseHash`) then `config.patch` with the same five canonical keys (`channels.whatsapp.{allowFrom,dmPolicy}` plus `accounts.default.{allowFrom,dmPolicy,selfChatMode}`). Pure channel-config patch with no plugin-entries delta, so the gateway picks up the new allowlist on the next channel-init cycle without restarting an in-flight pairing session.
Fixed`api/src/vm/vm.service.ts` `writeChannelBinding`: added community branch that reads existing bindings via `config.get`, replaces the matching entry, and writes the full array back via `config.patch`. Same dedup/skip-if-unchanged behaviour as the SSM path.
v1.26.22026-05-15
  • Collapsed sidebar icons now match the expanded sidebar 1:1. Previously the collapsed nav flattened the Control / Observability / Workspace groups into all of their child icons (Policies, Secrets, Terminal, Logs, Metrics, Evaluations, Team, Settings, Backups), none of which appear at the top level when expanded, so the icons you saw while collapsed had no relationship to what you were looking at a moment earlier. Now the collapsed view shows the three group icons; clicking one opens the relevant group and expands the sidebar.
Fixed`ui/components/sidebar.tsx`: `CollapsedNav` no longer flattens groups. It renders each group as a single button using the group icon, with its active state derived from `groupContainsActive`. Clicking a group icon writes `tropic:sidebar:group:{id}:open=1` to localStorage and calls `setCollapsed(false)`, so when `ExpandedNav` mounts the corresponding `SidebarGroup` reads the fresh value and renders open.
v1.26.12026-05-15
  • Scheduler skill follow-ups after v1.26.0 smoke testing. Three real fixes: (1) the shipped CLI POSTed to `/scheduled-events` but the Tropic API has `setGlobalPrefix('api')`, so every call 404'd — caught when testing against the deployed API. CLI now hits `/api/scheduled-events`. (2) Agent resolution was instance-blind: an org with two VMs whose agents share a slug (e.g. both `main`) saw `findFirst({slug, org})` arbitrarily pick one. Now scopes to `{slug, org, instance}` using the `req.instance.id` already attached by `InstanceTokenGuard`. (3) `marketplace.installSkillOnVm` was SSM-only, so the scheduler (and any other skill) couldn't be pushed to community containers. New `installSkillOnCommunityInstance` branch routes through `InstanceExecService` (SSH-via-host); the install loop in `installSkill` and the backfill script now both pick up community. The community image also bakes the scheduler binary at build time since the runtime user has no sudo to write `/usr/local/bin/`.
Fixed`api/src/marketplace/seeds/scheduler.md`: all 5 endpoint paths in the embedded Python CLI now use the `/api/` prefix.
Fixed`api/src/scheduled-events/scheduled-events.service.ts`: `resolveAgent` now takes `instanceId` and filters `{slug, organizationId, instanceId}`. Threaded through `create/listForAgent/getOne/update/cancel`; the controller passes `req.instance.id`. Spec test added covering the cross-instance refusal.
Added`api/src/marketplace/marketplace.service.ts:installSkillOnCommunityInstance` — mirrors `installSkillOnVm` but uses `tropic`-user paths under `/home/tropic`, runs without `sudo`, and restarts the gateway via the in-process CLI (`openclaw gateway restart`). Routed through `InstanceExecService.runShell`. `installSkill`'s candidate query and dispatch loop now include community. Wire `InstanceExecModule` into `MarketplaceModule`.
Added`api/docker/scheduler-cli.py` — standalone scheduler CLI source. The Dockerfile COPYs it to `/usr/local/bin/scheduler` (chmod 0755) before the sudo-purge step so new community containers ship with the binary. EC2 path still uses the marketplace seed's heredoc.
Changed`api/scripts/backfill-scheduler-skill.ts`: queries also pick up running community instances and dispatch to `installSkillOnCommunityInstance`.
v1.26.02026-05-14
  • New Scheduler skill — the only sanctioned way for OpenClaw agents to schedule future agent runs. Calendar is org-central in Tropic, fired by a Tropic-side @Cron tick worker that creates AgentTasks with `triggerSource=scheduled`. On-VM Python CLI (`scheduler add/list/show/cancel/update`) authenticates via `TROPIC_TELEMETRY_TOKEN` and is auto-installed on every new VM. RFC 5545 RRULE recurrence including DST-aware weekly/daily patterns. Catch-up policy `once|skip` for VMs that were stopped through an occurrence. Organization timezone is now a first-class setting (default Asia/Singapore; new orgs auto-capture the creator's browser timezone). Sondera now also denies runtime writes to systemd-user units and cron drop-dirs as defence-in-depth.
Added`api/src/scheduled-events/` — full NestJS module: ScheduledEventsService (CRUD with org+agent scoping), SchedulerTickService (`@Cron(EVERY_MINUTE)`, `pg_try_advisory_lock` against multi-instance double-fire), REST controller (5 endpoints), InstanceTokenGuard (Bearer = `ManagedInstance.telemetryToken`), DTOs with class-validator, `rrule-helpers.ts` with floating-rrule DST workaround for rrule v2.x TZID quirks. 20 unit tests.
Added`ScheduledEvent` Prisma model + migration; `Organization.timezone` (default Asia/Singapore, backfilled for existing orgs); `AgentTask.triggerSource` (`"user" | "scheduled"`).
Added`api/src/marketplace/seeds/scheduler.md` — marketplace seed bundling the on-VM Python CLI. Pre-installed at AMI build (`api/packer/scripts/install.sh` now uses `openclaw skills install`, replacing the `clawhub install` path).
Added`api/src/vm/sondera-plugin/policy-sondera-base.cedar`: new rules `sondera-deny-systemd-user-write` and `sondera-deny-cron-dir-write` blocking writes to `~/.config/systemd/`, `$HOME/.config/systemd/`, `/home/*/.config/systemd/`, `/etc/cron.{d,daily,hourly,weekly,monthly}/`, and `/etc/crontab` itself.
Added`api/src/settings/`: `GET/POST /settings/org/timezone` (owner-only). UI: Organization Timezone editor in the dashboard settings page.
Added`api/src/auth/auth.service.ts:ensurePersonalOrg` now threads the browser-captured IANA timezone from `auth-token-provider.tsx` into new Organization rows, and backfills pre-existing orgs still on the default.
Added`api/scripts/backfill-scheduler-skill.ts` — one-off backfill that pushes the scheduler skill to every running EC2 / online local instance via SSM.
Changed`MarketplaceService.installSkillOnVm` made public so the backfill script can call it.
v1.25.502026-05-14
  • OpenAI OAuth on community: tokens were being pushed to `~/.openclaw/workspace/auth-profiles.json` but NOT to `~/.openclaw/agents/main/agent/auth-profiles.json` (which is where the gateway actually reads them for the default agent). The OAuth-complete merge walks `agents/*/agent` and only writes to existing dirs — on a fresh community container, the gateway hadn't created `agents/main/agent` yet at merge time (lazy init on first agent invocation), so it was skipped. Chat then failed with `No API key found for provider "openai-codex"`. Entrypoint now pre-creates the dir AND mirrors any existing workspace profiles into it on every container start — fixes both new provisions and the user's currently-stuck one.
Fixed`api/docker/openclaw-entrypoint.sh`: `install -d -o tropic -g tropic ~/.openclaw/agents/main/agent` before the gateway starts. Future OAuth-complete merges will see the dir and write to it.
Fixed`api/docker/openclaw-entrypoint.sh`: if `workspace/auth-profiles.json` exists and `agents/main/agent/auth-profiles.json` does not, copy the former to the latter. Recovers existing instances stuck with tokens-in-workspace-only without re-doing the OAuth dance.
v1.25.492026-05-14
  • `unauthorized: gateway token mismatch` on community: the Tropic API generates a fresh per-instance `gatewayToken` and persists it to `ManagedInstance.gatewayToken`, but the container entrypoint was never writing that value into `openclaw.json` — the gateway booted with the static token baked into the Dockerfile at image build time (same for every container), so every proxied request failed auth. Fix is the same container-env pattern used for `allowedOrigins`: render `gateway.auth.token` from `$GATEWAY_TOKEN` (already passed via `docker run -e`) before the gateway starts.
Fixed`api/docker/openclaw-entrypoint.sh`: jq-renders `.gateway.auth.token = env.GATEWAY_TOKEN` (with fallback to existing value if unset) alongside the bind + allowedOrigins setup. Gateway boots with the same token the API will proxy with.
Fixed`api/src/community/scripts/tear-down-container.sh`: `wg set wg0 peer X remove` now tolerates an already-removed peer (`|| true`). Under `set -e`, the bare command was aborting the script BEFORE the `docker rm` ran when the peer was already gone — leaving an orphan container squatting on the SSH host port (22001/22002), so the next provision hit "port already allocated". Refresh scripts on the host to pick this up.
v1.25.482026-05-14
  • Per-instance `allowedOrigins` is now rendered by the container entrypoint from `$TROPIC_SUBDOMAIN` (passed via `docker run -e`) instead of being patched into a running container over SSH. Containers boot deterministic. No mid-flight `openclaw config set` (which was silently failing for community anyway — the JSON arg with embedded `"`s broke the outer `bash -lc "…"` quoting, so the `<subdomain>.vm.tropic.bot` origin never landed in openclaw.json). The previous SSH-and-mutate resync is gone for allowedOrigins.
Fixed`api/docker/openclaw-entrypoint.sh`: when `TROPIC_SUBDOMAIN` is set, jq-renders `gateway.controlUi.allowedOrigins = ["https://tropic.bot", "https://www.tropic.bot", "https://<sub>.vm.tropic.bot"]` into the volume's openclaw.json BEFORE exec'ing the gateway. Idempotent on every container start.
Changed`api/src/community/scripts/bring-up-container.sh`: takes a new positional arg (subdomain) and passes it as `docker run -e TROPIC_SUBDOMAIN`. `restartWithLatest` re-passes it too, so restarts always reconverge.
Changed`CommunityInstanceService.applyFirstStartConfig`: dropped the allowedOrigins branch entirely. Only model + secret sync remain — that's what the SSH path is actually good at.
v1.25.472026-05-14
  • Actually fixes the gateway crash loop. v1.25.46 was wrong: I assumed the `bind=lan` setting belonged under `gateway.controlUi.bind` (which is what v1.25.42 added) and "fixed" it by dropping the json key entirely, leaving only `--bind lan` on the CLI. But openclaw schema-validates the json at startup BEFORE applying CLI flags, so any existing volume with `gateway.controlUi.bind` in its openclaw.json still crashes the gateway with "Unrecognized key: bind" before the CLI flag is even read. Checked openclaw's own source: the canonical key is top-level `gateway.bind` (same as EC2's `api/packer/scripts/lib/install-tropic-plugins.sh:78`). Dockerfile now writes the right key, and the entrypoint runs a one-shot jq migration on every start to strip the wrong key + assert the right one — fixes both fresh-pulled and existing-volume containers.
Fixed`api/docker/openclaw.Dockerfile`: change the jq mutation from `.gateway.controlUi.bind = "lan"` to `.gateway.bind = "lan"`. Top-level, NOT under `controlUi` — matches what `openclaw` actually accepts.
Fixed`api/docker/openclaw-entrypoint.sh`: run `jq 'del(.gateway.controlUi.bind) | .gateway.bind = "lan"'` on every start. Idempotent migration that repairs existing volumes left in the broken state by v1.25.42-v1.25.46 — without this, `restartWithLatest` pulls the new image but the volume's bad openclaw.json keeps crashing the gateway.
v1.25.462026-05-14
  • Fresh-pulled Community containers were stuck in a crash loop (`Restarting (1)`) on the new ECR image. The openclaw gateway rejected `gateway.controlUi.bind = "lan"` in `openclaw.json` ("Unrecognized key: bind") and exited every ~2s, so port 22001 was always ECONNREFUSED, applyFirstStartConfig couldn't reach the container, and Connect OpenAI / Set Model / dashboard polling all timed out. EC2's systemd unit uses `--bind lan` as a CLI flag on `openclaw gateway run`, not a JSON key — the container was using the wrong knob for the same setting. Removed the JSON mutation and moved the flag onto the entrypoint, matching EC2.
Fixed`api/docker/openclaw-entrypoint.sh`: pass `--bind lan` to `openclaw gateway run`. The gateway now binds to 0.0.0.0 (the docker bridge IP, reachable via host-side DNAT from wg0) as intended, instead of crashing at startup.
Changed`api/docker/openclaw.Dockerfile`: drop the `.gateway.controlUi.bind = "lan"` jq mutation. That key is rejected by openclaw 2026.5.6 with `startup_failed`; the `--bind` CLI flag is the supported API.
Changed`runDiagnostics` admin endpoint: dump `docker logs --tail 60` for every `tropic-*` container before the network probes, and probe the actual `wg0` allowed-ips instead of hardcoded `10.99.0.19/20`. Cuts the "restarting container" debug loop from blind to one click.
v1.25.452026-05-14
  • OpenClaw container image migrated from GHCR to AWS ECR (private). v1.25.44's `docker logout ghcr.io` fix was wrong — the package is private (intentionally, it bakes in admin pubkey + plugin source), so anon pulls 401. GHCR's authenticated-pull egress quota is also too tight for our rebuild cadence even at normal volume, so any "just rotate the token" patch would re-break under load. Moved to ECR in `ap-southeast-1` (same region as the Asia Community host). Scoped IAM users for pull (Community hosts) and push (GitHub Actions); the `amazon-ecr-credential-helper` on each host fetches 12-hour ECR tokens transparently.
FixedProvisioning now succeeds again. Reverts v1.25.44's `docker logout ghcr.io` (which forced anon pulls and 401'd on the private package) — `bring-up-container.sh` is back to a plain `docker pull "$IMAGE_REF"`, but the image is now pulled from `721731392376.dkr.ecr.ap-southeast-1.amazonaws.com/tropic-openclaw:latest`.
Changed`.github/workflows/build-openclaw-image.yml`: replaced GHCR login + push with `aws-actions/configure-aws-credentials@v4` + `aws-actions/amazon-ecr-login@v2`. Uses the scoped `tropic-ecr-push` IAM user (push-only access to `tropic-openclaw`). GHCR_WRITE_TOKEN no longer used.
Changed`host-bring-up.sh`: drop `docker login ghcr.io`. Install `amazon-ecr-credential-helper`, write `/root/.docker/config.json` with `credHelpers` for the ECR registry, and `/root/.aws/credentials` + `/root/.aws/config` with the scoped pull-IAM creds. Helper fetches a fresh 12-hour ECR auth token on each `docker pull` automatically — no persistent `docker login`, no manual rotation.
Changed`community-host.service.ts:bootstrapHost`: exports `AWS_ECR_PULL_ACCESS_KEY_ID`, `AWS_ECR_PULL_SECRET_ACCESS_KEY`, `AWS_ECR_REGION`, `AWS_ECR_REGISTRY` instead of `GHCR_PULL_USER`/`GHCR_PULL_TOKEN`. Spec updated.
ChangedImage-ref env var: prefer `OPENCLAW_IMAGE_REF` (registry-neutral); fall back to legacy `GHCR_OPENCLAW_IMAGE` for the cutover. Image-ref resolution centralized in `resolveImageRef()` / `resolveOpenclawImageRef()`.
AddedAdmin endpoint `POST /community/instances/hosts/:id/setup-ecr` — installs the ECR credential helper and writes AWS creds on an already-bootstrapped host (without re-running full bootstrap, which would clobber WireGuard state). Idempotent. Smoke-tests by invoking `docker-credential-ecr-login get` on the registry URL. Used to migrate the existing Asia host onto ECR.
AddedAWS resources: ECR repo `tropic-openclaw` in `ap-southeast-1` with `scanOnPush=true` and a lifecycle policy keeping the latest 30 images. Two scoped IAM users: `tropic-ecr-pull` (read-only on `tropic-openclaw`, used by Community hosts), `tropic-ecr-push` (push-only on `tropic-openclaw`, used by GitHub Actions).
v1.25.442026-05-14
  • New provisions failed with `503 Egress is over the account limit` from GHCR. GitHub Container Registry counts *authenticated* pulls against the GitHub account's package egress quota — even for public packages. The host has been logged into ghcr.io since bootstrap (via `GHCR_PULL_TOKEN`, set up for the case of a future private image), so every `docker pull` in `bring-up-container.sh` was authenticated and burned the Priminus account's budget. With the rapid release cadence the past few days, we burned through the quota.
Fixed`api/src/community/scripts/bring-up-container.sh`: insert `docker logout ghcr.io 2>/dev/null || true` before `docker pull "$IMAGE_REF"`. Forces an anonymous pull — public-package quota isn't charged against any account. Idempotent (the logout is `|| true` so it works whether or not the host is logged in).
v1.25.432026-05-14
  • "Upgrade OpenClaw" now works for community instances too. The existing button on the agent card (which had been EC2/local-only and called `openclaw update` in place via SSM) now dispatches by `instance.type` at the controller — community routes to `restartWithLatest`, which pulls `ghcr.io/priminus/openclaw:latest`, recreates the container, and reattaches the same persistent volume. State on `/home/tropic` (openclaw.json, agents, authorized_keys, etc.) survives untouched. `applyFirstStartConfig` runs after so the new image's defaults get the per-instance subdomain re-applied. About 30s gateway downtime — no SSH key rotation, no data loss.
Changed`api/src/vm/vm.controller.ts:upgradeOpenClaw`: looks up `instance.type`, dispatches community → `CommunityInstanceService.restartWithLatest`, others → existing `VmService.upgradeOpenClaw` (SSM `openclaw update --no-onboard`). VmController now injects `CommunityInstanceService` and `PrismaService`.
ChangedConfirmed `openclaw update` exists as the in-place upgrade path for EC2/local (handles git+npm install, doctor, gateway restart, has `--dry-run`, `--channel`, `--tag` options). For community the in-place CLI would be wiped on the next container restart (immutable image), so image-pull is the correct upgrade primitive.
v1.25.422026-05-14
  • New community containers now ship with the right `gateway.controlUi.allowedOrigins` — no more "origin not allowed" when the proxied iframe loads. Fixed at two levels so new containers don't have this problem regardless of which path they boot through: the static `https://tropic.bot` + `https://www.tropic.bot` origins are baked into the image, and `applyFirstStartConfig` appends the per-instance `<subdomain>.vm.tropic.bot` on first start (also runs from `restartWithLatest` so config drift on an existing container is fixed by a restart).
Changed`api/docker/openclaw.Dockerfile`: the base `jq` that sets up `openclaw.json` now also seeds `.gateway.controlUi.allowedOrigins = ["https://tropic.bot", "https://www.tropic.bot"]` and forces `.gateway.controlUi.bind = "lan"` so the gateway listens on the container's docker bridge IP (reachable via the host's WG DNAT) instead of 127.0.0.1.
Changed`api/src/community/community-instance.service.ts:applyFirstStartConfig`: new step that runs `openclaw config set gateway.controlUi.allowedOrigins '[...]' --strict-json` with both the static dashboard origins and the per-instance `https://<subdomain>.vm.tropic.bot`. Runs before the model-set + secret-sync steps so the gateway restart at the end picks up everything in one cycle. Mirrors the EC2 path (`vm.service.ts:5063`).
Changed`api/src/community/community-instance.service.ts:restartWithLatest`: also calls `applyFirstStartConfig` after the new container is up so a `Restart with latest` re-applies the config (lets existing instances pick up image/policy changes without losing the volume).
v1.25.412026-05-14
  • `/gateway` polls `/openclaw/gateway/health` (a *different* endpoint from `/gateway-status` I added the community branch to in v1.25.34). `checkGatewayHealth` only had `local` and `ec2` branches; for community it fell through to the EC2 `!instance.instanceId` check (community has no SSM ID) and returned `{ ready: false, reason: 'VM not running' }` immediately — never even probed the gateway. That's why the page polled forever with the spinner. Conntrack confirmed the underlying WG path works (`TIME_WAIT [ASSURED]` entry from `src=10.100.0.1 dst=10.99.0.20 dport=18789` proved a full HTTP cycle had completed end-to-end).
Fixed`api/src/vm/vm.service.ts:checkGatewayHealth`: new `instance.type === 'community'` branch that probes `http://${wireguardIp}:18789/` with a 5s timeout, matches the existing `local` branch.
v1.25.402026-05-14
  • Diagnose v3: `DOCKER-USER -i wg0` counter incremented (9 pkts) confirming Fly→host traffic arrives at the host's wg0 interface. But Fly's curl still times out. Either the request reaches the container but the reply is dropped, or conntrack/MASQUERADE is rewriting the reply's source so Fly drops it as unsolicited. New diagnose dumps `conntrack -L` filtered to the container/Fly/WG IPs, `rp_filter` settings, and tcpdumps both `wg0` and `docker0` for 20s on port 18789 so we can see the SYN going in and (hopefully) the SYN-ACK coming back — and check whether the response source matches what Fly expects.
Changed`api/src/community/community-host.service.ts:runDiagnostics`: install `conntrack` if missing, dump entries touching 172.17.0.2 / 10.99.0.x / 10.100.0.x. Dump `net.ipv4.conf.{wg0,all,docker0}.rp_filter`. Capture 20s of port-18789 traffic on both wg0 and docker0 into separate pcaps and dump both.
v1.25.392026-05-14
  • Diagnose now runs `tcpdump -i wg0` for 8s as the final step. The v1.25.38 output confirmed the `DOCKER-USER -i wg0 -j ACCEPT` rule is in place but counter=0 — no wg0 traffic has hit it. Either Fly isn't sending the packet, or Fly is sending it but the encrypted UDP is dropped between Fly and the Community host. tcpdump on wg0 will show definitively whether anything arrives — Fly's gateway-status poller fires every ~2s while `/gateway` is open, so 8s captures multiple attempts if any are flowing.
Changed`api/src/community/community-host.service.ts:runDiagnostics`: installs `tcpdump` if missing, captures 8s of wg0 traffic to a pcap, parses and prints up to 30 packets at the end of the diagnose output. Need `/gateway` open with Community VM selected (or any other source of traffic to `10.99.0.x`) to see captured frames.
v1.25.382026-05-14
  • Diagnose now dumps the `DOCKER-USER` + `DOCKER-FORWARD` chain contents and probes the gateway from the host on both the bridge IP AND the WG IP (`10.99.0.x:18789`). The first two confirm whether the v1.25.37 `DOCKER-USER -i wg0 -j ACCEPT` rule is actually in place; the WG-IP probe tests whether PREROUTING DNAT works from the host's own perspective (different code path than from Fly across the WG tunnel).
Changed`api/src/community/community-host.service.ts:runDiagnostics`: added `iptables -L DOCKER-USER -n -v`, `iptables -L DOCKER-FORWARD -n -v`, and a curl probe to `http://10.99.0.19:18789/health` / `http://10.99.0.20:18789/health` from the host itself.
v1.25.372026-05-14
  • Docker 26+'s `DOCKER-FORWARD` chain was silently dropping WG→container packets. The full chain looked right — WG handshake, kernel routes, `PREROUTING -d 10.99.0.20/32 -j DNAT --to-destination 172.17.0.2`, `net.ipv4.ip_forward=1`, the per-container `FORWARD … -d 172.17.0.2 -j ACCEPT` rules in place — but the FORWARD chain hit `DOCKER-FORWARD` first. Docker 26 added hardening rules there that drop traffic from non-bridge interfaces to *unpublished* container ports. Port 18789 is intentionally not published (`-p 22001:22` only publishes SSH), so packets arriving via `wg0` after DNAT got dropped before our ACCEPTs ever saw them. Insert an explicit `DOCKER-USER -i wg0 -j ACCEPT` (which runs *before* `DOCKER-FORWARD`) to bypass that filtering for the WG path.
Fixed`api/src/community/scripts/host-bring-up.sh`: after WG peering, idempotently insert `iptables -I DOCKER-USER -i wg0 -j ACCEPT` and `iptables-save > /etc/iptables/rules.v4`. New hosts will get this from the start.
Fixed`api/src/community/community-host.service.ts:wireFlyPeer`: same iptables rule added to the host-side commands, so clicking `Wire Fly peer` on the admin panel patches existing hosts without needing a re-bootstrap.
ChangedDiagnose output from v1.25.36 confirmed: gateway was reachable at `curl http://172.17.0.2:18789/health` (200 in 12ms) directly from the host, but 530 forwarded packets accumulated on DOCKER-FORWARD with 0 pkts on the per-container ACCEPT rules. Pinpointed the layer that needed bypassing.
v1.25.362026-05-14
  • Diagnose button on the Reserved Hosts admin panel. Fly side of the WG plumbing is now confirmed working (peer + handshake + kernel route all present, traffic counters incrementing) but the container at 10.99.0.20:18789 still times out — somewhere on the host side the chain is broken (UFW FORWARD policy, iptables PREROUTING, container not listening on 0.0.0.0:18789, missing route, etc). Click Diagnose to dump WG state + ip route + iptables nat-rules + docker container/bridge + a localhost gateway health probe from inside the host so we can see exactly which hop is dropping packets without round-tripping a Fly SSH session.
Added`api/src/community/community-host.service.ts:runDiagnostics`: runs a fixed shell script over the host's SSH and returns the captured output. Reports `wg show wg0`, `ip route`, `iptables -t nat -S`, `iptables -L FORWARD -n -v`, `sysctl net.ipv4.ip_forward`, `docker ps`, each container's docker bridge IP, and `curl -m 3 http://<bridge>:18789/health` from the host.
Added`GET /community/instances/hosts/:id/diagnostics` (superadmin) + `useHostDiagnostics` hook + Diagnose button in the Reserved Hosts panel that renders the output in a `<pre>` block.
v1.25.352026-05-14
  • WG tunnel was up but no traffic flowed. `wg show wg0` on Fly showed `peer 217.216.74.174:51820, latest handshake 1m ago, transfer: 98 KB received, 32 KB sent` — fully handshaked. But `curl 10.99.0.20:18789/health` from Fly timed out. Cause: `wg set ... allowed-ips=10.99.0.0/16` only tells WireGuard *which peer to encrypt for*; it does NOT add a kernel route. Only `wg-quick up` does, and Fly + the host both use bare `wg set`. Without `ip route add 10.99.0.0/16 dev wg0` on Fly (and `10.100.0.0/16 dev wg0` on the host) the kernel sends packets out the default gateway and they get dropped.
Fixed`api/src/wireguard/wireguard.service.ts:addCommunityHostPeer`: after `wg set peer …`, also `ip route add 10.99.0.0/16 dev wg0`. Idempotent (treats "File exists" as success). Runs at every app boot via `loadExistingPeers`, so a redeploy of the API will backfill the route on Fly without needing a separate admin action.
Fixed`api/src/community/community-host.service.ts:wireFlyPeer` and `api/src/community/scripts/host-bring-up.sh`: after `wg set` on the host, also `ip route add 10.100.0.0/16 dev wg0`. Without this, the return path from container → host → Fly was missing — even forward traffic worked the response would never reach Fly because the host had no route for 10.100.0.x.
v1.25.342026-05-14
  • `getGatewayStatus` always returned `stopped` for community. Two reasons: line 4330 short-circuits to `stopped` when `!instance.instanceId` (community has no SSM ID), and the HTTP health-check hits `${publicIp}:80/health` which is empty for community (the gateway is reached via `wireguardIp:18789` through the WG tunnel now that v1.25.32 wired peering). `/gateway` page polls this forever and shows "Waiting for gateway to be ready..." indefinitely.
Added`api/src/vm/vm.service.ts:getGatewayStatus`: new branch for `instance.type === 'community'` that checks `instance.status === 'running' && instance.wireguardIp` then calls `checkGatewayViaHttp(wireguardIp, ..., 18789)`. Verified locally: `curl http://localhost:18789/health` returns 200 against the community image.
Changed`api/src/vm/vm.service.ts:checkGatewayViaHttp`: added optional `port` parameter (default 80) so the community branch can target the gateway directly on its native port — EC2 path still hits :80 via nginx as before.
v1.25.332026-05-13
  • Proxy `isAvailable` / `isOnline` checks didn't know about community. Both spots fell through to `instance.vmStatus === 'running'` which is null for community rows, so `/gateway` for a Community VM bailed with `Instance is null, not available` and the browser surfaced `Pre-authentication failed. Please refresh the page`.
Fixed`api/src/proxy/proxy.service.ts:preAuthenticate` and `proxyDirectly`: per-type status check — `ec2 → vmStatus === "running"`, `local → status === "online"`, `community → status === "running"`. Community branch was missing in both spots.
v1.25.322026-05-13
  • Wired the Fly↔Community-host WireGuard peering that was missing from the original architecture. Both sides ran WireGuard (`wg0` on Fly at 10.100.0.1/16, `wg0` on each Community host at 10.99.255.1/24 listening on :51820) but no `[Peer]` block ever connected them — packets from Fly to 10.99.0.0/16 had nowhere to go. With this in place: `/gateway` now lists community VMs, the gateway proxy routes Fly → host (WG) → iptables DNAT → container:18789, and `setInstanceModel` / OAuth-push that go via `instanceExec.runShell` get the proper WG route too (no longer needing the v1.25.25 public-IP workaround).
Added`api/src/wireguard/wireguard.service.ts:addCommunityHostPeer(pubkey, endpoint)`: registers a Community host on Fly's wg0 with `allowed-ips=10.99.0.0/16`, `endpoint=host:port`, `persistent-keepalive=25`. Distinct from `addPeer` which is for single-host /32 peers (EC2/local).
Changed`api/src/wireguard/wireguard.service.ts:loadExistingPeers`: at app boot, also loads Community hosts from `communityHost` table as /16 peers. Skips community `managedInstance` and `wireguard_peers` rows (container WG pubkeys were being incorrectly registered as black-hole /32 peers — the container doesn't run WireGuard itself; routing goes through the host).
Changed`api/src/community/community-host.service.ts:bootstrapHost`: now passes `FLY_WG_PUBLIC_KEY` (from `WIREGUARD_SERVER_PUBLIC_KEY`) to the host script and calls `wireguard.addCommunityHostPeer(...)` after bootstrap to register the new host on Fly's wg0 immediately.
Changed`api/src/community/scripts/host-bring-up.sh`: accepts `FLY_WG_PUBLIC_KEY` env var and after `wg-quick up wg0` does `wg set wg0 peer "$FLY_WG_PUBLIC_KEY" allowed-ips 10.100.0.0/16 persistent-keepalive 25 && wg-quick save wg0`. Skipped silently when empty (legacy bootstraps).
Added`api/src/community/community-host.service.ts:wireFlyPeer(hostId)` + `POST /community/instances/hosts/:id/wire-fly` (superadmin) + `Wire Fly peer` button on the Reserved Hosts panel. One-shot to backfill the peering on hosts bootstrapped before v1.25.32.
Changed`api/src/proxy/proxy.service.ts`: new `instance.type === 'community'` branch that uses `wireguardIp:18789`. Same path as local now that WG actually routes there.
Changed`ui/app/(dashboard)/gateway/page.tsx:isConnectable` and `instanceLabel` now accept community (status=running, tag = "Community"). `/gateway` dropdown shows Community VMs alongside Cloud and Local.
v1.25.312026-05-13
  • The amber "No API key configured" banner on the agent card was firing for community instances with copy that mentioned "Local instances require your own Anthropic API key" — wrong on two counts. Hidden the banner for community and tightened the self-hosted copy.
Fixed`ui/app/(dashboard)/agents/page.tsx:1390`: banner gate is now `!isEc2 && !isCommunity && !isExternal && !hasClaudeKey`. Community auto-pushes whichever org-level secrets exist via `applyFirstStartConfig` (v1.25.24) and the active model might not even be Anthropic; naming Anthropic specifically there was misleading. Self-hosted copy reworded to "Self-hosted instances" instead of "Local instances".
v1.25.302026-05-13
  • "Connect OpenAI" on a community instance failed with `Failed to start OAuth`. v1.25.24 relaxed `completeOpenAIOAuthInner` to accept community but `startOpenAIOAuth` still had the original `if (!instance.instanceId)` guard — so the *start* of the flow 400'd before any PKCE state was ever generated. Same fix pattern: accept community when `wireguardIp + containerName` are set.
Fixed`api/src/vm/vm.service.ts:startOpenAIOAuth`: replaced the SSM-only `!instance.instanceId` check with the full transport guard (EC2 instanceId OR Android runner OR community wireguardIp+containerName). Mirrors the v1.25.23 `setInstanceModel` and v1.25.24 `completeOpenAIOAuthInner` fixes — the trio of OAuth-touching endpoints now consistently accept community.
v1.25.292026-05-13
  • New `Clean orphans` admin button on the Reserved Hosts panel. The old pre-v1.25.28 terminate flow marked instance rows as `terminated` but never ran `docker rm` on the container or `docker volume rm` on its data volume — those orphans kept holding host ports 22001/22002 so subsequent provisions failed with `Bind for 0.0.0.0:22001 failed: port is already allocated`. The button SSHes to the host, lists every `tropic-*` container and `tropic-*-data` volume, cross-references with DB rows in pending/running/stopped, and removes anything that isn't backed by an active row.
Added`api/src/community/community-host.service.ts:cleanOrphanContainers`: lists `docker ps -a` and `docker volume ls` for the `tropic-*` namespace over SSH, intersects with `ManagedInstance` rows where `communityHostId=hostId AND status IN (pending,running,stopped)`, and `docker rm -f` / `docker volume rm` the leftovers. Returns `{ removed: string[], kept: string[] }`.
Added`POST /community/instances/hosts/:id/clean-orphans` (superadmin) → `useCleanOrphans` hook → `Clean orphans` button in `reserved-hosts-panel.tsx` next to `Refresh scripts`. Toasts `Removed N orphan(s) (M active kept)`.
v1.25.282026-05-13
  • Remove-instance on a community card was showing the self-hosted SSM/cloudflared cleanup commands — meaningless for a community container the user doesn't own. Worse, the backend `DELETE /instances/:userId/:id` only did SSM-deregister + Cloudflare-tunnel cleanup (neither relevant) and left the actual community container running, the slot occupied, and the Stripe quantity bumped. Now the user gets a plain "Terminate community instance?" confirmation, and the backend delegates to `CommunityInstanceService.terminateInstance` which actually stops the container, destroys the volume, frees the slot, releases the seat, and triggers the drain check.
Fixed`api/src/instances/instances.service.ts:remove`: detects `instance.type === 'community'` and delegates to `CommunityInstanceService.terminateInstance(id)` early. No SSM/Cloudflare/WG cleanup attempts on a community instance (none apply). `InstancesModule` now imports `CommunityModule` (with `forwardRef`) and `InstancesService` injects `CommunityInstanceService`.
Changed`ui/app/(dashboard)/agents/page.tsx:handleRemoveInstance`: for community, skip the rich SSM-cleanup dialog and open the simple `confirmAction` dialog with title "Terminate community instance?" and a clear "Cannot be undone" description. The cleanup-commands dialog is still used for self-hosted (`type === 'local'`) where the user does own the device.
Added`confirmAction` state now supports `title` and `confirmLabel` overrides so the dialog can match per-flow copy without each type needing its own dialog component.
v1.25.272026-05-13
  • "Restart with latest" is now admin-only. The label was opaque to end users, and the button was throwing 401 anyway (it used bare `fetch` with `credentials: "include"` — same SameSite cross-origin issue as the admin-community hooks before v1.25.12). Removed it from the user-facing instance card and gated the endpoint with `@SuperAdmin()`.
Removed`ui/components/community/restart-with-latest-button.tsx` and its render site at `agents/page.tsx:2113`.
Changed`api/src/community/community.controller.ts:restartWithLatest` now requires `@SuperAdmin()`. It's an internal operation for landing bring-up-script changes (e.g. new `docker run` flags) on an existing instance — caps are fixed at create time so this is the only way to apply a new cap set without losing the volume.
v1.25.262026-05-13
  • sshd inside the community container was killing every connection right after the banner with no visible client-side reason ("Connection closed by … port …"). Repro'd locally with the same docker run flags: server-side stderr showed `chroot("/run/sshd"): Operation not permitted [preauth]`. `--cap-drop=ALL` strips `CAP_SYS_CHROOT`, and sshd's preauth privilege-separation child *requires* it to chroot into `/run/sshd`. Adding `--cap-add=SYS_CHROOT` to the bring-up's `docker run` makes every SSH-driven operation against the container actually work — `setInstanceModel`, `syncAllToInstance`, OAuth token push, end-user SSH, all of them.
Fixed`api/src/community/scripts/bring-up-container.sh`: added `--cap-add=SYS_CHROOT` to the cap-add list. Verified locally: `docker run --cap-drop=ALL --cap-add=… --cap-add=SYS_CHROOT … openclaw:latest`, then `ssh -i admin.pem -p 22099 tropic@127.0.0.1 whoami` → `tropic`. Without the cap the same command got `Connection closed by 127.0.0.1 port 22099` immediately after banner exchange.
ChangedAfter `Refresh scripts` on the host, the existing running container is still broken — capabilities are fixed at `docker create` time. Click `Restart with latest` on the instance card (or terminate + reprovision) to recreate the container with the new cap set.
v1.25.252026-05-13
  • Community model-change actually finishes now. `CommunityExecService.exec` was SSH-ing to the *container's* WireGuard IP (e.g. `10.99.0.19:22`), but Fly's API process has no WireGuard client on that subnet — connections hung indefinitely. `setInstanceModel` logged `start` and never logged `runShell returned`. Switched to reach the container via the host's public IP + the container's published SSH port mapping (`docker run -p <port>:22`), which is exposed to the public internet and already works for end-user SSH. Auth still uses the build-time admin SSH key (its matching pubkey is image-baked at `/etc/tropic/admin_keys`).
Fixed`api/src/community/community-exec.service.ts:exec`: targets `<communityHost.publicIp>:<containerSshHostPort>` instead of `<wireguardIp>:22`. Looks up the host's public IP via the existing `communityHostId` FK so the caller doesn't need to know about it.
Fixed`api/src/community/community-exec.service.ts:runRaw`: lifted the v1.25.21 SSH hardening (8-min exec timeout, 10s TCP keepalive, settled guard) — without these a half-open connection from Fly to the Community host can still leave the model-change Promise hung.
Changed`api/src/instances/instance-exec.service.ts:ExecTarget`: added `containerSshHostPort` and `communityHostId` so the community branch can pass them through. The community missing-transport guard now requires all three (`containerName`, `containerSshHostPort`, `communityHostId`).
Changed`api/src/user-secrets/user-secrets.service.ts` (`syncAllToInstance` and `registerModelsOnInstance`) and `api/src/secrets/secrets.service.ts:syncSecretsToVm`: added `containerSshHostPort` + `communityHostId` to the Prisma select / signature so the new ExecTarget fields are populated when calling `InstanceExec.runShell`.
v1.25.242026-05-13
  • BYOK API key sync and OpenAI OAuth token push now work for community containers. Three services (`UserSecretsService.syncAllToInstance`, `UserSecretsService.registerModelsOnInstance`, `SecretsService.syncSecretsToVm`, `VmService.completeOpenAIOAuth`) had transport guards that only let through EC2 (`instanceId`) or Android (`runnerToken + tunnelUrl`); community got silently skipped. Plus `CommunityInstanceService.bringUpContainerAsync` now runs a first-start config step that pushes org secrets and applies the selected model so a freshly-provisioned container is usable end-to-end without any manual sync.
Fixed`api/src/secrets/secrets.service.ts:syncSecretsToVm`: extended `isRunning` to accept `type === "community" && status === "running" && wireguardIp && containerName`. InstanceExec already routes community via the WG IP; only the guard was blocking.
Fixed`api/src/user-secrets/user-secrets.service.ts` (both `syncAllToInstance` and `registerModelsOnInstance`): added `wireguardIp` + `containerName` to the Prisma select and relaxed the transport guard to accept community.
Fixed`api/src/vm/vm.service.ts:completeOpenAIOAuthInner`: relaxed the OAuth-push guard to accept community. The `mergeScript` is portable (reads `$OC_HOME` from the preamble), so the OAuth tokens land in `/home/tropic/.openclaw/agents/*/auth-profiles.json` correctly.
Added`api/src/community/community-instance.service.ts:applyFirstStartConfig`: new method called from `bringUpContainerAsync`'s success path. Runs `openclaw models set <selectedModel>` (no restart yet), then `userSecrets.syncAllToInstance(orgId, instanceId)` which pushes KEY=value lines to `/home/tropic/.openclaw/.env` and restarts the gateway. Best-effort — failure logs a warning but doesn't fail the bring-up.
Changed`api/src/community/community.module.ts`: imports `UserSecretsModule` + `InstanceExecModule`, provides `OcShellHelper`. `CommunityInstanceService` now injects `UserSecretsService`, `InstanceExecService`, `OcShellHelper`.
v1.25.232026-05-13
  • Community community instances now pick the same default model EC2 instances do (whatever the org has configured in settings / secrets) instead of always falling through to the Prisma column default `anthropic/claude-sonnet-4-6`. Switching models from the dashboard works for community too — the previous "Failed to change model" toast came from `setInstanceModel` rejecting any instance without an EC2 `instanceId` or Android runner channel.
Fixed`api/src/community/community-instance.service.ts:runProvisionSetup`: before the `managedInstance.create`, look up `orgSetting.openclawModel` and fall back to `pickBestModel(orgSecrets)` — same path EC2 uses at `instances.service.ts:119-130`. Sets `selectedModel` on the row and pre-flips `metadata.oauthConnected=false` when picking an `openai-codex/*` model, so the dashboard's "Connect OpenAI" banner shows up just like it does for Cloud.
Fixed`api/src/vm/vm.service.ts:setInstanceModel`: relaxed the transport guard to accept community (`isCommunity && wireguardIp && containerName` — `InstanceExecService.runShell` already routes community through the container's WG IP), and extended the `isRunning` check to include `instance.type === 'community' && status === 'running'`. Without these, every model-switch attempt threw 400 with `Instance has no transport to push the model change` / `Instance must be running to change model`.
v1.25.222026-05-13
  • Container was crash-looping under `--cap-drop=ALL --no-new-privileges` because root inside the container couldn't enter `/home/tropic/.openclaw` (mode 0700, owned by tropic) without `DAC_OVERRIDE`. The entrypoint's `[ -f .../openclaw.json ]` test failed (looked like the file didn't exist), the volume-init `cp -an /opt/tropic-defaults/. /home/tropic/` branch fired, then died with `cp: cannot stat '/home/tropic/.': Permission denied`, set -e killed entrypoint, --restart=unless-stopped → infinite loop. Bring-up's `docker cp` then reported `Container is restarting, wait until the container is running`. Image was fine all along — `--cap-drop=ALL` was over-restrictive.
Fixed`api/src/community/scripts/bring-up-container.sh`: keep `--cap-drop=ALL` but add back the minimum the entrypoint + sshd need: DAC_OVERRIDE (read/write tropic-owned paths), CHOWN/FOWNER (`chown -R tropic:tropic`), SETUID/SETGID (sshd privsep + `runuser`), NET_BIND_SERVICE (sshd port 22), AUDIT_WRITE (sshd PAM). Verified the new cap set boots cleanly: `docker run --cap-drop=ALL --cap-add=… ghcr.io/priminus/openclaw:latest` → `[gateway] ready` in 3-4s.
v1.25.212026-05-13
  • Hardened `runOverSsh` against silent SSH hangs. Last provision attempt sat at `status=pending` for 17+ minutes with no output, no failure reason, no audit log — the SSH exec channel had silently stalled and there was no exec timeout, no TCP keepalive, so the Promise never resolved. Added an 8-min exec timeout and 10s TCP keepalive (3 missed = dead).
Fixed`api/src/community/community-host.service.ts:runOverSsh` — wrap exec with a `setTimeout(execTimeoutMs)` (default 8 min). On fire, the timer ends the ssh client and rejects the Promise with `ssh exec timeout after Nms (last output: …)` so the failure path in `bringUpContainerAsync` marks the instance terminated instead of leaving it pending forever.
Fixed`api/src/community/community-host.service.ts:runOverSsh` — added `keepaliveInterval: 10_000, keepaliveCountMax: 3` to the `ssh2 client.connect()` call so half-open connections (the suspected cause of the 17-min hang) are detected within ~30s instead of OS-default hours.
FixedBoth `resolve` and `reject` paths now go through a `settled` guard that clears the timer, so we don't race the timeout against a real completion.
v1.25.202026-05-12
  • OpenClaw entrypoint pointed at `/usr/local/bin/openclaw` but `npm install -g openclaw` symlinks the binary to `/usr/bin/openclaw`. The container has been crash-looping on every start with `runuser: failed to execute /usr/local/bin/openclaw: No such file or directory` — earlier provisioning failures masked this because the bring-up died before the crash-loop became the visible failure. v1.25.19's `.ssh` fix finally exposed it: `docker cp` reported `Container ... is restarting, wait until the container is running`.
Fixed`api/docker/openclaw-entrypoint.sh`: `exec runuser -u tropic -- /usr/bin/openclaw gateway run --port 18789`. Verified by `docker run --entrypoint /bin/bash ghcr.io/priminus/openclaw:v1.25.19 -c "which openclaw"` → `/usr/bin/openclaw -> ../lib/node_modules/openclaw/openclaw.mjs`.
Fixed`api/docker/openclaw-entrypoint.sh`: defaults-copy check is now `[ ! -f /home/tropic/.openclaw/openclaw.json ]` — the previous `config.json` was a typo that meant the `cp -an` branch always ran (harmless since `-n` skips existing files but wasteful).
ChangedVerified `/home/tropic/.ssh/` with `mode 0700 owner tropic:tropic` is in `ghcr.io/priminus/openclaw:v1.25.19` via local image inspection. v1.25.19 Dockerfile change was correct; the bring-up failures past v1.25.19 were the entrypoint typo, not the .ssh fix.
v1.25.192026-05-12
  • Bake `/home/tropic/.ssh` directly into the OpenClaw image instead of patching the host bring-up. The v1.25.18 workaround (`docker exec install -d /home/tropic/.ssh` before the cp) belongs in the image, not the provisioning script — it ships with every container regardless of host. Reverted the script edit and pushed the directory into the Dockerfile so it lands in the volume the moment Docker initializes it.
Changed`api/docker/openclaw.Dockerfile`: new RUN that does `install -d -m 0700 -o tropic -g tropic /home/tropic/.ssh` and stages an empty `authorized_keys` with mode 0600. On `docker run -v vol:/home/tropic`, Docker initializes the empty named volume from the image's `/home/tropic`, so this directory ends up in the volume on first start — no race with the entrypoint's defaults-copy.
Changed`api/src/community/scripts/bring-up-container.sh`: reverted the v1.25.18 `docker exec install -d /home/tropic/.ssh` workaround. The directory is guaranteed to exist now.
ChangedTagging this release triggers `.github/workflows/build-openclaw-image.yml` which rebuilds `ghcr.io/priminus/openclaw:v1.25.19` and `:latest`. `CommunityInstanceService.bringUpContainerAsync` pulls `:latest` so the next provision picks the new image up.
v1.25.182026-05-12
  • Bring-up made it past iptables-save this time and died at `docker cp` with `Could not find the file /home/tropic/.ssh in container`. The OpenClaw image ships `/home/tropic` but not `/home/tropic/.ssh`. Create the directory inside the container with the correct mode/ownership before copying the authorized_keys.
Fixed`api/src/community/scripts/bring-up-container.sh`: insert `docker exec "$CONTAINER_NAME" install -d -m 0700 -o tropic -g tropic /home/tropic/.ssh` before `docker cp /tmp/userkey "${CONTAINER_NAME}:/home/tropic/.ssh/authorized_keys"`.
v1.25.172026-05-12
  • Bring-up made it through DNAT/SNAT/FORWARD this time (`BRIDGE_IP=172.17.0.2`, real IPv4) and finally failed at `iptables-save > /etc/iptables/rules.v4` — the directory doesn't exist on this Community base image (no `iptables-persistent` installed). One-line fix: `mkdir -p /etc/iptables` before the save.
Fixed`api/src/community/scripts/bring-up-container.sh`: `mkdir -p /etc/iptables` before `iptables-save > /etc/iptables/rules.v4`. The save was failing with `No such file or directory` on hosts that don't have iptables-persistent installed, killing the bring-up after the NAT/FORWARD rules were already applied.
v1.25.162026-05-12
  • Bring-up script was self-killing under `set -euo pipefail`. The v1.25.15 BRIDGE_IP loop did `docker inspect | grep -E 'IPv4' | head -n1`, and on the first iteration (container still warming up) `grep` legitimately returned 1 — `pipefail` propagated that to the whole command substitution and `errexit` killed the script before the retry loop could fire. Audit log went silent right after the docker container ID with no echo, no timeout diagnostic, just exit 1. Append `|| true` so the loop can keep polling.
Fixed`api/src/community/scripts/bring-up-container.sh:BRIDGE_IP loop`: trailing `|| true` on the substitution so a no-match grep doesn't trip errexit. Also initialized `BRIDGE_IP=""` before the loop so the guard's `[[ -n "$BRIDGE_IP" ]]` is well-defined under `nounset`.
v1.25.152026-05-12
  • Bring-up no longer fails at iptables. With v1.25.14's SFTP fix confirmed working (sha256 verified), the new bring-up script's diagnostic echo finally surfaced the real failure: `docker inspect -f '{{(index .NetworkSettings.Networks "bridge").IPAddress}}'` returns the literal string `"invalid IP"` on this Community host — either the bridge network is renamed or the IPAddress field carries a sentinel string for the default bridge. Rewrote the lookup to be robust to both.
Fixed`api/src/community/scripts/bring-up-container.sh`: switched to `{{range .NetworkSettings.Networks}}{{println .IPAddress}}{{end}}` piped through `grep -E '^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$' | head -n1`. Iterates *all* attached networks, prints each IPAddress on its own line, filters for a real IPv4. Robust to bridge-network renames AND to IPAddress carrying garbage.
ChangedOn bridge-IP-timeout the failure path now also dumps `docker network ls`, not just `docker inspect`, so a future host-config drift is diagnosable from the audit-log reason field without SSH.
v1.25.142026-05-12
  • Refresh scripts now actually writes the file. The previous heredoc-over-`ssh2.exec()` returned exit 0 and printed "scripts refreshed" but the on-host script content was *not* updated — confirmed by the bring-up failing with the same iptables error after a "successful" refresh, with no trace of the v1.25.9 echo line in the audit log output. Switched to SFTP and added a local-vs-remote sha256 verification so a silent upload failure is now impossible to miss.
Changed`api/src/community/community-host.service.ts:refreshHostScripts` replaced the `install -m 0755 /dev/stdin … <<'EOF'` pattern with `ssh2.sftp().createWriteStream` (mode 0o755). Returns `{ remoteSha, localBringUpSha, localTearDownSha }`.
Changed`api/src/community/community.controller.ts:refreshHostScripts` computes `{ ok, bringUpOk, tearDownOk }` by comparing local and remote sha256 lines so a mismatch is reported back to the UI.
Changed`ui/components/admin/reserved-hosts-panel.tsx` toast distinguishes "Helper scripts refreshed (sha256 verified)" from "Refresh ran but sha256 mismatch — bringUpOk=… tearDownOk=…".
v1.25.132026-05-11
  • Refresh scripts button toasted `Unexpected token "s"… is not valid JSON` even though the refresh succeeded. The controller was returning the raw SSH stdout (`scripts refreshed`) as the response body; the UI hook then called `res.json()` which choked on plain text.
Fixed`api/src/community/community.controller.ts:refreshHostScripts` now wraps the SSH output as `{ ok: true, output: "..." }` so the response is valid JSON. The actual script-refresh on the host was already working from v1.25.9 + the v1.25.11 path fix — this was just the response-shape mismatch hiding the success.
v1.25.122026-05-11
  • Admin Reserved Hosts panel actually loads now. The hooks in `admin-community.ts` were calling bare `fetch` with `credentials: "include"`, but the cross-origin `__session` cookie is blocked under modern SameSite defaults — so the backend saw no auth and returned 401. Every other hook in the codebase goes through `apiFetch`, which attaches the Clerk Bearer token explicitly; switched the three admin-community hooks over.
Fixed`ui/lib/api/admin-community.ts`: `useCommunityHosts`, `useReserveHost`, and `useRefreshHostScripts` now use `apiFetch` (Bearer token via `getAuthToken`) instead of bare `fetch`. Without this, all three returned `{"statusCode":401,"message":"Authentication required"}` cross-origin and the admin table fell through to "No reserved hosts yet." — even for a superadmin with a valid Clerk session.
v1.25.112026-05-11
  • Admin Reserved Hosts panel was rendering empty even with hosts in the DB. Path mismatch: `CommunityController` is mounted at `@Controller("community/instances")` so the host endpoints are exposed at `/community/instances/hosts/...`, but `useCommunityHosts`, `useReserveHost`, and the new `useRefreshHostScripts` were calling `/community/hosts/...`. The fetches 404'd silently and the table fell through to "No reserved hosts yet."
Fixed`ui/lib/api/admin-community.ts`: corrected all three URLs to the actual mounted paths under `/community/instances/hosts/...`. The `Onboard reserved host` and `Refresh scripts` buttons were both broken before this — listing was just the visible symptom.
v1.25.102026-05-10
  • Community provisioning now updates the dashboard list immediately. Cloud and Local mutations have always invalidated both `["instances", userId]` AND `["agents-page-data"]`; my Community mutation only invalidated the first, so the new pending row only showed up after a manual page refresh. Plus a `Refresh scripts` button on the admin Reserved Hosts panel so the v1.25.9 bring-up-script fix can actually be deployed to live hosts.
Fixed`ui/lib/api/community.ts:useProvisionCommunityInstance.onSuccess`: also invalidate `["agents-page-data"]`. The dashboard agents page subscribes to that key, not the prefix-only `instances` query.
Added`ui/components/admin/reserved-hosts-panel.tsx`: added an Actions column with a `Refresh scripts` button per host. Calls the v1.25.9 `POST /community/instances/hosts/:id/refresh-scripts` endpoint and toasts success/failure. Lets us push helper-script fixes to a live host without waiting on a host re-bootstrap.
Added`ui/lib/api/admin-community.ts`: `useRefreshHostScripts` mutation hook for the new button.
v1.25.92026-05-10
  • Community bring-up was failing at iptables with `Bad IP address "invalid IP"` because the docker-inspect template iterated *all* networks and concatenated their IPAddress strings — fine for a fresh container on the default `bridge`, but the host had picked up an extra network attachment so the result was a mashed-together IP that iptables (correctly) rejected. Plus a new admin endpoint to refresh the on-host scripts so this kind of fix doesn't require re-bootstrapping the whole VPS.
Fixed`api/src/community/scripts/bring-up-container.sh`: replace `{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}` with `{{(index .NetworkSettings.Networks "bridge").IPAddress}}` to pull from the default bridge network specifically. Also dump full `docker inspect` on bridge-IP timeout and echo `WG_IP`/`BRIDGE_IP` before iptables runs so future failures are diagnosable from the audit-log reason field.
Added`POST /community/hosts/:id/refresh-scripts` (`community.controller.ts` + `CommunityHostService.refreshHostScripts`, superadmin-only) re-uploads the latest `bring-up-container.sh` and `tear-down-container.sh` to `/usr/local/sbin/` on a live host via SSH. Avoids re-running the full `bootstrapHost` (which would touch WireGuard config) when the only thing that changed is the helper scripts.
FixedCleaned 2 orphan `wireguard_peers` rows left behind by the v1.25.6 / v1.25.7 / v1.25.8 failed bring-ups (`community-dc872ab5`, `community-e0735419`). The bring-up failure path was setting `ManagedInstance.status=terminated` + `communityHostId=null` but not deleting the matching peer row, so `allocateWgIp` was incrementing past addresses that nothing actually owns. Follow-up: have `bringUpContainerAsync` delete the peer too on failure.
v1.25.82026-05-10
  • Community SSH keys are now downloadable on demand from the instance card, matching how Cloud (`Download SSH key` at `agents/page.tsx:2133`) has always worked. The forced one-time popup that fired right after provisioning is gone.
Changed`api/src/community/community-instance.service.ts:runProvisionSetup` now encrypts the per-container SSH private key with `CredentialsService.encrypt` (AES-256-GCM, same as `CommunityHost.rootSshPrivateKey`) and stores it in `ManagedInstance.metadata.containerSshPrivateKeyEnc`. The key is no longer ephemeral — losing the download is recoverable.
Added`GET /community/instances/:id/ssh-key` (`community.controller.ts` + new `CommunityInstanceService.getOneTimeSshKey`) returns `{ privateKey, host, port }`. Authorization: caller must either own the instance directly (`User.clerkId` match) or have the active org id matching the instance's `organizationId`.
Changed`ui/components/launch-modal.tsx`: dropped the `keyResult` state, the `SshKeyDialog` import + render, and the `{open && (...)}` wrapper. Modal closes immediately and toasts "Community instance provisioning started"; no auto-popup.
Added`ui/lib/api/community.ts`: `fetchCommunitySshKey(instanceId)` and `downloadCommunitySshKey(instanceId)` (creates the blob, triggers the `.pem` download, names it `tropic-community-<id-prefix>.pem`).
Added`ui/app/(dashboard)/agents/page.tsx`: the Community instance card's SSH Endpoint section now shows a `Download SSH key (.pem)` link below the `ssh -i …` command. The pre-filled SSH command points at the matching pem filename so copy → paste works after a download.
FixedHand-cleaned an orphaned `ManagedInstance` row (`2b51b36a…`) that was left at status=pending when the v1.25.7 redeploy killed the in-flight `bringUpContainerAsync`. Marked terminated with `metadata.failureReason="orphaned by API redeploy mid-flight"`. Durable reconciliation of pending rows on app boot is a follow-up.
v1.25.72026-05-10
  • Community provisioning bring-up was reading the wrong env var for the Tropic memory URL, so every container failed at `bring-up-container.sh:12` with "tropic memory url required" the moment it tried to start. The instance row was created at status=pending, the bring-up rejected, and the new failure path immediately flipped it to status=terminated — so the dashboard list (which excludes terminated rows) showed nothing after the modal closed.
Fixed`api/src/community/community-instance.service.ts` (both `bringUpContainerAsync` and `restartWithLatest`): read `MEMORY_SERVICE_URL` instead of `TROPIC_MEMORY_URL`, with the same `https://tropic-memory.fly.dev` public fallback the EC2 path uses at `vm-provisioning.service.ts:2604`. Fly stores the secret as `MEMORY_SERVICE_URL`, so the previous lookup always returned undefined and the empty positional arg tripped the script's strict `${9:?…}` check.
v1.25.62026-05-10
  • Community provisioning now matches the Cloud UX. Modal closes immediately on click, instance appears in the dashboard list at status pending, and transitions to running once bring-up finishes — instead of blocking the modal on a synchronous SSH bring-up that could take 20+ seconds.
Changed`api/src/community/community-instance.service.ts:provisionForUser` split into a synchronous `runProvisionSetup` (user lookup, host placement, slot alloc, keypair gen, `WireguardPeer` + `ManagedInstance` create at status=pending) and a fire-and-forget `bringUpContainerAsync` (SSH bring-up, status flip to running, audit log). Mirrors the existing `vm.service.ts:provisionVmAsync` pattern for EC2.
Changed`bringUpContainerAsync` failure path: marks the row `status=terminated`, detaches `communityHostId`, writes `metadata.failureReason`, audits `community_provision_failed`, and releases the Stripe quantity. Synchronous-setup failures still roll back the Stripe bump via the existing path.
Changed`ui/components/launch-modal.tsx:handleProvisionCommunity` closes the modal immediately and toasts "Community instance provisioning started" before awaiting the API. The SSH-key dialog is hoisted out of the `{open}` block so it survives the modal closing — the one-time private key arrives a tick later and the dialog renders independently.
Added`api/src/community/community-instance.service.spec.ts`: split the placement test to verify the synchronous return shape (status=pending, oneTimeSshPrivateKey present) separately from the post-bring-up side effects (status=running, audit log). New test "marks instance terminated, audits the failure, and releases capacity when async bring-up fails" covers the failure path end-to-end.
v1.25.52026-05-10
  • Community provisioning no longer over-bills on failure. `CommunityInstanceService.provisionForUser` bumps the Stripe quantity *before* the transaction so a declined card aborts before any host-side work, but until now a failure inside the transaction (DB write, SSH bring-up, anything) left the quantity bumped with no instance to back it.
Fixed`api/src/community/community-instance.service.ts:provisionForUser`: wrap the `$transaction` in a try/catch; on any throw, call `billingQuantity.releaseCapacity(orgId, 'community')` to roll the bump back, then rethrow the original error. The release call is itself best-effort and its own errors are caught + logged so a billing-side hiccup never masks the underlying provisioning failure.
Added`api/src/community/community-instance.service.spec.ts`: new test "rolls back the Stripe quantity bump when provisioning fails after ensureCapacity" — forces `host.runOverSsh` to reject and asserts `releaseCapacity` is called with the community SKU before the error propagates.
v1.25.42026-05-10
  • Community tier provisioning unblocked. `POST /community/instances` was 500ing on every call with `P2003` on `managed_instances_user_id_fkey` because the controller passed `req.auth.userId` (the Clerk subject `user_xxx`) straight into `ManagedInstance.userId`, which FKs to `User.id` (internal UUID), not `clerkId`.
Fixed`api/src/community/community-instance.service.ts:provisionForUser`: at the top of the `$transaction`, look up the `User` row by `clerkId` and use its internal `id` for both the `managedInstance.create` write and the `auditLog.create` write (the audit log row FKs to `User.id` too). Mirrors the pattern used by `instances.service.ts:resolveUser` for EC2 / local provisioning.
Fixed`api/src/community/community-instance.service.spec.ts`: stubbed `tx.user.findUnique` and tightened the placement test so the resolved internal id (not the Clerk subject) is what flows into `managedInstance.create`.
v1.25.32026-05-09
  • Test suite fully green again. 29 tests across `instances.service.spec.ts` and `internal-providers.controller.spec.ts` had been silently failing at NestJS module-compile because their TestingModule provider lists were stale relative to the actual service constructors.
Fixed`api/src/instances/instances.service.spec.ts` (both `describe` blocks): added `UserSecretsService` and `OpenclawBackupService` mocks. `InstancesService` constructor takes 8 deps; the spec was passing 6, so Nest failed at `compile()` with "argument UserSecretsService at index [6] is available". 23 tests in this file were never reaching their assertions.
Fixed`api/src/internal-providers/internal-providers.controller.spec.ts`: added `AgentsService` and `InstancesService` mocks. The `ensure-books-extractor` work added those deps to the controller without updating this spec, blocking 24 tests at module compile.
ChangedMocks use `useValue: {}` because none of the failing test bodies actually invoke methods on the missing services — they only need them to satisfy DI graph compilation.
v1.25.22026-05-09
  • New VMs default to t3.small. The previous t4g.medium (Graviton/arm64) default is marked legacy and disabled in the Settings picker. ARM ran OpenClaw too slowly in practice, and the arm64 AMI build was lagging behind x86 (the latest arm64 image was 2026.4.26 while x86 was already on 2026.5.6, so "Latest" silently meant "older" for any t4g VM).
  • Deployed agent templates no longer pin a model. The agent entry written into the VM's `openclaw.json` omits `model`, so OpenClaw inherits whatever `models.default` the instance has configured. Changing the instance model now affects all agents on the box without redeploying.
  • Settings → Instance Type uses the standard React `DropdownSelect` (was a native `<select>`).
Changed`api/prisma/schema.prisma`: `Organization.ec2InstanceType` and `ManagedInstance.ec2InstanceType` defaults flipped from `t4g.medium` to `t3.small`. Migration `20260509140000_default_instance_type_t3_small` runs `ALTER COLUMN ... SET DEFAULT 't3.small'` on both tables. Existing rows are not rewritten; orgs already pinned to t4g.* keep their setting until they re-save it.
ChangedCode-level `t4g.medium` fallbacks flipped to `t3.small` in `vm-provisioning.service.ts:provisionVm` (function default), `backup.service.ts:restoreBackupToNewVm` (function default), and `agents.service.ts:getPageData` (`org.ec2InstanceType ||` fallback). Server validator (`settings.service.ts:ALLOWED_INSTANCE_TYPES`) still accepts t4g.* writes so existing pins can re-save without 400ing.
Changed`ui/app/(dashboard)/settings/page.tsx`: replaced the native `<select>` with `DropdownSelect`, defaulted state to `t3.small`, and marked both t4g.medium and t4g.small as `disabled` with reason "ARM Graviton runs OpenClaw too slowly. Use t3.small."
Changed`ui/app/(dashboard)/backups/page.tsx`: `INSTANCE_TYPES` reordered so `t3.small` is recommended, t4g.* labeled "legacy". Restore-modal `defaultType` fallback flipped to `t3.small`. Radio list still allows selecting t4g.* so existing t4g backups can be restored to the same family if you want.
ChangedAll six built-in agent templates in `agent-templates.service.ts` (web-research, tropic-console, bookkeeping, corpsec, cancer-support, books-extractor) now ship `bundleConfig.agent.model = ''` instead of `'anthropic/claude-sonnet-4-6'`. Reels Maker already used the empty-string convention.
Changed`agents.service.ts:pushAgentToVm` short-circuits the org/secrets model resolver when the template's model is empty, and emits the orchestrator + new-agent jq JSON without a `model` field. The redeploy filter (`map(select(.id != $entry.id))`) drops any prior pinned entry, so a redeploy of an inheriting agent cleanly clears a stale model.
Changed`ui/app/(dashboard)/agents/page.tsx`: the amber "no API key" warning's glm/nemotron exclusion now reads `agent.config?.agent?.model || currentModel`, so it doesn't misfire when an inheriting agent runs against a glm/nemotron model set at the instance level.
FixedThe grey model badge next to a deployed agent's name auto-hides for inheriting templates. Existing books-extractor agents still display their stale snapshotted model until a redeploy rewrites `agent.config`; the actual on-VM `openclaw.json` for those rows also still has the legacy pin until something redeploys them.
Changed`agent-templates.service.spec.ts` assertion for the books-extractor seed now expects `model: ''`.
v1.25.12026-05-09
  • Cross-session memory recall now works on new VMs. Storage was never broken (ingest fires from the gateway's `agent_end` event), but the agent had no way to query it — `memory_search`, `memory_get`, and `memory_save` were all silently rejected at plugin load, so the agent answered "I don't have that recorded" no matter what was in the central memory store.
FixedTropic memory plugin manifest (`tropic-memory/openclaw.plugin.json`) now declares `contracts.tools` listing `memory_search`, `memory_get`, `memory_save`. OpenClaw enforces this as the manifest ownership contract for tool registration; without it the gateway logs `plugin must declare contracts.tools before registering agent tools` and drops every `registerTool()` call from the plugin. Existing VMs need an AMI rebuild (or an SSM patch of the manifest + gateway restart) to pick up the fix.
v1.25.02026-05-08
  • New Community plan at $12/mo, hosted on Community VPS with shared-host Docker isolation.
  • SSH directly into your container; agents, skills, OAuth, and backups all work.
AddedCommunity tier provisioning with region picker (EU / US-Central / Asia).
AddedRestart with latest image button (preserves your data via Docker volume).
AddedAdmin support for reserved (annual) Community hosts in the bin-packing pool.
v1.24.52026-05-08
  • EBS backups page no longer shows every row stuck on `Pending`. The reconciliation in `BackupService.getBackups` and `getAllBackups` was sending one batched `DescribeSnapshots` per region, and any single deleted snapshot in that region (e.g. from a torn-down VM) was returning `InvalidSnapshot.NotFound` for the whole batch. The catch handler logged a warning and bailed, so every other valid pending row in that region stayed pending forever. After the fix, completed snapshots flip to `completed` and rows whose AWS snapshot is gone get marked `failed` (instead of dangling).
Fixed`BackupService` extracts a private `reconcilePendingStatuses` helper used by both `getBackups` and `getAllBackups`. On `InvalidSnapshot.NotFound` from the batched `DescribeSnapshots`, it falls back to a per-snapshot loop so surviving rows still reconcile, and rows whose AWS snapshot is gone get marked failed.
Added`api/scripts/reconcile-instance-backups.ts` one-off sweep for InstanceBackup rows. Run with `npx ts-node scripts/reconcile-instance-backups.ts`. Used today to clear 158 lingering `pending` rows (114 flipped to completed, 44 marked failed, 0 still pending).
v1.24.42026-05-07
  • OpenClaw backup `POST /instances/:id/openclaw-backups` was returning HTTP 500 even though the row was inserted. NestJS could not JSON-serialize the `BigInt` `sizeBytes` field on the response. The `register` endpoint now stringifies `sizeBytes` before returning, matching what the `list` endpoint already does.
Fixed`OpenclawBackupController.register` now returns `{ ...row, sizeBytes: row.sizeBytes.toString() }` so the response serializes cleanly. Spec test updated to assert the stringified field.
v1.24.32026-05-07
  • OpenClaw backup encryption now works. The first backup attempt failed with `enc: AEAD ciphers not supported` because `openssl enc` does not implement AEAD modes; replaced with a small Node one-liner using `crypto.createCipheriv("aes-256-gcm", ...)`. Output format unchanged from the spec: 12-byte IV, 16-byte auth tag, ciphertext.
Fixed`api/packer/files/tropic-backup-openclaw` swaps the `openssl enc -aes-256-gcm` call for a Node script that reads the tarball, encrypts with `crypto.createCipheriv("aes-256-gcm", key, iv)`, and writes `iv || tag || ciphertext`. Node 22 is already on every VM (OpenClaw runtime).
v1.24.22026-05-06
  • First inbound WhatsApp message after pairing no longer gets dropped. Tropic was firing a gratuitous `controlGateway("restart")` after the Baileys 515 pairing handshake completed, which killed the agent process mid-load while it was handling the user's first message.
Fixed`vm.service.ts` 515-retry success branch (around line 2718) no longer calls `controlGateway(..., "restart")` when the polling already saw `wa.connected === true`. Baileys handles the 515 protocol restart on its own socket; the OpenClaw gateway journal shows the WhatsApp provider transitions to "Listening for personal WhatsApp inbound messages" within ~2s of the link event without any process restart, so the additional Tropic-driven `systemctl restart openclaw-gateway` was pure damage. Diagnosed on a real failure: message arrived 06:03:30, agent plugins started loading 06:03:34, restart command landed 06:03:48 (after the rebuild-`.env` + patch-`allowedOrigins` + `systemctl restart` SSM round-trip in `controlGateway`), agent killed at 06:04:03 by SIGKILL when stop-sigterm timed out.
ChangedKept the 515-retry fallback restart (around line 2747) untouched — that branch only fires when 10×1500ms polling never saw `wa.connected`, which is genuinely a stuck-channel signal and may benefit from a restart. Will revisit if the fallback also proves unnecessary in practice.
v1.24.12026-05-06
  • WhatsApp QR generation no longer restarts the gateway mid-pairing. The agent-binding write now goes through `openclaw config set bindings ... --strict-json` over SSM instead of the gateway `config.patch` RPC, which the gateway was treating as a plugin-config change and reacting to with a full restart that killed the in-flight QR scan.
Changed`vm.service.ts:writeChannelBinding` rewritten to take the instance directly and use SSM with `OcShellHelper.userOpsPreamble` (so it works on EC2 + local Linux/macOS). It now reads existing bindings via `openclaw config get bindings`, short-circuits when the binding is already correct, and only writes when something actually changed. All four call sites (WhatsApp pairing on local + EC2, Telegram bot connect, Telegram legacy connect) updated to the new signature.
FixedRemoved the gateway `config.patch` round-trip from the binding write path. The CLI write produces only a `meta.lastTouchedAt` reload signal on the gateway and does not trigger the plugin-entry restart that was killing pairing sessions on every QR generation.
v1.24.02026-05-06
  • Sidebar reorganized into Primary / Control plane / Workspace sections with collapsible Control, Observability, and Workspace groups. Group expand/collapse state persists per-browser, and a group auto-expands when you are on a route inside it.
  • Renames: My Agents → Agents, Members → Team, Benchmarks → Evaluations. Top-level Agent Tasks and Terminal links removed from the sidebar (routes still reachable). New Billing entry under Workspace; the credits balance pill at the bottom now points to it as well.
  • Agents page shows a small "Agent Tasks · Under construction" preview banner at the top, linking to /agent-tasks while that page is still WIP.
Changed`ui/components/sidebar.tsx` rewritten around a typed Section / Group / Leaf model. Collapsed (icon-only) view renders all leaves as a flat icon list with thin dividers between sections; expanded view renders the grouped hierarchy with section headers and a persistent expand/collapse state stored under `tropic:sidebar:group:<id>:open` in `localStorage`.
Added`ui/app/(dashboard)/billing/page.tsx` redirects to `/credits` so the new Billing entry has a stable route. `/billing(.*)` added to `isProtectedRoute` in `ui/proxy.ts`.
Added"Agent Tasks · Under construction" banner-link inserted at the top of `ui/app/(dashboard)/agents/page.tsx` (main render path), pointing at `/agent-tasks`.
RemovedTop-level Agent Tasks, Terminal, and Credits links dropped from the sidebar. The bottom credits-balance pill now links to `/billing`.
v1.23.22026-05-06
  • API deploy unstuck. v1.23.0 introduced an OpenclawBackup controller whose `list` endpoint declared `@UseGuards(ClerkAuthGuard)` directly. Because `VmModule` does not import `AuthModule`, NestJS could not resolve `AuthService` for the per-controller guard instance and the app crashed on bootstrap, looping the Fly machine through restart and leaving every release after v1.23.0 in a degraded state.
Fixed`vm/openclaw-backup.controller.ts` no longer redeclares `@UseGuards(ClerkAuthGuard)` on the `list` method. `ClerkAuthGuard` is already registered as `APP_GUARD` in `app.module.ts`, so the explicit decorator was redundant in addition to being broken; removing it restores boot and changes no auth behavior.
v1.23.12026-05-06
  • WhatsApp QR generation works again when the pairing drawer has an agent selected. The "Internal server error" was caused by the gateway tightening its `config.patch` contract to require an optimistic-lock base hash, which the channel-binding write was not sending.
Fixed`vm.service.ts:writeChannelBinding` now reads `config.hash` from the preceding `config.get` and forwards it as `baseHash` on the `config.patch` call, matching what `ensureWhatsAppPlugin` and the other 11 `config.patch` call sites in this file already do. Without it, the gateway returned `INVALID_REQUEST: config base hash required` and the unhandled error surfaced to the UI as a generic 500.
v1.23.02026-05-06
  • Encrypted daily backups of your full OpenClaw state (config, agents, sessions, workspace) ship to S3 alongside the existing EBS snapshot track. Per-instance AES-256-GCM encryption, Glacier Deep Archive lifecycle, and an admin hotfix endpoint to enable on the running fleet.
AddedNew `OpenclawBackup` Prisma model + `backup_encryption_key` column on `managed_instances`. Per-instance keys are encrypted at rest with the platform `ENCRYPTION_KEY`, same pattern as user API keys.
AddedNew `OpenclawBackupService` with `ensureBackupKey`, `recordBackup`, and `listBackupsForInstance`. Provisioning auto-generates the key for every new instance; a backfill script (`api/scripts/backfill-backup-keys.ts`) covers the existing fleet.
AddedNew endpoints under `/instances/:id/openclaw-backups`: VMs `POST` to register a completed upload (runner-token bearer auth), the dashboard `GET`s to list. New superadmin endpoint `POST /admin/hotfix/openclaw-backup` runs the SSM rollout per instance or fleet-wide.
AddedVM-side `tropic-backup-openclaw` bash script + systemd unit and timer (daily 02:30 UTC). Tarballs `~/.openclaw` + `/etc/tropic`, encrypts with the per-instance key via `openssl enc -aes-256-gcm`, uploads to `s3://tropic-sessions/backups/{instanceId}/{ISO}.tar.gz.enc`, then registers metadata with the API.
AddedPacker install block bakes the script + units into the AMI. `OpenclawBackupHotfixService` (modelled on `SessionArchiveHotfixService`) self-installs from `s3://tropic-deploy/` for VMs running older AMIs.
AddedSmoke test at `api/scripts/test-openclaw-backup.sh` triggers one backup on a real instance and verifies the round-trip (SSM status, DB row, S3 object).
v1.22.12026-05-06
  • Logs page rows now show the originating instance name on the right, and clicking it opens the gateway pre-selected to that instance.
  • Chat button on the agents page now passes the instance ID, so the gateway dropdown lands on the right VM instead of the most recently created one.
  • Fix for 429s on `/instances/:userId/:id/verify`: the auto-poll runs every 10s per pending/offline local instance and was tripping a non-default throttler bucket.
Fixed`instances.controller.ts:verify` swapped bare `@SkipThrottle()` for the explicit form (`default`, `tasks`, all `messaging-*`). In `@nestjs/throttler` v6 a bare `@SkipThrottle()` only opts out of the bucket named `default`; the `tasks` bucket (10/min) was still applying and caused 429s when multiple local instances polled verify in parallel.
AddedLogs page (`ui/app/(dashboard)/logs/page.tsx`) renders the instance name in each collapsed row, just before the timestamp. Resolved via a `Map<instanceId, name>` built from `useInstances()`. Clicking the name `router.push`es to `/gateway?instance=<id>`, which the gateway page already supports.
FixedAgents page chat button now navigates to `/gateway?agent=<id>&instance=<id>` instead of `/gateway?agent=<id>`. Previously the gateway dropdown defaulted to the first connectable EC2 (typically the most recently created VM), regardless of which VM the agent actually lived on.
v1.22.02026-05-06
  • New pricing: Self-Hosted ($9/mo) and Cloud ($27/mo), both with 7-day free trial. The legacy credit packs (1,000 credits = $25) and per-hour `cr/hr` rates are no longer the marketing story; the underlying credit system in the app is still present and unchanged for now.
  • New `/pricing` marketing page with tier cards, seat add-on, "what you get on every plan" grid, and FAQ. Pricing link added to marketing nav and homepage nav.
  • Plan-aware sign-up: `/sign-up?plan=self-hosted|cloud` now persists the chosen plan via cookie through Clerk's multi-step flow, then auto-redirects to Stripe Checkout post-sign-up. Stripe Checkout for the new plans uses 7-day trial with card required.
AddedNew `/pricing` page (`ui/app/(marketing)/pricing/page.tsx`) with three tiers (Self-Hosted, Cloud, Enterprise), seat add-on, FAQ, and CTAs that route to `/sign-up?plan=...`.
AddedPricing link in `ui/app/(marketing)/layout.tsx` and `ui/app/page.tsx` nav. `/pricing` added to `ui/app/sitemap.ts`.
AddedSign-up captures `?plan=` and persists to cookie alongside the existing `?event=` cookie. Dashboard reads the cookie post-sign-up, calls `/credits/subscribe`, and redirects to Stripe Checkout. Code paths gated to `plan === "self-hosted" || plan === "cloud"`.
Added`credits.service.ts:basePriceId`, `seatPriceId`, `isBasePrice`, `isSeatPrice`, and `detectPlanFromPriceId` extended with `self-hosted` and `cloud` plan keys backed by new env vars `STRIPE_PRICE_SELF_HOSTED_MONTHLY`, `STRIPE_PRICE_CLOUD_MONTHLY`, `STRIPE_PRICE_SELF_HOSTED_SEAT`, `STRIPE_PRICE_CLOUD_SEAT`. Stripe products and prices created in live mode and Fly secrets set on `tropic-api`.
AddedPro checkout sessions get `subscription_data: { trial_period_days: 7 }` and `payment_method_collection: "always"` (card required during trial). Pro success URL goes to `/dashboard?subscribed=true`, cancel URL goes to `/pricing?canceled=true`. Legacy `day` and `standard` plans are unchanged and still work for existing customers.
ChangedHomepage (`ui/app/page.tsx`) tier section reworked from Individual / SME / Enterprise persona-based copy to Self-Hosted / Cloud / Enterprise SKU names with prices and CTA buttons that route into the new sign-up flow. Hero CTA tagline updated from "100 free credits on signup. No card required." to "7-day free trial. No charge until trial ends."
ChangedMarketing pricing copy in `ui/app/(marketing)/docs/page.tsx` rewritten from credit-based (cr/hr table, 1000 credits = $25) to subscription-based, with the daily-credit-cap card replaced by a max-session-runtime card.
Changed`ui/app/(marketing)/terms/page.tsx` section 5 rewritten from "prepaid credit system" to subscription billing with 7-day trial, monthly auto-renewal, and pricing-page reference. Privacy policy lines about "credit consumption" and "credit purchases" updated to subscription/feature usage and subscription billing.
RemovedPublic `/credits/*` API endpoints removed from `ui/app/(marketing)/docs/api/page.tsx`. Backend endpoints still exist for legacy customers.
v1.21.02026-05-05
  • Android instances can now change models and connect OpenAI ChatGPT OAuth from the dashboard, just like EC2. Both paths previously required SSM and were blocked in the UI; now they go through the runner over the Cloudflare tunnel.
Changed`vm.service.ts:setInstanceModel` rewritten to use `instanceExec.runShell` + `OcShellHelper.userOpsPreamble` instead of raw SSM and hardcoded `/home/ubuntu/.openclaw/.env`. Replaced grep+mv env edits with `sed -i.bak`. Drops the `if (!instance.instanceId)` check in favour of an SSM-or-runner availability check.
Changed`vm.service.ts:completeOpenAIOAuthInner` and `restartGatewayAsync` rewritten the same way. Merge script that writes auth-profiles.json now reads `OC_HOME` from env (set by userOpsPreamble) instead of hardcoding `/home/ubuntu/.openclaw`. Temp file path moved from `/tmp/tropic-oauth-merge.js` to `$OC_DIR/tropic-oauth-merge.js` so it works on Termux (no /tmp).
ChangedAgents page: model selector dropdown is enabled for Android (was previously gated `disabled={isExternal}`). The "API keys and model configuration are managed directly..." banner now only shows for non-Android external instances (e.g. Agent37).
ChangedAgents page: Connect OpenAI button shows on Android. Both inline (next to Set API Key) and the larger banner (when an OAuth-required model is selected and not yet connected).
v1.20.92026-05-04
  • `channels.whatsapp.accounts.<name>.selfChatMode = true` is now set on every WhatsApp account Tropic provisions or modifies. Required for OpenClaw self-chat behavior.
FixedThree call sites that touch `channels.whatsapp.accounts.default.*` now also set `selfChatMode: true`: the local-instance setup-script jq block (`api/src/instances/instances.service.ts`), the WhatsApp allowlist update path (`api/src/vm/vm.service.ts`), and the EC2 AMI build (`api/packer/scripts/install.sh`).
v1.20.72026-05-04
  • Remove Instance dialog now shows a copy-able OS-specific cleanup command for the device side, with a toggle to also uninstall OpenClaw itself. Default scope is Tropic-only — leaves OpenClaw on the machine in case the user installed it before connecting Tropic. Command varies based on `metadata.platformType` (Android/Termux, macOS, Linux, Windows) which the setup callback already stores.
Added`ui/lib/uninstall-commands.ts` exports `buildUninstallCommand(platformType, removeOpenclaw)` and `platformLabel(platformType)`. Generates a copy-pasteable shell (or PowerShell) script that stops Tropic processes, drops the runner + plugins, reverts the openclaw.json edits Tropic made at setup, and (optionally) uninstalls OpenClaw + cloudflared. Falls back to a sudo+systemd Linux command when `platformType` is unset (e.g. instance never completed setup callback).
Changed`Remove Instance` button on the agents page now opens a richer dialog: instance-cleanup command in a copy-able code block, "Also uninstall OpenClaw" checkbox (off by default), and a clear note that the Tropic-side delete (tunnel + DB row + SSM deregister) is independent of the device-side cleanup.
v1.20.62026-05-04
  • Org-level provider keys now sync to Android instances exactly the same way they sync to EC2. The user-secrets sync path (used on every key add/edit/delete) was hardcoded to SSM + systemd + sudo + /home/ubuntu — silently no-op on Android. Now goes through `InstanceExecService` (SSM for EC2, runner over Cloudflare Tunnel for Android), uses `OcShellHelper.userOpsPreamble` for portable paths, and skips systemd writes on Android.
  • Android setup callback now triggers a one-shot org-secrets push the moment the runner registers, so a fresh device picks up everything the org has configured (Gemini, OpenRouter, etc.) without a separate user action.
  • `TROPIC_HOSTED_ACKNOWLEDGED` and `OPENAI_OAUTH_CONNECTED` no longer appear in the Saved Keys UI. They are internal flags, not real provider credentials, and showing them with a key icon was misleading.
Changed`user-secrets.service.ts`: removed direct `SSMClient` usage. `getRunningInstances` selects `type`, `ocHome`, `runnerToken`, `tunnelUrl`, `metadata` and includes Android instances (no `instanceId` but with runnerToken+tunnelUrl). All command builders take an instance argument; `buildPreamble` uses `OcShellHelper.userOpsPreamble`; systemd writes early-return on Android; gateway restart switched from `sudo systemctl restart openclaw-gateway` to `$AS_OC_USER bash -lc "openclaw gateway restart"` (cross-platform).
Changed`user-secrets.service.ts:listAll`: filters out internal flag keys (`TROPIC_HOSTED_ACKNOWLEDGED`, `OPENAI_OAUTH_CONNECTED`) via Prisma `where: { key: { notIn: [...] } }`. The `INTERNAL_FLAG_KEYS` set is also used by `syncAllToInstance` to skip those keys when writing the device .env, replacing the previous one-off `if (secret.key === "OPENAI_OAUTH_CONNECTED") continue` check.
AddedAndroid setup callback (`handleSetupCallback`) calls `userSecretsService.syncAllToInstance(orgId, instanceId)` immediately after marking the instance online. Fire-and-forget; logs warnings on failure but does not block the callback response.
FixedReplaced grep+mv-inside-bash-c pattern with `sed -i.bak` for in-place env-file edits (same fix that landed in v1.20.4 for `secrets.service.ts` — `user-secrets.service.ts` had the same bug).
v1.20.52026-05-04
  • Local-instance creation (Connect Local Machine / Android) now picks a model that matches what the org actually has configured, instead of hardcoding `anthropic/claude-sonnet-4-6`. Previously, an Android setup on an org with OpenAI OAuth or Gemini configured would still default to Anthropic — and since no Anthropic key was on file, the agent had nothing to call. Now mirrors the EC2-provisioning logic: prefer the org's `openclawModel` setting, else `pickBestModel(orgSecrets)`.
Changed`createLocalInstance` looks up `OrgSetting.openclawModel`, then falls back to `pickBestModel(new Set(orgSecrets.keys))`. The MODEL_BY_KEY priority order (Anthropic > OpenAI key > OpenAI OAuth > Gemini > OpenRouter > xAI > Groq > Mistral > Together > Moonshot > Z.ai) determines which provider wins when multiple are configured. Sets `metadata.oauthConnected: false` when the chosen model is `openai-codex/*`, matching the EC2 path.
ChangedAndroid setup script now bakes `instance.selectedModel` into the `openclaw models set …` line instead of always writing `anthropic/claude-sonnet-4-6`.
v1.20.42026-05-04
  • Hotfix for the secret-sync regression introduced in v1.20.1. Per-instance secrets (the BYOK path used to push ANTHROPIC_API_KEY etc. to a VM) silently no-op'd on every platform after the v1.20.1 refactor — the new commands wrapped a `grep ... > tmp && mv tmp ...` block inside `bash -c '...'` (single-quoted), which prevented `$ENV_FILE` from expanding inside the inner shell. Replaced the temp-file dance with `sed -i.bak` for in-place edits, which doesn't need `bash -c` at all.
Fixed`secrets.service.ts:syncSecretsToVm`: per-key removal switched from `bash -c 'grep -v ... > $ENV_FILE.tmp && mv ...'` (broken — outer `$ENV_FILE` not expanded into the single-quoted inner shell, and `sudo -u <user>` strips env so it isn't there either) to `$AS_OC_USER sed -i.bak '/^KEY=/d' "$ENV_FILE"`. The outer shell expands `$AS_OC_USER` and `$ENV_FILE` correctly; sed runs as the right user with the right path. Affects EC2, local Linux, macOS, and Android.
Fixed`secrets.service.ts`: clean up `$ENV_FILE.bak` files generated by sed's in-place edit so they don't accumulate.
v1.20.32026-05-04
  • Android instances now correctly transition from Pending to Online after setup. Previously the dashboard kept showing the setup-curl card even after a successful run, because the pending→online transition went through SSM `DescribeInstanceInformation`, which Android instances are never registered with.
Fixed`handleSetupCallback`: when an Android setup callback arrives with both a gateway token and a runner token, mark the instance `online` and stamp `lastActivityAt`. The successful POST is itself proof the device installed everything, started both daemons, and has working outbound connectivity — no separate liveness probe needed.
Fixed`verify` (the dashboard's Retry button): added an Android branch that hits `<tunnelUrl>/_tropic/health` with the runner bearer token instead of calling SSM. SSM-only verify silently no-op'd on Android, so Retry never moved an Android instance off Pending.
v1.20.22026-05-03
  • Agents page mobile/tablet layout fix: the cramped single-row card header (status badges crashing into the WhatsApp/Telegram/Slack/Terminal action circles) was triggering at 768px+. Switched the breakpoint to 1024px so tablet widths get the multi-row layout (status row, type/region badges, centered action buttons) and only true desktop widths use the dense single-row header.
FixedInstance card layout breakpoint in `ui/app/(dashboard)/agents/page.tsx` switched from `md:` (768px) to `lg:` (1024px) for both `hidden md:flex` desktop header and `md:hidden` mobile header. Eliminates button-vs-badge overlap on iPad-class widths.
v1.20.12026-05-02
  • Follow-up to v1.20.0 Android support: BYOK secrets (OPENROUTER_API_KEY, OPENAI_API_KEY, etc.) now sync correctly to Android instances. Previously they wrote via SSM, which silently no-op'd on Termux. Also makes the dashboard Android-aware: instance card shows "Android" instead of "Local", the SSH-backed terminal button is hidden, and the offline-reconnect dialog tells Android users to re-register rather than restart a non-existent SSM agent.
Changed`secrets.service.ts`: 3 SSM `SendCommandCommand` call sites (`syncSecretsToVm`, `getAwsProviderStatus`, `configureAwsProvider`, plus the AWS-secret CRUD ref edits) routed through `InstanceExecService`. Hardcoded `~/.openclaw/.env` path replaced with `OcShellHelper.userOpsPreamble()` so writes land at `$OC_DIR/.env` regardless of platform. Gateway restart switched to `openclaw gateway restart` (cross-platform).
ChangedInstance card platform badge: shows the actual `metadata.platformType` (Android / macOS / Windows / Linux) instead of the generic "Local" label. Two locations updated in `ui/app/(dashboard)/agents/page.tsx`.
ChangedOffline-reconnect dialog: hides the "Restart SSM agent" step on Android instances (no SSM to restart) and reframes the second step as "Re-register your device", since on Termux the only real reconnect path is rerunning the setup curl command.
FixedTerminal button (browser-based shell) is hidden on Android instance cards. The terminal route is SSH-backed and Termux's sshd is not part of the standard setup; clicking it would fail with a connection error.
v1.20.02026-05-02
  • Android (Termux) is now a supported local-instance platform. The setup script detects Termux, skips SSM and WireGuard, installs cloudflared via the Termux user repo, and stands up a small bearer-auth `tropic-runner` HTTP daemon on `localhost:18790` that replaces SSM as the command-execution channel for Android instances. The Cloudflare Tunnel ingress now routes `/_tropic/*` to the runner so the API can drive Android devices over the existing tunnel — no cloudflared client needed in the Fly image.
  • Internal: agent push, settings sync, and credential push paths now route through a unified `InstanceExecService` that picks SSM (existing instances) or HTTP runner (Android) based on `metadata.platformType`. Around 14 SSM `SendCommandCommand` call sites collapsed onto two helpers (`runShell`, `runShellAsync`).
AddedAndroid branch in `api/src/instances/instances.service.ts` (`getSetupScript`): detects Termux via `uname -o` / `$ANDROID_ROOT` / `$TERMUX_VERSION`, skips sudo/SSM/WireGuard/metrics-exporters, runs `pkg install` for nodejs, jq, curl, tar, wget, openssl-tool, lsof, plus tur-repo + cloudflared. Reports `platformType: "Android"`, `environment: "android"` in the setup callback.
Added`tropic-runner` daemon, written by the setup script as `~/.tropic-runner/runner.js` and started via `nohup`. Bearer-token auth on `127.0.0.1:18790`, exposes `POST /_tropic/exec` (returns `{stdout, stderr, code, signal}`) and `GET /_tropic/health`. Token is generated at setup time and posted back to the API in the callback alongside `gatewayToken`.
AddedCloudflare Tunnel ingress rule `^/_tropic/` → `localhost:18790` in `cloudflare-tunnel.service.ts`. Existing tunnels reconcile on next API boot via `reconcileAllTunnelsIngress`.
Added`ManagedInstance.runnerToken` column (migration `20260502000000_add_runner_token`) plus wiring in the setup callback DTO/handler.
Added`InstanceExecService` (`api/src/instances/instance-exec.service.ts`) with `runShell()` (waits, returns `{stdout, stderr, status, exitCode}`) and `runShellAsync()` (fire-and-forget). Branches on `metadata.platformType === "Android"`: HTTPS POST through the runner via `<tunnelUrl>/_tropic/exec` for Android, otherwise existing SSM `SendCommandCommand` + `GetCommandInvocationCommand` poll loop. Lives in its own `InstanceExecModule` to break the cycle that would form between `InstancesModule` and `CredentialsModule`.
Added`OcShellHelper.userOpsPreamble(instanceType, ocHome)` extends `pathPreamble` with a sudo-detection line that defines `$AS_OC_USER` — resolves to `sudo -u $OC_USER` on EC2 (root → ubuntu) or empty on Android (already running as the user).
AddedPlugin tarball delivery for local instances: `GET /instances/plugins/:token` streams the four Tropic plugins (sondera, tropic-telemetry, policy-check, tropic-memory) so local-instance setup can install them the same way the EC2 AMI build does. Plugin sources copied into the API runtime image via `Dockerfile`.
Changed`agents.service.ts`: `pushAgentToVm` and `pushPolicyToVm` now take an `instance` object (replacing loose `region`, `instanceId`, `instanceType`, `ocHome` args) and route through `instanceExec.runShell` / `runShellAsync`. Five call sites updated across `agents.service.ts` and `vm.service.ts`.
Changed`credentials.service.ts`: API key sync, deletion, and SSH pubkey push routed through `InstanceExecService`. Hardcoded `/home/ubuntu/.openclaw/.env` and `sudo -u ubuntu` removed in favour of `OcShellHelper.userOpsPreamble()` which sets `$OC_DIR` and `$AS_OC_USER`. Gateway restart switched from `systemctl restart openclaw-gateway` to `openclaw gateway restart` so it works cross-platform.
Changed`settings.service.ts`: 8 SSM call sites collapsed onto `instanceExec.runShell`/`runShellAsync`. `syncOpenclawModelToVm` and `syncSkillsToVm` now use `$OC_DIR` paths so they work on Termux. The chmod-based restriction helpers (`applyOsRestrictionsToVm`, `applyBrowserRestrictions`, `applyCodeExecutionRestrictions`) early-return on Android since they target `/usr/bin/*` system binaries that don't exist on Termux. Removed the now-dead `waitForCommandComplete` helper, `ssmClients` map, and `@aws-sdk/client-ssm` imports.
FixedOn Termux, `eval echo "~$REAL_USER"` does not expand because Android UIDs (e.g. `u0_a123`) have no `/etc/passwd` entry. The literal `~u0_a123` then propagated through `OC_HOME`, `OC_CONFIG`, the `[ -f "$OC_CONFIG" ]` gate, and silently skipped the entire jq + gateway-restart + runner-install + callback block. Setup script now uses `$HOME` directly when `SUDO_USER` is unset.
FixedTermux `npm install -g` copies binaries instead of symlinking. The copied `openclaw` bin at `$PREFIX/bin/openclaw` resolves `./dist/entry.js` (via `import.meta.url`) against `/usr/bin/`, throwing `missing dist/entry.(m)js`. Setup script now replaces the copy with a symlink to `$PREFIX/lib/node_modules/openclaw/openclaw.mjs`.
FixedTermux has no `/usr/bin/env`, so npm-installed shebangs `#!/usr/bin/env node` were rejected with "bad interpreter". Setup script now runs `termux-fix-shebang` on the openclaw bin (idempotent).
FixedTermux `/tmp` is not writable. Three temp-file writes (`/tmp/oc-setup.tmp`, `/tmp/oc-plugins.tmp`, `/tmp/tropic-plugins-*.tar.gz`) failed silently. Setup script now uses `${TMPDIR:-/tmp}` everywhere on the Android path.
FixedTermux has no `which` command, so `OC_BIN=$(... which openclaw ...)` returned empty and skipped the gateway restart. Replaced `which` with the POSIX builtin `command -v`.
FixedGateway port-bind poll after `nohup openclaw gateway run` was too short (2s) — plugin init can push cold start past 5s, leading to a false-negative "Warning: Gateway may need manual restart". Extended to 12 × 1s polls.
v1.19.02026-05-01
  • GPT-5.5 is now the default model for users who connect via OpenAI Codex OAuth (ChatGPT Plus/Pro). The model entry, gateway model registry, and launch-modal dropdown all shipped in v1.16.0 — but the auto-selected default was still GPT-5.4, so most OAuth users never saw 5.5 unless they opened the dropdown. Defaults now point at 5.5 in `pickBestModel` and in the OAuth-token-push fallback.
Changed`api/src/settings/model-by-key.ts`: `OPENAI_OAUTH_CONNECTED` now maps to `openai-codex/gpt-5.5` (was `openai-codex/gpt-5.4`). This is the model `pickBestModel()` returns for OAuth-only users, and the default the launch modal pre-selects in `ui/app/(dashboard)/agents/page.tsx`.
Changed`api/src/vm/vm.service.ts` (`completeOpenAIOAuth`): when pushing OAuth tokens to a VM, the SSM `openclaw models set` call now defaults to `openai-codex/gpt-5.5` if the user did not explicitly pick a codex model at launch. User-selected codex models are still honoured.
v1.18.02026-05-01
  • New instance type: t4g.small (ARM Graviton, 2 vCPU / 2 GB) — roughly 20% cheaper on AWS than t3.small. Available from Settings → VM Instance Type and the backup-restore picker.
  • AMI build now parameterized by CPU architecture. The same Packer config produces both x86_64 (default) and arm64 AMIs via `PKR_VAR_arch=arm64 packer build openclaw-ubuntu.pkr.hcl`. install.sh detects the architecture and branches all arch-specific downloads (AWS CLI, SSM agent, CloudWatch agent, Prometheus exporters) automatically.
  • On arm64 the AMI ships Brave instead of Chrome (Google does not publish Chrome for Linux arm64). Brave is Chromium-based, installed from the official Brave apt repo as a real signed .deb (no snap), and symlinked as /usr/bin/google-chrome so OpenClaw browser plugins find it. Chromium-based browsers are interchangeable for OpenClaw's CDP-based browser tool.
AddedNew instance type t4g.small added to `ALLOWED_INSTANCE_TYPES` in `api/src/settings/settings.service.ts`, the Settings dropdown in `ui/app/(dashboard)/settings/page.tsx`, the backup-restore picker in `ui/app/(dashboard)/backups/page.tsx`, and the per-instance compute-rate map in `ui/app/(dashboard)/agents/page.tsx` (10 cr/hr informational rate).
AddedArchitecture-aware AMI selection in `api/src/vm/vm-provisioning.service.ts`: instance types matching `^(t4g|c7g|c8g|m7g|m8g|r7g|r8g)\.` now look for the `openclaw-ubuntu-desktop-arm64` AMI tag (built separately via `PKR_VAR_arch=arm64`). Same arch branch in the base-Ubuntu fallback path. Spot-price table extended with t4g.small at $0.0067/hr.
AddedCross-architecture backup restore is now blocked at the API layer. `BackupService.restoreBackupToNewVm` rejects attempts to restore an x86_64 backup snapshot onto an arm64 instance type (or vice versa) — the EBS snapshot contains the source OS and won't boot on the wrong architecture. Error message names both architectures and the source instance type so the user can pick a compatible target.
ChangedPacker config `api/packer/openclaw-ubuntu.pkr.hcl` now takes an `arch` variable (default `amd64`) and uses locals to derive the build-host instance type (t3.medium → t4g.medium), source AMI filter (`amd64-server-*` → `arm64-server-*`), output AMI name suffix, and Name tag. amd64 builds are byte-identical to before.
Changed`api/packer/scripts/install.sh` detects CPU architecture once via `dpkg --print-architecture` and branches all arch-specific downloads. On arm64: Brave instead of Chrome (chromium-browser apt package is a snap stub on Noble that hangs the build for ~50 minutes contacting an unreachable snap store; Brave is a real Chromium-based .deb from the official Brave apt repo). AWS CLI uses `aarch64`, SSM agent and CloudWatch agent use `arm64`, Prometheus exporters use `arm64` tarballs.
FixedAudit rules in `api/packer/files/tropic-audit.rules` no longer reference syscalls absent from the arm64 syscall table. Removed `open` from the access_denied rule and `unlink`/`rename` from the file_modify rule — arm64 only ships the `*at` variants. Modern glibc routes the legacy syscall names through `openat`/`unlinkat`/`renameat` on amd64 too, so coverage is identical on both architectures. Without this fix, `augenrules --load` aborted with "Syscall name unknown" and the AMI build failed.
Changed`heapMb` ladder in `vm-provisioning.service.ts` treats t4g.small the same as t3.small (1536 MB heap budget), since both have 2 GB RAM.
v1.17.22026-04-30
  • Final session-archive bug from the live-fleet rollout: events were landing in the wrong S3 bucket. The hotfix now inserts at the right point in fluentd.conf instead of appending. End-to-end pipeline confirmed working — first object verified in tropic-sessions.
FixedThe SSM hotfix shell payload in `session-archive-hotfix.service.ts` used `cat >> /etc/fluent/fluentd.conf` to add the openclaw.session source + match. Because fluentd evaluates match blocks top-down and the existing audit-logs match (`<match openclaw.** audit.**>`) catches `openclaw.session` events as well, our new match was unreachable — every session event was getting shipped to the audit-logs bucket under that bucket's key layout. Replaced the append with an `awk`-based insert that places the session block immediately before the audit-logs match block. Idempotent guard kept (`! grep -q "tag openclaw.session"`).
FixedHotfix now also `rm -f /var/log/fluent/openclaw-sessions.pos` before restarting fluentd so historical session content gets re-tailed from the start under the corrected match. Without this, the pos file kept fluentd at end-of-file so only new writes would land correctly.
v1.17.12026-04-30
  • Fixes two bugs in v1.17.0 surfaced when the SSM hotfix ran across the live fleet. The session-archive design itself is unchanged; this is just the dialect of bash + systemd needed to actually land it on running VMs.
FixedThe openclaw.json jq mutation in `session-archive-hotfix.service.ts` ran the redirection (`> /tmp/oc-maint.tmp`) under the SSM-injected root shell, then tried `sudo -u ubuntu mv /tmp/oc-maint.tmp openclaw.json`. The temp file was root-owned and `/tmp` has the sticky bit, so ubuntu silently failed to move it — `session.maintenance` ended up null on every VM the hotfix touched. Pipeline now runs entirely inside one `sudo -u ubuntu bash -c "..."` so the temp file is ubuntu-owned end to end. Verified by reading `.session.maintenance` on all six prod VMs after re-running the hotfix.
Fixedfluentd refused to start under the new sandbox with `Read-only file system @ dir_s_mkdir - /tmp/fluentd-lock-...`. `ProtectSystem=strict` made `/tmp` read-only and the fluentd supervisor creates lock files there during boot. Added `PrivateTmp=true` (gives fluentd its own ephemeral /tmp) plus `BindReadOnlyPaths=/tmp/openclaw` so the existing audit-logs gateway-log tail still works inside the private namespace.
FixedEven with /tmp fixed, fluentd cycled in `activating (start)` because the upstream unit is `Type=forking` with `--daemon`. With `User=ubuntu` overridden, systemd kept reading the parent's PID from the PID file and immediately declaring the main process a zombie (the parent had already forked-and-exited). Sandbox drop-in now overrides to `Type=simple`, clears `PIDFile=`, replaces `ExecStart=` with a foreground `/opt/fluent/bin/fluentd --log /var/log/fluent/fluentd.log` invocation. systemd tracks the long-lived process directly. Verified on five of six prod VMs (the sixth predates fluentd in the AMI; `session.maintenance` still applied there, archiving picks up on next reprovision).
v1.17.02026-04-30
  • OpenClaw's automatic session pruning is now disabled on every Tropic VM. Previously, OpenClaw 2026.4.23+ silently deleted any session entry older than 30 days and capped each agent's store at 500 sessions on every gateway write — production data loss with no warning.
  • Session transcripts (both `*.jsonl` and `*.trajectory.jsonl`) now stream off-VM to a new `tropic-sessions` S3 bucket via Fluentd. Daily snapshots of `sessions.json` (the index) are also archived. Lifecycle: 30d Standard → 90d IA → 180d Glacier IR → Deep Archive. No expiration.
  • Existing fleet hotfix endpoint: `POST /admin/hotfix/session-archive` (superadmin) brings every running VM into compliance idempotently — patches openclaw.json, drops the fluentd source + sandbox, installs the daily snapshot timer.
FixedOpenClaw 2026.4.23 ships `session.maintenance.mode = "enforce"` as its in-code default (despite the type comment claiming "warn") and Tropic's AMI was never overriding it. Result: every gateway write triggered `pruneStaleEntries` (entries older than 30 days deleted from `sessions.json`) and `capEntryCount` (store capped to 500 most-recently-updated entries). Per-VM jsonl transcripts then went into a 30-day grace window as `*.jsonl.deleted.<ts>` archives before being hard-deleted by the same maintenance pass. The AMI build (`api/packer/scripts/install.sh`) now runs `jq` post-SecureClaw to set `session.maintenance.mode = "warn"` and `resetArchiveRetention = false`. The warn branch in OpenClaw's `src/config/sessions/store.ts` skips both prune and cap, so transcripts on disk are untouched by the gateway. Tropic now owns retention via the off-box S3 archive (below).
AddedNew S3 bucket `tropic-sessions` in `ap-southeast-1`, versioning enabled, SSE-S3 encryption, public access blocked, lifecycle policy transitioning Standard → IA → Glacier IR → Deep Archive at 30/90/180 days with no expiration. Created via `api/scripts/setup-sessions-bucket.sh` (idempotent) using lifecycle config in `api/scripts/sessions-lifecycle.json`. The same role used for audit-logs is reused (TropicVMSSMProfile / TropicHybridSSMRole) — operators must attach an `s3:PutObject` policy on the new bucket separately.
AddedNew Fluentd tail source + S3 match in `api/packer/files/fluent.conf` ships `*.jsonl` from `~/.openclaw/agents/*/sessions/` to `s3://tropic-sessions/<env>/<instanceId>/sessions/<YYYY>/<MM>/<DD>/<HH>-<uuid>.jsonl.gz`. Hourly chunks, gzipped, `single_value` format preserves each JSONL line as-is. Excludes `*.deleted.*`, `*.reset.*`, `*.bak.*` archive artifacts so we don't double-ship.
AddedNew systemd drop-in `api/packer/files/tropic-fluentd-sandbox.conf` runs fluentd as `User=ubuntu`. OpenClaw's queued-file-writer opens `*.trajectory.jsonl` files with mode 0o600 and explicitly chmods them on every write (queued-file-writer.ts:100,107) — running as `_fluentd` could only ever ship the small `.jsonl` metadata files, never the actual conversation traces. Cannot modify OpenClaw source (hard rule), so the sandbox runs fluentd as the file owner but with `ProtectHome=tmpfs` + `BindReadOnlyPaths=/home/ubuntu/.openclaw/agents` so the rest of ubuntu's home (auth-profiles.json, .ssh, .aws, .secureclaw audit reports) is invisible to the fluentd process. Plus standard `NoNewPrivileges`, `ProtectSystem=strict`, kernel/namespace restrictions.
AddedNew systemd timer `tropic-session-index-snapshot.timer` (`OnCalendar=*-*-* 02:00:00 UTC`, `Persistent=true`) uploads each agent's `sessions.json` index file to `s3://tropic-sessions/<env>/<instanceId>/<agent>/index/<YYYY-MM-DD>.sessions.json` daily. Runs as ubuntu via `api/packer/files/tropic-session-index-snapshot.service`. The script is at `api/packer/files/tropic-session-index-snapshot.sh` and reads `/etc/default/tropic-session-index-snapshot` for env vars (TROPIC_ENV, TROPIC_INSTANCE_ID, S3_TROPIC_SESSIONS_BUCKET, S3_TROPIC_SESSIONS_REGION). The index file is rewritten in full on every gateway write — unsuitable for tail-based shipping — hence the daily snapshot.
AddedNew `POST /admin/hotfix/session-archive` superadmin endpoint in `api/src/admin/admin.controller.ts` + `api/src/admin/session-archive-hotfix.service.ts`. Iterates running ManagedInstances (status `running` or `online`), fans out an idempotent SSM `AWS-RunShellScript` per instance that applies all of the above to live VMs without reprovisioning: openclaw.json jq mutation, fluentd `tropic.conf` env drop-in, `tropic-sandbox.conf` drop-in, fluent.conf source/match append (only if missing), buffer dirs + chown, index-snapshot script + units + env file, daemon-reload, fluentd restart, timer enable. Returns a per-instance status summary with SSM command ids. Safe to re-run.
Changed`api/src/vm/vm-provisioning.service.ts` extends the per-VM fluentd drop-in (3 sites) so freshly provisioned VMs get TROPIC_ENV, S3_TROPIC_SESSIONS_BUCKET, S3_TROPIC_SESSIONS_REGION env vars in addition to the existing TROPIC_INSTANCE_ID + audit-bucket vars. Also writes `/etc/default/tropic-session-index-snapshot` and enables the snapshot timer.
Changed`api/packer/scripts/install-fluentd.sh` installs the sandbox drop-in, copies the snapshot script + units, creates `/var/log/fluent/buffer/sessions`, and chowns `/var/log/fluent` to ubuntu (was _fluentd — the sandbox switches the runtime user). New AMIs ship the timer enabled at boot.
AddedDesign spec at `docs/superpowers/specs/2026-04-29-session-archive-design.md` covers the full architecture, failure modes, IAM scoping decision (per-instance prefix scoping considered but skipped because it doesn't work for the hybrid local role), and rollout order.
v1.16.02026-05-01
  • WhatsApp allowlist is now per-VM. Saving a number on one VM no longer SSM-restarts every other gateway in the org
  • Each VM has its own WhatsApp drawer (matches the Telegram pattern) — manage allowed numbers and pair the QR in one place
  • GPT-5.5 (Codex OAuth) added alongside GPT-5.4 — pick either at launch when ChatGPT OAuth is connected
FixedThe org-scoped `POST /settings/:userId/phone-numbers` endpoint used to fan out an SSM `RunShellScript` against every running EC2 + local instance in the org and `sudo systemctl restart openclaw-gateway` on each one — so saving a single WhatsApp number on the QR modal would kill every active agent session on every VM the user owned. The endpoint and its `updatePhoneNumberAllowlist` + `syncPhoneNumbersToInstance` service methods are gone. The dashboard top-level WhatsApp QR section (which secretly operated on "primary EC2 instance" via that org-scoped API) is also gone — users are pointed to the Agents page where each VM has its own pairing UI.
AddedNew per-VM endpoints `GET /vm/:userId/:instanceId/whatsapp/allowlist` and `PUT /vm/:userId/:instanceId/whatsapp/allowlist` in `api/src/vm/vm.controller.ts`. Reads use `openclaw config get channels.whatsapp.allowFrom`. Writes use `openclaw config set` with `--strict-json` for the four canonical keys (`channels.whatsapp.allowFrom`, `channels.whatsapp.dmPolicy`, and the `channels.whatsapp.accounts.default.*` mirrors required for OpenClaw enforcement). No `systemctl restart` — the running gateway picks up channel config on the next channel-init.
AddedNew `WhatsAppManagementDrawer` (Sheet, side-right) in `ui/components/settings/whatsapp-management-drawer.tsx`. Replaces the old `WhatsAppSetupModal` Dialog. Shows the allowed-numbers list with add/remove, the QR pairing flow, agent routing select, relink, and disconnect. The org-scoped Phone Number Allowlist Card on `/settings` is also gone — replaced with a one-line redirect to `/agents`.
AddedNew OAuth model entry `openai-codex/gpt-5.5` registered in the gateway provider config (`api/src/vm/vm-provisioning.service.ts`), the UI MODELS list (`ui/lib/model-providers.ts`), and the books-api model-registry. The launch-modal ChatGPT OAuth card now renders a model dropdown when 2+ Codex models exist, matching the existing one-card-per-provider pattern.
FixedIn `vm.service.ts:completeOpenAIOAuth`, the post-OAuth SSM step used to hardcode `openclaw models set openai-codex/gpt-5.4` and overwrite `selectedModel` to gpt-5.4. So if a user picked gpt-5.5 at launch and then completed OAuth, they'd silently get reverted to gpt-5.4. The handler now respects an existing `openai-codex/*` selection and only falls back to gpt-5.4 when the selected model is from a different provider (e.g. they connected OAuth after picking Anthropic).
v1.15.02026-04-30
  • Cross-session memory now actually ingests on Tropic-memory VMs running OpenClaw 2026.4.23 or newer — previously the agent_end hook was being silently dropped, so nothing ever reached central pgvector
  • New `memory_save` tool lets the agent save explicit user-stated facts immediately, instead of waiting for the 4-message auto-ingest threshold
FixedOpenClaw 2026.4.23 introduced an authorization gate: non-bundled plugins that register typed conversation hooks (`agent_end`, `llm_output`, `before_agent_finalize`, `llm_input`) must opt in via `plugins.entries.<id>.hooks.allowConversationAccess=true` in `openclaw.json`, otherwise the gateway loads the plugin but silently strips the hook. The Tropic provisioning code in `api/src/vm/vm-provisioning.service.ts` was building the `tropic-memory` and `tropic-telemetry` entries without that field, so on every VM provisioned from an AMI shipping ≥ 4.23 the agent_end ingest never fired (memories failed to save) and per-LLM token usage telemetry stopped flowing. Older VMs (e.g. still on 4.12) were unaffected only because the enforcement didn't exist yet. Both entries now write `hooks: { allowConversationAccess: true }`, in all three places they're produced (the cloud-init `write_files` payload for both t3.micro and default plugin blocks, and the SSM-jq fallback used to push telemetry config to existing VMs). The schema acceptance for the field landed in OpenClaw 2026.4.24, so the AMI must install ≥ 4.24 — VMs stuck on exactly 4.23 will reject the new field with "Unrecognized key" and need an OpenClaw upgrade to pick up the fix.
AddedNew `memory_save` tool registered by `api/src/vm/tropic-memory-plugin/index.js`. Agents call it with `{ content: string }` whenever the user explicitly asks to remember/save/note something — it POSTs a synthetic two-turn conversation framed as a "remember this" instruction to `MEMORY_URL/ingest`, bypassing both the `MIN_NEW_MESSAGES = 4` floor and the 30-second debounce that normally gate auto-ingest. The server-side Claude extractor still produces the actual fact rows, so wording may be lightly rephrased but the substance is preserved. The plugin's promptBuilder was updated to instruct the agent to call `memory_save` in the same turn as a save request rather than waiting, and the activation banner now lists the new tool.
v1.14.142026-04-29
  • WhatsApp disconnect now shows a clear in-progress state and a success view, instead of silently doing nothing while the gateway restarts in the background
FixedIn `ui/components/settings/whatsapp-setup-modal.tsx`, `handleDisconnect` used to fire the `channels.logout` RPC, optimistically flip `connected` to false, and return — but `channels.logout` deletes the WhatsApp auth dir while the socket is still active, the gateway briefly stops responding to `/health`, and Tropic's own auto-restart logic in `vm.service.ts:checkGatewayViaHttp` then restarts the gateway. The modal had already moved on, the auto-pairing effect immediately kicked in and started generating a new QR, so from the user's side it looked like nothing happened — and clicking Disconnect a second time after the restart was the only way to actually finish the logout. The handler now keeps the button in a "Disconnecting…" spinner state and polls `/whatsapp/status` every 2s, ignoring failures while the gateway is restarting, until the status returns a non-connected state. At that point the modal swaps to a dedicated "WhatsApp Disconnected" success view with a Close button. Capped at 3 minutes; on timeout the user sees an inline retry message instead of a stuck modal.
v1.14.132026-04-29
  • EBS backups: tiered retention (7 daily / 4 weekly / 3 monthly), idempotent + parallel daily cron, errored snapshots auto-cleaned
  • New weekly cron deregisters stale AMIs and clears orphan "Final snapshot" snapshots across all 4 regions — measured ~$8/mo of dead AMI storage at the time of release
Changed`runDailyBackups` in `api/src/vm/backup.service.ts` now snapshots 5 instances in parallel via a worker-pool helper instead of one-at-a-time, and skips any instance whose newest Tropic-tagged AWS snapshot is less than 20 hours old. Errors are isolated per instance — one bad VM no longer aborts the rest of the run.
ChangedReplaced count-based `pruneAwsSnapshots` (keep 7 newest) with `applyTieredRetention`. Per instance: keep last 7 daily snapshots, plus 1 snapshot per ISO week for the last 4 weeks, plus 1 snapshot per calendar month for the last 3 months. Kept snapshots get a `BackupTier=daily|weekly|monthly` tag and the matching DB `expiresAt` is updated to the tier window so the UI banner stays accurate as snapshots graduate. Anything outside the tiers (or in `error` state) is deleted in the same pass.
Changed`pruneExpiredBackups` (04:00 UTC) is now a tier-retention catch-up over active instances first, then hard-expires by `expiresAt` only for backups whose instance is terminated/gone. Active-instance backups are governed solely by tier retention, so a stale `expiresAt` from a missed cron no longer causes accidental deletion.
AddedNew `BackupDate=YYYY-MM-DD` tag on every snapshot for traceability.
AddedNew `weeklyAwsCleanup` @Cron('0 5 * * 0') in the same service. Per region (ap-southeast-1, us-east-1, eu-west-1, eu-west-2): groups self-owned AMIs into `nemoclaw` vs `openclaw` families, keeps the 3 newest per family plus any AMI matching the pinned build timestamp (currently `1771920663`, the 2026-02-24 build), deregisters the rest and deletes their backing EBS snapshots. Also deletes any snapshot whose Description begins with "Final snapshot of orphaned volume". Set `AWS_CLEANUP_ENABLED=false` on Fly to disable without redeploying.
Changed`scripts/cleanup-aws-snapshots.sh` keeps its dry-run-by-default behavior as a manual escape hatch, with a header note pointing at the new cron.
v1.14.122026-04-26
  • Tropic-Hosted launch card now shows a model dropdown (Gemma 4 26B default, Gemma 3 27B)
  • Removed the legacy `chrome-headed.service` and Xvfb from the AMI; the OpenClaw browser tool now launches Chrome on-demand in headless mode
AddedAdded `gemma3:27b@tropic` to `ui/lib/model-providers.ts` alongside `gemma4:26b@tropic`. The launch modal's per-provider card already renders a select when the provider has 2+ models, so the Tropic-Hosted card now exposes both Gemma options without any further UI change.
RemovedDeleted `api/packer/files/chrome-headed.service` and dropped its install + enable steps from `api/packer/openclaw-ubuntu.pkr.hcl`. The unit was launching `/usr/bin/google-chrome` against an Xvfb display at boot with no `--remote-debugging-port`, so the OpenClaw browser plugin couldn't even talk to it — it was orphaned legacy from before OpenClaw's browser tool matured. On a 2GB t3.small it consumed ~600MB resident memory and competed with the gateway for disk I/O during cold start. Also removed `xvfb` from the apt install list since nothing else depended on it.
ChangedSet `browser.headless = true` in both writers of `openclaw.json` (`api/src/vm/vm-provisioning.service.ts` for runtime cold-provision and `api/packer/scripts/install.sh` for the AMI bake) so the on-demand Chrome the OpenClaw browser plugin launches doesn't require a display server. Added a final-stage AMI assertion that fails the build if `.browser.headless != true`.
v1.14.112026-04-25
  • Superadmin accounts no longer hit the 3-cloud-VM cap
ChangedThe `MAX_EC2_INSTANCES = 3` gate in `instances.service.ts` (cloud VM provision) and `backup.service.ts` (restore-to-new-VM) now skips the limit check when the requesting user has `isSuperAdmin = true`. The agents page `Add Machine → New Cloud VM` button and "Max 3 VMs reached" label are also gated on `useIsSuperAdmin()` so the UI matches the API.
v1.14.102026-04-25
  • New VMs no longer auto-archive their chat session at 04:00 UTC. OpenClaw's default daily reset would otherwise wipe visible chat history once per day, which is especially disruptive for users in timezones where 04:00 UTC falls during peak working hours.
FixedOpenClaw's default `session.reset` policy is `daily` at `atHour: 4` UTC. When a session is touched after that boundary, the gateway runs `evaluateSessionFreshness`, sees the session was last updated before the reset hour, and renames the transcript file with a `.reset.<timestamp>` suffix. The dropdown filters out `.reset.` files, so users lose visible history mid-conversation. The agent retains some context via the `session-memory` hook, but the visible chat appears to vanish. Now the runtime config in `api/src/vm/vm-provisioning.service.ts` sets `session.reset.mode = "idle"` (with no `idleMinutes`, so the staleness check is skipped entirely). The local-instance setup script (`api/src/instances/instances.service.ts`) sets the same via `jq`, and the Packer AMI build (`api/packer/scripts/install.sh`) bakes it in with a final-stage assertion so future builds fail loudly if it regresses.
AddedSettings page now has a Session Continuity card that surfaces the disabled status and explains what was happening before.
v1.14.92026-04-25
  • My Agents page now drops a card the moment a VM terminates, even when the termination came from the chat drawer instead of the dashboard buttons
Fixed`useAgentsPageData` only auto-refetched while an agent was `provisioning`, so out-of-band terminations (e.g. the chat drawer's `terminate_vm` tool) left the parent list stale: the per-VM `useVmStatus` poll picked up the new status and recoloured the badge to "Terminated", but the master list still had `status: 'running'`, so `visibleInstances` kept the card on screen until the user manually refreshed. Added a small effect inside the agent card that invalidates `agents-page-data` whenever the live `vmStatus.status` diverges from the cached `instance.status` and either side is `terminated`. Cheap (only triggers on a real divergence) and avoids broad polling.
v1.14.82026-04-25
  • OpenClaw chat in the dashboard drawer no longer drops trailing tokens on long responses
Fixed`onChatFinal` in `ui/lib/mcp-drawer-context.tsx` had the precedence backwards: `updated[idx].content || text` kept the accumulated `delta` content and ignored the `final` event's text. The gateway can batch the last tokens of a response into the `final` event without sending a matching last `delta`, so when that happens the trailing content was being silently dropped. Switched to `text || updated[idx].content` so the canonical complete message wins, with the delta accumulator only used as a fallback for empty-payload finals.
v1.14.72026-04-25
  • Cold-boot post-provisioning now runs over SSH instead of SSM, cutting the post-RunInstances wait from ~75s (SSM agent registration) to ~15-25s (sshd accepting connections)
ChangedNew `api/src/vm/ssh.helper.ts` (uses `ssh2`, already in package.json) provides `waitForSshReady` and `runOverSsh`. The cold-boot path in `vm-provisioning.service.ts` now waits for port 22 to accept connections (typically 15-25s after EC2 reaches the running state) and runs the entire post-provision command block via a single `sudo bash -lc` over SSH using the per-org ed25519 key already stored in `credentialsService.getSSHPrivateKey`. SSM is no longer in the cold-start critical path. Other SSM call sites (skill installs, settings sync on running VMs, etc.) are unchanged — only the cold-boot leg switched.
ChangedFly's sin egress IP `209.71.95.222/32` was already auto-added to the SSH (port 22) security-group ingress (see `vm-provisioning.service.ts:1627-1628`), so no SG changes were needed for the API to reach VMs over SSH.
v1.14.62026-04-25
  • Fixed a long-standing bug in /messages/callback that silently 404'd every VM-to-API reply — newly relevant because the public messaging API's ?wait=true and SSE stream both depend on this path
Fixed`MessagingService.handleCallback` looked up the pending message with `where: { userId: sessionKey }`, but `sessionKey` is a Clerk ID (e.g. `user_2abc...`) while `messages.user_id` stores the internal Tropic UUID. The two never matched, so the find returned null and the function threw `NotFoundException('Message not found')` on every callback. The dashboard chat masked this because the VM also delivers replies over its own WebSocket directly to the browser; v1.14.0's public API has no such fallback, so SSE / `?wait=true` clients never received assistant replies. Now we resolve `sessionKey` → user UUID first, then look up by `userId: user.id`. Same one-liner pattern as `getMessages`.
v1.14.52026-04-24
  • New VMs start ~25-45s faster. Post-provisioning no longer restarts openclaw-gateway when env vars were already written correctly by cloud-init, and SSM ready-detection polls every 2s instead of 5s.
FixedPost-provisioning env-sync in `vm-provisioning.service.ts` used to run `systemctl restart openclaw-gateway` unconditionally after writing TROPIC_* vars. On fresh VMs cloud-init had already written the identical values, so the restart was a no-op for the config but still SIGTERMed the in-progress first gateway start, forcing a second cold-start (+~25-45s of wasted Node.js module resolution and plugin init). Now the block tracks whether any value actually changed (via `grep -qxF` exact-match skip) and only restarts on a real change.
Changed`waitForSsmReady` in `vm.service.ts` polls `DescribeInstanceInformation` every 2s instead of every 5s. The AWS call itself is ~100-200ms; the 5s cadence was adding up to 5s of pure detection lag between the agent reporting Online and our code noticing.
v1.14.42026-04-24
  • Three compounding AMI bugs fixed: gateway health check now hits /health (was /, returned 503 on 2026.4.22 because the npm tarball ships without Control UI assets and triggered Tropic's auto-restart into a kill loop), TROPIC_* env vars now land in the systemd-visible environment file so sondera / policy-check / telemetry / tropic-memory actually activate, and openclaw is symlinked into plugin-runtime-deps so the whatsapp extension stops failing with "Cannot find package 'openclaw'"
FixedGateway health check in `api/src/vm/vm.service.ts:checkGatewayViaHttp` now hits `http://<vm>:80/health` instead of `http://<vm>:80/`. Gateway on 2026.4.22 returns 503 on `/` because the published npm tarball doesn't include Control UI assets (log line: "Missing Control UI assets at /usr/lib/node_modules/openclaw/dist/control-ui/index.html"). `validateStatus: status < 500` treated that as an error, our auto-restart branch at line ~4258 then issued SIGTERM to the gateway every poll cycle, and the VM sat in a kill loop with the UI stuck on "Gateway starting" forever. `/health` is wired up independently of Control UI.
FixedProvisioning now writes `TROPIC_API_URL`, `TROPIC_TELEMETRY_TOKEN`, `TROPIC_MEMORY_URL`, `TROPIC_INSTANCE_ID`, and `S3_AUDIT_LOGS_BUCKET` into `/etc/openclaw/environment` (both in cloud-init write_files and in the post-provision sed loop), not just `~/.openclaw/.env`. The openclaw-gateway systemd unit only has `EnvironmentFile=-/etc/openclaw/environment`; anything written to `~/.openclaw/.env` is invisible to the gateway process. Previously all four Tropic plugins (sondera, policy-check, tropic-telemetry, tropic-memory) logged "Missing TROPIC_..." and went inactive on every new VM. Also now restarts openclaw-gateway at the end of the post-provision commands so the new env takes effect immediately.
FixedAdded an `ExecStartPre` to `openclaw-gateway.service` that symlinks `/usr/lib/node_modules/openclaw` into every `/home/ubuntu/.openclaw/plugin-runtime-deps/*/node_modules/openclaw`. The npm 2026.4.22 package bundles extensions (whatsapp, telegram, etc.) that `import 'openclaw'` from a runtime-deps node_modules tree whose dependency list doesn't actually include the openclaw package itself — so Node's ESM resolver walks the tree, finds nothing, and the extension fails to start. Global install is at /usr/lib/node_modules/openclaw; linking it in lets the resolver find it. Runs on every service start so recreated runtime-deps dirs get relinked.
v1.14.32026-04-24
  • Provisioning now emits a `meta` block in openclaw.json so OpenClaw 2026.4.22's config-recovery heuristic stops auto-restoring our full config to the minimal AMI default on first boot
FixedOpenClaw 2026.4.22 fingerprints the config on every load. If the baseline "last-known-good" (the AMI's post-wizard backup) has a `meta` block and the current config doesn't, it flags `missing-meta-vs-last-good`, renames our file to `.clobbered.<timestamp>`, and restores the backup. That's defined in `src/config/io.observe-recovery.ts:422-432` of the openclaw repo. Our userdata-generated config never emitted `meta`, so every new VM lost its full Tropic config (plugins, controlUi allowedOrigins, sondera config, etc.) on first gateway start, leaving the gateway bound to loopback with only the `anthropic` plugin enabled and the UI stuck on "Gateway starting". Fix: include `meta.lastTouchedVersion` + `meta.lastTouchedAt` in the config we write at provisioning time so openclaw treats us as a version-aware writer rather than external tampering.
v1.14.22026-04-24
  • AMI and local-instance installers now pin OpenClaw to a real release (2026.4.22). Prior behaviour silently pulled npm `latest`, which is a main-branch build that rewrote Tropic-written configs on first boot.
FixedWhen `OPENCLAW_VERSION` was unset, `api/packer/scripts/install.sh` fell through to `npm install -g openclaw` — resolving to whatever OpenClaw publishes on the `latest` dist-tag. That has been a main-branch development build (reports its version as the *next* unreleased number, e.g. "2026.4.24"). The main binary clobbers the full `openclaw.json` written by Tropic's userdata and replaces it with a minimal onboarding config, dropping all four Tropic plugins (sondera, policy-check, tropic-memory, tropic-telemetry). The gateway then comes up with only the built-in plugin set and the UI sits on "Gateway starting" forever. Fix: `openclaw-ubuntu.pkr.hcl` now defaults `openclaw_version` to `2026.4.22` if the env var is empty, and `install.sh` hard-fails instead of falling through to an unpinned install.
FixedLocal-instance setup script in `api/src/instances/instances.service.ts` had the same unpinned `npm install -g openclaw` call on both npm and brew branches. Both now pin to `2026.4.22`.
v1.14.12026-04-24
  • AMI selection now compares versions, not tag strings — new VMs pick the latest AMI instead of getting stuck on an older one
FixedPacker's `OpenClawVersion` tag format drifted ("OpenClaw 2026.4.21 (f788c88)" → "openclaw 2026.4.22"). The AMI sort in `vm-provisioning.service.ts` was splitting the raw string on "." and comparing prefixes, so `"OpenClaw".localeCompare("openclaw") = 1` pushed the old-cased AMI to the top regardless of its version number. New VMs were therefore launching on the Apr 22 AMI and hitting the `tropic-memory-v2` plugin-dir name that predates the v1.12.0 rename. Extracted the semver portion with a regex before comparing, so the comparator is format-agnostic.
v1.14.02026-04-24
  • Public messaging API: send and receive agent messages using a tropic_live_* API key
  • New Server-Sent Events stream for real-time replies without WebSockets
  • Top-nav "API" link now exposes the documentation page that was otherwise unreachable
AddedPOST /instances/:userId/:id/messages — send a message to whichever agent is active on the instance. Default is async (returns { messageId, status: "processing" }); pass ?wait=true to block up to 120s for the reply in the same response. Rate limited to 30/min per key.
AddedGET /instances/:userId/:id/messages — list messages in chronological order. Supports ?after=<messageId> for polling clients and ?limit=N (default 100, max 500). Rate limited to 120/min per key.
AddedGET /instances/:userId/:id/messages/stream — Server-Sent Events stream of new messages on the instance, sourced from API sends, dashboard sends, or VM callbacks. 25s heartbeat keeps Fly/Cloudflare from timing out idle connections. Max 5 concurrent streams per user.
AddedMessage.instanceId column + (user_id, instance_id, created_at) index in messages table. New migration 20260424000000_message_instance_id. Existing rows stay instance_id = NULL — harmless, dashboard send path still works unchanged.
AddedTop nav on the marketing site now includes an "API" link between Marketplace and Releases, pointing at /docs/api. The documentation page existed but was previously only reachable from inline links inside /docs.
ChangedMessagingService now emits on a new in-process MessageEventBus after every user/assistant message write, so the SSE endpoint can fan out without polling the DB. Emissions happen in both the public send path and the existing /messages/callback path the VM uses to deliver assistant replies.
v1.13.12026-04-24
  • Fixed gateway crash-loop on new VM provisioning caused by `disabledRules` in the sondera plugin config being rejected by the deployed schema
Fixed`SonderaConfig.disabledRules` was in the TypeScript defaults and written to `.plugins.entries.sondera.config` at provisioning time, but the plugin manifest (`api/src/vm/sondera-plugin/openclaw.plugin.json`) had `additionalProperties: false` with no `disabledRules` entry. OpenClaw 2026.4.21 rejected the config with "must NOT have additional properties" and the gateway systemd unit restart-looped indefinitely. Fix: added `disabledRules` to the plugin's `configSchema` and introduced `stripUnknownSonderaKeys()` as a belt-and-braces workaround that drops any keys not in the current AMI's whitelist before the config is written — so new VMs from the existing AMI no longer crash, even before the AMI is rebuilt with the corrected schema.
FixedCall sites that write sondera config to `openclaw.json` now go through `stripUnknownSonderaKeys()`: `vm-provisioning.service.ts` (at provision) and `sondera-config.service.ts` (when the user edits their org config). Once the AMI is rebuilt and deployed, the strip can be removed.
v1.13.02026-04-24
  • New Automatic tab for Connect OpenAI: run `npx tropic-oauth <token>` and the browser flow hands itself back — no more copying the 404 URL
  • Manual tab (still available for users without Node) now aggressively warns that the 404 page is expected, before users ever see it
AddedPublished `tropic-oauth` on npm (v1.0.0). Zero-deps, ~80 lines of Node — binds `127.0.0.1:1455`, opens the OpenAI sign-in page, captures the OAuth callback, posts the code back to the Tropic API, and exits. Users never see the redirect URL.
AddedNew public API endpoints `POST /vm/openai-oauth/cli-init` and `POST /vm/openai-oauth/cli-complete` drive the CLI flow. Authorization is the opaque one-time `cliToken` returned by `/start` (base64url of userId + instanceId + state) — no Clerk session needed in the terminal. Plus `GET /vm/:userId/:instanceId/openai-oauth/status` so the modal can poll for completion.
Added`metadata.openaiOAuthStatus` on ManagedInstance records pending/completed/failed (with email or error). The modal polls every 2s and renders the result without the user doing anything.
ChangedConnect OpenAI modal rewritten with an Automatic / Manual tab split. Automatic is the default and recommended path. Each tab links across to the other if the user gets stuck.
ChangedManual tab now shows a prominent amber callout before the user even starts: "You WILL see a broken page. This is expected." Plus a second callout once the flow is in progress reminding them to copy the entire URL even though the page looks dead.
v1.12.42026-04-24
  • Stripped the last remaining BYOK token prices from dashboard + settings — Tropic charges compute, never tokens
RemovedDashboard "Set Up Your LLM API Key" provider dropdown no longer shows `$X in / $Y out per 1M tokens` subtitles. Each provider row now shows only its env var name (e.g. `OPENROUTER_API_KEY`). Rationale: BYOK means the user pays their provider directly — rendering that price next to our credit system implies Tropic charges for tokens. It doesn't.
RemovedSettings → Model dropdown no longer shows per-model token pricing subtitles or the pricing summary below the dropdown. Same reason. `formatModelPricing` is no longer imported in `settings/page.tsx` or `dashboard/page.tsx`.
v1.12.32026-04-24
  • Launch modal: one card per provider with a dropdown inside when there are multiple models
  • Dropped BYOK token prices from launch cards — those were the user's own provider bill, not Tropic's
ChangedVM launch modal model picker now groups by provider. Each configured provider renders as a single card (e.g. "OpenRouter", "Anthropic"). If the provider has 2+ models in `MODELS`, a compact `<select>` inside the card lets the user pick the specific model. Single-model providers show no dropdown. Fixes the long scrolling list that previously exceeded the viewport when a user had OpenRouter configured (5 routes) or Anthropic (3 tiers).
RemovedToken pricing lines like "$3 in / $15 out per 1M tokens" no longer render on launch modal cards. Tropic is BYOK — those prices are the user's own provider bill, never what Tropic charges. Only the Tropic compute rate (`cr/hr`) is shown. `formatModelPricing` remains available for places where BYOK pricing is contextually appropriate (e.g. Secrets).
ChangedReverted the v1.12.2 `MODELS` pruning. The 5 OpenRouter routes (Auto, Claude Sonnet 4.5, Gemini Pro 1.5, DeepSeek Chat, Kimi K2.5) are back. The right fix was UI grouping, not data removal — users still need to be able to pick between those routes via the in-card dropdown.
v1.12.22026-04-24
  • OpenRouter collapses to a single entry in the launch modal
ChangedRemoved redundant per-route OpenRouter entries (`openrouter/anthropic/claude-sonnet-4.5`, `openrouter/google/gemini-pro-1.5`, `openrouter/deepseek/deepseek-chat`, `openrouter/moonshotai/kimi-k2.5`) from `ui/lib/model-providers.ts`. OpenRouter is BYOK — enumerating its routing destinations implied Tropic had separate integrations with Anthropic/Google/DeepSeek/Moonshot, which it doesn't. Kept a single `openrouter/openrouter/auto` entry labelled "OpenRouter (auto)" so users get one option per provider key.
v1.12.12026-04-24
  • VM launch modal now only offers models the user has actually configured
FixedThe "Choose a model" modal on the Agents page was always showing GPT-5.4 (ChatGPT OAuth) and GLM-4.7 (Ollama) regardless of user setup. Now: OAuth appears only if the user has connected ChatGPT OAuth (`OPENAI_OAUTH_CONNECTED` secret present), the hardcoded GLM-4.7 entry is gone (users couldn't configure it anyway), and an empty state links to Secrets when no LLM is set up.
FixedDefault model selection no longer falls back to `openai-codex/gpt-5.4` when the user has no configured keys — it leaves the selection empty so the modal's empty state renders instead.
v1.12.02026-04-23
  • Per-launch toggle for Tropic RAG memory — opt in on the VM launch modal, or keep memory-core for parity with existing instances
  • Plugin renamed from tropic-memory-v2 to just tropic-memory
AddedNew "Tropic RAG memory (experimental)" checkbox on the VM launch modal. Unchecked by default — memory-core stays as the memory provider, matching pre-v1.11 behaviour. Checked → provisioning flips `plugins.slots.memory` to `tropic-memory`, removes `memory-core` from `plugins.allow`, and installs the slot-owner plugin. Choice is stored per-instance in `managed_instances.use_tropic_memory`.
ChangedPlugin renamed `tropic-memory-v2` → `tropic-memory` across plugin manifest, packer install.sh, provisioning service, and the plugin file itself. Checked for conflicts: no openclaw bundled plugin uses that name (memory-core, memory-lancedb, memory-wiki, active-memory are the occupants of the `memory` namespace). Legacy `rm -rf .../tropic-memory-v2` cleanup stays in place for defensive removal on t3.micro.
Changedv1.11.0 made Tropic memory the hard default at provisioning; v1.12.0 reverts that default to memory-core and makes Tropic memory opt-in, since it's still experimental.
v1.11.02026-04-23
  • Agents now retrieve cross-session memory via memory_search — the Tropic memory service replaces memory-core's local SQLite as the authoritative memory backend
Changedtropic-memory-v2 is now the openclaw memory slot owner (`plugins.slots.memory = "tropic-memory-v2"`), displacing memory-core entirely. The local-file + SQLite design didn't survive EC2 churn — memory vanished every time a VM was rebuilt, and copying state between instances via EBS was unreliable. All memory now lives in the central pgvector-backed Tropic memory service, shared across VMs and sessions.
AddedPlugin registers `memory_search` and `memory_get` tools backed by the memory service's `/recall` endpoint. Agents are prompted to use memory_search before answering anything about prior work, preferences, names, decisions, or ongoing projects. Verified end-to-end on a live VM: the agent correctly recalled user-specific email-filtering rules from facts ingested in a prior session.
Removedmemory-core removed from the default `plugins.allow` list during VM provisioning. New VMs come up with the Tropic memory slot as the only memory provider. Existing running VMs need an AMI rebuild + replacement to pick up the new slot wiring.
v1.10.02026-04-23
  • Books v2: entity-scoped reconciliation loop end-to-end — bank statement upload creates BankStatement + BankEntry rows, auto-match finds candidates, confirm/unmatch flow persists
  • Books landing demo on books.tropic.bot — click the invoice and statement tiles to watch live AI extraction fly in, then auto-reconcile highlights the two partial payments that add up to the invoice total
AddedBooks reconciliation engine wired to real bank statement data. `extraction.processor` now persists `BankStatement` + `BankEntry` rows when a statement PDF is uploaded, and `EntityReconciliationService` aggregates matches, stats, pending count, and bank-entry/transaction rows into a single entity-scoped DTO consumed by the new `/books/reconcile` view.
AddedLanding demo at `books.tropic.bot/` — two sample document tiles (invoice + bank statement) trigger live Anthropic extraction on click, cached per sample after the first run. A `POST /demo/reconcile` endpoint runs exact + fuzzy matching and a small subset-sum search for partial payments, so the demo resolves to "Invoice $284 = 2 partial payments" automatically.
AddedSynthetic sample PDFs (`invoice.pdf`, `statement.pdf`) generated by `scripts/generate-demo-samples.ts`, committed to both `books-api/assets/demo/` and `books-ui/public/demo/` so the landing page can preview them and the API can feed them to Claude.
FixedBooks Dockerfile now copies the `assets/` directory into the runtime image. Prior builds shipped without the demo PDFs, so `/demo/extract` failed at runtime with ENOENT on `/app/assets/demo/invoice.pdf`.
FixedDemo reconcile normalises signed bank entry amounts before matching — the AI extracts debits as negative numbers but the invoice amount is positive, so the subset-sum was silently returning no matches on the landing page.
Fixed`EntityReconciliationService.getForEntity` no longer drops legitimate `aiConfidence = 0` scores as `null`.
v1.9.12026-04-22
  • Internal cleanup: shared OpenClaw shell helpers + safer jq interpolation. No user-visible changes.
ChangedExtracted `OcShellHelper` so `agents`, `vm`, `secrets`, and `sondera-config` services share one implementation of the OC_USER/OC_DIR/OC_HOME shell preamble and the platform-aware gateway-restart command. Three near-identical private copies removed; 16 call sites consolidated.
FixedHardened every dynamic `jq --argjson` shell-interpolation site with a new `jqArgEscape` helper. A JSON value containing a literal single quote could previously close the bash quoting and inject as shell tokens. Not exploitable today (only trusted internal callers), but the foot-gun is closed.
AddedPlaywright smoke test for the `/policies` route — confirms the prod page redirects to sign-in cleanly for anon visitors and the org-scoped `/sondera-config` API endpoint returns 401 unauthenticated.
v1.9.02026-04-22
  • Sondera security packs are now configured at the workspace level — toggles on the Policies page apply to every new VM you provision and push to all running VMs
  • Policies page renders for brand-new users with no instances yet (previously showed two empty cards)
AddedNew `sondera_config` column on `organizations` plus `GET /sondera-config` and `PATCH /sondera-config` endpoints. PATCH is admin-gated; read is open to all members. Toggling a pack persists to the org and fire-and-forgets an SSM push to every running VM in the workspace so live instances stay in sync without a manual restart.
AddedVM provisioning reads the workspace's stored Sondera config and bakes it into `openclaw.json` at create time, so new VMs come up with the right packs already enabled. Workspaces that have not configured anything still get the prior defaults (Base on, System/OWASP off).
ChangedThe /policies page now renders the security packs unconditionally — the "no running instances" empty state is gone. The agent policies section shows a "No agents deployed. Go to Agents to deploy one." banner when the workspace has no agents yet.
RemovedPer-instance Sondera config endpoints `PATCH /agents/sondera/config` and `GET /agents/sondera/config/:instanceId`. Replaced by the workspace-scoped endpoints above.
v1.8.52026-04-21
  • Outlook Mail skill stops returning "Id is malformed" when operating on messages, folders, or rules whose IDs contain `/` or `+`
FixedThe `outlook-mail` wrapper was interpolating Microsoft Graph message, folder, and rule IDs directly into URL paths. Graph IDs are URL-safe-ish base64 that routinely contain `/`, `+`, and `=`; an unencoded `/` became a new path segment and Graph rejected the request with `ErrorInvalidIdMalformed`. The wrapper now URL-encodes every ID before building the request URL, so `+read`, `+move`, `+delete`, `+mark-read`, `+mark-unread`, `+rule-delete`, and `+list --folder <id>` all work regardless of which characters the ID happens to contain.
v1.8.42026-04-21
  • Outlook Mail skill gains move, delete, mark read/unread, and inbox-rule commands via Microsoft Graph
  • Skills install drawer now lets you connect another Outlook or Google account when one is already linked
Added`outlook-mail` CLI commands: `+move <messageId> --to <folderId>`, `+delete`, `+mark-read`, `+mark-unread`, `+rules`, `+rule-create` (with `--name`, `--from`, `--subject-contains`, `--move-to`, `--mark-read`, `--delete`, `--sequence`), and `+rule-delete`.
ChangedOutlook OAuth now requests `Mail.ReadWrite` and `MailboxSettings.ReadWrite` in addition to `Mail.Read` and `Mail.Send`.
AddedSkills install drawer dropdowns for Outlook and Google accounts now include a "+ Connect another account" option, which re-runs the OAuth consent flow. Previously the dropdown only showed existing accounts, with no way to add another.
v1.8.32026-04-20
  • WhatsApp and Telegram pairing work again for VMs provisioned in teams or workspaces whose slug differs from the owner's user ID
FixedWhen the org-scoping rollout on April 17 moved VM subdomain generation from the owner's Clerk ID to the organization slug, the Tropic API's gateway RPC client kept building its `Origin` header from the Clerk ID. The gateway's `allowedOrigins` list is built from the instance's actual subdomain, so any VM whose subdomain differed from `user_<clerkId>` rejected every connect frame with `CONTROL_UI_ORIGIN_NOT_ALLOWED` (close code 1008). Symptom: WhatsApp QR pairing, Telegram status polls, Slack activation, and config read/patch calls all failed with "origin not allowed" on newly provisioned VMs. The API now derives its `Origin` header from `instance.subdomain`, matching what the VM's allowlist was built from. No VM-side change is required — existing broken VMs start working as soon as the API deploys.
v1.8.22026-04-20
  • Cross-session memory actually records again — chats from the UI and CLI now land in the RAG memory service
FixedThe memory-v2 OpenClaw plugin depended on the `message_received` hook to buffer user turns, but that hook only fires for channel-delivered messages (Slack/WhatsApp/etc.) — not for gateway or CLI chats, which only fire `before_prompt_build`, `llm_output`, and `agent_end`. On top of that, `agent_end` and `llm_output` were resolving different session keys (long `agent:<slug>:<channel>:<id>` vs short `<id>`), so even the assistant messages that did get buffered were looked up under a key that never matched. Combined effect: in ~5 days of production chats on a running VM, not a single /ingest call was attempted. Plugin rewritten to source messages from `agent_end` `ev.messages` (the full `AgentMessage[]` for the turn) keyed off `ctx.sessionId`. Works uniformly for gateway, CLI, and channel-delivered turns.
v1.8.12026-04-20
  • Agent-driven clawhub skill calls against the Tropic registry stop failing with "skill.stats: invalid value"
FixedThe Tropic marketplace `/v1/skills` and `/v1/skills/:slug` responses were missing a `stats` field that newer clawhub CLI builds validate as a required key. ArkType rejected every response with `skill.stats: invalid value`, so any agent that invoked `clawhub search`/`clawhub install` on a VM surfaced the error back to chat. Both endpoints now return `stats: {}`, which satisfies the schema.
v1.8.02026-04-19
  • Outlook Mail skill with OAuth — connect a Microsoft 365 / personal Outlook account and give agents read/send access to email via Microsoft Graph
AddedNew marketplace skill "Outlook Mail" (Tropic-authored, free). Installs an `outlook-mail` CLI on the VM with commands for +list, +read, +send, +folders, +me, plus raw graph access. Token refresh is automatic — no access tokens are ever stored on disk.
Added`/outlook-connections` API (authorize, callback, list, delete) using Microsoft identity platform `/common` endpoint. Scopes: User.Read, Mail.Read, Mail.Send, offline_access. Refresh tokens are encrypted at rest with the same AES-256-GCM scheme used for Google connections.
AddedSkills install drawer now prompts to "Connect Outlook Account" when a skill declares `MICROSOFT_REFRESH_TOKEN` in its env_vars, mirroring the Google Sheets OAuth UX.
v1.7.72026-04-18
  • Agent card no longer falsely prompts "No model configured" for Ollama or other custom providers
FixedThe "No model configured — Set an API key" banner used to trigger for any provider not in the hardcoded PROVIDERS list (e.g. ollama/devstral:24b, or any other custom-typed provider). Logic now trusts unknown providers and hides the banner; only shows for providers we know require an env var that is missing.
v1.7.62026-04-18
  • Save Model now actually confirms the VM applied the new default — no more silent fire-and-forget
FixedThe /settings/:userId/openclaw/model endpoint now polls SSM until the command terminates and then reads `.agents.defaults.model.primary` out of openclaw.json on the VM to verify the write took. If the running VM(s) fail to apply the model, the API returns 500 (or `partialFailures` when some succeed) instead of returning success while the VM silently kept the old model.
FixedThe openclaw CLI is now invoked as the openclaw user (ubuntu on EC2, resolved from ocHome on local) so state lands in the correct `.openclaw` directory, not root's home.
ChangedModel string is shell-escaped before interpolation. The v1.7.5 regex loosening allowed dots, colons, and extra slashes, which would otherwise have been a command-injection foot-gun inside the SSM shell command.
v1.7.52026-04-18
  • Custom model input in Settings now actually applies — any provider/model-id is accepted
FixedThe "Custom..." option in the model dropdown visually reflects the typed value after Apply. Previously the state updated but the trigger kept showing the placeholder, making it look like nothing was selected.
FixedBackend /settings/:userId/openclaw/model no longer rejects multi-segment model ids. Strings like openrouter/arcee-ai/trinity-large-preview:free and vercel-ai-gateway/anthropic/claude-opus-4.6 now save. The validation is aligned with the per-instance model endpoint.
v1.7.42026-04-18
  • BYOK provider keys (OpenRouter, OpenAI, Gemini, XAI, Groq, Mistral, Together, Moonshot, Z.ai) now actually reach the running gateway
FixedProvider API keys saved in Settings are now written to the agent's auth-profiles.json (workspace + each agents/<slug>/agent), not just ~/.openclaw/.env. The gateway's systemd unit only loads /etc/openclaw/environment — a provision-time snapshot — so keys added later via ~/.openclaw/.env never reached the running process and OpenClaw's model calls failed with "No API key found for provider". Applies on secret save, secret delete, and fresh-instance provisioning.
FixedDeploying a new agent now inherits every provider profile from workspace/auth-profiles.json, not only openai-codex OAuth. Previously a newly deployed agent could end up with no credentials for a user's OpenRouter / OpenAI / other BYOK key until re-saved.
v1.7.32026-04-17
  • Workspace members can now see VM statuses for VMs launched by other members of the workspace
FixedVM status, gateway status, and all per-instance endpoints now authorise against the workspace that owns the instance instead of the calling user. Members viewing a workspace VM launched by an Owner no longer see "Instance not found" from the 40+ VM service methods (status, start, stop, workspace files, agents, skills, etc.).
FixedAfter starting a machine, the gateway status badge now updates immediately instead of showing "Gateway starting" for several seconds while the SSM round-trip completes.
v1.7.22026-04-17
  • Removed the orphaned High Availability tier and fixed test regressions from the org-migration cleanup
RemovedDropped the High Availability ($35/mo) subscription tier from the credits page. There was no matching live Stripe product behind it, so the button would have failed on click. Day and Standard are the active subscription tiers.
RemovedBackend: removed always_on plan key from CreditsService price maps, credit allowance table, and plan detection. If you want a High Availability tier back, create the Stripe product and add the env vars.
FixedUpdated service tests (metrics, vm, auth, agent-tasks, cost-tracking) with the post-migration mocks. Tests that referenced the dropped User.creditBalance / User.useSpotInstances columns now use OrgMembership + Organization. The one obsolete test that asserted credit deduction in the VM-metrics cron is gone, replaced with a test that verifies the cron only stops VMs at zero balance.
v1.7.12026-04-17
  • Cleanup pass to finish the multi-user migration — legacy User billing columns dropped, all reads/writes unified on Organization
  • Metrics cron no longer double-writes credit balances
RemovedDropped from users table: plan, planInterval, stripeCustomerId, stripeSubscriptionId, planStartedAt, creditBalance, region, ec2InstanceType, useSpotInstances, openclawVersion. All of these live on Organization now. planTimezone stays on User (it's a personal preference).
RemovedRemoved the credit-deduction logic from the VM metrics cron. That 5-min job was silently writing to User.creditBalance in parallel with the org-scoped CreditsService, causing drift. The cron now only enforces the "zero balance stops VMs" safeguard.
Changedauth.service: user sign-up now auto-creates a personal Organization + owner membership and grants the signup bonus to the org credit pool (not the user).
Changedadmin.service.getAllInstances: lists EC2 instances grouped by Organization (not User) so the shared credit balance shows correctly in the admin view.
Changedactivity-monitor day-plan sleep enforcement: queries Organization.plan = "day" with the owner's personal timezone from User.planTimezone.
Changedvm-provisioning: the OpenClaw version preference now reads from Organization.openclawVersion (was User.openclawVersion).
v1.7.02026-04-17
  • Per-seat pricing — each additional teammate is 75% of the base subscription price and adds 75% of the credit allowance to the shared pool
  • Promo codes work at checkout for both subscriptions and one-off credit packs
ChangedSubscription checkout uses two Stripe line items: one base seat at full price plus extra seats at 75% of base. Total monthly cost on each plan card now reflects your seat count (e.g. Standard at 3 seats = $18 + 2 × $13.50 = $45/mo).
ChangedCredit allowance scales with seats. Standard at 3 seats = 1,460 + 2 × 1,095 = 3,650 credits/month. Numbers shown live on the credits page.
AddedStripe Promotion Codes enabled on both the subscription checkout and one-time credit pack checkout (allow_promotion_codes: true). Codes are managed in the Stripe dashboard.
AddedsyncSeatCount: when an invitation is accepted or a member is removed, we push the new seat count to Stripe and grant prorated credits for the remainder of the current billing period.
ChangedCreditsService fully migrated to org-scoped: balance, transactions, subscription create, webhooks (checkout.session.completed, invoice.paid, customer.subscription.updated/deleted), grant, and the annual drip cron all read and write Organization.creditBalance instead of User.creditBalance.
AddedWebhook subscription updates derive seat count from line item quantities so seat changes made in Stripe (manual edits, dunning, etc.) flow back into the org.
v1.6.02026-04-17
  • Multi-user organizations are live — invite teammates by email, give them admin or member access
  • Roles enforced everywhere: owner controls billing, admins manage infra, members can use what is provisioned
AddedNew /settings/members page with invite-by-email, role management, and remove flows. Pending invitations table shows what is outstanding.
AddedTokenized email invitations via Postmark. The invite link goes to /accept-invite?token=..., which prompts sign-in if needed and joins the workspace on confirm. 7-day expiry, single-use, scoped to the email it was sent to.
AddedOrg switcher in the sidebar (only appears when you belong to more than one workspace). Switching reloads all queries scoped to the new org.
AddedCapability-aware UI: useOrg hook exposes can.manageBilling, can.manageMembers, can.launchInstances, can.installPaidSkills based on your role in the active org. Members visiting /credits or /settings/members now see a clear "ask your owner/admin" message instead of a 403.
AddedSidebar: new Members link.
ChangedMembers can install free skills (installCost = 0 AND hourlyCost = 0) but get a clear 403 with explanation on paid ones.
ChangedOwner-only endpoints: subscribe, top-up credits. Admin-or-Owner endpoints: launch/delete instance, delete agent, create/revoke API key, save/delete Claude key, rotate SSH key.
ChangedAPI client now sends X-Tropic-Active-Org header on every request, set from a localStorage value persisted by the org switcher. Backend ClerkAuthGuard reads it and verifies the user is a member of that org. Falls back to the user's personal org when unset, so solo users see no behavior change.
v1.5.12026-04-17
  • Backend services are now organization-scoped — every query reads and writes by organizationId instead of userId
  • No user-facing behavior change for solo accounts; this is the second-to-last piece before multi-user goes live
ChangedCredentialsService: all methods resolve clerkId → personal orgId; reads/writes OrgCredential; sync-to-VM filters managed_instances by organizationId so admin credential changes propagate to every org VM.
ChangedSettingsService: all settings (allowed-commands, phone allowlist, spending limits, OpenClaw model/skills, gateway config, OS restrictions, workspace files) are now read/written to OrgSetting. Instance-type, spot preference, and openclaw-version moved from User → Organization. Timezone stays personal.
ChangedUserSecretsService: API keys for LLM providers are now stored in OrgSecret and shared across org members. Also fixed a pre-existing bug where instance sync queried by clerkId against managedInstance.userId (which references User.id, not clerkId) and always returned no instances.
ChangedPoliciesService: policies are now org-owned. PolicyProfile.organizationId is the ownership scope; userId stays as the creator audit field.
ChangedApiKeysService: API keys are now org-owned with createdByUserId audit. All org members share the same keys. Revoke checks organizationId, not userId.
ChangedInstancesService: instance listing, creation, and authorization all scope by organizationId. Subdomains now derive from org slug. createEc2Instance reads defaults (region, ec2InstanceType) from Organization.
ChangedAgentsService: every query (findAllForUser, getPageData, deploy, telemetry/sondera status, instance env) scopes by organizationId. resolveInstance verifies org ownership, so org members can operate on shared VMs.
AddedOrgInvitation model (unused in UI yet — will power the email-based invite flow in the next release).
v1.5.02026-04-17
  • Foundation work for multi-user organizations and per-seat pricing — schema, RBAC plumbing, and personal-mode fallback
  • No user-facing changes yet; existing solo-account behavior is preserved
AddedOrganization, OrgMembership, and OrgInvitation models in the schema, plus organizationId columns (nullable for now) on every owned table (instances, agents, credits, policies, API keys, etc.).
AddedSide-by-side org-keyed credential, setting, and secret tables (org_credentials, org_settings, org_secrets) ready for service-layer migration.
AddedRole type (owner/admin/member) with hierarchical RolesGuard and @RequireRole decorator. Registered globally; not yet applied to any endpoint.
AddedClerkAuthGuard now resolves the active organization from a tropic_active_org cookie/header and attaches orgId + role to req.auth. Falls back to the user's personal org when unset, so solo users see no behavior change.
AddedSelf-hosted org membership (no Clerk B2B add-on dependency). Seed script + SQL migration that wires existing users into auto-created personal orgs. Migration not auto-applied — run manually before re-deploying.
AddedOrganizationsService skeleton (findById, findByClerkOrgId, findPersonalOrgForUser).
v1.4.12026-04-12
  • Silenced unauthenticated WebSocket upgrade log spam that was flooding API logs
FixedUnauthenticated WebSocket upgrade retries (browser Control UI reconnecting before the session cookie is set) were producing 4 log entries each at LOG/WARN/ERROR levels every ~15 seconds. Removed the full headers dump, suppressed per-request LOG and WARN for WS upgrades, and stopped logging 401s as ERROR in the middleware catch.
v1.4.02026-04-12
  • Browser-based terminal for every running instance — open real shells to your EC2 and local VMs directly from the dashboard
  • Sessions persist across reconnects, device switches, and API redeploys via tmux
AddedNew /terminal page with sidebar + tabbed terminal panes powered by xterm.js. Multiple concurrent tabs per instance; tmux auto-attach keeps sessions alive across reconnects.
AddedTerminal icon button on every running agent card that deep-links to /terminal and auto-opens that instance in a new tab.
AddedRuntime enablement for local instances: sshd is enabled, tmux is installed, and the Tropic pubkey is trusted on first terminal open via a one-time SSM migration. Already-registered instances upgrade transparently with no user action.
AddedHost-key pinning on every terminal connect — a reinstalled local host will flag a loud warning instead of being silently accepted.
FixedCredentialsService.rotateSSHKey no longer hardcodes /home/ubuntu on local instances — the new pubkey now lands in the real user's authorized_keys, branched on metadata.platformUser and metadata.platformType.
AddedLocal instance setup now captures platformUser + platformType during the setup callback, with a nightly SSM-based backfill cron for already-registered instances.
v1.3.02026-04-12
  • Telegram bots can now be opted into specific groups via a guided /start@bot handshake
  • Fixed a silent routing bug where the agent you picked when adding a Telegram bot never actually took effect
AddedPer-bot "Use in a group" flow in the Telegram drawer. Add your bot to a Telegram group, send /start@botusername as yourself, click the button in Tropic — we look up the group via getUpdates, show you the match, and on confirm write channels.telegram.accounts.<bot>.groups.<groupId> with { enabled: true, groupPolicy: "allowlist", requireMention: true, allowFrom: [your user id] }. Re-running for another teammate appends to allowFrom.
AddedNew bots now default to account-level groupPolicy: "disabled" so they ignore every Telegram group they're added to until you explicitly opt a group in. No more surprise group replies.
AddedAPI endpoints: POST /vm/:u/:i/telegram/bots/:accountId/discover-group, GET/POST /vm/:u/:i/telegram/bots/:accountId/groups, DELETE /vm/:u/:i/telegram/bots/:accountId/groups/:groupId.
ChangedTelegram drawer add-bot form now requires an agent selection — there is no "unagented" mode, a bot must always be bound to an agent. Removed the confusing "Group channel / DM agent: Default" dropdown options I added in an earlier draft; those were incorrect framings of how OpenClaw's telegram channel actually routes.
ChangedRemoved the old top-level Groups section of the Telegram drawer. It wrote per-group peer bindings without any allowlist, which was orthogonal to the new per-bot opt-in flow and confusing to have alongside it.
FixedwriteChannelBinding in vm.service.ts was hardcoding match.accountId to the literal string "default" for every channel binding it wrote. OpenClaw's routing normalizes account IDs at lookup time (src/routing/resolve-route.ts), so a binding keyed to "default" never matched a real bot account like "my_support_bot" — the selection silently fell through to resolveDefaultAgentId. Fixed: writeChannelBinding now takes an explicit accountId parameter; Telegram multi-bot passes the real bot accountId, legacy Telegram and WhatsApp continue to pass "default" (which is correct for their single-account shapes).
v1.2.52026-04-11
  • Metrics "Spend Over Time" chart is actually readable now
FixedSpend Over Time tooltip header was rendering dark-on-dark and was effectively invisible. Added explicit labelStyle so the timestamp header is readable on the dark tooltip background.
FixedSpend Over Time x-axis was labelling every bucket with just the date, so at 1h/6h/24h ranges every 5-minute/30-minute/1-hour bucket looked identical ("11 Apr", "11 Apr", "11 Apr"...). For sub-day ranges the axis now shows HH:MM per bucket; the tooltip header shows the full date + time so you can tell exactly which bucket you are hovering.
FixedSpend Over Time hover cursor was drawing a loud default rectangle that visually spilled across two adjacent bar slots, which combined with the duplicate "11 Apr" labels made it look like the tooltip was showing $0 for a bucket that had bars. Replaced with an explicit subtle cursor fill so the highlighted slot is unambiguous.
v1.2.42026-04-11
  • Connect OpenAI banner now reliably appears on new OAuth VMs that haven't had tokens pushed yet
FixedConnect OpenAI banner was missing on freshly provisioned Cloud VMs that picked GPT-5.4 (Codex OAuth) at launch. The check was using "live VM model is openai-codex" as a proxy for "OAuth tokens are pushed", but provisioning sets the live model to openai-codex regardless, so the proxy was always true and the banner never appeared. Replaced with explicit per-instance metadata.oauthConnected — set false at provision time for openai-codex VMs, set true once completeOpenAIOAuth has actually pushed the tokens.
v1.2.32026-04-10
  • Cold-start is dramatically faster — the plugin reload storm and opus native-compile loop are both gone
  • memory-v2 plugin and diagnostic events are now reliably registered on every new VM
FixedCold-start reload loop that was running ~46 plugin re-activation cycles over ~2 minutes on every fresh VM boot, spiking CPU/memory and pushing 1 GB into swap. The loop is gone.
Fixed@discordjs/opus native addon is now pre-compiled during AMI build (build-essential + python3 added to the toolchain and `npm rebuild --build-from-source` forces the compile in a controlled environment), instead of silently failing at AMI build time and then retrying the install on every cold-start.
Fixedtropic-memory-v2 plugin and diagnostic events were silently stripped from openclaw.json at some point between AMI save and gateway startup, leaving the plugin auto-discovered from disk but never officially registered (which triggered the reload loop) and disabling model.usage / tool.loop telemetry events. Defensive re-register of both now runs in vm-provisioning right before systemctl start openclaw-gateway, so they survive whatever was stripping them.
Fixedtropic-telemetry plugin was calling api.on("onDiagnosticEvent", ...) which openclaw silently ignored as an unknown hook — the plugin received zero diagnostic events. Switched to the correct openclaw/plugin-sdk top-level import pattern (onDiagnosticEvent(handler)) so model.usage and tool.loop events actually reach the Metrics dashboard.
Fixedinstall.sh plugin-register jq step was using `jq ... && mv ...` which silently swallowed jq failures under set -e. Split the chain, capture exit code explicitly, and hard-fail the AMI build with a final assertion block if tropic-memory-v2 or diagnostics.enabled aren't in the final openclaw.json. Silent-bad-AMI regressions can no longer ship.
Removedlossless-claw stranded artifacts (1.1 GB of node_modules from the old cross-session memory plugin) were still being installed on VMs through a combination of dead provisioning code paths. Removed the dead code and cleaned up the AMI bake.
RemovedDead deployLosslessClaw method from agents.service.ts + POST /agents/lcm/deploy endpoint from agents.controller.ts — nothing called them.
RemovedDead deployMemoryV2Plugin method from vm-provisioning.service.ts — superseded by AMI baking, never called after that.
AddedDiagnostic checkpoint logging throughout install.sh — every openclaw.json mutation now prints a one-line summary (size, memory-v2 present, diagnostics enabled, meta present) so future packer builds reveal where regressions happen instead of silently producing broken AMIs.
AddedSecureClaw install + quick-harden + quick-audit now runs before the Tropic plugin registration block in install.sh (defensive ordering — see memory/feedback_secureclaw_ordering.md for the backstory).
v1.2.22026-04-10
  • WhatsApp allowlist saves now land in the per-account config OpenClaw actually enforces
FixedWhatsApp allowlist modal — saved phone numbers were written only to top-level channels.whatsapp.allowFrom, leaving the enforcement-canonical channels.whatsapp.accounts.default.allowFrom stale. jq now updates both locations (and dmPolicy) to match the provisioning shape.
v1.2.12026-04-02
  • FFmpeg Video Editor skill in the marketplace — combine clips with transitions, add text overlays, and more
AddedFFmpeg Video Editor marketplace skill — cut, trim, merge with 35+ transition types (crossfade, dissolve, wipe, slide, zoom, etc.), add styled text overlays, lower thirds, watermarks, and more.
v1.2.02026-04-01
  • Chain of Thought visualization on the Logs page — see agent reasoning, tool calls, and subagent trees per turn
  • Enriched telemetry plugin captures thinking blocks, full tool inputs/outputs, and run tracking
AddedChain of Thought view on Logs page — toggle between flat event list and per-turn COT visualization showing thinking blocks, tool calls with inputs/outputs, and final responses.
AddedSubagent tree nesting — COT view renders parent/child agent relationships with visual indentation and dashed borders.
AddedEnriched telemetry plugin — now captures thinking blocks, full tool input/output (up to 10KB), run IDs, and parent-child run tracking.
AddedGET /telemetry/cot endpoint — returns telemetry events grouped by turn with nested subagent children.
AddedrunId filter on GET /telemetry/events — filter existing event log by specific run/turn ID.
v1.1.02026-03-28
  • Spot vs on-demand instance toggle — choose your pricing mode in Settings
  • Differentiated credit rates: on-demand 2.5x spot for compute cost transparency
AddedInstance pricing mode setting — toggle between on-demand (default, supports stop/start) and spot (~60% cheaper, terminated on stop) in Settings.
AddedDifferentiated credit rates — on-demand: t3.micro=3, t3.small=5, t3.medium=10, t3.large=15, t3.xlarge=25 cr/hr. Spot rates unchanged.
ChangedDefault instance mode is now on-demand (previously spot-first with fallback).
FixedSpot instances no longer return 500 Internal Server Error when stopped — gracefully terminates and shows informative message.
FixedClawScan skill scan no longer falsely reports critical issues when findings are 0 critical / 0 warning / N info.
v1.0.02026-03-25
  • Gateway loads in seconds instead of 20+s
  • Machine selector on Chat page — switch between Cloud and Local instances
  • EBS backup management with 7-day expiry and restore-to-new-VM
AddedMachine selector bar on Chat page — pick which Cloud or Local instance to connect to, with auto-detection of online machines.
AddedInstance polling on Chat page — newly started VMs appear automatically without page refresh.
AddedLoading indicator while gateway iframe loads, replacing blank white pane.
AddedEBS backup management — standalone /backups page with 7-day expiry, global banner, and restore-to-new-VM.
AddedSpot instances — EC2 provisioning uses spot with on-demand fallback. Auto-recovery for always-on plans.
AddedPlan tiers — Day ($10/$8yr) and Always-on ($18/$15yr) Stripe subscriptions with auto credit deposits.
AddedDay-mode auto-sleep — stops VMs at 11pm, starts at 7am in user timezone.
AddedPlatform audit events — VM and skill actions visible in Logs page.
AddedCross-session memory service — separate NestJS microservice using Ollama embeddings and pgvector.
FixedGateway load time reduced from 20+ seconds to ~2 seconds by removing Clear-Site-Data header that forced Chrome to purge cached assets on every /init.
FixedProxy now reads VM public IP from database instead of calling AWS DescribeInstances API, eliminating a ~400ms lookup per request.
FixedRemoved redundant gateway token pre-check that failed when token was not yet in DB (proxy /init handles retrieval with full fallback chain).
FixedChat page no longer flashes "No machines set up" while instances are loading.
ChangedCredit rates overhauled — t3.micro=1, t3.small=2, t3.medium=4, t3.large=6 cr/hr.
ChangedBrand CSS centralised in globals.css — sidebar matches landing page.
v0.9.42026-03-16
  • OpenClaw 2026.3.13 support
  • Per-machine model selector — change models directly on each agent card
  • Secrets page — view and manage secrets across all instances in one place
AddedPer-instance model selector on agent cards — change the active model (Sonnet, Opus, Haiku) per machine via openclaw CLI, replacing the misleading global model setting.
AddedSecrets page (formerly "API Keys") — view, add, and delete secrets across all agent machines in one place. Tropic API keys collapsed into a separate section.
AddedTelegram bot integration — connect a Telegram bot to your agent from the agent card.
AddedGoogle Sheets marketplace skill — install from the skills drawer to give your agent spreadsheet access.
AddedSSH key rotation — rotate your VM SSH keys from Settings with automatic expiry tracking.
AddedWireGuard VPN for local instances — direct encrypted tunnel between your machine and Tropic.
FixedTelegram plugin now explicitly disabled on new VMs — prevents 401 retry crash loop when no bot token is set.
FixedSkills drawer agent dropdown now filtered to current machine only — no longer shows agents from other instances.
FixedMarketplace seed files now correctly bundled in NestJS build — Google Sheets and other skills appear in the drawer.
FixedWireGuard peer deduplication — stopped log spam from repeated addPeer calls.
FixedGateway crash-loop from stale lossless-claw plugin reference in config.
ChangedAll docs pages now share the marketing layout topbar for consistent navigation.
v0.9.32026-03-10
  • OpenClaw 2026.3.8 support with device identity workaround
  • Scoped agent templates — private and restricted visibility for marketplace agents
  • Custom firewall port management in Settings
AddedScoped agent templates: templates now support public, private, and restricted visibility. Users can upload their own private agent templates from Settings.
AddedCustom firewall ports — open any port (1–65535) from Settings, no longer limited to preset ports.
AddedDB-backed OpenClaw version tracking — Packer builds now record the installed version to the database. Landing page reads from DB instead of querying AWS AMIs.
AddedEmail broadcast tracking — per-user open/click stats visible in Admin, with tracked "Sent from Tropic" footer link on all broadcasts.
AddedConsolidated agents page into a single API call (down from 7 requests).
FixedOpenClaw 2026.3.7+ device identity compatibility — patched gateway config to work around openclaw#40812.
FixedOpenClaw 2026.3.8 Control UI redirect flow — /init injects #token= fragment correctly for /chat access.
FixedAdmin broadcast page restyled to match dashboard conventions (Card components, consistent spacing).
FixedGateway health check now required before enabling chat input.
FixedLocal instances correctly verified as online before showing in sidebar.
ChangedSondera and Telemetry cards consolidated into single cards with inline icons in the skills drawer.
v0.9.22026-03-05
  • Dropped Redis/Bull dependency — distributed locks now use PostgreSQL advisory locks
  • OpenClaw 2026.3.2 compatibility fixes for Packer AMI builds
RemovedReplaced Bull+Redis job queues with PostgreSQL advisory locks for credit deduction and idle VM checks. Upstash Redis is no longer required.
FixedPacker AMI builds no longer freeze on needrestart prompts or systemctl status pager.
FixedClawHub skill pre-installation now authenticates to avoid rate limits.
BreakingOpenClaw 2026.3.2 defaults to messaging-only tools profile. Tropic now sets tools.profile to "full" during AMI provisioning — tool access controls are enforced through Tropic's security plane (Sondera policies) instead.
BreakingOpenClaw install method changed from curl installer to npm (avoids interactive prompts). openclaw doctor now runs before onboard.
v0.9.12026-03-01
  • WhatsApp dmPolicy enforcement for OpenClaw 2026.2.26 compatibility
  • AMI version tagging in Packer builds
AddedPacker AMI builds now tag AMIs with the installed OpenClaw version (OpenClawVersion tag) across all regions.
FixedWhatsApp config now sets dmPolicy: "allowlist" alongside allowFrom — required for OpenClaw 2026.2.26 enforcement.
v0.9.02026-02-28
  • Configurable EC2 instance types and model selection in Settings
  • Configurable credit billing with transparent rate breakdown
  • WhatsApp disconnect support
AddedEC2 instance type setting (t3.micro / t3.small) configurable in Settings — applies to new VMs.
AddedModel selection in Settings limited to supported models: Claude Sonnet 4.6, Opus 4.6, Haiku 4.5. Custom models still available for advanced users.
AddedPricing breakdown on Dashboard ("Read first" banner) and a dedicated Pricing section in Docs.
AddedWhatsApp disconnect button — you can now fully unlink WhatsApp without relinking to a new number.
AddedAgents page getting-started banner explaining what you can do: deploy agents, install skills, set policies, connect local machines.
Added"Further configurations" card on Dashboard linking to /agents.
AddedGoogle OAuth warning in agent deployment drawer — third-party vendors like Google may ban accounts that connect AI tools via OAuth.
ChangedDefault model for all accounts set to Claude Sonnet 4.6.
Changedt3.small credit rate set to 15 cr/hr (3x of t3.micro).
RemovedRemoved showLegacyVm feature flag — OpenClaw VM card is now always shown on Dashboard.
BreakingWith OpenClaw v2026.2.24, message:received events stopped working. We have since switched to before_tool_call hooks for event handling.