Skip to content

Safety Filters & Sandbox

Filters

  • Blocklist/Allowlist: literal matches (case-insensitive).
  • Regex Heuristics: basic credential and destructive command patterns.
  • Logits Masking: helper to set banned token ids to -inf before sampling.

Sandbox

  • Stdlib-only confinement via resource limits (CPU/memory/NOFILE), minimal env, umask 077, ephemeral working directory.
  • Optional: integrate with firejail or docker if present (not required).

Permissions & Logs

  • Output directories created with 0o700, files effectively 0o600 via umask(0o077).
  • Logs scrub simple secret-looking strings (password, api_key, secret, AKIA).

Limitations

  • Not a full VM; best-effort sandbox. For stronger isolation, use containers or VMs.
  • Network isolation is best-effort unless firejail/docker are used.

Acceptance: Dangerous outputs filtered; sandbox prevents filesystem modification during evaluation.