Skip to main content

Check out our Blog: The Enterprise Control Plane

Engineering

How We Built Policies in Nuon

Guardrails for continously delivering with Nuon

· 10 min read
blog hero image

With BYOC you deploy and operate software in cloud accounts you don’t own, and often every change is amplified across every customer. The customer is accountable for compliance and security in their environment. You're responsible for making sure your software works, and that it doesn't break anything else.

Both sides establish that contract on Day 1: permissions are scoped, configurations reviewed, security signs off. But what about Day 2? And every day after?

Changes keep flowing. A Helm chart update adds a public ingress. A Terraform apply recreates a database because of state drift. An emergency patch skips the normal review process. RBAC tells you who can act, but not whether the action is safe. Disaster recovery in the BYOC world can be very challenging. Manual reviews don't scale to hundreds of customer environments with continuous delivery.

Customers needed something that could take the Day-1 contract and enforce it continuously, automatically, on every change, before it reaches customer infrastructure. The demand is to shift policy enforcement left into the orchestration layer itself, at every stage of the application lifecycle.

That constraint shaped everything about how we designed and built it.

Everything Is a Workflow

To understand how we enforce policies, you first need to understand how we deliver software.

In Nuon, vendors package their application as an App , a set of configs that connect their existing containers, charts, infrastructure and automations into a single bundle that can be deployed.

Each customer gets an Install , a running instance of that app in their own cloud account, managed by a Runner agent operating alongside, inside their environment.

Every operation that modifies customer infrastructure runs as a Workflow , an ordered sequence of steps.

Creating an install? That's a workflow: provision the CloudFormation stack, start the runner, provision the sandbox, sync secrets, deploy each component in dependency order, run post-deploy actions. Updating a single Helm chart? Also a workflow. Reprovisioning the sandbox? Workflow. Running a health check? Workflow.

Workflows are the single chokepoint through which all changes flow to customer environments. Nothing bypasses them.

This is by design, and it's possible because workflows are powered by Temporal . Temporal gives us durable, exactly-once execution. If a workflow step starts, it will complete, even through process crashes, network partitions, or node failures. There's no way to skip a step by restarting a service or timing out a request. Temporal's workflow history provides a permanent, immutable record of every step, every status transition, every decision.

This matters enormously for policies: if we embed policy enforcement in the workflow, it's guaranteed to run. It can't be accidentally bypassed. And every evaluation is automatically part of the audit trail.

The Plan → Approve → Apply Lifecycle

Every infrastructure-modifying workflow step follows a three-phase lifecycle:

Plan. The runner, a VM running inside the customer's cloud, executes the plan phase: terraform plan , helm template , or a Kubernetes dry-run. The output is the exact diff of what's about to change: a Terraform JSON plan with resource changes, rendered Kubernetes manifests from a Helm chart, or dry-run output from kubectl apply . This structured plan data is sent back to the Nuon control plane.

Approve. The plan is presented to the operator with diffs, resource changes, and impact details. The operator reviews and approves, rejects, or retries. This is the human gate, nothing gets applied without explicit sign-off (or an explicit auto-approve configuration).

Apply. Once approved, the runner applies the change: terraform apply, helm upgrade , or kubectl apply . The customer's infrastructure is modified.

The key insight is this: after the plan phase, we have the exact structured representation of what's about to be applied. And there's already a gate, the approval step, where we can block dangerous changes.

Policy evaluation slots in between plan and approve:

Plan → Policy Check → Approve → Apply

The plan data is the policy input. The approval gate is the enforcement point. We didn't need to invent new infrastructure. We just needed to add a phase to the existing lifecycle.

How We Model Policies

Policies live alongside the rest of the app configuration, defined in policies.toml and synced to the Nuon control plane via nuon apps sync. Here's what a typical configuration looks like:

toml

Each policy declares four things:

  • Type: what kind of infrastructure it applies to: terraform_module , helm_chart , kubernetes_manifest , container_image , or sandbox
  • Engine: opa (Rego) or kyverno (YAML, for cluster-level resources)
  • Components: which components it targets, specific names or ["*"] for all components of that type
  • Contents: the policy document itself, either inline Rego/YAML or a file path reference

When synced, the policies config becomes part of the immutable AppConfig . Every install deployment references a specific AppConfig , which means the exact set of policies that govern any given deployment is always known and auditable.

Validation at sync time

We didn't want vendors to discover policy syntax errors during a deployment. During nuon apps sync , we validate every policy before it's uploaded:

  • OPA policies are parsed using OPA's own ast.ParseModule . If the Rego doesn't compile, you get an error immediately
  • Kyverno policies are validated as YAML
  • Type/engine compatibility is checked. For example, kubernetes_cluster only supports the Kyverno engine, while terraform_module requires OPA
  • Component targeting is validated. You can't mix wildcard * with specific component names

This gives vendors fast, local feedback. Client-side validation catches errors; server-side evaluation in the workflow provides the actual enforcement.

Taking a Plan and Running a Policy Over It

This is the core of the system. A workflow step has generated a plan. Now we need to evaluate policies against it.

Preparing the input

Getting OPA up and running to evaluate policies was relatively easier. The part that needed work was normalizing wildly different infrastructure artifacts into shapes that Rego policies can reason about.

Each component type produces fundamentally different plan data, and each needs its own transformation:

Terraform plans are the most straightforward. Terraform already emits a structured JSON plan with resource_changes, each containing the resource type, address, and before/after values. We parse it and present it as input.plan:

Helm charts are more involved. The runner templates the chart with the configured values, but Nuon uses Go template expressions like {{ .nuon.install.sandbox.outputs.cluster.cluster_name }} in Helm values, which would cause template rendering to fail. So we sanitize these first, replacing {{ .nuon.* }} expressions with placeholder values. The rendered multi-document YAML is then split into individual Kubernetes resources, and each is wrapped as an AdmissionReviewInput, the same format Kubernetes admission controllers use. This means existing Kubernetes policy patterns (from tools like Gatekeeper or Kyverno) port directly to Nuon policies:

Kubernetes manifests go through the same AdmissionReview transformation, from their dry-run output. This means Helm and Kubernetes manifest policies share identical input structures and can use the same Rego rules.

Container images require a completely different approach, since there's no "plan" to evaluate, the policy input is the image's metadata. We use a two-phase approach: first, a lightweight check determines whether any container_image policies are even configured. If not, we skip entirely avoiding expensive OCI registry calls. If policies exist, we dispatch a runner job into the customer's infrastructure to fetch image metadata from the registry: SBOM presence and format, cosign signatures, attestation manifests, OCI referrers, and in-toto statements. This metadata is assembled into input.metadata for the policy:

Each transformation runs as a Temporal activity named PrepPolicyEvaluation , which resolves the full context (deploy or sandbox run → install → app config → policies config), filters applicable policies by type, engine, and component name, and builds a list of PolicyToEvaluate items: each pairing a policy's Rego contents with a specific JSON input document and a human-readable identity for the input (e.g., deployment/default/nginx ).

Parallel fan-out with Temporal

Once inputs are prepared, all policy evaluations execute in parallel. The workflow conductor's checkPolicies method fans out EvaluateSinglePolicy Temporal activities as futures, one per (policy × input document) pair.

For a Helm chart that renders 12 Kubernetes resources evaluated against 3 policies, that's 36 parallel evaluations. Each runs independently, with Temporal handling scheduling, timeouts, and retries. After all futures resolve, violations are collected and the workflow decides what to do next.

Here's the checkPolicies method going through prepare, fan out, and collect phases:

go

processPolicyViolations separates violations into two categories: deny violations fail the workflow step immediately and the step status becomes error with "Policy check failed", and the apply phase never executes. Warn violations are attached as step metadata but the workflow continues to the approval gate. Both are visible in the dashboard and CLI output. This two-severity model was deliberate: it lets vendors start with warnings to understand their policy surface before turning on hard enforcement. In future, this will also allow us to offer more advanced auto-approval settings, where vendors can switch to a human-in-the-loop approval depending on policy evaluations.

go

The OPA evaluation core

At the center of all this orchestration is a surprisingly small piece of code. The EvaluateSinglePolicy activity is about 60 lines, and it's intentionally minimal.

It takes two inputs: a policy's Rego source code and a JSON document. It runs two OPA queries against that document: data.nuon.deny and data.nuon.warn . Each query is compiled with rego.New() , prepared for evaluation with PrepareForEval() , and executed against the input with rego.EvalInput() .

The result expressions are iterated to extract violation messages. We support two result formats for flexibility: plain strings ( deny contains msg if { msg := "..." } ) and structured objects with a msg field ( deny contains obj if { obj := {"msg": "..."} } ). Each violation is tagged with the policy ID, a severity level, and a human-readable identity of the input document it was evaluated against.

Here's the core, the activity evaluates data.nuon.deny and data.nuon.warn queries, then extracts violation messages:

go

Every policy must live in package nuon . This single convention is what makes the engine work uniformly across Terraform plans, Kubernetes resources, and container image metadata. The policy author writes Rego against input.*, and the evaluation engine doesn't need to know what kind of infrastructure produced that input. OPA does the heavy lifting; the activity is just the bridge between Temporal's durable execution model and OPA's query engine.

Build-time evaluation

For container images and Helm charts, policy evaluation runs during the build workflow rather than the deploy workflow. The same EvaluateSinglePolicy activity is reused, the only difference is the orchestration context.

For external container images, the build workflow checks for container image policies, dispatches a runner job to fetch OCI metadata from the registry, and evaluates policies against that metadata. If a deny rule matches, the build fails with status policy_failed . The image never becomes available for deployment. The violation is caught before it can reach any customer environment.

This is the shift-left in practice: by the time a deployment workflow runs, the artifacts it's working with have already passed their build-time policy checks. Deploy-time policies then validate the plan, the combination of that artifact with the specific customer's infrastructure state.

Policy Reports

Every policy evaluation, whether from a deploy, build, or sandbox run produces a PolicyReport . This is a first-class database entity, not a log line that gets rotated away.

A PolicyReport captures the complete evaluation context:

  • Polymorphic ownership: Each report links to its source via OwnerID and OwnerType , a component build, an install deploy, or a sandbox run. This lets us query all policy evaluations for a specific deploy, or all evaluations across all deploys of a specific install.
  • Per-policy results: A JSONB array with per-policy status ( pass , deny , or warn ), violation counts, and how many input documents were evaluated.
  • Violations: Each violation records the policy name, severity, message, and a human-readable input identity (e.g., Deployment/default/nginx or aws_s3_bucket.backup ).
  • Summary counts: Denormalized deny_count , warn_count , and pass_count fields enable efficient list views and filtering without deserializing the JSONB arrays.
  • Display names: Denormalized OrgName , AppName , InstallName , and ComponentName allow human-readable reports without expensive joins.

A PersistPolicyReport Temporal activity computes per-policy results from the raw violations, determines the overall report status, and creates the database record. It runs with retries and policy evaluation can proceed even if report persistence temporarily fails, but we make a strong effort to record every result. Failures here are logged and we get alerted.

In the dashboard, policy results surface in two places: the Policy Evaluation card on each workflow step (showing pass/deny/warn counts with expandable violation details), and a dedicated Policy Evaluations list view per install that can be filtered by status, type, and component. The CLI shows policy status inline with workflow step listings.

This is what closes the loop. Policies aren't just enforcement, they produce a verifiable, auditable record of every evaluation. When a customer asks "was encryption validated on the last deployment?", the answer isn't "we think so", it's a specific PolicyReport with a timestamp, the exact policy that was evaluated, and the result.

Making It Production-Safe

A few design decisions were important for making the policy system reliable in production without becoming a liability:

Policy failures don't break core workflows. If PersistPolicyReport fails, the workflow logs a warning and continues. The policy evaluation is part of the critical path (deny violations must block), but report persistence is not. We'd rather have a missing report than a stuck deployment.

Lazy metadata fetching. For container image policies, we check whether any container_image policies are configured before dispatching a runner job to fetch OCI metadata. Registry calls can be slow and expensive, there's no reason to make them if nobody's written a policy for images.

The human gate remains. Even after policies pass, the approval step still presents the plan for human review. Policies augment human judgment; they don't replace it. A vendor can always inspect the diff and reject a change that policies didn't catch.

Temporal makes it durable. Because policy evaluation runs as Temporal activities within a Temporal workflow, there's no way to silently skip it. Process crash? Temporal replays the workflow and re-executes the activity. Network timeout? Temporal retries. The policy check is as reliable as the workflow engine itself.

Tying it all up

Nuon provides software vendors with powerful abstractions to ship their applications in customer cloud accounts. Now we have added first-class support for policy enforcement in the BYOC deployment pipeline.

Vendors can detect and stop unwanted changes before they even make it to customer environments. A shift left, from approaches like RBAC or runtime specific policy engines, that only come into picture when application configs and artifacts have already made into the customer environments.

Our policy engine provides software vendors with policy enforcement that works across all the components that make up their application. 

Read more at our docs to get started .

Ready to get started?

Deploy your application into customer clouds

See how Nuon can help you unlock BYOC deployment for your entire customer base.

Newsletter

Subscribe to our newsletter

Too much email? Subscribe via RSS feed