The Nuon Runner Architecture: A Secure Approach to Bring Your Own Cloud (BYOC)

In this post we will deep dive some of the common approaches for managing Bring Your Own Cloud (BYOC) installs, and share the Nuon Runner. We believe the Nuon Runner creates a more secure and robust approach by managing a customer's install from within their account, while combining policy, network, cluster and cloud based permissions into a life cycle-based permission model.

Our belief is that the Nuon Runner solves the security challenges of BYOC for customers, and the operational burden for vendors.

Delivering on our promise of BYOC, for everyone

Our premise at Nuon is to BYOC for everyone.

The Nuon Runner is an isolated management process that is deployed alongside your application, inside each customer's cloud account. It is designed to securely manage, monitor and operate your application and gives you the tools you need to deliver a SaaS-like experience in your customer's environment.

When new installs are setup, the runner is deployed into the customer's account first, so it can then come online and setup the rest of your application. Customers install your application with a CloudFormation stack, Azure template or GCP template, depending on their account.

The runner is managed by Nuon, and powers our control plane giving you tools for:

Provisioning and managing the underlying cluster and it’s network access
The vendor packaged app including container images, Kubernetes and other infrastructure
Continuously monitoring the application and enabling debugging commands via Actions

Now, we will share some background on how we landed on this architecture and why we belive that a runner based approach is the best way to securely operate SaaS in each customer's cloud account.

BYOC, SaaS From First Principles

Bring Your Own Cloud is a deployment model where software runs inside a customer's cloud account and operated remotely by the vendor. This model is not "run everywhere", where software is packaged to be run in any environment. Instead, BYOC software is specifically designed to be deployed and operated inside AWS, GCP, or Azure accounts owned by the customer and should be able to leverage customer managed infrastructure, applications and data.

So, basically, the software is remotely managed by the vendor, but needs to live in a customer-owned environment. It needs to be secure, and reliable. Limited and controlled access needs to be granted so the vendor can setup the app, push updates and ensure quality of service of the software.

How Vendors Build BYOC Today

We talk to a lot of vendors and have found a few standard approaches for building a BYOC offering.

Cross Account Permissions

Cross account permissions for BYOC are where the customer grants direct access to an external third party (the vendor) to manage their application remotely. This includes granting the vendor a set of permissions, continuously monitoring the usage of those permissions and modifying the boundary of what can be done, at different parts of the lifecycle.

In this model, the vendor can build tools and automation that use this permission set to manage the app on the customer's behalf.

Since, as a customer, you are ultimately granting direct third party access to an account, handling the minimal set of permissions prohibits most day-to-day debugging. The customer is inclined to continually reduce permissions because they don't have other controls into what the the role is doing. From there, the vendor and customer continually go back and forth to request new permissions which makes managing the application cumbersome at best, and more work than the customer self-hosting it at worst.

Further, the cross account permissions model usually includes a single role, vs scoped roles for different parts of the application's lifecycle. This model also limits the customer to controls at the cloud account level, not the network or application level. While things like using external IDs on accounts, adding/removing accounts and setting up access managers can be helpful this model often requires a lot of upfront effort to convince customers that it is secure enough to meet enterprise requirements.

Tunnel / Network Controlled Kubernetes Access

Tunnel based access is where the customer sets up a network connection that enables the vendor to access the application in their account, and run (limited) commands. Usually, this means granting access to the customer's Kubernetes cluster where the software was deployed into.

This model has severe limitations and for many customers is not viable. Essentially, this is allowing direct network access to a third party (the vendor). In the best case, these permissions can be locked down but that means

The vendor can really only access the Kubernetes cluster and usually also needs a separate cross-account role that grants permissions to manage infrastructure.
The customer has limited (only network) audit-ability of what the vendor has done, and managing things at the application level requires additional inputs.
The integration is more complex, since each customer network has differences and some may want to setup the tunnel using a VPN, a private link or a peering connection.

This model also only works for Kubernetes applications, and requires the customer to bring their own Kubernetes cluster, or the vendor to build some automation to set it up for them. While Kubernetes is the most common deployment target in large enterprises, each individual customer managed cluster can have differences that make support and operations more difficult for the vendor.

Kubernetes Operator

Kubernetes operators are a way to ship management software into a Kubernetes cluster. In this model, the vendor can invest in building their management tools into the operator, and the customer can deploy it into an existing cluster, or the vendor can help automate the cluster setup.

The operator runs inside of the cluster, and RBAC can be used to update, manage and operate the software manually. However, this model has some downsides:

The operator is deployed inside the cluster, and can not operate and manage the cluster itself, when there are problems.
This limits the application to only Kubernetes and means that other applications that do not leverage Kubernetes can not be managed with operators.
This requires continual updates to the operator itself, with each new management capability.

While teams with Kubernetes (and specifically comfort building operators) may find this model the easiest to setup, it does require the customer to manage the cluster and has significant permissions and robustness challenges related to it being deployed inside the same cluster as the application.

Nuon's Runner Model

Nuon's runner model attempts to take parts of each of the aforementioned approaches and balance them into a more secure and robust approach for BYOC vendors to build on.

Nuon's runner enables a toggleable set of permissions that allow the vendor to securely manage their app in the customer's account. The customer gets fine-grained control to enable/disable specific permissions and control over the security boundary.

The runner governs the application at the network, infrastructure and application layer and in some cases can be completely disabled by the customer.

The runner can apply policies and act as a trusted manager for the account, and can be deployed in the customer's account, a dedicated management account and more. The runner is completely auditable, and can be inspected by the customer to understand what operations it has taken to manage the software.

The runner is designed with security first to enable vendors to operate software in their customer's account and offers a balance of permissions, visibility and control.

Lifecycle Management

Software deployed into a customer's cloud account has 4 modes, or lifecycles:

Provision Mode - when the application is being configured. This involves higher level of permissions than usual and can be disabled after the initial setup.
Deprovision Mode - for tearing the application down. This is disabled by default.
Maintenance Mode - for regular day-2 operations. This is a locked down role used for health-checks, basic debugging and updates.
Break Glass Mode - for extenuating circumstances where permissions need to be elevated to fix or mitigate issues.

Diagram showing a vendor-to-customer VPC deployment lifecycle. On the left, the "Vendor" box represents the vendor’s environment, and on the right, "Customer's VPC" represents the customer (StarFish Inc) environment. Both sides use Terraform, Helm, and Docker icons. The lifecycle flow between the two environments is illustrated by arrows connecting four modes below: Provision Mode (elevated access for setup/configuration), Maintenance Mode (maintenance access for updates/deployments), BreakGlass Mode (emergency access for fixes/incident response), and De-provision Mode (escalated access for resource cleanup/data removal). Each mode includes a description of authorized roles and typical actions. — Lifecycle modes for vendor-managed deployments in customer VPCs: from initial provisioning, ongoing maintenance, emergency breakglass access, to final de-provisioning and resource cleanup.

The runner is designed around these permission modes, and switches between them based on the lifecycle of the application. The runner can be granted both cloud and cluster access for each role depending upon the application lifecycle and what types of updates a vendor wants to make. For instance, a vendor could make decisions such as allowing only no new infrastructure to be created during maintenance mode, and allow updates to some existing component infrastructure.

Since the runner is deployed as a VM inside the customer's cloud account, no cross account access is required meaning the only thing that has these permissions is the runner itself. At any time, they can be removed or disabled at the cloud account level giving the customer more control to airgap or isolate their install.

Policy based execution

The runner is responsible for managing the software, and giving the vendor controls to manage the application once it is deployed. These jobs include things like:

syncing new images into the customer account
updating Terraform modules or Helm Charts
updating the cluster and it's node groups
executing actions to fetch logs, rollback software and debug the application

The runner executes each job independently and can apply policies to limit provisioning and resources with finer-grained controls than just network, cluster or cloud account permissions will allow. The runner can manage policies inside the cluster and on the jobs, or job plans themselves. By combining these primitives together for each lifecycle, the Runner model can create a more secure management experience for BYOC.

Auditability

The runner is designed to be auditable and allows a customer to inspect every action and operation it is executing in the customer's account. Since the runner is an independent virtual machine and not deployed as a Kubernetes cluster, it is able to manage the entire surface of the application, and this means that a single consolidated view of the application, infrastructure and day-2 management jobs can be created.

Illustration showing “Visibility into Runner” for a customer (Big Bank Co.). Runner monitoring is broken into three components: Runner Logs (detailed runner job events in customer VPC), Health Checks (monitor runner instance health, status, deployment tags, metrics, shown with a bar graph of connectivity), and Runner Details (full runner operation history, including provision, maintenance, de-provision, breakglass audits, and completed job statuses). — Key aspects of runner observability in customer VPCs: view job logs, monitor instance health and connectivity, and access complete operation and audit histories for consistent deployment and maintenance oversight.

Along with policy based execution, this enables the customer to have peace of mind (policies) and verification (audit log) of what is happening in the customer's account. This ultimately leads to giving the vendor more control to operate the software and make it feel like SaaS.

Fine Grained Network Permissions

The runner is bootstrapped by the customer managed stack into a new VPC or a customer managed VPC (or network on other clouds). The runner is deployed first, so it can also preconfigure networks, routes and more (when granted the permissions by the customer).

Common architectures that can be supported include:

Deploying the runner into it's own subnet for isolation.
Creating a new VPC that is dedicated for the application or using a customer-managed VPC.
Setting up a private-link, peering connection, or route-table.
Integrating with a customer managed hub-and-spoke network where the application networking is routed through a central hub.
Deploying behind a firewall

Most customer environments have custom networking enabled, so the runner is designed to be flexible. For connectivity, the runner requires a single egress network rule, to a single API. The runner can be deployed in any environment and designed around an egress-only model so ingress into a customer environment is never required.

Runner Capabilities

The Nuon runner is designed to give the vendor everything they need to operate their software in a customer's account and make it feel like SaaS, all while giving the customer the right amount of controls to have peace of mind and a secure deployment.

Flexible Application Support

A critical requirement of the runner is that it is deployed externally from the software it manages. This enables it to be more robust (if the application has a problem, the runner is isolated and can still mitigate the problem) and more flexible architectures.

Diagram illustrating the operational flow between the Nuon Control Plane and the customer environment (Big Bank Co.). On the left, the Nuon Control Plane manages Terraform, Helm, and Docker-based applications. In the center, the Runner (with Lambda and Terraform icons) acts as a middleware, connecting the Nuon Control Plane to the customer’s VPC. On the right, the customer environment (Big Bank Co.) is shown running similar applications with an active Runner status, indicating integration. — Architecture depicting how the Nuon Control Plane interfaces with customer infrastructure through a Runner.

The runner is designed to manage network, cluster and cloud infrastructure. This allows vendors to deploy Serverless apps, integrate with new types of customer infrastructure and more. In other words, it's not just Kubernetes, and this gives the vendor more control over their application.

Robust Cluster Management

For apps that use Kubernetes, the runner can manage updating, autoscaling and monitoring the cluster. Since the runner is deployed externally, it is resilient to downtime of the cluster. This prevents the runner from being a noisy neighbor and enables things such as full lifecycle cluster management.

Some enterprise customers have standards around their Kubernetes environments and want to "bring their own cluster". In this model, they can grant access to the Nuon runner to manage applications in the cluster on their behalf. The runner can test the cluster capabilities before deploying the application and can have permissions removed as needed.

AWS architecture diagram showing a customer account with VPC containing an EKS cluster running Big Bank Co. application, connected to external dashboard, control plane API, and runner components via dashed lines. — AWS EKS cluster with IAM and RBAC roles enabling runner to manage Big Bank Co. application.

Sandboxed Remote Code Execution

The runner enables a secure and sandboxed remote code execution environment for vendors to manage applications in the customer account. The runner has audit-ability, policies and permissions built into each layer of the application - networking, app-level and cloud level.

This enables a vendor to define health-check actions, debug tools and other operational commands that can be used during maintenance or break-glass mode. This enables the vendor to operate the software and ensure it is working while give the customer peace of mind of what is permitted.

The Path Forward

We believe that the BYOC model is a balance between customer security and vendor control. Done right, this model should be more secure than Cloud SaaS or Self-Hosted. It should be as simple to use as a Cloud SaaS and the vendor should be able to focus on building and iterating on their product.

Nuon's runner is designed to serve as the contract for this balance - it enables secure but isolated and controlled access to the vendor, and has the primitives to enable a vendor to operate their application in an environment they do not have access to.

Today, we have found that our customers are able to create a better security posture with their customers, and unlock larger and more sensitive customers with our model than using cross account permissions, an operator, or a tunnel/ingress.

This is just the beginning, we have a huge list of features and controls we plan to build into the runner and share with you all!