Why

While building a service for cloud resource management, we saw similarities between parts of our business logic and how the Kubernetes operators function.

Our interest in Kubernetes made us curious how custom operators are developed, and we used the opportunity to extract the core logic of our resource watcher and re-implement it as a custom operator.

For us, this project will serve as a ground base for a deeper focus into building solutions with custom Kubernetes resources.

What are the use cases

Being kind of a research project for us, learning from it was a priority. This still continues to be our goal, and by making the project public, we hope it can serve as another example on how Kubernetes operators can be used.

On a tech side, our main use case is managing cloud resources. Through a booking system, we can schedule machines to wake up, do some work, and then fall asleep again. Which in effect can cut server costs and make recurring computational jobs easier to set up.

In future, this can extend to managing different types of resources on a wide range of providers.

How it works

The custom resource operator provides a friendly interface to manage cloud resources through bookings.

We start by grouping our cloud instances under a common tag name. Next, we need to make the resources we plan to manage, on the cluster. We have two options of doing that:

  • Create a resource monitor which depending of its type will scan for tagged instances and create the resources representation on the cluster automatically.
  • Manually create resources on the cluster by applying the resource manifest.

Once we have resources, we can manage their state through bookings that have a resource name, start, and end time.

Example manifests can be found in the config/samples directory.


Check out Getting Started for a quick guide on how to use the operator.

To play with the operator

Prerequisites

Setup for your provider

Make sure to check the EC2 section for more details.

Every provider might need a different initial setup. Though at this moment we support EC2 and RDS, in the future it will be recommended to check out the Integrations page to find if we support the service of your choosing.

Try it locally

Grab the code

There are two ways to get the code:

  • Clone the repository
git clone https://github.com/kotaicode/resource-booking-operator.git
  • Pull the image
docker pull docker pull kotaicode/resource-booking-operator:latest

Run it

After we’ve set up and ran a Minikube cluster. We have to apply the manifests with:

make install

Then we are ready to run the operator with:

make run

At this point we should see logging from the running server, indicating that things are functioning properly.

Next, we can make sure we know how to manage resources.

Run it on your cluster

Amazon Web Services

Permissions

In order for the operator to control EC2 or RDS instances, it needs permissions to start, stop, and do other helpful actions. This is a sample policy document to give the necessary permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "allowec2",
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeInstances",
                "ec2:StartInstances",
                "ec2:DescribeTags",
                "ec2:StopInstances",
                "ec2:DescribeInstanceStatus",
                "ec2:DescribeTags",
                "ec2:CreateTags",
                "ec2:DeleteTags"
            ],
            "Resource": "*"
        },
        {
            "Sid": "allowrds",
            "Effect": "Allow",
            "Action": [
                "rds:DescribeDBInstances",
                "rds:StartDBInstance",
                "rds:StopDBInstance",
                "rds:ListTagsForResource",
                "rds:AddTagsToResource",
                "rds:RemoveTagsFromResource"
            ],
            "Resource": "*"
        }
    ]
}

Create a new policy with the permissions above and attach it to an IAM role which the operator can use, e.g. using IAM roles for service accounts.

Note: If your instances use KMS ecrypted EBS volumes, the operator also needs the kms:CreateGrant permission on the respective KMS keys.

Setting it up in EKS (Elastic Kubernetes Service)

  1. Create the required policy granting the permissions.

  2. Follow the AWS instructions to set up an IAM Role for the serviceaccount of the resource-booking-operator.

    Namespace of the operator is resource-booking-operator-system.
    Name of the serviceaccount is resource-booking-operator-controller-manager.

    The required trust-policy for the IAM role will look like this (replace $oidc-provider and $accountid with your values):

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "Federated": "arn:aws:iam::$accountid:oidc-provider/$oidc-provider"
                },
                "Action": "sts:AssumeRoleWithWebIdentity",
                "Condition": {
                    "StringEquals": {
                        "$oidc-provider:sub": "system:serviceaccount:resource-booking-operator-system:resource-booking-operator-controller-manager",
                        "$oidc-provider:aud": "sts.amazonaws.com"
                    }
                }
            }
        ]
    }
    
  3. Build and push the docker image to your container registry:

    make docker-build && make docker-push
    
  4. Build the manifests and deploy them to your cluster:

    make deploy
    

To make sure the resource booking operator is using the IAM role you created, you need to set the eks.amazonaws.com/role-arn annotation in the serviceaccount to the ARN of the role. Here's how to do this:

  • Find the ARN of the role you created.
  • Open the serviceaccount configuration for the resource booking operator.
  • Set the eks.amazonaws.com/role-arn annotation to the ARN of the role you found.
  • Save the changes to the serviceaccount configuration.

This will ensure that the resource booking operator is using the correct IAM role for its operations.


For a general guide on how to control resources with bookings, head up to Managing Resources.

Integrations

We have a future plans to expand the operator functionality to be able to control other cloud resource providers. Our first integration is EC2.

Most often referred to as EC2, Amazon Elastic Compute Cloud provides scalable computing capacity in the Amazon Web Services (AWS) Cloud.

The only platform-specific action we need to do so we can use EC2 instances is tagging them. That way we group multiple machines under a common user-friendly name, but most importantly – it allows us to manage them in numbers.


Tagging instances is the first requirement to make our operator work. This action makes the instances visible to our codebase.

Configuring the AWS command-line interface tool is out of the scope for this documentation. If you don’t have this part already working – head up to the official documentation for the details.

The instances we’ll use for the operator need two tags to make them manageable:

  • resource-booking/managed - Used to mark the instance as managed by the operator.
  • resource-booking/application - The name of the resource to group the instances by.

Tagging using AWS CLI

The AWS command-line interface allows us to change our resources without wondering through the dashboard. We’ve exported the few of the parts that are responsible for the instance tagging and listing into our makefile to make this process a bit quicker.

List instances

Calling make list-instances will list our instances and also provide us information which ones have the managed and resource tags. We can use this to keep track of our resources and detect instances that need to be changed.

$ make list-instances

-------------------------------------------------
|               DescribeInstances               |
+----------------------+----------+-------------+
|       Instance       | Managed  |  Resource   |
+----------------------+----------+-------------+
|  i-0b1b591a931567907 |  true    |  analytics  |
|  i-0073357fb586b5a74 |  None    |  None       |
|  i-09bf70d1832a14871 |  true    |  web        |
|  i-0e7f363c3871a249f |  true    |  analytics  |
|  i-0bf4736484ecdfef5 |  true    |  web        |
+----------------------+----------+-------------+

Mark as managed

Marking the instances as managed makes them visible for our tool. The make mark-managed command expects two parameters:

  • instances - a space-separated instance IDs
  • enable - Wether to set the instances as managed. Values can be true or false.

An example usage can be:

make mark-managed instances="i-09bf70d1832a14871 i-0bf4736484ecdfef5" enable=true

Create resource tags

We can directly add resource tag to our instances, using make tag-instances. The command expects two parameters:

  • instances - a space-separated instance IDs
  • tag - The resource name

An example usage can be:

make tag-instances instances="i-09bf70d1832a14871 i-0bf4736484ecdfef5" tag=web

Tagging using the AWS dashboard interface

We can also use the AWS Dashboard to tag instances, but this is out of scope for this documentation. Head up to the official docs for more details.

Assuming a Role

We can assume a role by setting the AWS_ASSUME_ROLE_ARN environment variable with the ARN of the role we want to assume.

With this, we can make requests to AWS from the account that owns the role.

Amazon Relational Database Service (RDS) is a managed service that makes it easy to set up, operate, and scale a relational database in the cloud.

The prerequisite work for booking RDS instances is the same as for EC2 instances. See the EC2 section for more information.


Extending to another provider

Our operator has three custom resources. Resources, Resource Monitors, and Bookings. The following is general information on how they represent a running instance on a cloud provider.

Their example manifests can be found in the config/samples directory. Once we modify their details, we can directly apply them to the cluster.

Prerequisites

This section assumes that you've set up your cloud instances to be manageable by the operator. To make sure you've done that, check out tagging instances.

Adding Resources

We start with the resources. They represent one or more instances grouped by a tag name. Once we have tagged our resources per the information here, we are ready to create the resources on the cluster.

With Resource Monitor

A quick and easy way to represent tagged cloud instances as resources on the cluster is to use a resource monitor.

Resource monitors continuously scan the cloud provider for changes to the instances and apply the changes to the cluster. At this moment only newly created resources that are not present on the cluster will trigger change.

As seen in the sample manifest, they require just a type of supported cloud resource.

apiVersion: manager.kotaico.de/v1
kind: ResourceMonitor
metadata:
  labels:
    app.kubernetes.io/name: resourcemonitor
    app.kubernetes.io/instance: ec2
    app.kubernetes.io/part-of: resource-booking-operator
    app.kuberentes.io/managed-by: kustomize
    app.kubernetes.io/created-by: resource-booking-operator
  name: ec2
spec:
  type: ec2

We create the resource monitor on the cluster with kubectl:

kubectl apply -f config/samples/manager_v1_resourcemonitor.yaml

Once created, a resource monitor will populate the cluster with initial resources of the given type and will continue to scan for newly tagged instances until it is removed.

Manually

A more involved way of creating resources is applying their manifests directly to the cluster.

Initially, we can reuse the sample manifest in config/samples/manager_v1_resource.yaml that looks like this:

apiVersion: manager.kotaico.de/v1
kind: Resource
metadata:
  labels:
    app.kubernetes.io/name: resource
    app.kubernetes.io/instance: analytics
    app.kubernetes.io/part-of: resource-booking-operator
    app.kuberentes.io/managed-by: kustomize
    app.kubernetes.io/created-by: resource-booking-operator
  name: ec2.analytics
spec:
  booked_by: ""
  booked_until: ""
  tag: analytics
  type: ec2

Note that spec.booked field needs to be false as this is our initially desired state, and it’s set true by the controller only when there is an active booking.

We create the resource on the cluster with kubectl:

kubectl apply -f config/samples/manager_v1_resource.yaml

Creating Bookings

After we make sure that we have created a resource on the cluster, we can create a booking for it. The spec of a booking requires the filling of a resource name (the tag we used for the instances), start and end time of the booking. The default date time format we use is RFC3339.

The chosen time slot can be happening now, which will mark the booking status as IN PROGRESS and the resource as booked. Or at some point in the future, which will lead to the status of the booking being SCHEDULED.

apiVersion: manager.kotaico.de/v1
kind: Booking
metadata:
  labels:
    app.kubernetes.io/name: booking
    app.kubernetes.io/instance: backup-jan10
    app.kubernetes.io/part-of: resource-booking-operator
    app.kuberentes.io/managed-by: kustomize
    app.kubernetes.io/created-by: resource-booking-operator
  name: backup-jan10
spec:
  resource_name: ec2.analytics
  start_at: 2023-01-10T22:35:00Z
  end_at: 2023-01-10T22:45:00Z
  user_id: cd39ad8bc3

We create the booking on the cluster with kubectl:

kubectl apply -f config/samples/manager_v1_booking.yaml

Creating bookings on a schedule

For some purposes, creating bookings manually becomes cumbersome. This is where we can use BookingSchedulers. BookingSchedulers are a way to create bookings on a schedule. They require 3 fields.

  • spec.schedule - a cron expression that defines when the booking should be created (e.g. 0 0 * * * for every day at midnight)
  • spec.duration - the duration of the booking in minutes
  • spec.bookingTemplate - a template for the booking that will be created on the schedule. Using the same fields that the booking resource expects.
apiVersion: manager.kotaico.de/v1
kind: BookingScheduler
metadata:
  labels:
    app.kubernetes.io/name: bookingscheduler
    app.kubernetes.io/instance: bookingscheduler-sample
    app.kubernetes.io/part-of: resource-booking-operator
    app.kuberentes.io/managed-by: kustomize
    app.kubernetes.io/created-by: resource-booking-operator
  name: bookingscheduler-sample
spec:
  schedule: "0 0 * * *"
  duration: 20
  bookingTemplate:
    resource_name: ec2.analytics
    user_id: cd39ad8bc3

Underneath, the schedulers create a regular booking resource. They are just a scaffold for bookings, with extra capabilities for automation. The best way to debug a scheduler is to check the bookings it created. Note that:

  • Schedulers don't create bookings in the future upon creation. A single booking is created every time the schedule is triggered.
  • Deleting a scheduler won't remove the bookings it created. It will only be prevented from creating new ones.
  • Modifying a scheduler at any time, will instantly affect the next execution of the scheduler — using the new given values.

We create the scheduler on the cluster with kubectl:

kubectl apply -f config/samples/manager_v1_bookingscheduler.yaml

How do we watch for changes

Every change to a custom resource triggers its Reconcile controller function, which is responsible for updating the spec and status of the resource.

Resource’s Reconcile function runs every 30 seconds, which is needed to provide up-to-date information about the instances it watches over.

Booking’s Reconcile runs every minute, so that we can let the Resource know that there’s an active booking, or that the currently active booking finished. Finished bookings are not constantly checked. Once a Booking is marked as FINISHED, its Reconcile is not called ever again.

Notifications

The operator has a basic support for sending notifications. As of now, the only event that triggers a notification is when a booking is about to expire in 20 minutes, and the currently supported type of notification is email.

Configuration

Each type of notification might need a different set of credentials and configuration process.

Email

Here is the list of environment variables that are used to configure the email notifier:

SMTP_HOST, SMTP_PORT, SMTP_USERNAME, SMTP_PASSWORD, SMTP_SENDER

Adding notifications to a booking

To use a type of notification, we need to add the notifications field to the booking manifest. The notifications field is a collection of types of notifications to be sent, so in theory we can trigger notifications to multiple recipients and even different type of notifications at once.

Each item fron the notifications collections needs to have a type of a notification and a recipient:

apiVersion: manager.kotaico.de/v1
kind: Booking
metadata:
  labels:
    app.kubernetes.io/name: booking
    app.kubernetes.io/instance: backup-jan10
    app.kubernetes.io/part-of: resource-booking-operator
    app.kuberentes.io/managed-by: kustomize
    app.kubernetes.io/created-by: resource-booking-operator
  name: backup-jan10
spec:
  resource_name: ec2.analytics
  start_at: 2023-01-10T22:35:00Z
  end_at: 2023-01-10T22:45:00Z
  user_id: cd39ad8bc3
  notifications:
  - recipient: example@example.com
    type: email

The recipient is a string that can be used as an identifying information about who will receive the notification. It can be an email address, a phone number, a slack channel, etc. It depends on the type of notification.

Writing a custom notifier

To write a custom notifier, we need to implement the Notifier interface:

type Notifier interface {
	Prepare(booking managerv1.Booking) Notifier
	Send() error
}

The Prepare method is used to prepare the notification to be sent. It receives the booking that triggered the notification and returns the notifier itself. It's usually used to fill the notification instance with the necessary data before sending it.

The Send method is where we call the external service to send the notification. It returns an error if the notification could not be sent.

Once we've implemented the new notifier, we need to add it to our factory method:

func NewNotifier(notification managerv1.Notification) (Notifier, error) {
	switch notification.Type {
	case "email":
		return &Email{Recipient: notification.Recipient}, nil
	default:
		return nil, errors.New("Notifier type not found")
	}
}

After we actully implement the functionality of the new type methods — we are done. Now we can use our new notifier by adding it to the notifications field of the booking manifest.