Skip to main content

Data Plane

Monorepo for all Data Plane (DP) components.

Prerequisites

  • Python 3.11 (for now)
  • Pants

Setup

  • Install Pants, a powerful build system that handles virtually all of our code-related tasks. Run pants help goals to see what it can do.

  • Clone the repo and open it in your IDE of choice. If you're working on a single component, you may prefer to open just that subtree of the file system to reduce the noise. Git and Pants commands will still work, but relatives paths will be different.

  • If you're doing development work, create a branch and PR on GitHub to trigger GitHub Actions which will typically take care of your deployment-related needs. Since the Control Plane is serverless, we spin up a new environment for each PR.

Structure

The monorepo structure follows EDR-6:

  • /infra: Infrastructure as code (i.e. CDK) for deployments.
  • /components: Each of the independent components of the Control Plane system.
  • /.github/workflows: The CI/CD workflows that run on GitHub Actions.
  • /docs: Markdown docs for each component
  • /3rdparty: Pants-related dependency lockfiles (generally ignore these, unless adding or upgrading dependencies - see below).

There's also lots of top-level config files to configure various tools.

Development principles

  1. Kubernetes healthchecks shouldn't check any internal Vespa logic. Kubernetes' controller should be relegated to checking the health of Vespa (via Vespa's health check endpoints). If these health checks pass then Vespa should be capable of reacting appropriately to any internal failures. Kubernetes only needs to step in if these health check fail. This is because we have two control planes managing a cluster; both Kubernetes and Vespa control the state of nodes. This principle intends to create a clearer separation of responsibilities.

Development

You can change one or more components and/or infra directories in the same PR.

For the time being, deployments are manual, so you can manage how the changes are rolled out after merging to main. However you should always ensure that the main branch is ready for deployment to production so as to not block the release of other changes. Whatever testing you need to feel confident should be automated on your branch.

Adding new roots

To add a new Pants root for your component/infra, update the pants.toml file in the top-level directory by adding the new path to the root_patterns variable in the [source] section.

[source]
root_patterns = [
...
"/path/to/new/root",
]

Then run:

pants tailor

Once the BUILD files are configured, add a new python.resolve in pants.toml. This tells pants where to generate the dependency lockfiles. Generating the lockfile is found in the next section.

[python.resolves]
my_component = "3rdparty/python/my_component.lock"

Changing dependencies

To add or update dependencies for a given component first update the requirements.txt for the component, then you need to tell Pants to add it to the 3rdparty lockfile. For example, to add/update dependencies for the index_workflows component:

pants generate-lockfiles --resolve=index_workflows

These lockfiles must be committed to the respository as this is what the CI would use to resolve dependencies for each particular component.

Resolving shared dependencies

Sometimes, pants can get confused if multiple components share the same dependency package and are on different versions. To ensure that a component's dependency resolve is consistent, we must add the following to the main BUILD file for each component's root

e.g. in ./components/my_component/BUILD

__defaults__(
all=dict(resolve="my_component"),
)

This tell's Pants to use this particular resolve when importing modules/packages.

Testing

Once a new component is correctly set up, you can run the following commands to run unit tests:

# Run all tests in the repository.
pants test ::

# Run all the tests in this directory.
pants test components/my_component:

# Run just the tests in this file.
pants test components/my_component/my_test_1.py

# Run just one test.
pants test components/my_component/my_test_1.py -- -k test_1

Lint and Formatting

This repo uses the ruff pants backend for our lint and formatting checks.

You can use the pants lint command to run lint against:

  • all components (pants lint ::),
  • a particular component (pants lint ./component/my_component/::),
  • or a specific file (pants lint ./component/my_component/my_specific_file.py)

Make sure to always run pants fmt to autoformat your code and keep our formatting consistent before you push a commit.

In the case of syntax errors or minor bugs (e.g unused variables), please run pants fix to autofix your code.

Both pants fmt and pants fix follows the same command argument structure as pants lint.