Network Restricted Databricks UCX Installation
A guide on installing UCX on Databricks CLI without opening up a restricted network to allow external services, e.g., GitHub access.
Have you ever written terraform code and found that your code paths covered edge cases that were only in production? Or perhaps, written a reusable terraform module that you needed some way to test and prove worked as expected? When you write software, it's very often expected to write tests to prove the software is works as intended and is relatively bug free. With Infrastructure as Code, testing is often an afterthought, if ever even considered, which can make testing such things difficult. With tools such as Kitchen (also known as Test Kitchen), Inspec, and Kitchen Terraform (a plugin for Kitchen) you can follow test driven development practices, improve the quality, prove functionality, and document your Terraform.
In this post we'll show you how to add testing to your Terraform which you can incorporate into your workflow, by working through a reusable Terraform Module example for managing AWS VPCs.
All examples, a Dockerfile, and helper scripts are available here.
We've provided a Dockerfile
and a helper script to ease the setup of all the tools you'll need while following this guide, but still we'll walk-through getting everything setup.
If you've checked out our git repo from above, there's a Dockerfile
provided. It should provide a suitable execution environment for running our tests.
cd kitchen-terraform
docker build -t kitchen-terraform:latest .
mkdir kitchen-terraform && cd kitchen-terraform
Package | Version (at time of writing) | Notes |
---|---|---|
ruby | 3.1.3p185 | |
gem | 3.3.26 | |
bundler | 2.3.26 | |
kitchen | 3.4.0 | |
inspec | 4.56.20 | |
kitchen-terraform | 6.1.0 | |
terraform | v1.3.6 | |
aws-cli | 2.9.4 | |
bash | >=4.0.0 | Needed for running validation scripts below |
You'll also need to have an IAM user with programmatic access in AWS with permissions to create and destroy the following resources: VPCs, Subnets, and NAT Gateways.
It's also advisable to have a read-only user with programmatic access for validation with Inspec.
Ruby
brew install ruby
Terraform
brew install terraform
AWS CLI
brew install awscli
BaSH
brew install bash
Bundler
gem install bundler
Create a Gemfile
cat>Gemfile<<EOF
# frozen_string_literal: true
source "https://rubygems.org/"
gem "kitchen-terraform", "~> 6.1"
gem "inspec-bin", "~>4.56.20"
EOF
Install Kitchen-Terraform and Inspec via Bundler
bundle install
cat>validate_deps.sh<<EOF
#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'
declare -A DEPS
DEPS=( [ruby]='3.1.3'
[terraform]='v1.3.6'
[aws]='2.9.4'
[inspec]='4.56.20'
)
for dep in "${!DEPS[@]}"; do
printf "checking for %s version %s..." "$dep" "${DEPS[$dep]}"
(command -v "$dep">/dev/null) || printf "\n\tUnable to find %s...\n" "$dep"
_version="$($dep --version)"
[[ "$_version" =~ "${DEPS[$dep]}" ]] || printf "\n\t%s installed but version is %s...\n" "$dep" "$_version"
printf "Done.\n"
done
printf "checking for kitchen-terraform..."
kitchen_terraform_ver="$(gem list | grep kitchen-terraform | sed 's/.* (\([0-9]\+\.[0-9]\+\.[0-9]\+\))/\1/')"
[[ "$kitchen_terraform_ver" =~ "6.1.0" ]] || printf "kitchen-terraform is install but version is %s..." "$kitchen_terraform_version"
printf "Done.\n"
EOF
bash ./validate_deps.sh
checking for ruby...Done.
checking for terraform...Done.
checking for aws...Done.
checking for inspec...Done.
checking for kitchen-terraform...Done
Now that we have our tools setup, we can begin the development of our terraform module.
This example is available under
examples/aws/simple_vpc
The location for our Inspec tests is test/integration/<test suite>
. Inside that directory we'll have the following subdirectories: controls
and fixtures
.
mkdir -p test/integration/default/{controls,fixtures}
Inspec is a framework that we'll use for testing our Terraform code. It uses a Domain Specific Language (DSL) similar to Ruby rspec, so it is very close to a natural language is very easy to write and read. Inspec is configured via Profiles, these are used to organize the tests, allowing for the creation of complex test suites.
cat>test/integration/default/inspec.yml<<EOF
---
name: AWS_VPC_Default
title: AWS Default InSpec Profile
maintainer: Rearc Engineering
copyright: Rearc LLC
copyright_email: engineering@rearc.io
license: Apache-2.0
summary: >-
An integration test profile for validation of creation of
AWS Virtual Private Clouds
version: 0.1.0
inspec_version: ">= 2.2.7"
depends:
- name: inspec-aws
url: https://github.com/inspec/inspec-aws/archive/main.tar.gz
supports:
- platform: aws
EOF
Let's look at the different fields in this file. Taken from here
name
to specify a unique name for the profile. Always Required.title
to specify a human-readable name for the profile.maintainer
to specify the profile maintainer.copyright
to specify the copyright holder.copyright_email
to specify support contact information for the profile, typically an email address.license
to specify the license for the profile.summary
to specify a one line summary for the profile.description
to specify a multiple line description of the profile.version
to specify the profile version.inspec_version
to place SemVer constraints on the version of Chef InSpec that the profile can run under.supports
to specify a list of supported platform targets.depends
to define a list of profiles on which this profile depends. This Field is required when testing Public Cloud Resources as we have to specify the additional profile which extends the Inspec DSL with the additional verbs, and nouns needed to write tests for the resources we wish to test.*inputs
to define a list of inputs you can use in your controls.gem_dependencies
to specify a list of profile gem dependencies that is required to be installed for the profile to function correctly.Now we'll write our first test. Inspec tests should go in test/integration/<test suite>/controls/<control>.rb
aws_region = input('output_region')
vpcs = input('output_vpcs')
title 'Default Section'
control 'aws-virtual-network' do
impact 'critical'
title 'Inspect Virtual Private Clouds' # A human-readable title
vpcs.each do |_, vpc|
describe aws_vpc( aws_region: aws_region, vpc_id: vpc[:vpc_id] ) do
it { should exist }
end
end
end
Let's break this down.
In the first part we set some variables by calling to input() and passing the name of the terraform output.
The kitchen-terraform plugin automatically maps all the terraform outputs to Inspec inputs named output_<variable name>
.
aws_region = input('output_region')
vpcs = input('output_vpcs')
The next block is the actual test where we test an attribute about the resource that is created with terraform.
control 'aws-virtual-network' do
impact 'critical'
title 'Inspect Virtual Private Clouds' # A human-readable title
vpcs.each do |_, vpc|
describe aws_vpc( aws_region: aws_region, vpc_id: vpc[:vpc_id] ) do
it { should exist }
end
end
end
First we define the control block, aws-virtual-network
.
control 'aws-virtual-network' do
From the Inspec Glossary:
The control keyword is used to declare a control block. Here, the word ‘control’ means a ‘regulatory control, recommendation, or requirement’ - not a software engineering construct. A control block has a name (which usually refers to the assigned ID of the regulatory recommendation it implements), metadata such as descriptions, references, and tags, and finally groups together related describe blocks to implement the checks.
Now we set the impact for the control. Can be a string or numeric.
impact 'critical'
Valid strings are none
, low
, medium
, high
, and critical
.
Valid numeric ranges:
0.0 to <0.01 these are controls with no impact, they only provide information
0.01 to <0.4 these are controls with low impact
0.4 to <0.7 these are controls with medium impact
0.7 to <0.9 these are controls with high impact
0.9 to 1.0 these are critical controls
A statement to setup a loop over the VPCs as output from the test fixture.
vpcs.each do |_, vpc|
Following that we have a describe block.
describe aws_vpc( aws_region: aws_region, vpc_id: vpc[:vpc_id] ) do
From the Inspec Glossary
The describe keyword is used with a describe block to refer to a Chef InSpec resource. You use the describe keyword along with the name of a resource to enclose related tests that apply to the resource. Multiple describe blocks are usually grouped together in a control, but you can also use them outside of a control.
A complete list of resources supported by Inspec is here
Finally we get to a test.
A describe block must contain at least one matcher, but may contain as many as required
it { should exist }
A complete list of supported matchers/tests are here.
Finally don't forget to close out all the blocks with end
statements.
end
end
end
Now that we've got some tests it'd be great if we can automate this in some fashion, that's where Kitchen comes in.
From the official documentation,
Use a kitchen.yml file to define what is required to run Test Kitchen, including drivers, provisioners, platforms, and test suites.
cat>kitchen.yml<<EOF
---
provisioner:
name: terraform
verifier:
name: terraform
platforms:
- name: aws
suites:
- name: default
driver:
name: terraform
root_module_directory: test/integration/default/fixtures
verifier:
systems:
- name: aws
fail_fast: true
backend: aws
EOF
Now let's take a deeper look at this file.
provisioner:
name: terraform
Provisioner says how the tests well be setup/provisioned. Here we're setting it to terraform so as to use the kitchen-terraform plugin.
A more in-depth overview of the provisioner options can be found here
verifier:
name: terraform
Verifier specifies which application to use when running tests, such as Inspec. In this case we're using the kitchen-terraform driver which calls Inspec.
Here is a more complete look at all the options available for the verifier.
platforms:
- name: aws
Platforms are used to define attributes that common to the collection of test suites. We're setting it to aws
here since that's the cloud provider we're looking at. All of our test suites will be suffixed with this name.
suites:
- name: default
driver:
name: terraform
root_module_directory: test/integration/default/fixtures
verifier:
systems:
- name: aws
fail_fast: true
backend: aws
Suites is where the majority of the automation configuration takes place.
First we set a name for the suite this must map one-to-one with the directory path we setup earlier.
Next we specify the test driver which will run our test fixture creating the test resources. We pass the path to the fixture used to instantiate the terraform module we are testing, think of it like an example use of your terraform module.
Also keep in mind that if you're testing a "root" terraform module or at least something more complete than a reusable module, as we're building here, then you don't need to create fixtures and simply passing in input variables will be sufficient.
A more compete look at the driver configuration.
Finally we setup the verifier, to have a system, and provide some configuration ensuring we fail at the first error, and set the Inspec backend to use AWS.
The verifier options can be found here
Now that we've got our testing in place lets see it in action. We're going to add our terraform module.
This is the Terraform that we are going to be testing. The design is to be a reusable Terraform Module for building one or more Virtual Private Clouds in AWS.
This module is pretty simple it just takes in a list of objects, which define the VPCs to built, and then passes them to the AWS VPC module from the Terraform Registry.
Pay particular attention to the outputs these values are what will be passed to Inspec as inputs via kitchen-terraform, specifically those from the fixture.
terraform {
required_version = "~>1.3.0, <2.0.0"
}
module "this" {
for_each = local.vpcs
source = "terraform-aws-modules/vpc/aws"
version = "3.16.1"
name = each.value.name
cidr = each.value.cidr
azs = each.value.azs
private_subnets = each.value.private_subnets
public_subnets = coalesce(each.value.public_subnets, [])
enable_nat_gateway = coalesce(each.value.enable_nat_gateway, false)
enable_vpn_gateway = coalesce(each.value.enable_vpn_gateway, false)
single_nat_gateway = coalesce(each.value.single_nat_gateway, false)
enable_ipv6 = coalesce(each.value.enable_ipv6, false)
public_subnet_tags = coalesce(merge(each.value.public_subnet_tags, var.tags), tomap({}))
vpc_tags = coalesce(merge(each.value.vpc_tags, var.tags), tomap({}))
tags = coalesce(merge(each.value.tags, var.tags), tomap({}))
}
output "vpc" {
value = module.this
}
variable "vpcs" {
type = list(object({
name = string
cidr = string
azs = list(string)
private_subnets = list(string)
public_subnets = optional(list(string))
enable_nat_gateway = optional(bool)
enable_vpn_gateway = optional(bool)
single_nat_gateway = optional(bool)
enable_ipv6 = optional(bool)
public_subnet_tags = optional(map(string))
vpc_tags = optional(map(string))
tags = optional(map(string))
}))
}
variable "tags" {
type = map(string)
default = null
}
locals {
vpcs = { for i, v in var.vpcs : v.name => v }
}
This is a Terraform module which will supply test data to our VPC module. It is also rather simple, and not dynamic. The point is to have predictable data that we can test for. This module is also a good source for documentation on how to use the VPC module.
The following TF files should go in
test/integration/default/fixtures
terraform {
required_version = "~>1.3.0, <2.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~>4.0"
}
}
}
provider "aws" {
region = var.region
access_key = var.access_key
secret_key = var.secret_key
token = var.token
}
module "test" {
source = "../../../../"
vpcs = [{
name = "kitchen-terraform"
cidr = "10.0.0.0/16"
azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
enable_ipv6 = false
enable_nat_gateway = false
single_nat_gateway = true
tags = {
Terraform = "True"
Kitchen = "True"
Workspace = terraform.workspace
}
}]
}
output "terraform_workspace" {
value = terraform.workspace
}
output "terraform_state" {
description = "The path to the backend state file"
value = "${path.module}/terraform.tfstate.d/${terraform.workspace}/terraform.tfstate"
}
output "region" {
value = var.region
}
output "vpcs" {
value = module.test.vpc
}
variable "secret_key" {}
variable "token" { default="" }
variable "access_key" {}
variable "region" {}
Whew––we've finally made it to actually running this Rube Goldberg machine.
We highly suggest using separate users for the management of the test resources and the validation, with the validation user only having read-only permissions.
cat>.env<<EOF
# User with permissions to manage VPCs, Subnets, etc.
export TF_AWS_ACCESS_KEY_ID="<Access Key>"
export TF_AWS_SECRET_ACCESS_KEY="<Secret Access Key>"
# Uncomment if using MFA
# export TF_AWS_SESSION_TOKEN="<Session Token>"
# Read-only user used to validate Resources
export INSPEC_AWS_ACCESS_KEY_ID="<Access Key>"
export INSPEC_AWS_SECRET_ACCESS_KEY="<Secret Access Key>"
export AWS_REGION="us-east-1" # change to a region that works best for you
EOF
If using the same user to both manage and validate set TF_VAR_* to the same values as the respective AWS_* variables
export TF_VAR_access_key="<Access Key>"
export TF_VAR_secret_access="<Secret Access Key>"
# If using MFA
# export TF_VAR_token="<Session Token>"
# Read-only User
export AWS_ACCESS_KEY_ID="<Access Key>"
export AWS_SECRET_ACCESS_KEY="<Secret Access Key>"
export AWS_REGION="us-east-1" # change to a region that works best for you
./run.sh test
kitchen test
You can see in the the demo above that kitchen will automatically do terraform workspace new <test suite>
, terraform workspace select <test suite>
, terraform init
, terraform validate
, and terraform apply
to the test fixture module.
Once the resources are built, it calls to Inspec to run the controls and validate that the resources are what we expect them to be.
Finally it does a terraform destroy
, terraform workspace select default
, and terraform workspace delete <test suite>
.
Taking a look at the output just from the controls:
Profile: AWS Default InSpec Profile (AWS_VPC_Default)
Version: 0.1.0
Target: aws://us-east-1
✔ aws-virtual-network: Inspect Virtual Private Clouds
✔ VPC vpc-0d6060a4d30dc4883 in us-east-1 is expected to exist
Profile: Amazon Web Services Resource Pack (inspec-aws)
Version: 1.83.53
Target: aws://us-east-1
No tests executed.
Profile Summary: 1 successful control, 0 control failures, 0 controls skipped
Test Summary: 1 successful, 0 failures, 0 skipped
From this you can see that our single test is successful. The second profile is from inspec-aws, it only has the extended resources and matches for testing AWS resources but no actual tests.
Running the complete test is great for incorporating kitchen as part of your CI/CD pipeline, but it's also really handy when using it as part of a development workflow. For that there's a few kitchen commands you'll want to be familiar with, converge
, verify
, and destroy
.
Let's take a look at each and add additional control for our simple VPC.
Change the current control to be the following:
aws_region = input('output_region')
vpcs = input('output_vpcs')
title 'Default Section'
control 'aws-virtual-network' do
impact 'critical'
title 'Inspect Virtual Private Clouds' # A human-readable title
vpcs.each do |_, vpc|
describe aws_vpc( aws_region: aws_region, vpc_id: vpc[:vpc_id] ) do
it { should exist }
its('cidr_block') { should cmp '10.0.0.0/16' }
end
end
end
Here we'll ensure that the VPC as built ended up with the Network we specified in the fixture. You can also use the variable vpc[:vpc_cidr_block]
instead of '10.0.0.0/16' which would be the value that Terraform provided in the outputs.
Now let's build our test resources by using the converge
command
./run.sh converge
kitchen converge
Now that we have some test resources built, let's verify them by running our newly updated controls.
./run.sh verify
kitchen verify
Now that we've verified our resources, let's clean up.
./run.sh destroy
kitchen destroy
As you can see, using kitchen and inspec to test our Terraform in an automated fashion is very powerful. We've shown how it can act as a means for validation, proving your code does what it is intended to do. It can provide a means of documentation by showing how the Terraform module is expected to be used. Finally, it can act as a safety net ensuring that as you develop your Terraform modules you aren't making unintended or breaking changes.
This post is just the tip of the iceberg of what you can do with these tools, but we hope you find this introduction valuable.
Read more about the latest and greatest work Rearc has been up to.
A guide on installing UCX on Databricks CLI without opening up a restricted network to allow external services, e.g., GitHub access.
Our seasoned engineers at Rearc are here to share their insights for navigating anything spooky in your next digital transformation project
The Art of Hiring: How Rearc Matches Top Talent
LLM and Copyright
Tell us more about your custom needs.
We’ll get back to you, really fast
Kick-off meeting