Blog

Network Restricted Databricks UCX Installation

Databricks' Unity Catalog Migration Assistant tool (UCX) requires network connectivity to GitHub to install the Labs Project using the Databricks CLI. If you're working in an environment with secure and restricted network access to Github, you'll need to work around the default installation requirements of UCX.

In this post, we'll walk you through installing UCX for the Databricks CLI without opening network access to GitHub.

Pre-Requisites

Make sure that you meet the minimum requirements for installing in your environment.

  1. A terminal environment with connectivity to your Databricks Workspaces.
  2. You have the Databricks CLI, v0.213 or higher.
    • we've tested this with version 0.224.1.
  3. The UCX python dependencies for your target version are available in a PyPi mirror, or PyPi is not restricted.
  4. You can access the Databricks UCX project source code originating from the UCX releases page within your restricted network.
    • We have tested this process on UCX version 0.47.0.
    • Communicate with your security team to ensure the software is vetted and made available through your organization's preferred tooling.

Installing UCX without Network Access to GitHub

Let's get started with the installation process.

  1. Download the compressed UCX source code
    • Download your UCX source code zip file.
  2. Decompress the UCX source code
    • Use a command-line tool, like unzip, which is available in the Databricks web terminal, to decompress the UCX source code: unzip v0.47.zip -d /path/to/ucx-source
  3. Navigate to the source directory
    • Change your working directory to the decompressed source code directory: cd /path/to/ucx-source
  4. Install local UCX release using the Databricks CLI
    • Run the installation command with your target workspace profile databricks labs install . --profile <target_workspace_profile>
      • Replace <target_workspace_profile> with the name of your Databricks workspace profile.
    • Note:
      • If you don't set a Databricks workspace profile, the Databricks CLI will attempt to use your default Account profile for authentication.
      • If you are using a PyPi mirror, use the following command: PIP_INDEX_URL="<your organization's PyPi url>" databricks labs install . --profile <target_workspace_profile>
  5. Verify the Installation
    • After running the install command, follow the prompts to complete the installation.
    • You can verify the installation by checking the labs projects installed by running the following command: databricks labs installed
    • The installed command shows the labs projects installed, which will include the UCX project and version you just installed, like so: Name Description Version ucx Unity Catalog Migration Toolkit (UCX) v0.47.0

That's it! You've successfully installed UCX on Databricks CLI without requiring network access to GitHub. The Databricks CLI will walk you through a typical UCX installation after the UCX install command, which we have already executed above.

Closing Thoughts

This guide is a product of the our experience helping customers migrate to Unity Catalog in secure and highly regulated environments. If you're still curious on the issues we've faced while solving this problem, read on for more details on the issues we encountered and how we came to this solution.

We first attempted to install UCX through a wheel in a PyPi mirror file, which failed with Github access was restricted. The error we encountered was:

Error: remote: read labs.yml from GitHub: Get "https://raw.githubusercontent.com/databrickslabs/UCX/v0.47.0/labs.yml": dial tcp: lookup raw.githubusercontent.com: no such host

It was clear that the Databricks CLI was still trying to reach out to GitHub before checking any PyPi repository for the UCX project. We decided to try and outsmart the Databricks CLI by copying the directory of a UCX project that has already been installed in the ~/.databricks/labs/ucx directory to a new location and running the PIP_NO_BUILD_ISOLATION=0 databricks install ucx command to prevent reinstalling build dependencies. Unfortunately, this also failed, with the same error we saw before. It was clear that we needed to understand how the Databricks CLI was setting up the installation.

After reviewing how the Databricks CLI installs Labs projects, we discovered that the NewInstaller function checks if the project name is "." in the first line if name == ".". This check offered us the best chance of installing UCX without requiring access to GitHub. By specifying . as the project name, the Databricks CLI installs a local project. Although intended for local development, this method is currently the easiest way to install UCX in a network-restricted environment using the Databricks CLI.

We hope this unorthodox UCX installation guide can help you out until the Databricks CLI team adds native installation functionality for network-restricted environments. If you have any questions or want assistance with your Unity Catalog migration, please get in touch with us at Rearc.

Next steps

Ready to talk about your next project?

1

Tell us more about your custom needs.

2

We’ll get back to you, really fast

3

Kick-off meeting

Let's Talk