Implementing Multi-Cloud Security With Zscaler
Implementing Zero Trust in Multi-Cloud Environments With Zscaler
Databricks' Unity Catalog Migration Assistant tool (UCX) requires network connectivity to GitHub to install the Labs Project using the Databricks CLI. If you're working in an environment with secure and restricted network access to Github, you'll need to work around the default installation requirements of UCX.
In this post, we'll walk you through installing UCX for the Databricks CLI without opening network access to GitHub.
Make sure that you meet the minimum requirements for installing in your environment.
Let's get started with the installation process.
unzip
, which is available in the Databricks web terminal, to decompress the UCX source code: unzip v0.47.zip -d /path/to/ucx-source
cd /path/to/ucx-source
databricks labs install . --profile <target_workspace_profile>
<target_workspace_profile>
with the name of your Databricks workspace profile.PIP_INDEX_URL="<your organization's PyPi url>" databricks labs install . --profile <target_workspace_profile>
databricks labs installed
installed
command shows the labs projects installed, which will include the UCX project and version you just installed, like so:
Name Description Version
ucx Unity Catalog Migration Toolkit (UCX) v0.47.0
That's it! You've successfully installed UCX on Databricks CLI without requiring network access to GitHub. The Databricks CLI will walk you through a typical UCX installation after the UCX install command, which we have already executed above.
This guide is a product of the our experience helping customers migrate to Unity Catalog in secure and highly regulated environments. If you're still curious on the issues we've faced while solving this problem, read on for more details on the issues we encountered and how we came to this solution.
We first attempted to install UCX through a wheel in a PyPi mirror file, which failed with Github access was restricted. The error we encountered was:
It was clear that the Databricks CLI was still trying to reach out to GitHub before checking any PyPi repository for the UCX project. We decided to try and outsmart the Databricks CLI by copying the directory of a UCX project that has already been installed in the `~/.databricks/labs/ucx` directory to a new location and running the `PIP_NO_BUILD_ISOLATION=0 databricks install ucx` command to prevent reinstalling build dependencies. Unfortunately, this also failed, with the same error we saw before. It was clear that we needed to understand how the Databricks CLI was setting up the installation.
After reviewing how the Databricks CLI installs Labs projects, we discovered that the [`NewInstaller`](https://github.com/databricks/cli/blob/main/cmd/labs/project/fetcher.go#L57) function checks if the project name is `"."` in the first line [`if name == "."`](https://github.com/databricks/cli/blob/main/cmd/labs/project/fetcher.go#L58). This check offered us the best chance of installing UCX without requiring access to GitHub. By specifying `.` as the project name, the Databricks CLI installs a local project. Although intended for local development, this method is currently the easiest way to install UCX in a network-restricted environment using the Databricks CLI.
We hope this unorthodox UCX installation guide can help you out until the Databricks CLI team adds native installation functionality for network-restricted environments. If you have any questions or want assistance with your Unity Catalog migration, please get in touch with us at [Rearc](https://www.rearc.io/contact).
Read more about the latest and greatest work Rearc has been up to.
Implementing Zero Trust in Multi-Cloud Environments With Zscaler
How to Succeed at Container Migrations on AWS
Ensuring properly sized infrastructure and app performance during migrations by using monitoring tools
Rearc at AWS re:Invent 2024: A Journey of Innovation and Inspiration
Tell us more about your custom needs.
We’ll get back to you, really fast
Kick-off meeting