Network Restricted Databricks UCX Installation
A guide on installing UCX on Databricks CLI without opening up a restricted network to allow external services, e.g., GitHub access.
The most interesting innovations and releases from the Databricks Data & AI Summit 2024, and our thoughts on what's coming next
Jump to a section:
This year's summit, with its focus on open-source initiatives, generative AI advancements, and secure data collaboration, highlighted the transformative potential of these technologies across various industries. From groundbreaking demos to significant product launches, the event offered a glimpse into the future of data-driven decision-making and AI-powered innovation. DAIS 2024 was a landmark event that showcased the rapid evolution of data intelligence and collaborative AI technologies. The themes that arose throughout the week focused heavily on data intelligence, open-source initiatives & key acquisitions, and of course - generative AI. Day one started off with a heavy hitting keynote with a number of exciting announcements and customer led showcases.
One of the most visually intriguing demos involved using the newly announced Shutterstock ImageAI to create an image for a sample company's Instagram post, and demonstrated how the company's existing enterprise data could be used to generate an image that accurately represented the theme and goals requested. This also showcased the release of the Mosaic AI Agent Framework, which is a “suite of tooling designed to help developers build and deploy high-quality generative AI applications using RAG for output that is consistently measured and evaluated to be accurate, safe and governed”. This is yet another sign of Databricks' investment into offering one of the most premier and complete platforms for developing Generative AI. Shutterstock's ImageAI, built entirely from their vast collection of licensed images, opens up new avenues for enterprises to use image generation technologies with confidence. It provides assurance that the generated materials are derived from legally compliant and ethically sourced datasets, and it's no surprise they chose Databricks as the platform to launch on.
Along with the announcement of the Agent Framework was the release of Mosaic AI Agent Evaluation. This tool “enables developers to quickly and reliably evaluate the quality, latency, and cost of agentic generative AI applications, including the simpler forms of RAG applications and chains.” Rearc's own David Maxson goes into depth in one of our latest blog posts on using MLFlow Tracing to identify, debug and fix hallucinations for a specific scenario using a sample RAG chatbot application.
The clear advancement of the Mosaic AI platform on Databricks, and the fact that numerous companies are deploying models powered by Databricks, was just one of the many synergistic technologies announced during DAIS 2024.
Ali Ghodsi, the CEO of Databricks, made it clear that the company was committed to an open source and collaborative data ecosystem, regardless of the current state of an enterprise's data journey. To solidify these claims, Databricks' open sourced Unity Catalog - a comprehensive data governance solution that provides unified access control, auditing, and discovery across data, AI, and analytics assets. This is a critical step forward for a number of reasons:
The open-sourcing of Unity Catalog represents a significant shift towards a more collaborative and transparent approach to data governance. By democratizing access to robust data governance tools, Databricks is empowering organizations of all sizes to harness the full potential of their data while maintaining the highest standards of security and compliance.
Another exciting announcement was the release of Clean Rooms. This was something we here at Rearc got to partner with Databricks and Mastercard on the initial testing and implementation of a project that utilized Clean Rooms heavily, and we are beyond excited it is finally entering the Public Preview phase.
Clean Rooms on Databricks are described as allowing “businesses to easily collaborate in a secure environment with their customers and partners on any cloud in a privacy-safe way.” Originally announced at DAIS 2023, Clean Rooms is finally entering public preview. To best understand what Clean Rooms on Databricks could do for your business, it's best to understand how other enterprises are currently engaging with this new technology.One compelling use case for Clean Rooms is in addressing Know Your Customer (KYC) standards in the banking and financial services sector. These standards, designed to combat financial fraud and money laundering, require banks to verify the identity and assess the risk profile of their customers.
This process necessitates high-velocity collaboration involving sensitive and protected data sources across multiple organizations, including banks, credit bureaus, and government agencies. Clean Rooms offer a revolutionary solution to this complex challenge as they provide a secure pathway for creating aggregate results from datasets contributed by multiple collaborators without exposing the raw underlying data.
This approach allows financial institutions to cross-reference customer information against various databases, identify potential red flags, and comply with regulatory requirements while maintaining strict data privacy. For instance, a bank could verify a customer's identity and financial history by comparing their data against records from credit bureaus and government watchlists, all within the confines of a Clean Room.
The beauty of this system lies in its ability to generate only aggregated and approved results as output, ensuring that each participating organization's sensitive data remains protected. This not only enhances the efficiency and accuracy of KYC processes but also significantly reduces the risk of data breaches and unauthorized access to confidential information.
Another scenario where Clean Rooms play a crucial role is category management for retail and consumer goods. Clean Rooms enable retailers and suppliers to collaborate securely on sensitive data without exposing individual transaction details or customer information. This allows for joint analysis of sales trends, inventory levels, and consumer insights, leading to more accurate demand forecasting and optimized product assortments.
Retailers can share aggregated data with suppliers, enabling them to analyze product performance across different stores or regions without accessing raw sales data. This collaborative approach facilitates more effective pricing strategies, personalized marketing campaigns, and even new product development based on identified market gaps.
By leveraging Clean Rooms, retailers and suppliers can make more informed decisions, improve category performance, and enhance the overall shopping experience while maintaining strict data privacy and security standards.
DAIS 2024 showcased Databricks AI/BI, a groundbreaking business intelligence product designed to democratize analytics and insights across organizations. This AI-first approach to Business Intelligence combines two key capabilities: Dashboards for quick, interactive data visualizations, and Genie, a conversational interface that allows users to query data using natural language. What sets AI/BI apart is its deep integration with the Databricks Data Intelligence Platform, enabling it to understand an organization's unique data structures and business concepts. Putting the tooling to analyze and act on enterprise data into the hands of everyone in the organization is a revolutionary step forward. You can imagine this progressing to a point where it can be used live during executive meetings to enable non technical resources to derive insights, or optimizing a data analysts approach to solving problems. The time saved that was usually spent on understanding and deciphering complex enterprise data structures can now be used to ask important questions, and derive crucial insights faster than ever before.
The 2024 Databricks' Data & AI Summit showcased a series of groundbreaking advancements that are set to reshape the landscape of data intelligence and collaboration. The open-sourcing of Unity Catalog, the introduction of Clean Rooms, and the enhancements to the Mosaic AI platform collectively point towards a future where data-driven decision-making becomes more accessible, secure, and powerful. It will be exciting to see how businesses leverage these tools to create value and drive transformation in the immediate future.
Read more about the latest and greatest work Rearc has been up to.
A guide on installing UCX on Databricks CLI without opening up a restricted network to allow external services, e.g., GitHub access.
Our seasoned engineers at Rearc are here to share their insights for navigating anything spooky in your next digital transformation project
The Art of Hiring: How Rearc Matches Top Talent
LLM and Copyright
Tell us more about your custom needs.
We’ll get back to you, really fast
Kick-off meeting