Since its introduction, Unity Catalog has been creating significant buzz in the data community. But what exactly is it, and why is enabling it in your Databricks workspace so important? This article dives into the essence of Unity Catalog, demonstrating how it revolutionizes data engineering in lakehouses and provides guidelines for enabling it in your existing Databricks Workspace.
So you’ve figured out how to write data into Delta format, congratulations! You’ve joined the Delta Lake club and are enabled for all the goodness that comes with Detla, such as ACID transactions, time travel, etc. Now, how do you ensure that the underlying storage of your Delta Tables is maintained so that as you have inserts, updates, and deletes taking place over time, your data is still stored in the most optimal manner.
Data engineers and architects will often spend many hundreds of hours building a complex and fully automated solution for delivering data analysis capabilities to end users but will often forget or overlook the step of provisioning user access to said capabilities. Database logins, users, and role memberships are traditionally created manually in each environment for several reasons, in this article I’ll challenge this status quo and propose that user provisioning may be better suited to be managed via Azure Active Directory security groups. This works with Azure Synapse (Dedicated and Serverless), SQL Database, SQL Managed Instance, other Microsoft SQL variants, and all surrounding cloud services (i.e. data lake, Synapse Workspace, etc.).
Proper use of table distributions in Synapse Dedicated SQL Pools is easily the #1 shortcoming in Synapse implementations.