a microsoft data platform blog

Decoding Delta Lake Compatibility Between Fabric and Databricks

March 22, 2024

With Microsoft going all in on Delta Lake, the landscape data architects deeply integrated with the Microsoft stack is undergoing a significant transformation. The introduction of Fabric and OneLake with a Delta Lake driven architecture meanas that the decision on which data platform to use no longer hinges on the time and complexity of moving data into the platform’s data store. Instead, we can now virtualize our data where it sits, enabling various Delta-centric compute workloads, including Power BI.

Beyond Information Schema: Metadata Mastery in a Fabric Lakehouse

February 26, 2024

Have you ever needed to delve into the Information Schema within a notebook environment? There are myriad reasons for wanting to do so, such as: Programmatically recreating view definitions in another lakehouse Identifying table dependencies via view definitions Locating tables that include a soon-to-be-dropped column

Cluster Configuration Secrets for Spark: Unlocking Parallel Processing Power

February 19, 2024

Something I’ve always found challenging in PaaS Spark platforms, such as Databricks and Microsoft Fabric, is efficiently leveraging compute resources to maximize parallel job execution while minimizing platform costs. It’s straightforward to spin up a cluster and run a single job, but what’s the optimal approach when you need to run hundreds of jobs simultaneously? Should you use one large high-concurrency cluster, or a separate job cluster for each task?

RLS in Databricks Unity Catalog and Power BI

January 26, 2024

Unity Catalog introduces many new concepts in Databricks, particularly around security and governance. One significantly improved security feature that Unity Catalog enables is Row Level Security (hereby referred to as RLS).

Querying Databases in Apache Spark: Pandas vs. Spark API vs. Pandas-on-Spark

January 24, 2024

Apache Spark offers tremendous capability, regardless of the implementation—be it Microsoft Fabric or Databricks. However, with vast capabilities comes the risk of using the wrong “tool in the shed” and encountering unnecessary performance issues.

6 / 8