a microsoft data platform blog

Mastering Spark: Enhancing Job Visibility

September 13, 2024

One of the most critical challenges in large-scale data processing with Apache Spark is tracking what each job is doing. As Spark applications grow in complexity, understanding what’s running and when can become difficult, especially when looking at the Spark UI.

From Databricks to Fabric: A Deep Dive into Spark Cluster Differences

August 22, 2024

Every software platform has its own terminology, and when terms overlap but don’t mean the same thing, it can be quite confusing. For example, coming from my years as a developer in Databricks land, I initially assumed that Fabric Spark Pools were just like Pools in Databricks. However, as I discovered, this assumption was completely wrong—and understanding this distinction is key to designing the right architecture.

Optimizing Spark: A Deep Dive into Optimized Write in Microsoft Fabric

August 16, 2024

At the time of writing this post, Fabric Spark Runtimes enable Optimized Write by default as a Spark configuration setting. This Delta feature aims to improve read performance by providing optimal file sizes. That said, what is the performance impact of this default setting, and are there scenarios where it should be disabled?

Elevate Your Code: Developing Python Libraries Using Microsoft Fabric

July 18, 2024

How do you develop a python library in Microsoft Fabric while maintaining the full ability to test code prior to packaging?

Fabric Announcements at Build '24

May 23, 2024

With Microsoft Build 2024 underway, the wave of announcements are hot off the press! This is a recap of some of the data engineering specific updates that I’m particularly excited about.

2 / 6