a microsoft data platform blog

Mastering Spark: Session vs. DataFrameWriter vs. Table Configs

December 20, 2024

With Spark and Delta Lake, just like with Hudi and Iceberg, there are several ways to enable or disable settings that impact how tables are created. These settings may affect data layout or table format features, but it can be confusing to understand why different methods exist, when each should be used, and how property inheritance works.

Should You Ditch Spark for DuckDb or Polars?

December 12, 2024

There’s been a lot of excitement lately about single-machine compute engines like DuckDB and Polars. With the recent release of pure Python Notebooks in Microsoft Fabric, the excitement about these lightweight native engines has risen to a new high. Out with Spark and in with the new and cool animal-themed engines— is it time to finally migrate your small and medium workloads off of Spark?

Announcing the Microsoft Fabric Shape Library for Excalidraw

November 20, 2024

I’m thrilled to announce that my Microsoft Fabric Shape Library for Excalidraw has been published!

Unlock Faster Writes in Delta Lake with Deletion Vectors

November 04, 2024

Since Delta 2.3, deletion vectors have been available, but only recently have we been able to take full advantage of them to improve the performance of write operations. As of Delta 3.1, all operations support deletion vector optimizations. Fabric customers using Runtime 1.3 (Delta 3.2) can now benefit from much faster writes with very little impact on reads.

Breaking the Myth: Spark Isn’t as Scary as You'd Think (And Yes, It Supports SQL!)

October 24, 2024

I frequently hear people talk about Spark as being this complex, pro-dev oriented engine, that is unapproachable to the traditional SQL developer and modern Analytics Engineer. Commonly referenced is the necessity to learn Python, Scala, and completely different data processing constructs.

2 / 8