I’m guilty. I’ve peddled the #NotebookEverything tagline more than a few times.
Coming from a notebook-first Spark background, I wanted to write the introduction to Spark Job Definitions (SJDs) that I wish I had when I first encountered them. If you are first interest in why you might want to use a Spark Job Definition over a Notebook, see my blog here.
I’m excited to formally announce LakeBench, now in version v0.3, the first Python-based multi-modal benchmarking library that supports multiple data processing engines on multiple benchmarks. You can find it on GitHub and PyPi.
Last December (2024) I published a blog seeking to explore the question of whether data engineers in Microsoft Fabric should ditch Spark for DuckDb or Polars. Six months have passed and all engines have gotten more mature. Where do things stand? Is it finally time to ditch Spark? Let The Small Data Showdown ‘25 begin!
This is part 2 of my prior post that continues where I left off. I previously showed how you can use Resource folders in either the Notebook or Environment in Microsoft Fabric to do some pretty agile development of Python modules/libraries.