At the time of writing this post, Fabric Spark Runtimes enable Optimized Write by default as a Spark configuration setting. This Delta feature aims to improve read performance by providing optimal file sizes. That said, what is the performance impact of this default setting, and are there scenarios where it should be disabled?
How do you develop a python library in Microsoft Fabric while maintaining the full ability to test code prior to packaging?
With Microsoft Build 2024 underway, the wave of announcements are hot off the press! This is a recap of some of the data engineering specific updates that I’m particularly excited about.
LLMs like ChatGPT and CoPilot are transforming every industry, so why not use them as a data engineer to free up time for more complex tasks? One thing every data engineer—and most humans—are revolted by is repetitive tasks. Thankfully, we don’t live in the world of iRobot and all we need are tokens to pay the LLM masters to get our work done.
Photon is a native vectorized execution engine within Databricks, entirely written in C++, designed to massively boost performance on top of Spark by circumventing some of the JVM inefficiencies and better leveraging modern hardware.