In this episode we tackle the Billion Row Challenge in Swift. This will test our assumptions about Swift performance characteristics and lead to some optimizations that may not be obvious at first!
Length: about 2 hours
In this episode, I am thrilled to be joined by Matt Massicotte to kick off the "1 Billion Row Challenge" in Swift. The goal is to efficiently parse a massive file containing weather station data and calculate the min, max, and mean temperatures for each city. We set up a Swift package, explore basic file I/O methods, memory mapping, chunked reading, and implement a naive solution that processes the data and sets the stage for significant optimization in upcoming episodes.
The initial, naive solution for the 1 Billion Row Challenge, which involved splitting strings by separator and semicolon, parsing cities and temperatures, and then collecting and outputting data, ran in approximately 11 minutes. This time we profile our solution to see where the time is spent. We find some quick wins for optimization and drastically improve the run time.