Plotting data distributions with Swift Charts
In this post I'm going to explore how to visualise data distributions with Swift Charts framework.
I want to look at a dataset of some survey data collected from penguin colonies. The dataset captures many different anatomical details, such as weight, flipper length and bill length. To start my investigation, I will look at the distribution of values for each of these parameters.
To build a histogram, I need to group my data into bins and count how many samples are within each bin. Charts provides a simple way to do the binning operation with NumberBins data types.
The NumberBins
datatype has a range of constructors. The simplest one to use is init(data:desiredCount:), it detects the best placement of bin thresholds from the input data.
let bins = NumberBins(
data: dataset.map(\.billLength),
desiredCount: 30
)
However, just using NumberBins
is not enough, it doesn't group the data into bins. It just provides a way to easily determine which bin each bit of data belongs to. To group the data I need to use Dictionary.init(grouping:by:) constructor along with the bins to group the data.
let groups = Dictionary(
grouping: dataset.map(\.billLength),
by: bins.index
)
Now I have a dictionary that maps the bin index to an array of all the values in that bin.
To plot a histogram, I need to compute the proportion of all the data samples that are in each bin.
let preparedData = groups.map { key, values in
(
index: key,
range: bins[key],
frequency: values.count
)
}
I'm computing the proportion of samples and, at the same time, retrieving the range of values that correspond to this bin.
Since the number of samples is small enough, I will just place all of the above work into a computed property.
struct PenguinChart: View {
let dataset: [PenguinsDataPoint]
var binnedData: [
(
index: Int,
range: ChartBinRange<Float>,
frequency: Int
)
] {
let bins = NumberBins(
data: dataset.map(\.billLength),
desiredCount: 30
)
let groups = Dictionary(
grouping: dataset.map(\.billLength),
by: bins.index
)
let preparedData = groups.map { key, values in
return (
index: key,
range: bins[key],
frequency: values.count
)
}
return preparedData
}
var body: some View { ... }
}
With the data aggregated into bins, it's time to plot a bar chart.
struct PenguinChart: View {
let dataset: [PenguinsDataPoint]
var binnedData: [(...)} { ... }
var body: some View {
Chart(
self.binnedData, id: \.index
) { element in
BarMark(
x: .value(
"Bill Length",
element.range
),
y: .value(
"Frequency",
element.frequency
)
)
}
}
}

The chart looks a little off, since the framework always forces the axes to start at 0
by default. Using chartXScale(domain: .automatic(includesZero: false)) modifier I can override the default here.
struct PenguinChart: View {
let dataset: [PenguinsDataPoint]
var binnedData: [(...)} { ... }
var body: some View {
Chart(self.binnedData, id: \.index) { element in
BarMark(
x: .value(
"Bill Length",
element.range
),
y: .value(
"Frequency",
element.frequency
)
)
}
.chartXScale(
domain: .automatic(includesZero: false)
)
}
}

Let's explore the other numerical properties of the data set. I can adjust the computed binnedData
to aggregate over other fields of the dataset.

Plotting just one attribute at a time is useful, however, I would love to drill down on the relationship between these variables.
# Building a 2D density plot
I feel it would be best to look into the relationship between flipper length and bill length, as both of these appear to be multimodal distributions.
I need to adjust how I'm aggregating the data. To start with, I will create bins for both flipper length and bill length.
let flipperLengthBins = NumberBins(
data: dataset.map(\.flipperLength),
desiredCount: 30
)
let billLengthBins = NumberBins(
data: dataset.map(\.billLength),
desiredCount: 30
)
To represent each grid square, I will create an intermediate data type, that combines a flipperLengthBin
index and a billLengthBin
index.
struct TwoDimensionalBinIndex: Hashable {
let xBinIndex: Int
let yBinIndex: Int
}
TwoDimensionalBinIndex
must conform to Hashable
, so that it can be used with Dictionary.init(grouping:by:)
. When grouping the data, I will create the corresponding TwoDimensionalBinIndex
for each value.
let groupedData = Dictionary(
grouping: dataset
) { element in
TwoDimensionalBinIndex(
xBinIndex:
flipperLengthBins
.index(for: element.flipperLength),
yBinIndex:
billLengthBins
.index(for: element.billLength)
)
}
I'm calling index(for:) method of the NumberBin
instances for each element in my data set and assigning that bin index to the corresponding x
and y
attributes in the TwoDimensionalBinIndex
, that is used to group the data.
As with the regular histogram, I now need to aggregate these values for each group. This time, rather than returning just a single range, I will need to return a range for x
and a range for y
.
let values = groupedData
.map { key, values in
return (
index: key,
xDataRange: flipperLengthBins[
key.xBinIndex
],
yDataRange: billLengthBins[
key.yBinIndex
],
frequency: values.count
)
}
Here subscript(position:) method on the respective NumberBins
instances is used to retrieve the ChartBinRange values for each axes. To bring this all together, it's best to first declare a typealias
for the aggregated binned values.
typealias BinnedValue = (
index: TwoDimensionalBinIndex,
xDataRange: ChartBinRange<Float>,
yDataRange: ChartBinRange<Float>,
frequency: Int
)
In my chart view, as with the histogram, I use a regular computed property.
struct PenguinChart: View {
let dataSet: [PenguinsDataPoint]
struct TwoDimensionalBinIndex: Hashable {
let xBinIndex: Int
let yBinIndex: Int
}
typealias BinnedValue = (
index: TwoDimensionalBinIndex,
xDataRange: ChartBinRange<Float>,
yDataRange: ChartBinRange<Float>,
frequency: Int
)
var bins: [BinnedValue] {
let flipperLengthBins = NumberBins(
data: dataset.map(\.flipperLength),
desiredCount: 25
)
let billLengthBins = NumberBins(
data: dataset.map(\.billLength),
desiredCount: 25
)
let groupedData = Dictionary(
grouping: dataset
) { element in
TwoDimensionalBinIndex(
xBinIndex: flipperLengthBins
.index(for: element.flipperLength),
yBinIndex: billLengthBins
.index(for: element.billLength)
)
}
let values = groupedData
.map { key, values in
return (
index: key,
xDataRange: flipperLengthBins[
key.xBinIndex
],
yDataRange: billLengthBins[
key.yBinIndex
],
frequency: values.count
)
}
return values
}
var body: some View { ... }
}
With all these bins computed, it's now a matter of plotting the values.
struct PenguinChart: View {
let dataSet: [PenguinsDataPoint]
...
var bins: [BinnedValue] { .. }
var body: some View {
Chart(
self.bins, id: \.index
) { element in
RectangleMark(
x: .value(
"Flipper Length",
element.xDataRange
),
y: .value(
"Bill Length",
element.yDataRange
)
)
.foregroundStyle(
by: .value(
"Frequency",
element.frequency
)
)
}
.chartXScale(
domain: .automatic(
includesZero: false
)
)
.chartYScale(
domain: .automatic(
includesZero: false
)
)
}
}
I use a RectangleMark to draw a rectangle for each x
and y
region, then I colour that region based on the frequency value. Furthermore, as with the histogram, I adjust the x
and y
axes to no longer start at 0
.

Clearly, some styling is going to be needed. I will adjust the colour gradient and get rid of the spacing between each of the rectangles. The colour gradient can be adjusted using chartForegroundStyleScale(range:) modifier. And by passing width and height values to RectangleMark
initialiser, I'm able to controle the spacing provided.
struct PenguinChart: View {
...
var body: some View {
Chart(
self.bins, id: \.index
) { element in
RectangleMark(
x: .value(
"Flipper Length",
element.xDataRange
),
y: .value(
"Bill Length",
element.yDataRange
),
width: .ratio(1),
height: .ratio(1)
)
.foregroundStyle(
by: .value(
"Frequency",
element.frequency
)
)
}
.chartXScale(
domain: .automatic(
includesZero: false
)
)
.chartYScale(
domain: .automatic(
includesZero: false
)
)
.chartForegroundStyleScale(
range: Gradient(
colors: [
Color.red.opacity(0.1),
Color.yellow
]
)
)
}
}

While exploring this, I did also tried using BarMark
instead of or RectangleMark
by varying the width and height and I was able to create some interesting effects. These could be very useful if animating between different plotted attributes, but that will need to wait for a follow up post.

You can find the code for this post in our GitHub project, this includes the code to download and parse the CSV file.