Custom function evaluation on the GPU with MPSGraph
MPSGraph enables us to run math on the GPU without needing to write C++ based Metal shaders. We can combine logical and arithmetic nodes in a graph that the framework then stitches together.
In this post we will go over the steps needed to evaluate a polynomial function of variable size rapidly on the GPU. This can be easily modified for other arithmetic functions. In a future post I plan on documenting how we can use control flow nodes to provide GPU based solvers for finding approximate numerical solutions to computationally complex functions.
# Building the graph
We start by building the graph. Here are the inputs to our function. The exponents
and coefficients
will be used in the function f(location) = coefficients[0]*(location^ exponents[0]) + coefficients[1]*(location^exponents[1]) + ...
.
import Metal
import MetalPerformanceShaders
import MetalPerformanceShadersGraph
let graph = MPSGraph()
// Create placeholder for the exponents of polynomial
let exponents = graph.placeholder(
shape: nil,
dataType: .float32,
name: "exponents"
)
// Create placeholder coefficients of the polynomial
let coefficients = graph.placeholder(
shape: nil,
dataType: .float32,
name: "coefficients"
)
// Create placeholder for the x value that we will evaluate
let location = graph.placeholder(
shape: nil,
dataType: .float32,
name: "location"
)
At this point we are not constraining the shape of the values, since we would like to be able to reuse this code for many different sizes of the polynomial.
The first thing we need to do is raise the location
to the power in the exponents
. The framework provides power(::name) method that creates a node to do it.
let locationHat = graph.power(
location,
exponents,
name: "power"
)
Since location
is a single value, this node will return a new vector of the same length as the exponents, raising the location to the power of each corresponding component.
Next we create a multiplication(::name) node combining the coefficients
vector with the pairwise values from the locationHat
vector.
let value = graph.reductionSum(
with: values,
axes: nil,
name: "sum"
)
Then we'll need to apply a reductionSum(with:axis:name) to this output to resolve a single value as our output.
# Preparing data
There are multiple solutions for how to prepare data. Depending on how frequently you need to change the data, you might need to consider a more involved method. For simplicity here we will create the exponents and coefficients in Swift on the CPU and copy/share them with the GPU.
Let's consider a simple example of f(x) = 2 + 2x + 2x^2 + 2x^3
and define the values.
let exponentValues: [Float32] = [0, 1, 2, 3]
let coefficientValues: [Float32] = [2, 2, 2, 2]
With these values defined in Swift we need to copy them to the GPU. Create an extension on Array
so that we can produce a matching MTLBuffer
on a target metal device.
extension Array where Element == Float32 {
func asBuffer(on device: MTLDevice) -> MTLBuffer? {
self.withContiguousStorageIfAvailable { pointer -> MTLBuffer? in
guard let baseAddress = pointer.baseAddress else {
return nil
}
return device.makeBuffer(
bytes: baseAddress,
length: pointer.count * MemoryLayout<Float32>.size,
options: MTLResourceOptions.storageModeShared
)
} ?? nil
}
}
This buffer lacks information about the data type that is stored within it. We need to create an MPSVector
that wraps the buffer describing its value type.
extension Array where Element == Float32 {
func asVector(on device: MTLDevice) -> MPSVector? {
guard let buffer = self.asBuffer(on: device) else {
return nil
}
return MPSVector(
buffer: buffer,
descriptor: MPSVectorDescriptor(
length: self.count,
dataType: MPSDataType.float32
)
)
}
}
We can now create instances of MPSGraphTensorData
that the graph takes as input.
// Get the GPU
let device = MTLCreateSystemDefaultDevice()!
let exponentsTensorData = MPSGraphTensorData(
exponentValues.asVector(on: device)!
)
let coefficientsTensorData = MPSGraphTensorData(
coefficientsValues.asVector(on: device)!
)
We still need to create one more input: the value of x
that we want to evaluate. For simplicity we are just going to evaluate a single location.
let locationTensorData = MPSGraphTensorData(
MPSNDArray(device: device, scalar: 2.0)
)
# Running the graph
For running the graph we are going to use the simple run() method. It will compile your graph into a metal shader on the first call and then dispatch it to compute.
let results = graph.run(
feeds: [
location : locationTensorData,
exponents: exponentsTensorData,
coefficients: coefficientsTensorData
],
targetTensors: [value],
targetOperations: nil
)
You can instead pre-compile it and run it as part of a larger metal compute shader pipeline. This could be useful if you need to pre-compute/post-compute some of the data using other Metal shaders.
To retrieve the computed result from our graph we need to copy the value back from the GPU to the CPU, so that we can use the value within our regular Swift code.
// We can index the results returned by the `run()` method
// using the `value` tensor.
let result = results[value]
let outputNDArray = result?.mpsndarray()
// Allocate a local array of the correct size
// so we can copy the values in.
var outputValues: [Float32] = [0]
// Copy all the values from the Metal array to the Swift array.
outputNDArray?.readBytes(&outputValues, strideBytes: nil)
The outputValues
array will now contain the function evaluation. This method of evaluating polynomials is only worth it if you have a high order polynomial with 100s if not 1000s of terms.
# Evaluating multiple locations at once
We can modify the above code a little bit to enable us to evaluate many locations at once.
Firstly, we need to create a matrix of our locations with the values saved in the first column.
func asMatrix(on device: MTLDevice) -> MPSMatrix? {
guard let buffer = self.asBuffer(on: device) else {
return nil
}
return MPSMatrix(
buffer: buffer,
descriptor: .init(
rows: self.count,
columns: 1, rowBytes: MemoryLayout<Float32>.size,
dataType: .float32
)
)
}
We also need to use this when we declare the locationTensorData
.
let locationValues: [Float32] = [-1, 0, 1]
let locationTensorData = MPSGraphTensorData(
locationValues.asMatrix(on: device)!
)
Then we update the reductionSum
to correctly sum up the values for each location rather than all the values over all locations.
let value = graph.reductionSum(
with: values,
axes: [NSNumber(1)],
name: "sum"
)
And finally, when the graph finishes we need to ensure that outputValues
array is large enough to have a result for each input location.
var outputValues: [Float32] = [0, 0, 0]
With these changes we can evaluate many points at once on the GPU in parallel.