GPU-accelerated Calm Water Resistance In Simcenter STAR-CCM+ 2602

In Simcenter STAR-CCM+ 2602, Siemens has expanded general-purpose GPU coverage so that a larger part of the solver stack can execute on the GPU. Please refer to previous blog article for an detailed walk through of the latest implementations. The idea behind general-purpose computing on graphics processing units (GPGPU computation) is that you assign computations that are traditionally solved on a CPU. Most, but only certain solvers in Simcenter STAR-CCM+ are compatible with running on GPGPUs. If you select a solver that is not GPGPU-compatible, the entire simulation runs on the CPU instead. That changes the practical question from “Can I run on GPUs?” to “When does a GPU workflow actually pay off for calm water resistance?”

The key limitation: DFBI Morphing runs on CPU

For most resistance predictions we want free sinkage and trim. In the VTT template this is typically handled with DFBI Morphing. The problem is that DFBI Morphing runs entirely on the CPU. As a result, part of the simulation cannot benefit from GPU acceleration. If that CPU-only portion becomes large enough, it can dominate the runtime and reduce the value of the GPU.

To understand how much this matters, the goal of my testing was simple: Quantify how much DFBI Morphing slows down a simulation when the rest of the solver runs on the GPU.

fullGPGPU comparison — Wave pattern of the calm water KCS test case. Simulation on CPU compared with simulation on GPU

What Siemens’ benchmark shows (fixed ship)

Siemens’ published benchmark for a fixed ship case utilizing the VTT template for calm water resistance calculations of the KCS benchmark case with approximately 28 million cells.

In this example, the GPU run is about 19% slower in wall time, but it consumes around 70% less energy. However, this benchmark uses a fixed hull. That leaves two practical questions unanswered.

First, calm water resistance predictions normally include free sinkage and trim, which means enabling DFBI Morphing. That introduces CPU-only work that could significantly affect performance.

Second, not everyone has access to two A100 GPUs. Many of us will run these simulations on a workstation with more modest hardware. That is why we utilize an NVIDIA L40S 46 GB with CUDA version 12.4 for simulations shown in this blog.

A practical idea: a two-stage workflow

A natural idea is to treat calm water resistance as a two-stage convergence problem.

Stage A – GPU phase (fixed hull)

Run the simulation with a fixed hull long enough for the flow field and wave pattern to stabilize. The goal is to perform the bulk of the iterations while the solver runs efficiently on the GPU.

Stage B – CPU phase (DFBI Morphing)

Once the hydrodynamics are largely converged, enable DFBI Morphing to solve for free sinkage and trim. A shorter follow-up run then converges the motion and final forces.

In theory, this keeps most of the heavy computation in the GPU-accelerated regime.

The practical constraint

Unfortunately, there is a complication.

DFBI Morphing cannot be enabled mid-run. It must be initialized at t = 0. That means the clean “GPU first, then activate DFBI” strategy is not directly possible.

The practical workaround is slightly different:

Initialize DFBI Morphing at the start of the simulation.
Freeze the 6-DOF Solver and Mesh Morpher during the early iterations.
Allow the flow field and wave pattern to converge on the GPU.
Unfreeze 6-DOF solvers later to solve for free sinkage and trim.

In principle, this keeps the early part of the simulation mostly GPU-driven while delaying the expensive motion solution.

What actually happens in practice

In testing, this workaround only provided a modest improvement.

Freezing DFBI motion does not remove the DFBI overhead entirely. Even when the motion is frozen, parts of the solver still run on the CPU. As a result, the performance difference is limited.

In my case:

DFBI Morphing active: 2.32 h
DFBI initialized but frozen: 2.07 h

This is roughly a 10% speed improvement.

The results are summarized in figure above which compares both the predicted resistance and the total solver runtime across several configurations.

The first thing to notice is that the predicted resistance remains fairly consistent for most configurations. Except for only two cases (either fixed or frozen 6-DOF solver) simulated results fall within roughly 1% of the reference value.

The case using DFBI with translation and rotation calculates resistance values very close to the morphing case. This indicates that the main hydrodynamic behaviour is still captured even without mesh morphing.

The runtime results reveal a much larger variation.

The GPU fixed-hull case is by far the fastest configuration. Removing DFBI Morphing allows the solver to run almost entirely on the GPU, which explains the dramatic reduction in runtime.

When DFBI Morphing is initialized but the motion is frozen, the runtime increases significantly but is still slightly faster than running with fully active DFBI. In this case the improvement is modest, roughly 10%, which confirms that freezing the motion does not eliminate the CPU-side overhead associated with Morphing.

The CPU-only case (32 cores) performs similarly to the GPU + DFBI Morphing configuration. This highlights an important point: once Morphing is active, the CPU portion of the workflow becomes large enough that the GPU advantage is reduced.

Finally, the DFBI translation and rotation case is extremely fast, completing in about 0.12 h, which is only 40% of the solver time using 32 cores! Because this setup avoids Morphing, the solver can remain almost entirely GPU-accelerated.

What this means for GPU workflows

Taken together, the results reinforce a key point: the performance bottleneck is not the 6-DOF solver itself but the presence of DFBI Morphing in the model. This is why freezing the motion only produces a small speed improvement. The Morphing infrastructure is still active, so a significant part of the solver continues to run on the CPU.

For practical workflows, this means that the most efficient approach is to change the VTT template to utilize DFBI Rotation and Translation motion instead of DFBI Morphing, while keeping in mind that with lager trim and pitch we are losing the advantage of keeping the free-surface aligned and well resolved in the far field of the hull.

The Author

Florian Vesting, PhD
Contact: support@volupe.com
+46 768 51 23 46