In stellar dynamics, direct N-body codes are generally used due to their high accuracy in orbital integration. These codes consider all pairwise interactions and are thus very slow. On the other hand, N-body methods used in cosmology (such as Barnes–Hut tree) are much faster per particle, but they are usually unsuitable for star clusters. The advantage of using a direct code is diminished when parts of the stellar system (or all of it) have very long relaxation times. This is the case in galactic nuclei, dwarf galaxies, and other astrophysical systems. The self-consistent field (SCF) method is a way to estimate the gravitational field through a series of smooth potential terms. Based on it, I wrote a collisionless N-body code called ETICS. This code is parallel (using MPI) and GPU-accelerated.
It was a challenge to implement the SCF method on GPU efficiently. Special memory restrictions meant that algorithms designed for CPU, specifically to calculate spherical harmonics, would not work, and had to be devised from scratch using different recursion relations and order of loops.
As a follow-up project, I wrote GRAPite, a software layer that essentially transformed the φgrape N-body code into a hybrid of direct and SCF methods. This allows collisional dynamics to be followed accurately in parts of the stellar system where it is needed, allowing the rest of the system to be integrated much faster using SCF. The main challenge here was to wrap the hybridization mechanism in such a way that φgrape was left almost unmodified. A second issue was calculating the jerk in the SCF part of the code, which was done with divided differences.
Reference (for ETICS): Meiron et al. (2014)