Filters
Results 1 - 1 of 1
Results 1 - 1 of 1.
Search took: 0.018 seconds
AbstractAbstract
[en] Monte Carlo (MC) neutron transport simulations are widely used in the nuclear community to perform reference calculations with minimal approximations. The conventional MC method has a slow convergence according to the law of large numbers, which makes simulations computationally expensive. Cross section computation has been identified as the major performance bottleneck for MC neutron code. Typically, cross section data are precalculated and stored into memory before simulations for each nuclide, thus during the simulation, only table lookups are required to retrieve data from memory and the compute cost is trivial. We implemented and optimized a large collection of lookup algorithms in order to accelerate this data retrieving process. Results show that significant speedup can be achieved over the conventional binary search on both CPU and MIC in unit tests other than real case simulations. Using vectorization instructions has been proved effective on many-core architecture due to its 512-bit vector units; on CPU this improvement is limited by a smaller register size. Further optimization like memory reduction turns out to be very important since it largely improves computing performance. As can be imagined, all proposals of energy lookup are totally memory-bound where computing units does little things but only waiting for data. In another word, computing capability of modern architectures are largely wasted. Another major issue of energy lookup is that the memory requirement is huge: cross section data in one temperature for up to 400 nuclides involved in a real case simulation requires nearly 1 GB memory space, which makes simulations with several thousand temperatures infeasible to carry out with current computer systems. In order to solve the problem relevant to energy lookup, we begin to investigate another on-the-fly cross section proposal called reconstruction. The basic idea behind the reconstruction, is to do the Doppler broadening (performing a convolution integral) computation of cross sections on-the-fly, each time a cross section is needed, with a formulation close to standard neutron cross section libraries, and based on the same amount of data. The reconstruction converts the problem from memory-bound to compute-bound: only several variables for each resonance are required instead of the conventional pointwise table covering the entire resolved resonance region. Though memory space is largely reduced, this method is really time-consuming. After a series of optimizations, results show that the reconstruction kernel benefits well from vectorization and can achieve 1806 GFLOPS (single precision) on a Knights Landing 7250, which represents 67% of its effective peak performance. Even if optimization efforts on reconstruction significantly improve the FLOP usage, this on-the-fly calculation is still slower than the conventional lookup method. Under this situation, we begin to port the code on GPGPU to exploit potential higher performance as well as higher FLOP usage. On the other hand, another evaluation has been planned to compare lookup and reconstruction in terms of power consumption: with the help of hardware and software energy measurement support, we expect to find a compromising solution between performance and energy consumption in order to face the 'power wall' challenge along with hardware evolution. (author)
[fr]
L'acces aux donnees de base, que sont les sections efficaces, constitue le principal goulot d'etranglement aux performances dans la resolution des equations du transport neutronique par methode Monte Carlo (MC). Ces sections efficaces caracterisent les probabilites de collisions des neutrons avec les nucleides qui composent le materiau traverse. Elles sont propres a chaque nucleide et dependent de l'energie du neutron incident et de la temperature du materiau. Les codes de reference en MC chargent ces donnees en memoire a l'ensemble des temperatures intervenant dans le systeme et utilisent un algorithme de recherche binaire dans les tables stockant les sections. Sur les architectures many-coeurs (typiquement Intel MIC), ces methodes sont dramatiquement inefficaces du fait des acces aleatoires a la memoire qui ne permettent pas de profiter des differents niveaux de cache memoire et du manque de vectorisation de ces algorithmes.Tout le travail de la these a consiste, dans une premiere partie, a trouver des alternatives a cet algorithme de base en proposant le meilleur compromis performances/occupation memoire qui tire parti des specificites du MIC (multithreading et vectorisation). Dans un deuxieme temps, nous sommes partis sur une approche radicalement opposee, approche dans laquelle les donnees ne sont pas stockees en memoire, mais calculees a la volee. Toute une serie d'optimisations de l'algorithme, des structures de donnees, vectorisation, deroulement de boucles et influence de la precision de representation des donnees, ont permis d'obtenir des gains considerables par rapport a l'implementation initiale.En fin de compte, une comparaison a ete effectue entre les deux approches (donnees en memoire et donnees calculees a la volee) pour finalement proposer le meilleur compromis en termes de performance/occupation memoire. Au-dela de l'application ciblee (le transport MC), le travail realise est egalement une etude qui peut se generaliser sur la facon de transformer un probleme initialement limite par la latence memoire ('memory latency bound') en un probleme qui sature le processeur ('CPU-bound') et permet de tirer parti des architectures many-coeursOriginal Title
Optimisation du code Monte Carlo neutronique a l'aide d'accelerateurs de calculs
Primary Subject
Source
14 Dec 2017; 140 p; 121 refs.; Available from the INIS Liaison Officer for France, see the INIS website for current contact and E-mail addresses; Informatique
Record Type
Report
Literature Type
Thesis/Dissertation
Report Number
Country of publication
Reference NumberReference Number
INIS VolumeINIS Volume
INIS IssueINIS Issue