Contents

Hardware Architecture

Code was run on Harvard’s Odyssey cluster. Odyssey is a large scale heterogeneous high performance computing cluster used by Harvard University. Refer to the FAS RC website for more information.

Hardware Specification

To replicate the benchmarking results, please use the following specifications.

Software

To replicate benchmarking results, please make sure to, at the minimum, allow for the below software dependencies.

Dependencies

Main Programming Language: Fortran90

Secondary Programming Language: Python 2.7.x or 3.6.x (used for I/O, plotting and automating bash scripts only)

Parallel Implementation:

Odyssey modules: intel/17.0.2-fasrc01 openmpi/2.1.0-fasrc01 fftw/3.3.5-fasrc01

Serial Implementation:

Odyssey modules: gcc/4.8.2-fasrc01 fftw/3.3.5-fasrc01

Libraries used:

Compiler

Parallel Implementation:

Serial Implementation:

Installation

ssh username@login.rc.fas.harvard.edu
module purge
module load intel/17.0.2-fasrc01 
module load openmpi/2.1.0-fasrc01 
module load fftw/3.3.5-fasrc01
git clone https://github.com/toledy/ParallelRayleighBenardConvection.git

Note: In order to clone a GitHub repository on Odyssey it is strongly encouraged to use the SSH protocol, in conjunction with a personal, Odyssey public SSH key that needs to be included in your Github account. Instructions on how to do this is available here.

vim global.f90
character(len=84) :: vtkloc="PATH"
character(len=82) :: floc="PATH"
make all
vim input.data
vim sbatchmpi.run
sbatch sbatchmpi.run
sacct

Summary information will be output to Time_loop.out and any error logs will be exported to Time_loop.err. Relevant output (VTK files and Nusselt numbers at different time steps) are exported to the vtkdata and fdata directories respectively, as specified in global.f90.

Input File Structure

Line # Column 1 Column 2 Column 3
1 $Pr$ Initial $\alpha$ $\alpha$ step
2 Initial $Ra$* # of $Ra$s to do $Ra$ increment multiplier
3 Time* Time step* Leave blank
4 $y$ at bottom $y$ at top Leave blank
5 Nx* Ny* Nz
6 x-refinement y-refinement z-refinement
7 Save to VTK* Save field output* Leave blank

We retained the same input file structure used by the serial code. Given that the code has been modified to be used only as a time integrator and the flowmap algorithm that finds steady solutions to the 2D Boussinesq equations has been removed not all of the input parameters are relevant for the parallel code. The input.data file in the GitHub repository contains the recommended default parameter settings.

*Parameters that we modified for testing purposes.

I/O

Separate files for each processor and timestep interval will be exported to the fdata directory, representing the field solutions for $T$ and $u_y$ at the specified timestep. The files can be combined using the CombineFields.py script in the scripts directory.

cd scripts
python CombineFields.py

Performance Analysis

Given the extensive amount of benchmarks run by (i) varying the domain size Nx/Ny, (ii) the number of nodes (-N), (iii) the number of processors (-n) and (iv) the number of threads (-c), a Python script was written to automate the BASH/SLURM process. In pseudocode, the script followed roughly the below approach.

import os
import sys
import time

num_nodes_to_try = [List of Nodes]
num_processes_to_try = [List of Processors] 
num_threads_to_try = [List of Threads]
nx_to_try = [List of Nx]
ny_to_try = [List of Ny] 

partition = "shared"

for N in num_nodes_to_try:
    for n in num_processes_to_try:
        for c in num_threads_to_try:
            for nx in nx_to_try:
                for ny in ny_to_try:
                    1. Open the input file
                    2. Check if benchmark has been run before
                    3. If not run before, set up appropriate SBATCH.run script
                    4. SBATCH SBATCH.run
                    5. Save data into relevant directories