ATLAS Scientific Computation on SMU ManeFrame

Welcome!

“ManeFrame” is the name of the SMU Scientific Computing Cluster, housed in the Data Center located at the southern end of the SMU Campus in Dallas, TX. ManeFrame runs SLC6 and provides a standard ATLAS Tier 3 computing environment to users who request it (see “First-Time Setup of ATLAS Environment” below), including all ATLAS software via CVMFS and the ability to transfer data to/from the GRID via common tools like RUCIO. The basic specs of the current (July, 2015) ManeFrame system are as follows:

Interactive Login Systems: 3 general-use interactive login nodes are available (mflogin01.hpc.smu.edu, mflogin02, and mflogin03)
Batch and Interactive Computing Nodes - “Normal Memory”: 1084 - 8 CPU cores – 24GB per node - Intel(R) Xeon(R) CPU X5560 @ 2.80GHz
Batch and Interactive Computing Nodes - “Big Memory”: 20 – 8 CPU cores – 192GB per node - Intel(R) Xeon(R) CPU X5560 @ 2.80GHz
Resource Management System - SLURM – SLURM allocates access to resources (computing nodes) to users for some duration of time so they can perform work; it provides a framework for starting, executing, and monitoring work (typically a parallel job) on a set of allocated nodes; it arbitrates contention for resources by managing a queue of pending work. Users will have access to various queues that are configured for different purposes (e.g. parallel computing jobs, serial jobs, interactive jobs, high-memory jobs, etc.) To get started, see “SLURM and Resource Management” below.
Filesystem - Shared LUSTRE filesystem, 1.2PB of total scratch storage, not backed up. LUSTRE system is configured as GRID endpoint (SMU2_LOCALGROUPDISK). User home directories are stored on NFS, and while limited in size (to a maximum of 3.1TB) they are backed up routinely.

Access - General Issues

You can contact Stephen Sekula (ssekula@smu.edu) if you are interested in conducting your ATLAS work on the SMU ManeFrame system. This request will be considered and, if approved, passed along to the Scientific Computing Administrator for ManeFrame, Amit Kumar (ahkumar@smu.edu).

Access by External Users

Access Policy

For now (July - December, 2015), SMU is hosting a few “ATLAS power users” to conduct their work on ManeFrame and provide quick access to resources for fast turnaround of time-sensitive results; the work these power users are conducting is synergistic with the technical, upgrade, and physics interests of the SMU ATLAS Group. We are exploring other models by which we might share our Tier 3 resources with ATLAS collaborators, and expect to make decisions about those based on this trial period with power users.

Mailing List

External ManeFrame users should request to be added to the mailing list atlas-maneframe-external (hosted at SMU) if they are not already members. If external ManeFrame users have any questions or concerns, they should feel free to post to this list.

Acknowledgement by External Users

For any user outside the institution, we have the following policy for acknowledging use of ManeFrame.

Acknowledgement Policy

During the period July 2015 - December 2016, we request that any external collaborator who uses ManeFrame please acknowledge your use of the system in any results that are shown internally within ATLAS. We suggest a line on the concluding slide of any presentation or in an internal ATLAS supporting document: “We gratefully acknowledge SMU's Center for Scientific Computation for their support and for the use of the SMU ManeFrame Tier 3 ATLAS System.”

We do not ask for acknowledgement in public papers, conference notes, or presentations; only in internal documents, where permitted.

If you produce any internal ATLAS results using ManeFrame and acknowledge the support from our system, please send Steve Sekula (ssekula@smu.edu) a link to the presentation or note. This will help us to assess the pattern of usage and any impact this usage is having on broader ATLAS activities. We will report such achievements to our University leadership at SMU, which will help them to understand the impact of ManeFrame.

Usage

Logging Into ManeFrame

For now, we request that external (non-SMU) ATLAS users log into the following machine:

mflogin03.hpc.smu.edu

ssh <USERNAME>@mflogin03.hpc.smu.edu

SMU users are encouraged to use any available interactive login machine.

Interactive machines are NOT to be used to any serious work or even serious testing. Instead, testing and other heavy work prior to submitting batch jobs should be run using SLURM's “interactive” queue. See the supporting materials below for the SLURM resource management system to learn how to do this.

First-Time Setup of ATLAS Environment

To access standard ATLAS tools, you need to perform a one-time setup of your environment. Edit your .bash_profile login script and make sure it looks similar to the following:

# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# ATLAS Setup
source /grid/software/ATLASLocalRootBase/setup.sh
alias setupATLAS='source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh'

Logout of ManeFrame, and then log back in. The next time (and all subsequent times) you log in, you should see something like this:

...Type localSetupAGIS to setup AGIS
...Type localSetupAtlantis to setup Atlantis
...Type localSetupDQ2Client to use DQ2 Client
...Type localSetupEIClient to setup EIClient
...Type localSetupEmi to use emi
...Type localSetupFAX to use FAX
...Type localSetupGanga to use Ganga
...Type localSetupGcc to use alternate gcc
...Type localSetupPacman to use Pacman
...Type localSetupPandaClient to use Panda Client
...Type localSetupPyAMI to setup pyAMI
...Type localSetupPoD to setup Proof-on-Demand
...Type localSetupROOT to setup (standalone) ROOT
...Type localSetupRucioClients to setup rucio-clients
...Type localSetupSFT to setup SFT packages
...Type localSetupXRootD to setup XRootD
...Type showVersions to show versions of installed software
...Type asetup to setup a release (changeASetup to change asetup version)
...Type rcSetup to setup an ASG release (changeRCSetup to change rcSetup ver.)
...Type diagnostics for diagnostic tools
...Type helpMe for more help
...Type printMenu to show this menu

19 Jun 2015 
  You are encouraged to use rucio instead of DQ2 clients, type
     localSetupRucioClients.
     For more info: https://twiki.cern.ch/twiki/bin/view/AtlasComputing/RucioClientsHowTo


02 Jul 2015 
   Standalone ROOT version 6.02.12 available for Linux (gcc48) and for MacOSX.

   If there are no problems, these will be made the default aftr 9 Jul 2015.

   (Usage: showVersions root).  


02 Jul 2015 
  New version of emi is available and will be the default next week.                                                                                                                             
  If you want to try it, simply do (instead of setupATLAS)                                                                                                                                       
    setupATLAS --test=testEmi                                                                                                                                                                    
                                                                                                                                                                                                 
[<USERNAME>@mflogin02 ~]$

You should now be able to run all standard ATLAS tools for setting up releases, checking out RootCore and AnalysisBase packages, running ATHENA, transferring data using RUCIO, compiling, running ROOT-based programs, etc.

Disk Storage Information

Your home directory is backed up but provides a limited amount of storage space. You should primarily store “irreplaceable” things in your home directory, like original code. Your home area is limited to a total of 3TB.

You also have access to “scratch” space totalling 1.2PB. This is a shared resource. You can find your scratch area here:

/scratch/users/<USERNAME>

Note that there is already a subdirectory in this location:

/scratch/users/<USERNAME>/_small

Anything written to _small/ goes first to the limited amount of fast disks that makeup about one-third of the shared filesystem. If that area is filled, data is automatically and transparently written to the much larger pool of slower disks. You should feel free to simply direct output to the _small area all the time, and it will be intelligently stored on the best possible disk resources available that that time.

A Comment on I/O speeds and LUSTRE

The very first time you have to read a file from LUSTRE, it can appear to be slow (e.g. 1kHz read speeds from a typical DAOD). However, this process essentially caches the file for faster access (for a short period of time - e.g. a day or so) the second time. So you will notice that first first time you have to read a few thousand DAOD files, the jobs run at about 1kHz. However, the second time you run on the same files, you may notice a 5-10 fold increase in read speed. This is normal, and typical of a filesystem like LUSTRE.

SLURM and Resource Management

We have an excellent tutorial on resource management and job submission via SLURM here:

SMU High Performance Computing Workshop Lesson Material - Session 5 - by Prof. Dan Reynolds, SMU Mathematics

General documentation on SLURM can be found on the project's website:

SLURM Home Website
SLURM Documentation (including a “quick start” guide for users)

Running tests of your code

The interactive frontend machines are shared resources with light hardware specifications and are not intended for any serious code testing. This is what our batch worker machines are to be used for, and you can even run on them interactively via SLURM!

To run a test of your code just like you would straight from the command line on the interactive frontend machine, simply do this:

srun -o test.log ./MyAnalysis.exe <ARGUMENTS> &

You can then tail the log file,

tail -f test.log

and watch your code run. It's not running on the interactive login machine, but rather in the “interactive” queue of the batch system.

What queue(s) to use?

If you are running a “standard” ATLAS xAOD analysis executable (e.g. a CxAOD production, or a program that reads CxAODs and generates output from them), you probably should use the “serial” queue. It's tuned for that style of jobs.

If you are not sure which queue to use, contact Stephen Sekula (ssekula@smu.edu) and Amit Kumar (ahkumar@smu.edu), describe the kind of work you are doing, and they can help you find the right queue or perhaps even create a new one to suite your needs.

maneframe-run: a simple script to launch jobs

Just as prun (PANDA run) allows you, with just a few command-line options, to launch a GRID producton, maneframe-run allows the same. It's a simple Python-based wrapper script that collects input using options similar to prun and constructs SLURM batch configuration files, launching the jobs when it is done making the job configuration directories. Here is the usage information and how to find the application

~sekula/bin/maneframe-run

Usage:

maneframe-run --inDS <input directory> --outDS <output directory> --exec <job parameters> --email <email address> [-m/--maxNFilesPerJob <number>] [-t/--maxTimePerJob <HH:MM:SS>] [-h/--help] 
 
   -h/--help              Print help information (this info) 
   -i/--inDS              The location of a directory containing ROOT files to process [REQUIRED] 
   -o/--outDS             The location to which to write all output, including the SLURM job file and logs [REQUIRED] 
   -x/--exec              The program that should be run, along with its options [REQUIRED] 
   -e/--email             The destination email address for SLURM failure notifications [REQUIRED] 
   -m/--maxNFilesPerJob   The maximum number of ROOT files for each submitted job (Default: 5) 
   -t/--maxTimePerJob     The maximum allowed time for a job to finish execution (Default: 01:00:00, or 1 hour) 


EXAMPLE:

maneframe-run --inDS /scratch/users/me/_small/mc15_13TeV.410000/ --outDS mc15_13TeV.410000_processed/ --exec "MyAnalysis input.txt" --email me@someplace.com --maxNFilesPerJob 10 --maxTimePerJob "00:30:00"

So an example bash submission script might look like:

#!/bin/bash

INPUT=ThisIsYourInputFileDirectoryPATH
OUTPUT=/scratch/users/<USERNAME>/_small/YourProjectName/Processed/
EXECUTABLE="testRun submitDir input.txt"
EMAIL="YourEmail@smu.edu"
MAXFILES=10
MAXTIME=00:30:00

~sekula/bin/maneframe-run -i $INPUT -o $OUTPUT -x "$EXECUTABLE" -e $EMAIL -m $MAXFILES -t "$MAXTIME"

Example Code: Loading and Processing a list of ROOT files

maneframe-run creates a file called input.txt that contains a comma-separated list of ROOT files to be processed by a given batch job. You can enable your code to handle such a file fairly easily. Here, we assume you have created a <PACKAGE>/utils/MyAnalysis.cxx file following the xAOD Tutorial Page. You can add to this code to the example to allow it to open and parse the input.txt file:

  #include <fstream>
  // This is an expansion of the example code from the xAOD Tutorial Page for ATLAS:
  //
  // use SampleHandler to scan all of the subdirectories of a directory for particular MC single file:                         
  std::string pathstring = "$ALRB_TutorialData/r6630/";
  if (argc > 2) {
    pathstring = static_cast<std::string>(argv[2]);
  }
  const char* inputFilePath = gSystem->ExpandPathName (pathstring.c_str());

  if (TString(inputFilePath).Contains("input.txt")) {
    // This is an input.txt file - it contains a CSV list of ROOT files.                                                       
    // Parse it and add the root files to the job                                                                              
    SH::SampleLocal* sample = new SH::SampleLocal( "UNIQUE_LABEL" );
  
    std::fstream fs;
    fs.open(inputFilePath);

    if (fs.good() && fs.is_open()) {
      while (!fs.eof()) {
        std::string line;
        getline( fs, line );

        TObjArray* rootfilelist = TString(line).Tokenize(",");

        for (Int_t i = 0; i < rootfilelist->GetEntries(); i++) {
          TString sampleName = ((TObjString*)rootfilelist->At(i))->GetString();
          sample->add(sampleName.Data());
        }
      }
    }

    sh.add( sample );

  } else {
   // Handle the input file path using SampleHandler's scanning capabilities, as in the tutorial.
  }

so that (keeping with the tutorial's example) the final file content should be along the lines of

#include "xAODRootAccess/Init.h"
#include "SampleHandler/SampleHandler.h"
#include "SampleHandler/ScanDir.h"
#include "SampleHandler/ToolsDiscovery.h"
#include "EventLoop/Job.h"
#include "EventLoop/DirectDriver.h"
#include "SampleHandler/DiskListLocal.h"
#include "SampleHandler/SampleLocal.h"
#include <EventLoop/OutputStream.h>
#include <EventLoopAlgs/NTupleSvc.h>
#include <TSystem.h>
#include <fstream>

#include "MyAnalysis/MyxAODAnalysis.h"

int main (int argc, char *argv[]) {
  // Take the submit directory from the input if provided:
  std::string submitDir = "submitDir";

  if (argc > 1) submitDir = argv[1];

  // Set up the job for xAOD access:
  xAOD::Init().ignore();

  // Construct the samples to run on:
  SH::SampleHandler sh;

  // use SampleHandler to scan all of the subdirectories of a directory for particular MC single file:
  std::string pathstring = "$ALRB_TutorialData/r6630/";

  if (argc > 2) {
    pathstring = static_cast<std::string>(argv[2]);
  }
  const char *inputFilePath = gSystem->ExpandPathName(pathstring.c_str());

  if (TString(inputFilePath).Contains("input.txt")) {
    // This is an input.txt file - it contains a CSV list of ROOT files.
    // Parse it and add the root files to the job
    SH::SampleLocal *sample = new SH::SampleLocal("EnterNameOfYourOutputFileHere");

    std::fstream fs;
    fs.open(inputFilePath);

    if (fs.good() && fs.is_open()) {
      while (!fs.eof()) {
        std::string line;
        getline(fs, line);

        TObjArray *rootfilelist = TString(line).Tokenize(",");

        for (Int_t i = 0; i < rootfilelist->GetEntries(); i++) {
          TString sampleName = ((TObjString*)rootfilelist->At(i))->GetString();
          sample->add(sampleName.Data());
        }
      }
    }

    sh.add(sample);
  }
  else {
    SH::ScanDir().filePattern("GetThisFileNameFromTheTurotialOrYourOwnFiles").scan(
      sh, inputFilePath);
  }

  // Set the name of the input TTree. It's always "CollectionTree" for xAOD files.
  sh.setMetaString("nc_tree", "CollectionTree");

  // Print what we found:
  sh.print();

  // Create an EventLoop job:
  EL::Job job;
  job.sampleHandler(sh);
  job.options()->setDouble(EL::Job::optMaxEvents, 500); // limit the number of events run over

  // Create an output and an assocaited ntuple
  EL::OutputStream output("myOutput");
  job.outputAdd(output);
  EL::NTupleSvc *ntuple = new EL::NTupleSvc("myOutput");
  job.algsAdd(ntuple);

  // Add our analysis to the job:
  MyxAODAnalysis* alg = new MyxAODAnalysis();
  job.algsAdd(alg);

  alg->outputName = "myOutput"; // name of the output of the algorithm

  // Run the job using the local/direct driver:
  EL::DirectDriver driver;
  driver.submit(job, submitDir); // This will be obsolete when a SLURM driver is available

  return 0;
} // main

Combining Output Files

When maneframe-run submits your jobs to ManeFrame it will do so in batches. Each of these will run on ManeFrame and each batch will return output to the OUTPUT directory you passed to maneframe-run. You will now want to combine these outputs to be able to look at the combined results in a single file. The following script will use ROOT's hadd utility to produce a single root file (location and name given by the user) as output:

#!/bin/bash

SOURCE=()
TARGET=~This/Is/The/PATH/To/Your/Desired/OutputFile.root
i=0

# find all the .root files of interest
while IFS= read -r file; do
	SOURCE[i]=$file
	((i++))
done < <(find /scratch/users/$USER/ -print | grep -i '.*data.*/' | sort)

hadd $TARGET ${SOURCE[*]}

Using the EventLoop (EL) SlurmDriver

As of about November of 2016, AnalysisBase releases (after 2.4.21) contained a new driver in the EventLoop package, SlurmDriver. Like the other drivers for running code locally or on other batch management systems, the EL::SlurmDriver allows you to execute your analysis binary and have it submit and manage jobs on the SLURM queue management system. This frees you from having to write sbatch scripts or your own independent scripts for submitting Framework-bases xAOD analysis jobs.

The present EL::SlurmDriver code submits one job per ROOT file processed by your code. To setup the driver in, for instance, your standalone executable main() function declaration:

  EL::SlurmDriver driver;
  driver.SetJobName("MyAnalysis");
  driver.SetAccount("default");
  driver.SetPartition("serial");
  driver.SetRunTime("00:60:00");
  driver.SetMemory("2G");
  driver.SetConstrain("");

When you create an EL::SlurmDriver, you MUST configure all of its options. You can leave some of them blank, but all of them have to be defaulted to something. The above are examples of setting the following options:

The Job Name (appears in squeue and labels your individual jobs)
Account - this is always “default”, unless a special account has been setup to allow you to use a protected partition on SLURM
Partition - the name of the partition where jobs will run, e.g. serial or development, etc.
RunTime - the maximum running time of the job. Here, the example code uses 30 minutes in HH:MM:SS format
Memory - the memory limit for your job
Constrain - this will constrain your job to specific resources, if there are multiple resource options in the partition.

Known Issues

Clock Skew Warnings During Builds

There is a known effect that, while doing builds, you get a lot of “clock skew” warnings. This is under investigation. It appears to be harmless

Questions or Problems?

Please contact the SMU scientific computing team with questions or comments. Learn more: http://faculty.smu.edu/csc/documentation/usage.html

SMU Physics Department Wiki

Table of Contents