User Tools

Site Tools


atlas_maneframe

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
atlas_maneframe [2015/07/05 09:08] – [SLURM and Resource Management] sekulaatlas_maneframe [2018/10/28 10:28] (current) sekula
Line 20: Line 20:
  
 For now (July - December, 2015), SMU is hosting a few "ATLAS power users" to conduct their work on ManeFrame and provide quick access to resources for fast turnaround of time-sensitive results; the work these power users are conducting is synergistic with the technical, upgrade, and physics interests of the SMU ATLAS Group. We are exploring other models by which we might share our Tier 3 resources with ATLAS collaborators, and expect to make decisions about those based on this trial period with power users. For now (July - December, 2015), SMU is hosting a few "ATLAS power users" to conduct their work on ManeFrame and provide quick access to resources for fast turnaround of time-sensitive results; the work these power users are conducting is synergistic with the technical, upgrade, and physics interests of the SMU ATLAS Group. We are exploring other models by which we might share our Tier 3 resources with ATLAS collaborators, and expect to make decisions about those based on this trial period with power users.
 +
 +**Mailing List**
 +
 +External ManeFrame users should request to be added to the mailing list ''atlas-maneframe-external'' (hosted at SMU) if they are not already members. If external ManeFrame users have any questions or concerns, they should feel free to post to this list.
  
 ===== Acknowledgement by External Users ===== ===== Acknowledgement by External Users =====
Line 27: Line 31:
 **Acknowledgement Policy**  **Acknowledgement Policy** 
  
-During the period July - December 2015, we request that any external collaborator who uses ManeFrame please acknowledge your use of the system in any results that are shown //internally within ATLAS//. We suggest a line on the concluding slide of any presentation or in an internal ATLAS supporting document: **"We gratefully acknowledge SMU's Center for Scientific Computation for their support and for the use of the SMU ManeFrame Tier 3 ATLAS System."**+During the period July 2015 - December 2016, we request that any external collaborator who uses ManeFrame please acknowledge your use of the system in any results that are shown //internally within ATLAS//. We suggest a line on the concluding slide of any presentation or in an internal ATLAS supporting document: **"We gratefully acknowledge SMU's Center for Scientific Computation for their support and for the use of the SMU ManeFrame Tier 3 ATLAS System."**
  
 We do not ask for acknowledgement in public papers, conference notes, or presentations; only in internal documents, where permitted. We do not ask for acknowledgement in public papers, conference notes, or presentations; only in internal documents, where permitted.
Line 186: Line 190:
  
 <code> <code>
-maneframe-run --inDS <input directory> --outDS <output directory> --exec <job parameters> --email <email address> [-m/--maxNFiledPerJob <number>] [-t/--maxTimePerJob <HH:MM:SS>] [-h/--help] +maneframe-run --inDS <input directory> --outDS <output directory> --exec <job parameters> --email <email address> [-m/--maxNFilesPerJob <number>] [-t/--maxTimePerJob <HH:MM:SS>] [-h/--help] 
    
    -h/--help              Print help information (this info)     -h/--help              Print help information (this info) 
Line 202: Line 206:
 </code> </code>
  
-===== Known Issues =====+So an example bash submission script might look like: 
 +<code> 
 +#!/bin/bash
  
-==== Clock Skew Warnings During Builds ====+INPUT=ThisIsYourInputFileDirectoryPATH 
 +OUTPUT=/scratch/users/<USERNAME>/_small/YourProjectName/Processed/ 
 +EXECUTABLE="testRun submitDir input.txt" 
 +EMAIL="YourEmail@smu.edu" 
 +MAXFILES=10 
 +MAXTIME=00:30:00
  
-There is a known effect that, while doing builds, you get a lot of "clock skewwarnings. This is under investigation. It appears to be harmless+~sekula/bin/maneframe-run -i $INPUT -o $OUTPUT -x "$EXECUTABLE-e $EMAIL -m $MAXFILES -t "$MAXTIME" 
 +</code>
  
 +=== Example Code: Loading and Processing a list of ROOT files ===
  
-===== Questions or Problems? =====+''maneframe-run'' creates a file called ''input.txt'' that contains a comma-separated list of ROOT files to be processed by a given batch job. You can enable your code to handle such a file fairly easily. Here, we assume you have created a ''<PACKAGE>/utils/MyAnalysis.cxx'' file following the [[https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/SoftwareTutorialxAODAnalysisInROOT#Alternative_Run_the_job_from_a_c|xAOD Tutorial Page]]. You can add to this code to the example to allow it to open and parse the input.txt file:
  
-Please email both of the following people with questions, comments, suggestions, or any other matters that arise:+<code> 
 +  #include <fstream> 
 +  // This is an expansion of the example code from the xAOD Tutorial Page for ATLAS: 
 +  // 
 +  // use SampleHandler to scan all of the subdirectories of a directory for particular MC single file:                          
 +  std::string pathstring = "$ALRB_TutorialData/r6630/"; 
 +  if (argc > 2) { 
 +    pathstring = static_cast<std::string>(argv[2]); 
 +  } 
 +  const char* inputFilePath = gSystem->ExpandPathName (pathstring.c_str());
  
-  * Stephen Sekula (ssekula@smu.edu) - SMU ATLAS Computational Physics Liaison +  if (TString(inputFilePath).Contains("input.txt")) { 
-  Amit Kumar (ahkumar@smu.edu- SMU Scientific Computing Administrator+    // This is an input.txt file it contains a CSV list of ROOT files.                                                        
 +    // Parse it and add the root files to the job                                                                               
 +    SH::SampleLocalsample = new SH::SampleLocal"UNIQUE_LABEL" ); 
 +   
 +    std::fstream fs; 
 +    fs.open(inputFilePath);
  
-Or just fill out the form below, which will result in us receiving an email.+    if (fs.good() && fs.is_open()) { 
 +      while (!fs.eof()) { 
 +        std::string line; 
 +        getline( fs, line );
  
-<form> +        TObjArray* rootfilelist = TString(line).Tokenize(",");
-Action mail ssekula@smu.edu ahkumar@smu.edu +
-Thanks "Your information has been sent to Stephen Sekula and Amit Kumar. You should expect a response shortly."+
  
-textarea "Your Name" x2 +        for (Int_t i = 0; i < rootfilelist->GetEntries(); i++) { 
-select "Category" "Comment|Request|Suggestion|Question|Urgent Matter" +          TString sampleName = ((TObjString*)rootfilelist->At(i))->GetString(); 
-textarea "Your Comments" x10 !+          sample->add(sampleName.Data()); 
 +        } 
 +      } 
 +    }
  
-email "Your E-Mail Address"+    sh.add( sample );
  
-Fieldset "Submit your message+  } else { 
-submit "Submit+   // Handle the input file path using SampleHandler's scanning capabilities, as in the tutorial. 
-</form>+  } 
 + 
 +</code> 
 +so that (keeping with the tutorial's example) the final file content should be along the lines of 
 +<code> 
 +#include "xAODRootAccess/Init.h
 +#include "SampleHandler/SampleHandler.h" 
 +#include "SampleHandler/ScanDir.h" 
 +#include "SampleHandler/ToolsDiscovery.h" 
 +#include "EventLoop/Job.h" 
 +#include "EventLoop/DirectDriver.h" 
 +#include "SampleHandler/DiskListLocal.h" 
 +#include "SampleHandler/SampleLocal.h" 
 +#include <EventLoop/OutputStream.h> 
 +#include <EventLoopAlgs/NTupleSvc.h> 
 +#include <TSystem.h> 
 +#include <fstream> 
 + 
 +#include "MyAnalysis/MyxAODAnalysis.h" 
 + 
 +int main (int argc, char *argv[]) { 
 +  // Take the submit directory from the input if provided: 
 +  std::string submitDir = "submitDir"; 
 + 
 +  if (argc > 1) submitDir = argv[1]; 
 + 
 +  // Set up the job for xAOD access: 
 +  xAOD::Init().ignore(); 
 + 
 +  // Construct the samples to run on: 
 +  SH::SampleHandler sh; 
 + 
 +  // use SampleHandler to scan all of the subdirectories of a directory for particular MC single file: 
 +  std::string pathstring = "$ALRB_TutorialData/r6630/"; 
 + 
 +  if (argc > 2) { 
 +    pathstring = static_cast<std::string>(argv[2]); 
 +  } 
 +  const char *inputFilePath = gSystem->ExpandPathName(pathstring.c_str()); 
 + 
 +  if (TString(inputFilePath).Contains("input.txt")) { 
 +    // This is an input.txt file - it contains a CSV list of ROOT files. 
 +    // Parse it and add the root files to the job 
 +    SH::SampleLocal *sample = new SH::SampleLocal("EnterNameOfYourOutputFileHere"); 
 + 
 +    std::fstream fs; 
 +    fs.open(inputFilePath); 
 + 
 +    if (fs.good() && fs.is_open()) { 
 +      while (!fs.eof()) { 
 +        std::string line; 
 +        getline(fs, line); 
 + 
 +        TObjArray *rootfilelist = TString(line).Tokenize(","); 
 + 
 +        for (Int_t i = 0; i < rootfilelist->GetEntries(); i++) { 
 +          TString sampleName = ((TObjString*)rootfilelist->At(i))->GetString(); 
 +          sample->add(sampleName.Data()); 
 +        } 
 +      } 
 +    } 
 + 
 +    sh.add(sample); 
 +  } 
 +  else { 
 +    SH::ScanDir().filePattern("GetThisFileNameFromTheTurotialOrYourOwnFiles").scan( 
 +      sh, inputFilePath); 
 +  } 
 + 
 +  // Set the name of the input TTree. It's always "CollectionTree" for xAOD files. 
 +  sh.setMetaString("nc_tree", "CollectionTree"); 
 + 
 +  // Print what we found: 
 +  sh.print(); 
 + 
 +  // Create an EventLoop job: 
 +  EL::Job job; 
 +  job.sampleHandler(sh); 
 +  job.options()->setDouble(EL::Job::optMaxEvents, 500); // limit the number of events run over 
 + 
 +  // Create an output and an assocaited ntuple 
 +  EL::OutputStream output("myOutput"); 
 +  job.outputAdd(output); 
 +  EL::NTupleSvc *ntuple = new EL::NTupleSvc("myOutput"); 
 +  job.algsAdd(ntuple); 
 + 
 +  // Add our analysis to the job: 
 +  MyxAODAnalysis* alg = new MyxAODAnalysis(); 
 +  job.algsAdd(alg); 
 + 
 +  alg->outputName = "myOutput"; // name of the output of the algorithm 
 + 
 +  // Run the job using the local/direct driver: 
 +  EL::DirectDriver driver; 
 +  driver.submit(job, submitDir); // This will be obsolete when a SLURM driver is available 
 + 
 +  return 0; 
 +} // main 
 +</code> 
 + 
 +=== Combining Output Files === 
 +When maneframe-run submits your jobs to ManeFrame it will do so in batches. Each of these will run on ManeFrame and each batch will return output to the OUTPUT directory you passed to maneframe-run. You will now want to combine these outputs to be able to look at the combined results in a single file. The following script will use ROOT's hadd utility to produce a single root file (location and name given by the user) as output: 
 +<code> 
 +#!/bin/bash 
 + 
 +SOURCE=() 
 +TARGET=~This/Is/The/PATH/To/Your/Desired/OutputFile.root 
 +i=0 
 + 
 +# find all the .root files of interest 
 +while IFS= read -r file; do 
 + SOURCE[i]=$file 
 + ((i++)) 
 +done < <(find /scratch/users/$USER/ -print | grep -i '.*data.*/' | sort) 
 + 
 +hadd $TARGET ${SOURCE[*]} 
 +</code> 
 + 
 +=== Using the EventLoop (EL) SlurmDriver === 
 + 
 +As of about November of 2016, AnalysisBase releases (after 2.4.21) contained a new driver in the EventLoop package, SlurmDriver. Like the other drivers for running code locally or on other batch management systems, the EL::SlurmDriver allows you to execute your analysis binary and have it submit and manage jobs on the SLURM queue management system. This frees you from having to write sbatch scripts or your own independent scripts for submitting Framework-bases xAOD analysis jobs. 
 + 
 +The present EL::SlurmDriver code submits one job per ROOT file processed by your code. To setup the driver in, for instance, your standalone executable main() function declaration: 
 + 
 +<code> 
 +  EL::SlurmDriver driver; 
 +  driver.SetJobName("MyAnalysis"); 
 +  driver.SetAccount("default"); 
 +  driver.SetPartition("serial"); 
 +  driver.SetRunTime("00:60:00"); 
 +  driver.SetMemory("2G"); 
 +  driver.SetConstrain(""); 
 +</code> 
 + 
 +When you create an EL::SlurmDriver, you MUST configure all of its options. You can leave some of them blank, but all of them have to be defaulted to something. The above are examples of setting the following options: 
 + 
 +  * The Job Name (appears in squeue and labels your individual jobs) 
 +  * Account - this is always "default", unless a special account has been setup to allow you to use a protected partition on SLURM 
 +  * Partition - the name of the partition where jobs will run, e.g. serial or development, etc. 
 +  * RunTime - the maximum running time of the job. Here, the example code uses 30 minutes in HH:MM:SS format 
 +  * Memory - the memory limit for your job 
 +  * Constrain - this will constrain your job to specific resources, if there are multiple resource options in the partition. 
 +===== Known Issues ===== 
 + 
 +==== Clock Skew Warnings During Builds ==== 
 + 
 +There is a known effect that, while doing builds, you get a lot of "clock skew" warnings. This is under investigation. It appears to be harmless 
 + 
 + 
 +===== Questions or Problems? =====
  
-We welcome suggestions and feedback on how to make the ManeFrame/ATLAS experience better We are grateful for your interest in using our Tier 3 resources to advance ATLAS research and development activities.+Please contact the SMU scientific computing team with questions or comments. Learn more: http://faculty.smu.edu/csc/documentation/usage.html
atlas_maneframe.1436101739.txt.gz · Last modified: 2015/07/05 09:08 by sekula