Skip to content

CRAB Rucio Stageout Tutorial

Prerequisite

Using Rucio CLI

Be aware that $USER stays for your CERN account username or Rucio's group account.

# Note: please use CLI outside CMSSW environment.
voms-proxy-init --voms cms
source /cvmfs/cms.cern.ch/rucio/setup-py3.sh
export RUCIO_ACCOUNT=$USER

Quota

First, check your Rucio Quota:

rucio list-account-usage $USER

Expected output:

[tseethon@lxplus808 ~]$ rucio list-account-usage tseethon
+------------+-----------+------------+--------------+
| RSE        | USAGE     | LIMIT      | QUOTA LEFT   |
|------------+-----------+------------+--------------|
| T2_CH_CERN | 0.000 B   | 100.000 GB | 100.000 GB   |
| T2_IT_Rome | 19.070 GB | 2.000 TB   | 1.981 TB     |
+------------+-----------+------------+--------------+
+------------------+---------+---------+--------------+
| RSE EXPRESSION   | USAGE   | LIMIT   | QUOTA LEFT   |
|------------------+---------+---------+--------------|
+------------------+---------+---------+--------------+

If you still do not have any quota, please consult quota request in FAQs.

Submit task with Rucio stageout

We will submit a simple analysis task and using HammerCloud dataset as our input.

PSet.py:

from __future__ import division
import FWCore.ParameterSet.Config as cms

process = cms.Process('NoSplit')

process.source = cms.Source("PoolSource", fileNames = cms.untracked.vstring('root://cms-xrd-global.cern.ch///store/mc/HC/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/00000/8ADD04E5-1776-E711-A1BA-FA163E6741E0.root'))
process.maxEvents = cms.untracked.PSet(input = cms.untracked.int32(10))
process.options = cms.untracked.PSet(wantSummary = cms.untracked.bool(True))
process.output = cms.OutputModule("PoolOutputModule",
    outputCommands = cms.untracked.vstring("drop *", "keep recoTracks_globalMuons_*_*"),
    fileName = cms.untracked.string('output.root'),
)
process.out = cms.EndPath(process.output)

crabConfig.py:

from WMCore.Configuration import Configuration
config = Configuration()
config.section_('General')

config.General.transferLogs = False
config.General.requestName = 'rucio_transfers_tutorial'
config.section_('JobType')
config.JobType.pluginName = 'Analysis'
config.JobType.psetName = 'pset.py'
config.JobType.maxJobRuntimeMin = 60
config.section_('Data')
config.Data.totalUnits = 10
config.Data.splitting = 'LumiBased'
config.Data.publication = True
config.Data.unitsPerJob = 1
config.Data.outputDatasetTag = 'ruciotransfer-tutorial'
config.Data.outLFNDirBase = '/store/user/rucio/tseethon/'
config.Data.inputDataset = '/GenericTTbar/HC-CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/AODSIM'
config.section_('User')
config.section_('Site')
config.Site.storageSite = 'T2_CH_CERN'
config.section_('Debug')

The most important part of CRAB config is

config.Data.outLFNDirBase = '/store/user/rucio/tseethon/'

CRAB recognize Rucio stage-out only when output LFN is prefixed with /store/{user,group}/rucio/${rucioaccount}.

Then, submit the task with the usual crab submit:

crab submit -c crabConfig.py

We will wait until some jobs finish, and move to PostJob stage (jobs change from "running" to "transferring" in crab status).

Inspect "transferring" status

Assume after running crab submit, we get task name:

230829_164047:tseethon_crab_rucio_transfers_publication_test12_20230829_184045

To inspect transferring status,

  • Run crab status and looking for the line Transfer container's rule.

    crab-status-rucio-rule.png

    On the line with "Transfer container's rule:", copy the link and open with your web browser.

  • Check at state field, if "OK" mean files are transferred to destination (destination is in rse_expression field).

    rucio-rule-page-tutorial.png

  • You can look at individual files in Locks Overview:

    rucio-lock-overview-tutorial.png

  • You can click hyperlink in the "name" field to see the content of the container.

    rucio-container-info-tutorial.png