This repository has tools and notes for demonstration and evaluation of Rucio for LIGO bulk data management.
Some notes on getting started
- RUCIO_HOMEmust point to a directory which includes- etc/rucio.cfg
- rucio.cfgshould look like:
[client]
rucio_host = https://rucio-ligo.grid.uchicago.edu:443
auth_host = https://rucio-ligo.grid.uchicago.edu:443
ca_cert = /etc/grid-security/certificates
client_x509_proxy = /tmp/x509up_p2411400.filearAiBG.1
request_retries = 3
auth_type = x509
client_cert = /tmp/x509up_p2411400.filearAiBG.1
client_key = /tmp/x509up_p2411400.filearAiBG.1
where client_cert and client_key should point to the output of
grid-proxy-info -path
- Admin tasks should have RUCIO_ACCOUNT=root
- User tasks should have RUCIO_ACCOUNT=jclark(for example)
The first thing we need is an RSE (container for files) to upload our files to.
- Create the RSE (see e.g., CLI admin
examples:
rucio-admin rse add LIGOTEST
- Add supported protocols (e.g., srm, gsiftp, http, ...).  To begin with, we can just use gsiftp:
rucio-admin rse add-protocol \ --prefix /user/ligo/rucio \ --domain-json '{"wan": {"read": 1, "write": 1, "delete": 1, "third_party_copy": 1}}' \ --scheme gsiftp \ --hostname red-gridftp.unl.edu \ LIGOTEST
Note that rucio-admin operations should be performed with RUCIO_ACCOUNT=root
At least for testing, we will designate scopes according to data-taking runs (engineering and observing runs). To create an ER8 scope:
rucio-admin scope add --account jclark --scope ER8
See e.g., rucio scope docs
Now that we have an RSE and a scope we can experiment with the CLI examples
- Uploading a single frame with scope "ER8"
rucio -v upload \
    /hdfs/frames/ER8/hoft_C02/H1/H-H1_HOFT_C02-11262/H-H1_HOFT_C02-1126256640-4096.gwf
    --rse LIGOTEST --scope ER8 \
    --name H-H1_HOFT_C02-1126256640-4096.gwf
Should generate something like,
2018-02-05 13:33:31,104    DEBUG    Extracting filesize (457680774) and checksum
(ef00cf51) for file ER8:H-H1_HOFT_C02-1126256640-4096
2018-02-05 13:33:31,106    DEBUG    Automatically setting new GUID
2018-02-05 13:33:31,381    DEBUG    Using account root
2018-02-05 13:33:31,381    DEBUG    Skipping dataset registration
2018-02-05 13:33:31,381    DEBUG    Processing file
ER8:H-H1_HOFT_C02-1126256640-4096 for upload
2018-02-05 13:33:39,285    INFO    Local files and file
ER8:H-H1_HOFT_C02-1126256640-4096 recorded in Rucio have the same checksum. Will
try the upload
2018-02-05 13:33:56,808    INFO    File ER8:H-H1_HOFT_C02-1126256640-4096.gwf
successfully uploaded on the storage
2018-02-05 13:33:56,809    DEBUG    sending trace
2018-02-05 13:33:57,270    DEBUG    Finished uploading files to RSE.
2018-02-05 13:33:57,505    INFO    Will update the file replicas states
2018-02-05 13:33:57,586    INFO    File replicas states successfully updated
Completed in 34.7796 sec.
A next step is to set up a python simple script to:
- Retrieve a list of frame files which corresponds to some nominal data set
- Loop through the list and call the Ruico API
This can be easily achieved with a simple python script which makes use of the pycbc datafind module and a pip install of Rucio.
cmsexample.py is a command line tool for registering a CMS dataset into rucio. This set of slides describes the CMS evaluation. The CMS hierachy is more complicated than (at least our initial test) in LIGO. In CMS:
- Files: ~4GB
- Blocks (Rucio dataset): chunks of ~100 files. This is the typical unit of data transfer.
- Datasets (Rucio container): N blocks with some physical meaning
The (current) proposed LIGO arrangement is simpler:
- LIGO runs (ER8, O1, ...): Rucio scope
- LIGO dataset == Rucio dataset
Here's a run-through of cmsexample.py:
- Instantiate the DataSetInjectorobject, a general class for injecting a cms dataset into rucio
- DataSetInjectorhas methods to create containers and register files and datasets
- This class has methods for finding the rucio url and filenames
I do not need anything to do with rucio containers (yet) so can just mimic the parts associated with file and data set registration, and some of the sanity checking. I should be able to swap out my existing routines for translating LIGO file URLs to Rucio DIDs.