FAQ

Automatic Analysis : Frequently Asked Questions

This FAQ provides information on a collection of topics that may require some clarification. Many of the questions here are covered in more detail in the sidebar links. Even if you only need part of this information, it's helpful to skim the entire document to get a feel for the kinds of issues that arise when working with aa and the features that are available to address them. The more lengthy entries appear first.

TABLE OF CONTENTS

How do I change a module parameter?
What processing options are available for BIDS input?
Can I export my data to BIDS in aa?
Does aa offer nonparametric modeling?
What is a stream?
How (and why) do I rename a stream?
Where can I find a list of parameters that can be customized for a given module?
How do I refer to a module when more than one instance of the module appears in the tasklist?
How does aa know which modules in a tasklist have finished processing?
How do I force aa to rerun part of the analysis?
How do I set up aa for Matlab multicore operation?
Can I use aa on a computing cluster?
What is a provenance map?
What is a branched tasklist?
What is aa garbage collection?
What is stored in my home aa directory?
What is a module domain?
How should I process data collected using sparse sampling?
What are some good aa programming and data practices?

How do I change a module parameter?

Most modules have customizable parameters. Values for these parameters can be specified in the tasklist using an extraparameters tag. In the tutorial, a kernel width (FWHM) of 6 mm (isotropic) was specified for the module aamod_smooth:

	<module><name>aamod_smooth</name>
		<extraparameters>
			<aap><tasklist><currenttask><settings>
				<FWHM>6</FWHM>
			</settings></currenttask></tasklist></aap>
		</extraparameters>
	</module>

This overrides the module default FWHM value of 10 mm.

Alternatively, module parameters can be specified in the userscript using the aap tasksettings field. For example, the kernel width for smoothing could be specified as follows:

	aap.tasksettings.aamod_smooth.FWHM = 6;

Some parameters are cumbersome to change via XML, but the choice of whether to set a parameter in your userscript or tasklist is largely a matter of personal preference.

What processing options are available for BIDS input?

The general calling convention of aas_processBIDS is as follows:

aap = aas_processBIDS(aap,[sessions],[tasks],[subjects],[regcolumn])

Parameters listed in square brackets are optional. If these are omitted, all tasks, subjects, and sessions in the dataset are input. Otherwise, you may use these parameters to only process a subset of the data. (The parameter regcolumn has a different kind of function and is described later.)

Use of the parameters is most easily explained through an example. Consider OpenNeuro dataset ds000114. This dataset contains ten subjects (sub-01 through sub-10) performing five different tasks (finger-foot-lips, covertverb, overtword, overtverb, line_bisection ) repeated in two sessions (sess-test and sess-retest).

A subset of subjects is selected by passing a cell array of subject identifiers:

aap = aas_processBIDS(aap,[],[],{'sub-01','sub-07'});

This usage would result in processing only subjects sub-01 and sub-07. Per Matlab syntax rules, an empty array is passed for the two parameters (sessions and tasks) which precede subjects in the function call; default values will be used for these variables (i.e., all sessions and tasks will be processed for the two subjects).

We analyze a task subset by passing a cell array for the task parameter:

aap = aas_processBIDS(aap,[],{‘finger-foot-lips’});

This usage of aas_processBIDS will result in processing of only the finger-foot-lips motor task (for all subjects and sessions).

Finally, we select a session subset by passing a cell array for the session parameter:

aap = aas_processBIDS(aap,{'sess-test'});

Options can be combined as needed. For example:

aap = aas_processBIDS(aap,[],{‘coverb’},{'sub-01','sub-07'});

would process both sessions of the "coverb" task only for subjects 01 and 07.

The regcolumn parameter

According to the BIDS specification, event information in a BIDS dataset should be organized as a collection of plaintext tsv files each having a minimum of four columns: 1) event onset (i.e., latency since start of scan), 2) event duration, 3) event auxiliary information, and 4) event type. These columns must be labeled onset, duration, weight, and trial_type. The tsv may contain additional columns defining alternate or auxiliary event information. For example, trial_type may be L or R to indicate a left- or right-hand button press and the tsv might include a column that appends response information (e.g., L_correct, L_incorrect, R_correct, and R_incorrect).

By default, aas_processBIDS uses the column labeled trial_type for the event definition. If you wish to use alternate data as the event type, pass the column label to aas_processBIDS in the regcolumn parameter. For example:

aap = aas_processBIDS(aap, [],[],[],'scored_response');

would process all tasks, subjects, and sessions in the data using the column labeled scored_response for the event type. Any value may be passed (as long as the label exists in the tsv file). Note regcolumn is a character variable, not a cell array.

FYI: Public BIDS data is notorious for violating the BIDS specification. Although aa attempts to accommodate common transgressions, you will sometimes need to edit the files or file contents of a BIDS dataset in order to use it. Running a BIDS validation tool is often helpful in diagnosing errors caused by bad BIDS formatting.

BIDS Event Name Modification

The tsv files in public BIDS datasets often use poor event naming practices that may result in runtime errors. To address this, aa provides four options for modifying event names during BIDS import. The fields are self-explanatory:

aap.acq_details.stripBIDSEventNames = true; % strip special characters from event names
aap.acq_details.omitNullBIDSEvents = true; % do not add "null" events to the model
aap.acq_details.convertBIDSEventsToUppercase = true; % convert event names to uppercase
aap.acq_details.maxBIDSEventNameLength = max; % truncate event names to max characters

Set these fields prior to calling aas_processBIDS.

Finally, you can opt out of BIDS event specification altogether when importing BIDS data:

aap.acq_details.omitBIDSmodeling = true; % do not process tsv data

Although this option may seem counterintuitive as it requires you to explicitly define events in your userscript, it allows you to implement models in your analysis such as parametric modulation that are supported by aa but are not yet part of the BIDS standard.

FYI: These options may be moved from acq_details into name-value pairs passed to aas_processBIDS in a future release.

Can I export my data to BIDS in aa?

Yes. Export to BIDS is done using aa_export_toBIDS, which will reformat the analysis data so that it is BIDS compliant. The function can be run either standalone or be included at the end of a userscript. See comments in aa_export_toBIDS.m for more information.

Does aa offer nonparametric modeling?

Yes. Automatic Analysis implements nonparametric modeling for second-level analysis using FSL randomise. See the modules aamod_secondlevel_randomise and aamod_secondlevel_randomise_threshold. Note FSL must be installed to use these modules.

What is a stream?

Data transfer between modules is handled for you by aa. Each module takes data as an input, processes it, then makes the processed version of the data available to other modules as an output. Internally, this is implemented using a data abstraction called a stream. From your perspective, a stream is a label used to refer to a type of data appearing in the tasklist. For example, aa refers to the structural and functional data as the "structural" and "epi" streams, respectively. The same stream name is used throughout the analysis even though the data for the stream may have been altered. For example, the functional data is altered by realignment but both the input and output to the realignment module are referred to as the "epi" stream. This simplifies the analysis of data dependencies.

If an input stream required by a given module is not created as an output stream by some other module, the analysis will halt and aa will inform you of the missing stream. Pipeline design in aa is largely an effort of identifying a collection of modules that create an unbroken chain of streams connecting an input to a result of interest.

FYI: Stream handling is implement using plaintext files in each module directory. These have names:

stream_[streamname]_inputto_[modulename]
stream_[streamname]_outputfrom_[modulename]

These files are used by the scheduler to associate a file in the module directory with the named input or output stream. Although the contents can be viewed using an editor, you should not modify them.

How do I rename a stream?

The data dependencies defined in a tasklist as written sometimes require modification.

By default, an input stream for a given module is taken from the closest module appearing previously in the tasklist that generates the stream as an output. For example, in the tutorial pipeline aamod_segment_structural uses the "structural" stream from aamod_reorienttomiddle_structural because that is the first module that precedes it in the pipeline that has a structural stream as an output. Sometimes this is not the correct choice. A common case arises when multiple instances of aamod_firstlevel_model appear in a tasklist. On exit, aamod_firstlevel_model replaces the "epi" stream with the model residuals. As such, the aa scheduler will assign the residuals as the "epi" input of any module appearing after aamod_firstlevel_model in the tasklist that takes this stream as an input. This may be a correct assignment for some modules. However, for others -- a second instance of aamod_firstlevel_model, say -- you likely intended the "epi" input to be the processed functional data, not the model residuals.

Fortunately, it is possible to customize stream assignment in the userscript using aas_renamestream. Take as an example the above case of a second instance of aamod_firstlevel_model appearing in the tasklist. We need to explicitly assign its "epi" input. This is done as follows:

aap = aas_renamestream(aap,'aamod_firstlevel_model_00002','epi','aamod_smooth_00001.epi');

The passed parameters are the aap struct, the module which requires stream renaming, the name of the stream to be renamed, and the new source of the stream. Here we assume smoothing was the final operation applied to the functional data prior to modeling. As shown here, the syntax used to identify the stream output by a specific module is modulename.streamname. Note the module names must include the numerical identifier suffix.

FYI: You rename output streams by including an 'output' tag as the final parameter passed to aas_renamestream. However, input stream renaming is generally more common.

FYI: You can determine if a stream is renameable by the appearance of an isrenameable modifier on the stream name in the module header.

Placeholder Streams

Stream renaming is also used in modules that define a generic placeholder stream with an expectation that the proper stream name will be passed to the module at runtime. For example, the module aamod_firstlevel_LI implements a placeholder input stream (literally named "placeholder") because we may want to perform a lateralization analysis on different kinds of inputs. The selection is also made using aas_renamestream. For example:

aap = aas_renamestream(aap,'aamod_firstlevel_LI_00001','placeholder','firstlevel_spmts');

will analyze the first level t-map. However, we could have selected the beta map:

aap = aas_renamestream(aap,'aamod_firstlevel_LI_00001','placeholder','firstlevel_betas');

Note these examples do not explicitly identify a module for the input stream source. In such uses of aas_renamestream, the aa scheduler will apply the default rule for selecting the stream (i.e., the closest module appearing previously in the tasklist that generates firstlevel_spmts or firstlevel_betas).

FYI: Some modules that implement a placeholder stream use a standard default name such as epi or structural. This allows the module to be used as-is if the placeholder happens to be the stream you want to process while retaining the option to reassign the stream if it is not.

More information on working with streams can be found in the sidebar links.

Where can I find a list of parameters that can be customized for a given module?

The full set of parameters and their default value defined for a given module can be found in the module's XML header.

Occasionally, a parameter will not include a default value. For example, you can explicitly set the repetition time (TR) in aamod_firstlevel_model (aa usually determines the repetition time automatically from the data, but some data file formats omit this information). However, there is no such thing as a "default" TR, so none is assigned in the module header.

If a required module parameter is not set, a runtime error will occur.

How do I refer to a module when more than one instance of the module appears in the tasklist?

When more than one instance of a module appears in the tasklist, you may need to refer to a specific instance (or all instances) when setting parameters in the userscript. Two coding approaches are available.

When passing a module to an aa utility, use a module ordinal suffix or wildcard: For example:

aap = aas_addevent(aap, 'aamod_firstlevel_model_00002', ... aap = aas_addevent(aap, 'aamod_firstlevel_model_00004', ...

to define events for the second and fourth instance of aamod_firstlevel_model and:

aap = aas_addevent(aap, 'aamod_firstlevel_model_*', ...

to define the event for all instances.

Use Matlab array indexing when setting multiple module parameters using the tasksettings field:

for index = 1:numel(aap.tasksettings.aamod_firstlevel_model) aap.tasksettings.aamod_firstlevel_model(index).xBF.T0 = 1; end

Programming tip: Check that instances of the module exist in the tasklist to prevent a runtime error:

if isfield(aap.tasksettings,'aamod_firstlevel_model')
	for index = 1:numel(aap.tasksettings.aamod_firstlevel_model)
		aap.tasksettings.aamod_firstlevel_model(index).xBF.T0 = 1;
	end
end

FYI: When only a single instance of a module appears in the tasklist, the numerical suffix can be omitted when refering to it. For example, a contrast was defined in the tutorial analysis using:

aap = aas_addcontrast(aap,'aamod_firstlevel_contrasts',...

and not

aap = aas_addcontrast(aap,'aamod_firstlevel_contrasts_00001',...

because aamod_firstlevel_contrasts appeared only once in the tasklist. However, the latter syntax would be acceptable.

How does aa know which modules in a tasklist have finished processing?

The aa scheduler writes a semaphore to the module directory having the name done_[modulename]. This file is created when the module exits and is how the scheduler determines which modules have run to completion (e.g., if a partially completed analysis is restarted).

How do I force aa to rerun part of the analysis?

When an analysis is started, aa examines the contents of the results directory tree (if it exists) and determines which modules in the tasklist have completed and so need not be re-run (it does so by checking for the existence of a "done" file flag in the module results directory). This usually occurs without issue. However, you may occasionally need to intervene and force some or all of the previously completed modules to rerun. This can happen after changing module options, making changes to the model definition (including events or contrasts), renaming streams, or changing files in a DICOM data collection -- that is, after making any change the scheduler can't detect by the absence of a done flag. If this occurs, aa will echo the usual informational messages to the command window then exit without rerunning the affected (or any) modules.

The fix is straightforward: simply delete the directory for any module you want to force aa to rerun. The schedule will rerun the module when the analysis is restarted, and also any affected modules based on the tasklist data dependencies.

Of course, you can force the analysis to be rerun from scratch by deleting the entire results directory tree, but this is generally not an optimal solution.

How do I set up aa for Matlab multicore operation?

Automatic Analysis has code that will allow you to run an analysis using multiple CPU cores on your machine. These resources are selected by the wheretoprocess field under aap.options. There are currently three options:

aap.options.wheretoprocess = 'single';
aap.options.wheretoprocess = ‘parpool';
aap.options.wheretoprocess = ‘matlab_pct';

localsingle (the default) carries out the analysis on your local machine and parpool will run the analysis using the local multicore architecture. Additionally, matlab_pct is version of parpool that uses the Parallel Computing Toolbox.

Two other parameters are required:

aap.directory_conventions.poolprofile = 'local';
aap.options.aaparallel.numberofworkers = 12;

Selecting the optimal value for the numberofworkers (shown as '12' in this example) can require some trial and error. Theoretically, this should be the number of cores in your machine minus one (ignoring multithreading issues). However, the aa scheduling engine places further restrictions on the value and may exhibit unpredictable behavior if set too high.

If your machine has multiple CPU cores available (type feature('numcores') in the Matlab command window), there is little reason to not use parpool or matlab_pct instead of localsingle. This will provide a speed-up by about a factor of the number of cores. However, some modules do not generate graphics (e.g., diagnostic and QA figures) under multicore operation. This can be remedied by running reporting in localsingle mode after the analysis completes.

FYI: The aa scheduler works differently depending on the wheretoprocess option. Under localsingle, modules are run strictly in the order in which they appear in the tasklist. In multicore (and cluster) operation, the scheduler will runs modules when data dependencies are satisfied. This may not correspond to the order of the modules in the tasklist. It is possible one subject may run to completion while others are waiting for processors to become available.

Can I use aa on a computing cluster?

Yes. Automatic Analysis provides options for submitting jobs to a computing cluster using several popular job schedulers including qsub, torque, and slurm. Use of a computing cluster offers a substantial speedup, but there is a nontrivial setup effort required. If you would like to explore this option, you will probably need to work with the system administrator of your local cluster. See the aa GitHub site for more information.

More information on running aa on a computing cluster can be found in the sidebar links.

What is a provenance map?

Automatic Analysis creates a graphical summary of the tasklist data flow called a provenance map. This consists of a schematic of the modules in the tasklist connected by labeled paths indicating input and output streams. The provenance map is generally helpful in verifying the correctness of a pipeline, although a complex tasklist may create a layout that is somewhat difficult to read.

The provenance map is saved at the top level of the results directory in the file aap_prov.dot. The file is created at the start of the analysis (so you need not wait for an analysis to complete to review it). However, viewing the map requires a tool that can read the "dot" language syntax. In Linux and Windows, GraphViz works well. However, OS-X support for GraphViz is rather spotty. You may want to seek out alternative tools that can convert and display a dot file.

What is a branched tasklist?

The aa tasklist parser recognizes a branching syntax that allows you to explore pipeline variations in a single tasklist. For example, the following excerpt includes two orderings of realignment and slice timing:

<module>
	<branch>
		<analysisid_suffix>_realign_then_slicetime</analysisid_suffix>
		<module><name>aamod_realign</name></module>
		<module><name>aamod_slicetiming</name></module>
	</branch>
	<branch>
		<analysisid_suffix>_slicetime_then_realign</analysisid_suffix>
		<module><name>aamod_slicetiming</name></module>
		<module><name>aamod_realign</name></module>
	</branch>            
</module>

After initial processing (not shown), two pipeline variations are introduced based on the order of realignment and slice timing. Automatic Analysis will run separate analyses using both orderings. Note all modules appearing after a branch is defined the tasklist will be run in both branches. This provides a convenient feature so that you need not paste copies of all the post-branch modules into each branch.

It is standard practice to include an analysisid_suffix tag in each branch. This will organize the output of the branches into separate named directories in the results directory. Note the usual rules for module numbering still apply -- this usually results in one results directory containing even-numbered modules and the other containing odd-numbered modules.

More information on branched tasklists can be found in the sidebar links

What is aa garbage collection?

Imaging analysis is a disk hog and aa is no exception: it generates multiple copies of (typically large) files throughout the analysis directory tree (although it will use links* when it can). If your disk space is tight, you may want to use aa garbage collection to delete duplicate files at the conclusion of an analysis. Include the following module at the very end of your tasklist:

<module><name>aamod_garbagecollection</name></module>

As an alternative to garbage collection, some aa users prefer to monitor disk usage manually and delete old or unused analysis results as needed.

* due to arcane programming reasons, aa uses hard links by default. These appear to be file duplicates (the only way is identify a hard link is to examine the file inode number) but in fact no additional disk space is consumed. Hard links are known to create problems on some filesystems; you may opt out of hardlinks by setting the parameter aap.options.hardlinks to false (0).

What is stored in my home aa directory?

Automatic Analysis assumes the default parameter file aap_parameters_user.xml is kept in the (hidden) directory named ".aa" in your home directory. Additionally, aa creates “worker” directories in ~/.aa to store temporary results. These directories will (eventually) be automatically deleted once they are no longer needed (you will see a message in the Matlab command window at the start of an analysis when this occurs). Occasionally the directories accumulate and you may want to delete them yourself. Doing so is generally harmless, but may increase runtime somewhat if a partial analysis is restarted.

What is a module domain?

Each module has a domain that determines execution details. The three most common domains are study, subject, and session. A study-level module runs once per analysis, a subject-level module runs once per subject, and a session-level module runs once per session (i.e., once per session per subject per analysis). The general rule of thumb is that operations on functional data are session level, operations on structural data are subject level, first-level modeling is subject level, and second level modeling operations are study level. The implementation details are transparent to the user, but the terminology appears in some aa documentation and messages and so is mentioned here. A module's domain definition can be found at the top of the module's XML header.

FYI: A recent addition to aa is domain wildcarding, which allows a single module to be used in multiple domains depending on context (e.g., norming a functional image is a session-level operation; norming a structural image is a subject-level operation). The module domain is listed as "*" in the xml header. Be advised some users have encountered runtime errors in modules that use domain wildcarding (the error will mention "attempt to load stream failed"). You may need to edit the module's header so that it explicitly refers to "session" or "subject" domain, as appropriate.

How should I process data collected using sparse sampling?

It is sometimes necessary to modify the HRF parameters aa passes to SPM for modeling. One example is sparse sampling, in which a stimulus (typically an auditory stimulus) must be presented between scans to avoid contamination by scanner noise. The HRF must be adjusted to account for the nonstandard timing of the response.

The two HRF quantities of interest are xBF.T (the number of microtime bins) and xBF.T0 (the microtime origin bin number). The default value of xBF.T is 16 and typically does not need to be changed. However, when using sparse sampling, the value of xBF.T0 should be changed to 1 (the telltale sign that xBF.T0 is incorrect is that parameter maps aa generates are all blank). Include this line in your userscript:

aap.tasksettings.aamod_firstlevel_model.xBF.T0 = 1;

If aamod_firstlevel_model appears more than once in the tasklist, you will need to change the parameter for each instance.

What are some good aa programming and data practices?

Do not use special characters (!@#$%^&*()+{}[]|><,.?/) or whitespace (space and tab) in identifiers. "Identifiers" include directory names, filenames, and event or contrast labels. In particular, do not use > or < or + or - in a contrast label. Instead, use G for >, L for <, p for +, and m for -. In fact, don't use any characters in identifiers except letters, numbers, and an underscore. The first character in an identifier should be a letter.
Do not use case-sensitive identifiers (e.g., tone and TONE to indicate soft and loud tone events). This is especially problematic under OS-X, wherein some system functions recognize case and some do not.
Include a leading zero in digits less than 10 (i.e., "subject_09" not "subject_9"). This will ensure consistent directory listing sort order.
Do not use excessively long identifiers. These tend to be difficult to read in figures and may create filenames that interact badly with the OS. Even if your file system can handle longer names, keeping identifiers shorter than 16 characters is generally a good idea; shorter than 8 is even better.
Don't include empty rows or trailing blanks in tsv files. Indicate missing entries with NA or -1. Don't mix strings and numeric data in the same column. Don't mix tabs and spaces between entries. Don't use double spaces.
Be careful when cutting and pasting code examples from user guides (even this one!) into Matlab. Some editors use incompatible punctuation (e.g., smart quotes or em-dash) or add hidden formatting to text.

FYI: OpenNeuro contains many datasets that violate one or more of these rules (for example, embedding whitespace in event names). There are options for BIDS input in aa that can be used to correct some of the problems (such as stripping special characters from event names). However, the opportunity for errors remain.

Contents

FAQ

Automatic Analysis : Frequently Asked Questions

How do I change a module parameter?

What processing options are available for BIDS input?

Can I export my data to BIDS in aa?

Does aa offer nonparametric modeling?

What is a stream?

How do I rename a stream?

Where can I find a list of parameters that can be customized for a given module?

How do I refer to a module when more than one instance of the module appears in the tasklist?

How does aa know which modules in a tasklist have finished processing?

How do I force aa to rerun part of the analysis?

How do I set up aa for Matlab multicore operation?

Can I use aa on a computing cluster?

What is a provenance map?

What is a branched tasklist?

What is aa garbage collection?

What is stored in my home aa directory?

What is a module domain?

How should I process data collected using sparse sampling?

What are some good aa programming and data practices?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally