pept.Pipeline#

class pept.Pipeline(transformers)[source]#

Bases: PEPTObject

A PEPT processing pipeline, chaining multiple Filter and Reducer for efficient, parallel execution.

After a pipeline is constructed, the fit(samples) method can be called, which will apply the chain of filters and reducers on the samples of data.

A filter is simply a transformation applied to a sample (e.g. Voxelliser on a single sample of LineData). A reducer is a transformation applied to a list of all samples (e.g. Stack on all samples of PointData).

Note that only filters can be applied in parallel, but the great advantage of a Pipeline is that it significantly reduces the amount of data copying and intermediate results’ storage. Reducers will require collecting all results.

There are three execution policies at the moment: “sequential” is single-threaded (slower, but easy to debug), “joblib” (very fast on medium datasets due to joblib’s caching) and any concurrent.futures.Executor subclass (e.g. MPIPoolExecutor for parallel processing on distributed clusters).

Examples

A pipeline can be created in two ways: either by adding (+) multiple transformers together, or explicitly constructing the Pipeline class.

The first method is the most straightforward:

>>> import pept
>>> filter1 = pept.tracking.Cutpoints(max_distance = 0.5)
>>> filter2 = pept.tracking.HDBSCAN(true_fraction = 0.1)
>>> reducer = pept.tracking.Stack()
>>> pipeline = filter1 + filter2 + reducer
>>> print(pipeline)
Pipeline
--------
transformers = [
    Cutpoints(append_indices = False, cutoffs = None, max_distance = 0.5)
    HDBSCAN(clusterer = HDBSCAN(), max_tracers = 1, true_fraction = 0.1)
    Stack(overlap = None, sample_size = None)
]
>>> lors = pept.LineData(...)        # Some samples of lines
>>> points = pipeline.fit(lors)

The chain of filters can also be applied to a single sample:

>>> point = pipeline.fit_sample(lors[0])

The pipeline’s fit method allows specifying an execution policy:

>>> points = pipeline.fit(lors, executor = "sequential")
>>> points = pipeline.fit(lors, executor = "joblib")
>>> from mpi4py.futures import MPIPoolExecutor
>>> points = pipeline.fit(lors, executor = MPIPoolExecutor)

The pept.Pipeline constructor can also be called directly, which allows the enumeration of filters:

>>> pipeline = pept.Pipeline([filter1, filter2, reducer])

Adding new filters is very easy:

>>> pipeline_extra = pipeline + filter2
Attributes
transformerslist[pept.base.Filter or pept.base.Reducer]

The list of Transformer to be applied; this includes both Filter and Reducer instances.

__init__(transformers)[source]#

Construct the class from an iterable of Filter, Reducer and/or other Pipeline instances (which will be flattened).

Methods

__init__(transformers)

Construct the class from an iterable of Filter, Reducer and/or other Pipeline instances (which will be flattened).

copy([deep])

Create a deep copy of an instance of this class, including all inner attributes.

fit(samples[, executor, max_workers, verbose])

Apply all transformers defined to all samples.

fit_sample(sample)

Apply all transformers - consecutively - to a single sample of data.

load(filepath)

Load a saved / pickled PEPTObject object from filepath.

optimise(lines[, max_evals, executor, ...])

save(filepath)

Save a PEPTObject instance as a binary pickle object.

steps()

Return the order of processing steps to apply as a list where all consecutive sequences of filters are collapsed into tuples.

Attributes

filters

Only the Filter instances from the transformers.

reducers

Only the Reducer instances from the transformers.

transformers

The list of Transformer to be applied; this includes both Filter and Reducer instances.

property filters#

Only the Filter instances from the transformers. They can be applied in parallel.

property reducers#

Only the Reducer instances from the transformers. They require collecting all parallel results.

property transformers#

The list of Transformer to be applied; this includes both Filter and Reducer instances.

fit_sample(sample)[source]#

Apply all transformers - consecutively - to a single sample of data. The output type is simply what the transformers return.

fit(samples, executor='joblib', max_workers=None, verbose=True)[source]#

Apply all transformers defined to all samples. Filters are applied according to the executor policy (e.g. parallel via “joblib”), while reducers are applied on a single thread.

Parameters
samplesIterable

An iterable (e.g. list, tuple, LineData, list[PointData]), whose elements will be passed through the pipeline.

executor“sequential”, “joblib”, or concurrent.futures.Executor subclass, default “joblib”

The execution policy controlling how the chain of filters are applied to each sample in samples; “sequential” is single threaded (slow, but easy to debug), “joblib” is multi-threaded (very fast due to joblib’s caching). Alternatively, a concurrent.futures.Executor subclass can be used (e.g. MPIPoolExecutor for distributed computing on clusters).

max_workersint, optional

The maximum number of workers to use for parallel executors. If None (default), the maximum number of CPUs are used.

verbosebool, default True

If True, show extra information during processing, e.g. loading bars.

steps()[source]#

Return the order of processing steps to apply as a list where all consecutive sequences of filters are collapsed into tuples.

E.g. [F, F, R, F, R, R, F, F, F] -> [(F, F), R, (F), R, R, (F, F, F)].

optimise(lines, max_evals=200, executor='joblib', max_workers=None, verbose=True, **free_parameters)[source]#
copy(deep=True)#

Create a deep copy of an instance of this class, including all inner attributes.

static load(filepath)#

Load a saved / pickled PEPTObject object from filepath.

Most often the full object state was saved using the .save method.

Parameters
filepathfilename or file handle

If filepath is a path (rather than file handle), it is relative to where python is called.

Returns
pept.PEPTObject subclass instance

The loaded object.

Examples

Save a LineData instance, then load it back:

>>> lines = pept.LineData([[1, 2, 3, 4, 5, 6, 7]])
>>> lines.save("lines.pickle")
>>> lines_reloaded = pept.LineData.load("lines.pickle")
save(filepath)#

Save a PEPTObject instance as a binary pickle object.

Saves the full object state, including inner attributes, in a portable binary format. Load back the object using the load method.

Parameters
filepathfilename or file handle

If filepath is a path (rather than file handle), it is relative to where python is called.

Examples

Save a LineData instance, then load it back:

>>> lines = pept.LineData([[1, 2, 3, 4, 5, 6, 7]])
>>> lines.save("lines.pickle")
>>> lines_reloaded = pept.LineData.load("lines.pickle")