pept.Pipeline#

class pept.Pipeline(transformers)[source]#

Bases: PEPTObject

A PEPT processing pipeline, chaining multiple Filter and Reducer for efficient, parallel execution.

After a pipeline is constructed, the fit(samples) method can be called, which will apply the chain of filters and reducers on the samples of data.

A filter is simply a transformation applied to a sample (e.g. Voxelliser on a single sample of LineData). A reducer is a transformation applied to a list of all samples (e.g. Stack on all samples of PointData).

Note that only filters can be applied in parallel, but the great advantage of a Pipeline is that it significantly reduces the amount of data copying and intermediate results’ storage. Reducers will require collecting all results.

There are three execution policies at the moment: “sequential” is single-threaded (slower, but easy to debug), “joblib” (very fast on medium datasets due to joblib’s caching) and any concurrent.futures.Executor subclass (e.g. MPIPoolExecutor for parallel processing on distributed clusters).

Examples

A pipeline can be created in two ways: either by adding (+) multiple transformers together, or explicitly constructing the Pipeline class.

The first method is the most straightforward:

>>> import pept

>>> filter1 = pept.tracking.Cutpoints(max_distance = 0.5)
>>> filter2 = pept.tracking.HDBSCAN(true_fraction = 0.1)
>>> reducer = pept.tracking.Stack()
>>> pipeline = filter1 + filter2 + reducer

>>> print(pipeline)
Pipeline
--------
transformers = [
    Cutpoints(append_indices = False, cutoffs = None, max_distance = 0.5)
    HDBSCAN(clusterer = HDBSCAN(), max_tracers = 1, true_fraction = 0.1)
    Stack(overlap = None, sample_size = None)
]

>>> lors = pept.LineData(...)        # Some samples of lines
>>> points = pipeline.fit(lors)

The chain of filters can also be applied to a single sample:

>>> point = pipeline.fit_sample(lors[0])

The pipeline’s fit method allows specifying an execution policy:

>>> points = pipeline.fit(lors, executor = "sequential")
>>> points = pipeline.fit(lors, executor = "joblib")

>>> from mpi4py.futures import MPIPoolExecutor
>>> points = pipeline.fit(lors, executor = MPIPoolExecutor)

The pept.Pipeline constructor can also be called directly, which allows the enumeration of filters:

>>> pipeline = pept.Pipeline([filter1, filter2, reducer])

Adding new filters is very easy:

>>> pipeline_extra = pipeline + filter2

Attributes

transformerslist[pept.base.Filter or pept.base.Reducer]: The list of Transformer to be applied; this includes both Filter and Reducer instances.

__init__(transformers)[source]#: Construct the class from an iterable of Filter, Reducer and/or other Pipeline instances (which will be flattened).

Methods

`__init__`(transformers)	Construct the class from an iterable of `Filter`, `Reducer` and/or other `Pipeline` instances (which will be flattened).
`copy`([deep])	Create a deep copy of an instance of this class, including all inner attributes.
`fit`(samples[, executor, max_workers, verbose])	Apply all transformers defined to all samples.
`fit_sample`(sample)	Apply all transformers - consecutively - to a single sample of data.
`load`(filepath)	Load a saved / pickled PEPTObject object from filepath.
`optimise`(lines[, max_evals, executor, ...])
`save`(filepath)	Save a PEPTObject instance as a binary pickle object.
`steps`()	Return the order of processing steps to apply as a list where all consecutive sequences of filters are collapsed into tuples.

Attributes

`filters`	Only the Filter instances from the transformers.
`reducers`	Only the Reducer instances from the transformers.
`transformers`	The list of Transformer to be applied; this includes both Filter and Reducer instances.

property filters#: Only the Filter instances from the transformers. They can be applied in parallel.

property reducers#: Only the Reducer instances from the transformers. They require collecting all parallel results.

property transformers#: The list of Transformer to be applied; this includes both Filter and Reducer instances.

fit_sample(sample)[source]#: Apply all transformers - consecutively - to a single sample of data. The output type is simply what the transformers return.

fit(samples, executor='joblib', max_workers=None, verbose=True)[source]#

Apply all transformers defined to all samples. Filters are applied according to the executor policy (e.g. parallel via “joblib”), while reducers are applied on a single thread.

Parameters

samplesIterable: An iterable (e.g. list, tuple, LineData, list[PointData]), whose elements will be passed through the pipeline.
executor“sequential”, “joblib”, or concurrent.futures.Executor subclass, default “joblib”: The execution policy controlling how the chain of filters are applied to each sample in samples; “sequential” is single threaded (slow, but easy to debug), “joblib” is multi-threaded (very fast due to joblib’s caching). Alternatively, a concurrent.futures.Executor subclass can be used (e.g. MPIPoolExecutor for distributed computing on clusters).
max_workersint, optional: The maximum number of workers to use for parallel executors. If None (default), the maximum number of CPUs are used.
verbosebool, default True: If True, show extra information during processing, e.g. loading bars.

steps()[source]#

Return the order of processing steps to apply as a list where all consecutive sequences of filters are collapsed into tuples.

E.g. [F, F, R, F, R, R, F, F, F] -> [(F, F), R, (F), R, R, (F, F, F)].

optimise(lines, max_evals=200, executor='joblib', max_workers=None, verbose=True, **free_parameters)[source]#

copy(deep=True)#: Create a deep copy of an instance of this class, including all inner attributes.

static load(filepath)#

Load a saved / pickled PEPTObject object from filepath.

Most often the full object state was saved using the .save method.

Parameters

filepathfilename or file handle: If filepath is a path (rather than file handle), it is relative to where python is called.

Returns

pept.PEPTObject subclass instance: The loaded object.

Examples

Save a LineData instance, then load it back:

>>> lines = pept.LineData([[1, 2, 3, 4, 5, 6, 7]])
>>> lines.save("lines.pickle")

>>> lines_reloaded = pept.LineData.load("lines.pickle")

save(filepath)#

Save a PEPTObject instance as a binary pickle object.

Saves the full object state, including inner attributes, in a portable binary format. Load back the object using the load method.

Parameters

filepathfilename or file handle: If filepath is a path (rather than file handle), it is relative to where python is called.

Examples

Save a LineData instance, then load it back:

>>> lines = pept.LineData([[1, 2, 3, 4, 5, 6, 7]])
>>> lines.save("lines.pickle")

>>> lines_reloaded = pept.LineData.load("lines.pickle")