pept.Pipeline#
- class pept.Pipeline(transformers)[source]#
Bases:
PEPTObject
A PEPT processing pipeline, chaining multiple Filter and Reducer for efficient, parallel execution.
After a pipeline is constructed, the fit(samples) method can be called, which will apply the chain of filters and reducers on the samples of data.
A filter is simply a transformation applied to a sample (e.g. Voxelliser on a single sample of LineData). A reducer is a transformation applied to a list of all samples (e.g. Stack on all samples of PointData).
Note that only filters can be applied in parallel, but the great advantage of a Pipeline is that it significantly reduces the amount of data copying and intermediate results’ storage. Reducers will require collecting all results.
There are three execution policies at the moment: “sequential” is single-threaded (slower, but easy to debug), “joblib” (very fast on medium datasets due to joblib’s caching) and any concurrent.futures.Executor subclass (e.g. MPIPoolExecutor for parallel processing on distributed clusters).
Examples
A pipeline can be created in two ways: either by adding (+) multiple transformers together, or explicitly constructing the Pipeline class.
The first method is the most straightforward:
>>> import pept
>>> filter1 = pept.tracking.Cutpoints(max_distance = 0.5) >>> filter2 = pept.tracking.HDBSCAN(true_fraction = 0.1) >>> reducer = pept.tracking.Stack() >>> pipeline = filter1 + filter2 + reducer
>>> print(pipeline) Pipeline -------- transformers = [ Cutpoints(append_indices = False, cutoffs = None, max_distance = 0.5) HDBSCAN(clusterer = HDBSCAN(), max_tracers = 1, true_fraction = 0.1) Stack(overlap = None, sample_size = None) ]
>>> lors = pept.LineData(...) # Some samples of lines >>> points = pipeline.fit(lors)
The chain of filters can also be applied to a single sample:
>>> point = pipeline.fit_sample(lors[0])
The pipeline’s fit method allows specifying an execution policy:
>>> points = pipeline.fit(lors, executor = "sequential") >>> points = pipeline.fit(lors, executor = "joblib")
>>> from mpi4py.futures import MPIPoolExecutor >>> points = pipeline.fit(lors, executor = MPIPoolExecutor)
The pept.Pipeline constructor can also be called directly, which allows the enumeration of filters:
>>> pipeline = pept.Pipeline([filter1, filter2, reducer])
Adding new filters is very easy:
>>> pipeline_extra = pipeline + filter2
- Attributes
transformers
list
[pept.base.Filter
orpept.base.Reducer
]The list of Transformer to be applied; this includes both Filter and Reducer instances.
- __init__(transformers)[source]#
Construct the class from an iterable of
Filter
,Reducer
and/or otherPipeline
instances (which will be flattened).
Methods
__init__
(transformers)Construct the class from an iterable of
Filter
,Reducer
and/or otherPipeline
instances (which will be flattened).copy
([deep])Create a deep copy of an instance of this class, including all inner attributes.
fit
(samples[, executor, max_workers, verbose])Apply all transformers defined to all samples.
fit_sample
(sample)Apply all transformers - consecutively - to a single sample of data.
load
(filepath)Load a saved / pickled PEPTObject object from filepath.
optimise
(lines[, max_evals, executor, ...])save
(filepath)Save a PEPTObject instance as a binary pickle object.
steps
()Return the order of processing steps to apply as a list where all consecutive sequences of filters are collapsed into tuples.
Attributes
Only the Filter instances from the transformers.
Only the Reducer instances from the transformers.
The list of Transformer to be applied; this includes both Filter and Reducer instances.
- property filters#
Only the Filter instances from the transformers. They can be applied in parallel.
- property reducers#
Only the Reducer instances from the transformers. They require collecting all parallel results.
- property transformers#
The list of Transformer to be applied; this includes both Filter and Reducer instances.
- fit_sample(sample)[source]#
Apply all transformers - consecutively - to a single sample of data. The output type is simply what the transformers return.
- fit(samples, executor='joblib', max_workers=None, verbose=True)[source]#
Apply all transformers defined to all samples. Filters are applied according to the executor policy (e.g. parallel via “joblib”), while reducers are applied on a single thread.
- Parameters
- samples
Iterable
An iterable (e.g. list, tuple, LineData, list[PointData]), whose elements will be passed through the pipeline.
- executor“sequential”, “joblib”,
or
concurrent.futures.Executorsubclass
,default
“joblib” The execution policy controlling how the chain of filters are applied to each sample in samples; “sequential” is single threaded (slow, but easy to debug), “joblib” is multi-threaded (very fast due to joblib’s caching). Alternatively, a concurrent.futures.Executor subclass can be used (e.g. MPIPoolExecutor for distributed computing on clusters).
- max_workers
int
, optional The maximum number of workers to use for parallel executors. If None (default), the maximum number of CPUs are used.
- verbosebool,
default
True
If True, show extra information during processing, e.g. loading bars.
- samples
- steps()[source]#
Return the order of processing steps to apply as a list where all consecutive sequences of filters are collapsed into tuples.
E.g. [F, F, R, F, R, R, F, F, F] -> [(F, F), R, (F), R, R, (F, F, F)].
- optimise(lines, max_evals=200, executor='joblib', max_workers=None, verbose=True, **free_parameters)[source]#
- copy(deep=True)#
Create a deep copy of an instance of this class, including all inner attributes.
- static load(filepath)#
Load a saved / pickled PEPTObject object from filepath.
Most often the full object state was saved using the .save method.
- Parameters
- filepath
filename
orfile
handle
If filepath is a path (rather than file handle), it is relative to where python is called.
- filepath
- Returns
pept.PEPTObject
subclass
instance
The loaded object.
Examples
Save a LineData instance, then load it back:
>>> lines = pept.LineData([[1, 2, 3, 4, 5, 6, 7]]) >>> lines.save("lines.pickle")
>>> lines_reloaded = pept.LineData.load("lines.pickle")
- save(filepath)#
Save a PEPTObject instance as a binary pickle object.
Saves the full object state, including inner attributes, in a portable binary format. Load back the object using the load method.
- Parameters
- filepath
filename
orfile
handle
If filepath is a path (rather than file handle), it is relative to where python is called.
- filepath
Examples
Save a LineData instance, then load it back:
>>> lines = pept.LineData([[1, 2, 3, 4, 5, 6, 7]]) >>> lines.save("lines.pickle")
>>> lines_reloaded = pept.LineData.load("lines.pickle")