pept.tracking.Segregate#

class pept.tracking.Segregate(window, cut_distance, min_trajectory_size=5, max_time_interval=1.7976931348623157e+308)[source]#

Bases: Reducer

Segregate the intertwined points from multiple trajectories into individual paths.

Reducer signature:

      pept.PointData -> Segregate.fit -> pept.PointData
list[pept.PointData] -> Segregate.fit -> pept.PointData
       numpy.ndarray -> Segregate.fit -> pept.PointData

The points in point_data (a numpy array or pept.PointData) are used to construct a minimum spanning tree in which every point can only be connected to points_window points around it - this “window” refers to the points in the initial data array, sorted based on the time column; therefore, only points within a certain timeframe can be connected. All edges (or “connections”) in the minimum spanning tree that are larger than trajectory_cut_distance are removed (or “cut”) and the remaining connected “clusters” are deemed individual trajectories if they contain more than min_trajectory_size points.

The trajectory indices (or labels) are appended to point_data. That is, for each data point (i.e. row) in point_data, a label will be appended starting from 0 for the corresponding trajectory; a label of -1 represents noise. If point_data is a numpy array, a new numpy array is returned; if it is a pept.PointData instance, a new instance is returned.

This function uses single linkage clustering with a custom metric for spatio-temporal data to segregate trajectory points. The single linkage clustering was optimised for this use-case: points are only connected if they are within a certain points_window in the time-sorted input array. Sparse matrices are also used for minimising the memory footprint.

See also

Reconnect: Connect segregated trajectories based on tracer signatures.
PlotlyGrapher: Easy, publication-ready plotting of PEPT-oriented data.

Examples

A typical workflow would involve transforming LoRs into points using some tracking algorithm. These points include all tracers moving through the system, being intertwined (e.g. for two tracers A and B, the point_data array might have two entries for A, followed by three entries for B, then one entry for A, etc.). They can be segregated based on position alone using this function; take for example two tracers that go downwards (below, ‘x’ is the position, and in parens is the array index at which that point is found).

`points`, numpy.ndarray, shape (10, 4), columns [time, x, y, z]:
    x (1)                       x (2)
     x (3)                     x (4)
       x (5)                 x (7)
       x (6)                x (9)
      x (8)                 x (10)

>>> import pept.tracking.trajectory_separation as tsp
>>> points_window = 10
>>> trajectory_cut_distance = 15    # mm
>>> segregated_trajectories = tsp.segregate_trajectories(
>>>     points, points_window, trajectory_cut_distance
>>> )

`segregated_trajectories`, numpy.ndarray, shape (10, 5),
columns [time, x, y, z, trajectory_label]:
    x (1, label = 0)            x (2, label = 1)
     x (3, label = 0)          x (4, label = 1)
       x (5, label = 0)      x (7, label = 1)
       x (6, label = 0)     x (9, label = 1)
      x (8, label = 0)      x (10, label = 1)

Attributes

windowint: Two points are “reachable” (i.e. they can be connected) if and only if they are within points_window in the time-sorted input point_data. As the points from different trajectories are intertwined (e.g. for two tracers A and B, the point_data array might have two entries for A, followed by three entries for B, then one entry for A, etc.), this should optimally be the largest number of points in the input array between two consecutive points on the same trajectory. If points_window is too small, all points in the dataset will be unreachable. Naturally, a larger time_window correponds to more pairs needing to be checked (and the function will take a longer to complete).
cut_distancefloat: Once all the closest points are connected (i.e. the minimum spanning tree is constructed), separate all trajectories that are further apart than trajectory_cut_distance.
min_trajectory_sizefloat, default 5: After the trajectories have been cut, declare all trajectories with fewer points than min_trajectory_size as noise.
max_time_intervalfloat, default np.finfo(float):obj:.max: Only connect points if the time difference between their timestamps is smaller than max_time_interval. Setting added in pept-0.5.2.

__init__(window, cut_distance, min_trajectory_size=5, max_time_interval=1.7976931348623157e+308)[source]#

Methods

`__init__`(window, cut_distance[, ...])
`copy`([deep])	Create a deep copy of an instance of this class, including all inner attributes.
`fit`(points)
`load`(filepath)	Load a saved / pickled PEPTObject object from filepath.
`save`(filepath)	Save a PEPTObject instance as a binary pickle object.

fit(points)[source]#

copy(deep=True)#: Create a deep copy of an instance of this class, including all inner attributes.

static load(filepath)#

Load a saved / pickled PEPTObject object from filepath.

Most often the full object state was saved using the .save method.

Parameters

filepathfilename or file handle: If filepath is a path (rather than file handle), it is relative to where python is called.

Returns

pept.PEPTObject subclass instance: The loaded object.

Examples

Save a LineData instance, then load it back:

>>> lines = pept.LineData([[1, 2, 3, 4, 5, 6, 7]])
>>> lines.save("lines.pickle")

>>> lines_reloaded = pept.LineData.load("lines.pickle")

save(filepath)#

Save a PEPTObject instance as a binary pickle object.

Saves the full object state, including inner attributes, in a portable binary format. Load back the object using the load method.

Parameters

filepathfilename or file handle: If filepath is a path (rather than file handle), it is relative to where python is called.

Examples

Save a LineData instance, then load it back:

>>> lines = pept.LineData([[1, 2, 3, 4, 5, 6, 7]])
>>> lines.save("lines.pickle")

>>> lines_reloaded = pept.LineData.load("lines.pickle")