tlseparation.classification package

Submodules

tlseparation.classification.classes_reference module

class tlseparation.classification.classes_reference.DefaultClass[source]

Defines a default reference class to be used in classification of tree point clouds.

tlseparation.classification.classify_wood module

tlseparation.classification.classify_wood.reference_classification(point_cloud, knn_list, n_classes=4, prob_threshold=0.95)[source]

Classifies wood material points from a point cloud. This function uses wlseparate_ref_voting to perform the basic classification and then apply class_filter to filter out potentially misclassified wood points.

Parameters:
point_cloud: numpy.ndarray

2D (n x 3) array containing n points in 3D space (x, y, z).

knn_list: list

List of knn values to be used iteratively in the voting separation.

n_classes: int

Number of intermediate classes. Minimum classes should be 3, but default value is set to 4 in order to accommodate for noise/outliers classes.

prob_threshold: float

Classification probability threshold to filter classes. This aims to avoid selecting points that are not confidently enough assigned to any given class. Default is 0.95.

Returns:
wood_points: numpy.ndarray

2D (nw x 3) array containing n wood points in 3D space (x, y, z).

tlseparation.classification.classify_wood.threshold_classification(point_cloud, knn, n_classes=3, prob_threshold=0.95)[source]

Classifies wood material points from a point cloud. This function uses wlseparate_abs to perform the basic classification and then apply class_filter to filter out potentially misclassified wood points.

Parameters:
point_cloud : numpy.ndarray

2D (n x 3) array containing n points in 3D space (x, y, z).

knn : int

Number of neighbors to select around each point. Used to describe local point arrangement.

n_classes: int

Number of intermediate classes. Default is 3.

prob_threshold: float

Classification probability threshold to filter classes. This aims to avoid selecting points that are not confidently enough assigned to any given class. Default is 0.95.

Returns:
wood_points: numpy.ndarray

2D (nw x 3) array containing n wood points in 3D space (x, y, z).

tlseparation.classification.gmm module

tlseparation.classification.gmm.class_select_abs(classes, cm, nbrs_idx, feature=5, threshold=0.5)[source]

Select from GMM classification results which classes are wood and which are leaf based on a absolute value threshold from a single feature in the parameter space.

Parameters:
classes : list or array

Classes labels for each observation from the input variables.

cm : array

N-dimensional array (c x n) of each class (c) parameter space mean valuess (n).

nbrs_idx : array

Nearest Neighbors indices relative to every point of the array that originated the classes labels.

feature : int

Column index of the feature to use as constraint.

threshold : float

Threshold value to mask classes. All classes with means >= threshold are masked as true.

Returns:
mask : list

List of booleans where True represents wood points and False represents leaf points.

tlseparation.classification.gmm.class_select_ref(classes, cm, classes_ref)[source]

Selects from the classification results which classes are wood and which are leaf.

Parameters:
classes : list

List of classes labels for each observation from the input variables.

cm : array

N-dimensional array (c x n) of each class (c) parameter space mean valuess (n).

classes_ref : array

Reference classes values.

Returns:
mask : array

List of booleans where True represents wood points and False represents leaf points.

tlseparation.classification.gmm.classify(variables, n_classes)[source]

Function to perform the classification of a dataset using sklearn’s Gaussian Mixture Models with Expectation Maximization.

Parameters:
variables : array

N-dimensional array (m x n) containing a set of parameters (n) over a set of observations (m).

n_classes : int

Number of classes to assign the input variables.

Returns:
classes : list

List of classes labels for each observation from the input variables.

means : array

N-dimensional array (c x n) of each class (c) parameter space means (n).

probability : array

Probability of samples belonging to every class in the classification. Sum of sample-wise probability should be 1.

tlseparation.classification.path_detection module

tlseparation.classification.path_detection.detect_main_pathways(point_cloud, k_retrace, knn, nbrs_threshold, verbose=False, max_iter=100)[source]

Detects the main pathways of an unordered 3D point cloud. Set as true all points detected as part of all detected pathways that down to the base of the graph.

Parameters:
point_cloud : array

Three-dimensional point cloud of a single tree to perform the wood-leaf separation. This should be a n-dimensional array (m x n) containing a set of coordinates (n) over a set of points (m).

k_retrace : int

Number of steps in the graph to retrace back to graph’s base. Every node in graph will be moved k_retrace steps from the extremities towards to base.

knn : int

Number of neighbors to fill gaps in detected paths. The larger the better. A large knn will increase memory usage. Recommended value between 50 and 150.

nbrs_threshold : float

Maximum distance to valid neighboring points used to fill gaps in detected paths.

verbose: bool

Option to set verbose on/off.

Returns:
path_mask : array

Boolean mask where ‘True’ represents points detected as part of the main pathways and ‘False’ represents points not part of the pathways.

Raises:
AssertionError:

point_cloud has the wrong shape or number of dimensions.

tlseparation.classification.path_detection.get_base(point_cloud, base_height)[source]

Get the base of a point cloud based on a certain height from the bottom.

Parameters:
point_cloud : array

Three-dimensional point cloud of a single tree to perform the wood-leaf separation. This should be a n-dimensional array (m x n) containing a set of coordinates (n) over a set of points (m).

base_height : float

Height of the base slice to mask.

Returns:
mask : array

Base slice masked as True.

tlseparation.classification.path_detection.path_detect_frequency(point_cloud, downsample_size, frequency_threshold)[source]

Detects points from major paths in a graph generated from a point cloud. The detection is performed by comparing the frequency of all paths that each node is present. Nodes with frequency larger than threshold are selected as detected. In order to fill pathways regions with low nodes density, neighboring points within downsampling_size * 1.5 distance are also set as detected.

Parameters:
point_cloud : numpy.ndarray

2D (n x 3) array containing n points in 3D space (x, y, z).

downsample_size : float

Distance threshold used to group (downsample) the input point cloud. Simplificaton of the cloud by downsampling, improves the results and processing times.

frequency_threshold : float

Minimum path frequency for a node to be selected as part of major pathways.

Returns:
path_points: numpy.ndarray

2D (np x 3) array containing n points in 3D space (x, y, z) that belongs to major pathways in the point cloud.

tlseparation.classification.path_detection.voxel_path_detection(point_cloud, voxel_size, k_retrace, knn, nbrs_threshold, verbose=False)[source]

Applies detect_main_pathways but with a voxelization option to speed up processing.

Parameters:
point_cloud : array

Three-dimensional point cloud of a single tree to perform the wood-leaf separation. This should be a n-dimensional array (m x n) containing a set of coordinates (n) over a set of points (m).

voxel_size: float

Voxel dimensions’ size.

k_retrace : int

Number of steps in the graph to retrace back to graph’s base. Every node in graph will be moved k_retrace steps from the extremities towards to base.

knn : int

Number of neighbors to fill gaps in detected paths. The larger the better. A large knn will increase memory usage. Recommended value between 50 and 150.

nbrs_threshold : float

Maximum distance to valid neighboring points used to fill gaps in detected paths.

verbose: bool

Option to set verbose on/off.

Returns:
path_mask : array

Boolean mask where ‘True’ represents points detected as part of the main pathways and ‘False’ represents points not part of the pathways.

Raises:
AssertionError:

point_cloud has the wrong shape or number of dimensions.

tlseparation.classification.point_features module

tlseparation.classification.point_features.calc_features(e)[source]

Calculates the geometric features using a set of eigenvalues, based on Ma et al. [1] and Wang et al. [2].

Parameters:
e : array

N-dimensional array (m x 3) containing sets of 3 eigenvalues per row (m).

Returns:
features : array

N-dimensional array (m x 6) containing the calculated geometric features from ‘e’.

References

[1]Ma et al., 2015. Improved Salient Feature-Based Approach for Automatically Separating Photosynthetic and Nonphotosynthetic Components Within Terrestrial Lidar Point Cloud Data of Forest Canopies.
[2]Wang et al., 2015. A Multiscale and Hierarchical Feature Extraction Method for Terrestrial Laser Scanning Point Cloud Classification.
tlseparation.classification.point_features.curvature(arr, nbrs_idx)[source]

Calculates pointwise curvature of a point cloud.

Parameters:
arr : array

Three-dimensional (m x n) array of a point cloud, where the coordinates are represented in the columns (n) and the points are represented in the rows (m).

nbr_idx : array

N-dimensional array of indices from a nearest neighbors search of the point cloud in ‘arr’, where the rows (m) represents the points in ‘arr’ and the columns represents the indices of the nearest neighbors from ‘arr’.

Returns:
c : numpy.ndarray

1D (m x 1) array containing the curvature of each point in ‘arr’.

tlseparation.classification.point_features.knn_evals(arr_stack)[source]

Calculates eigenvalues of a stack of arrays.

Parameters:
arr_stack : array

N-dimensional array (l x m x n) containing a stack of data, where the rows (m) represents the points coordinates, the columns (n) represents the axis coordinates and the layer (l) represents the stacks of points.

Returns:
evals : array

N-dimensional array (l x n) of eigenvalues calculated from ‘arr_stack’. The rows (l) represents the stack layers of points in ‘arr_stack’ and the columns (n) represent the parameters in ‘arr_stack’.

tlseparation.classification.point_features.knn_features(arr, nbr_idx, block_size=200000)[source]

Calculates geometric descriptors: salient features and tensor features from an array and an indexing with fixed numbers of neighbors.

Parameters:
arr : array

Three-dimensional (m x n) array of a point cloud, where the coordinates are represented in the columns (n) and the points are represented in the rows (m).

nbr_idx : array

N-dimensional array of indices from a nearest neighbors search of the point cloud in ‘arr’, where the rows (m) represents the points in ‘arr’ and the columns represents the indices of the nearest neighbors from ‘arr’.

Returns:
features : array

N-dimensional array (m x 6) of the calculated geometric descriptors. Where the rows (m) represent the points from ‘arr’ and the columns represents the features.

tlseparation.classification.point_features.svd_evals(arr)[source]

Calculates eigenvalues of an array using SVD.

Parameters:
arr : array

nxm numpy.ndarray where n is the number of samples and m is the number of dimensions.

Returns:
evals : array

1xm numpy.ndarray containing the calculated eigenvalues in decrescent order.

tlseparation.classification.point_features.vectorized_app(arr_stack)[source]

Function to calculate the covariance of a stack of arrays. This function uses einstein summation to make the covariance calculation more efficient. Based on a reply from the user Divakar [3] at stackoverflow.

Parameters:
arr_stack : array

N-dimensional array (l x m x n) containing a stack of data, where the rows (m) represents the points coordinates, the columns (n) represents the axis coordinates and the layer (l) represents the stacks of points.

Returns:
cov : array

N-dimensional array (l x n x n) of covariance values calculated from ‘arr_stack’. Each layer (l) contains a (n x n) covariance matrix calculated from the layers (l) in ‘arr_stack’.

References

[3]Divakar, 2016. http://stackoverflow.com/questions/35756952/quickly-compute-eigenvectors-for-each-element-of-an-array-in-python.

tlseparation.classification.wlseparation module

tlseparation.classification.wlseparation.fill_class(arr1, arr2, noclass, k)[source]

Assigns noclass entries to either arr1 or arr2, depending on neighborhood majority analisys.

Parameters:
arr1 : array

Point coordinates for entries of the first class.

arr2 : array

Point coordinates for entries of the second class.

noclass : array

Point coordinates for noclass entries.

k : int

Number of neighbors to use in the neighborhood majority analysis.

Returns:
arr1 : array

Point coordinates for entries of the first class.

arr2 : array

Point coordinates for entries of the second class.

tlseparation.classification.wlseparation.wlseparate_abs(arr, knn, knn_downsample=1, n_classes=3)[source]

Classifies a point cloud (arr) into three main classes, wood, leaf and noclass.

The final class selection is based on the absolute value of the last geometric feature (see point_features module). Points will be only classified as wood or leaf if their classification probability is higher than prob_threshold. Otherwise, points are assigned to noclass.

Class selection will mask points with feature value larger than a given threshold as wood and the remaining points as leaf.

Parameters:
arr : array

Three-dimensional point cloud of a single tree to perform the wood-leaf separation. This should be a n-dimensional array (m x n) containing a set of coordinates (n) over a set of points (m).

knn : int

Number of nearest neighbors to search to constitue the local subset of points around each point in ‘arr’.

knn_downsample : float

Downsample factor (0, 1) for the knn parameter. If less than 1, a sample of size (knn * knn_downsample) will be selected from the nearest neighbors indices. This option aims to maintain the spatial representation of the local subsets of points, but reducing overhead in memory and processing time.

n_classes : int

Number of classes to use in the Gaussian Mixture Classification.

Returns:
class_indices : dict

Dictionary containing indices for wood and leaf classes.

class_probability : dict

Dictionary containing probabilities for wood and leaf classes.

tlseparation.classification.wlseparation.wlseparate_ref_voting(arr, knn_lst, class_file, n_classes=3)[source]

Classifies a point cloud (arr) into two main classes, wood and leaf. Altough this function does not output a noclass category, it still filters out results based on classification confidence interval in the voting process (if lower than prob_threshold, then voting is not used for current point and knn value).

The final class selection is based a voting scheme applied to a similar approach of wlseparate_ref. In this case, the function iterates over a series of knn values and apply the reference distance criteria to select wood and leaf classes.

Each knn class result is accumulated in a list and in the end a voting is applied. For each point, if the number of times it was classified as wood is larger than threhsold, the final class is set to wood. Otherwise it is set as leaf.

Class selection will mask points according to their class mean distance to reference classes. The closes reference class gets assignes to each intermediate class.

Parameters:
arr : array

Three-dimensional point cloud of a single tree to perform the wood-leaf separation. This should be a n-dimensional array (m x n) containing a set of coordinates (n) over a set of points (m).

knn_lst : list

List of knn values to use in the search to constitue local subsets of points around each point in ‘arr’. It can be a single knn value, as long as it has list data type.

class_file : pandas dataframe or str

Dataframe or path to reference classes file.

n_classes : int

Number of classes to use in the Gaussian Mixture Classification.

Returns:
class_dict : dict

Dictionary containing indices for all classes in class_ref. Classes are labeled according to classes names in class_file.

count_dict : dict

Dictionary containin votes count for all classes in class_ref. Classes are labeled according to classes names in class_file.

prob_dict : dict

Dictionary containing probabilities for all classes in class_ref. Classes are labeled according to classes names in class_file.

Module contents