tlseparation.classification package¶
Submodules¶
tlseparation.classification.classes_reference module¶
tlseparation.classification.classify_wood module¶
-
tlseparation.classification.classify_wood.
reference_classification
(point_cloud, knn_list, n_classes=4, prob_threshold=0.95)[source]¶ Classifies wood material points from a point cloud. This function uses wlseparate_ref_voting to perform the basic classification and then apply class_filter to filter out potentially misclassified wood points.
Parameters: - point_cloud: numpy.ndarray
2D (n x 3) array containing n points in 3D space (x, y, z).
- knn_list: list
List of knn values to be used iteratively in the voting separation.
- n_classes: int
Number of intermediate classes. Minimum classes should be 3, but default value is set to 4 in order to accommodate for noise/outliers classes.
- prob_threshold: float
Classification probability threshold to filter classes. This aims to avoid selecting points that are not confidently enough assigned to any given class. Default is 0.95.
Returns: - wood_points: numpy.ndarray
2D (nw x 3) array containing n wood points in 3D space (x, y, z).
-
tlseparation.classification.classify_wood.
threshold_classification
(point_cloud, knn, n_classes=3, prob_threshold=0.95)[source]¶ Classifies wood material points from a point cloud. This function uses wlseparate_abs to perform the basic classification and then apply class_filter to filter out potentially misclassified wood points.
Parameters: - point_cloud : numpy.ndarray
2D (n x 3) array containing n points in 3D space (x, y, z).
- knn : int
Number of neighbors to select around each point. Used to describe local point arrangement.
- n_classes: int
Number of intermediate classes. Default is 3.
- prob_threshold: float
Classification probability threshold to filter classes. This aims to avoid selecting points that are not confidently enough assigned to any given class. Default is 0.95.
Returns: - wood_points: numpy.ndarray
2D (nw x 3) array containing n wood points in 3D space (x, y, z).
tlseparation.classification.gmm module¶
-
tlseparation.classification.gmm.
class_select_abs
(classes, cm, nbrs_idx, feature=5, threshold=0.5)[source]¶ Select from GMM classification results which classes are wood and which are leaf based on a absolute value threshold from a single feature in the parameter space.
Parameters: - classes : list or array
Classes labels for each observation from the input variables.
- cm : array
N-dimensional array (c x n) of each class (c) parameter space mean valuess (n).
- nbrs_idx : array
Nearest Neighbors indices relative to every point of the array that originated the classes labels.
- feature : int
Column index of the feature to use as constraint.
- threshold : float
Threshold value to mask classes. All classes with means >= threshold are masked as true.
Returns: - mask : list
List of booleans where True represents wood points and False represents leaf points.
-
tlseparation.classification.gmm.
class_select_ref
(classes, cm, classes_ref)[source]¶ Selects from the classification results which classes are wood and which are leaf.
Parameters: - classes : list
List of classes labels for each observation from the input variables.
- cm : array
N-dimensional array (c x n) of each class (c) parameter space mean valuess (n).
- classes_ref : array
Reference classes values.
Returns: - mask : array
List of booleans where True represents wood points and False represents leaf points.
-
tlseparation.classification.gmm.
classify
(variables, n_classes)[source]¶ Function to perform the classification of a dataset using sklearn’s Gaussian Mixture Models with Expectation Maximization.
Parameters: - variables : array
N-dimensional array (m x n) containing a set of parameters (n) over a set of observations (m).
- n_classes : int
Number of classes to assign the input variables.
Returns: - classes : list
List of classes labels for each observation from the input variables.
- means : array
N-dimensional array (c x n) of each class (c) parameter space means (n).
- probability : array
Probability of samples belonging to every class in the classification. Sum of sample-wise probability should be 1.
tlseparation.classification.path_detection module¶
-
tlseparation.classification.path_detection.
detect_main_pathways
(point_cloud, k_retrace, knn, nbrs_threshold, verbose=False, max_iter=100)[source]¶ Detects the main pathways of an unordered 3D point cloud. Set as true all points detected as part of all detected pathways that down to the base of the graph.
Parameters: - point_cloud : array
Three-dimensional point cloud of a single tree to perform the wood-leaf separation. This should be a n-dimensional array (m x n) containing a set of coordinates (n) over a set of points (m).
- k_retrace : int
Number of steps in the graph to retrace back to graph’s base. Every node in graph will be moved k_retrace steps from the extremities towards to base.
- knn : int
Number of neighbors to fill gaps in detected paths. The larger the better. A large knn will increase memory usage. Recommended value between 50 and 150.
- nbrs_threshold : float
Maximum distance to valid neighboring points used to fill gaps in detected paths.
- verbose: bool
Option to set verbose on/off.
Returns: - path_mask : array
Boolean mask where ‘True’ represents points detected as part of the main pathways and ‘False’ represents points not part of the pathways.
Raises: - AssertionError:
point_cloud has the wrong shape or number of dimensions.
-
tlseparation.classification.path_detection.
get_base
(point_cloud, base_height)[source]¶ Get the base of a point cloud based on a certain height from the bottom.
Parameters: - point_cloud : array
Three-dimensional point cloud of a single tree to perform the wood-leaf separation. This should be a n-dimensional array (m x n) containing a set of coordinates (n) over a set of points (m).
- base_height : float
Height of the base slice to mask.
Returns: - mask : array
Base slice masked as True.
-
tlseparation.classification.path_detection.
path_detect_frequency
(point_cloud, downsample_size, frequency_threshold)[source]¶ Detects points from major paths in a graph generated from a point cloud. The detection is performed by comparing the frequency of all paths that each node is present. Nodes with frequency larger than threshold are selected as detected. In order to fill pathways regions with low nodes density, neighboring points within downsampling_size * 1.5 distance are also set as detected.
Parameters: - point_cloud : numpy.ndarray
2D (n x 3) array containing n points in 3D space (x, y, z).
- downsample_size : float
Distance threshold used to group (downsample) the input point cloud. Simplificaton of the cloud by downsampling, improves the results and processing times.
- frequency_threshold : float
Minimum path frequency for a node to be selected as part of major pathways.
Returns: - path_points: numpy.ndarray
2D (np x 3) array containing n points in 3D space (x, y, z) that belongs to major pathways in the point cloud.
-
tlseparation.classification.path_detection.
voxel_path_detection
(point_cloud, voxel_size, k_retrace, knn, nbrs_threshold, verbose=False)[source]¶ Applies detect_main_pathways but with a voxelization option to speed up processing.
Parameters: - point_cloud : array
Three-dimensional point cloud of a single tree to perform the wood-leaf separation. This should be a n-dimensional array (m x n) containing a set of coordinates (n) over a set of points (m).
- voxel_size: float
Voxel dimensions’ size.
- k_retrace : int
Number of steps in the graph to retrace back to graph’s base. Every node in graph will be moved k_retrace steps from the extremities towards to base.
- knn : int
Number of neighbors to fill gaps in detected paths. The larger the better. A large knn will increase memory usage. Recommended value between 50 and 150.
- nbrs_threshold : float
Maximum distance to valid neighboring points used to fill gaps in detected paths.
- verbose: bool
Option to set verbose on/off.
Returns: - path_mask : array
Boolean mask where ‘True’ represents points detected as part of the main pathways and ‘False’ represents points not part of the pathways.
Raises: - AssertionError:
point_cloud has the wrong shape or number of dimensions.
tlseparation.classification.point_features module¶
-
tlseparation.classification.point_features.
calc_features
(e)[source]¶ Calculates the geometric features using a set of eigenvalues, based on Ma et al. [1] and Wang et al. [2].
Parameters: - e : array
N-dimensional array (m x 3) containing sets of 3 eigenvalues per row (m).
Returns: - features : array
N-dimensional array (m x 6) containing the calculated geometric features from ‘e’.
References
[1] Ma et al., 2015. Improved Salient Feature-Based Approach for Automatically Separating Photosynthetic and Nonphotosynthetic Components Within Terrestrial Lidar Point Cloud Data of Forest Canopies. [2] Wang et al., 2015. A Multiscale and Hierarchical Feature Extraction Method for Terrestrial Laser Scanning Point Cloud Classification.
-
tlseparation.classification.point_features.
curvature
(arr, nbrs_idx)[source]¶ Calculates pointwise curvature of a point cloud.
Parameters: - arr : array
Three-dimensional (m x n) array of a point cloud, where the coordinates are represented in the columns (n) and the points are represented in the rows (m).
- nbr_idx : array
N-dimensional array of indices from a nearest neighbors search of the point cloud in ‘arr’, where the rows (m) represents the points in ‘arr’ and the columns represents the indices of the nearest neighbors from ‘arr’.
Returns: - c : numpy.ndarray
1D (m x 1) array containing the curvature of each point in ‘arr’.
-
tlseparation.classification.point_features.
knn_evals
(arr_stack)[source]¶ Calculates eigenvalues of a stack of arrays.
Parameters: - arr_stack : array
N-dimensional array (l x m x n) containing a stack of data, where the rows (m) represents the points coordinates, the columns (n) represents the axis coordinates and the layer (l) represents the stacks of points.
Returns: - evals : array
N-dimensional array (l x n) of eigenvalues calculated from ‘arr_stack’. The rows (l) represents the stack layers of points in ‘arr_stack’ and the columns (n) represent the parameters in ‘arr_stack’.
-
tlseparation.classification.point_features.
knn_features
(arr, nbr_idx, block_size=200000)[source]¶ Calculates geometric descriptors: salient features and tensor features from an array and an indexing with fixed numbers of neighbors.
Parameters: - arr : array
Three-dimensional (m x n) array of a point cloud, where the coordinates are represented in the columns (n) and the points are represented in the rows (m).
- nbr_idx : array
N-dimensional array of indices from a nearest neighbors search of the point cloud in ‘arr’, where the rows (m) represents the points in ‘arr’ and the columns represents the indices of the nearest neighbors from ‘arr’.
Returns: - features : array
N-dimensional array (m x 6) of the calculated geometric descriptors. Where the rows (m) represent the points from ‘arr’ and the columns represents the features.
-
tlseparation.classification.point_features.
svd_evals
(arr)[source]¶ Calculates eigenvalues of an array using SVD.
Parameters: - arr : array
nxm numpy.ndarray where n is the number of samples and m is the number of dimensions.
Returns: - evals : array
1xm numpy.ndarray containing the calculated eigenvalues in decrescent order.
-
tlseparation.classification.point_features.
vectorized_app
(arr_stack)[source]¶ Function to calculate the covariance of a stack of arrays. This function uses einstein summation to make the covariance calculation more efficient. Based on a reply from the user Divakar [3] at stackoverflow.
Parameters: - arr_stack : array
N-dimensional array (l x m x n) containing a stack of data, where the rows (m) represents the points coordinates, the columns (n) represents the axis coordinates and the layer (l) represents the stacks of points.
Returns: - cov : array
N-dimensional array (l x n x n) of covariance values calculated from ‘arr_stack’. Each layer (l) contains a (n x n) covariance matrix calculated from the layers (l) in ‘arr_stack’.
References
[3] Divakar, 2016. http://stackoverflow.com/questions/35756952/quickly-compute-eigenvectors-for-each-element-of-an-array-in-python.
tlseparation.classification.wlseparation module¶
-
tlseparation.classification.wlseparation.
fill_class
(arr1, arr2, noclass, k)[source]¶ Assigns noclass entries to either arr1 or arr2, depending on neighborhood majority analisys.
Parameters: - arr1 : array
Point coordinates for entries of the first class.
- arr2 : array
Point coordinates for entries of the second class.
- noclass : array
Point coordinates for noclass entries.
- k : int
Number of neighbors to use in the neighborhood majority analysis.
Returns: - arr1 : array
Point coordinates for entries of the first class.
- arr2 : array
Point coordinates for entries of the second class.
-
tlseparation.classification.wlseparation.
wlseparate_abs
(arr, knn, knn_downsample=1, n_classes=3)[source]¶ Classifies a point cloud (arr) into three main classes, wood, leaf and noclass.
The final class selection is based on the absolute value of the last geometric feature (see point_features module). Points will be only classified as wood or leaf if their classification probability is higher than prob_threshold. Otherwise, points are assigned to noclass.
Class selection will mask points with feature value larger than a given threshold as wood and the remaining points as leaf.
Parameters: - arr : array
Three-dimensional point cloud of a single tree to perform the wood-leaf separation. This should be a n-dimensional array (m x n) containing a set of coordinates (n) over a set of points (m).
- knn : int
Number of nearest neighbors to search to constitue the local subset of points around each point in ‘arr’.
- knn_downsample : float
Downsample factor (0, 1) for the knn parameter. If less than 1, a sample of size (knn * knn_downsample) will be selected from the nearest neighbors indices. This option aims to maintain the spatial representation of the local subsets of points, but reducing overhead in memory and processing time.
- n_classes : int
Number of classes to use in the Gaussian Mixture Classification.
Returns: - class_indices : dict
Dictionary containing indices for wood and leaf classes.
- class_probability : dict
Dictionary containing probabilities for wood and leaf classes.
-
tlseparation.classification.wlseparation.
wlseparate_ref_voting
(arr, knn_lst, class_file, n_classes=3)[source]¶ Classifies a point cloud (arr) into two main classes, wood and leaf. Altough this function does not output a noclass category, it still filters out results based on classification confidence interval in the voting process (if lower than prob_threshold, then voting is not used for current point and knn value).
The final class selection is based a voting scheme applied to a similar approach of wlseparate_ref. In this case, the function iterates over a series of knn values and apply the reference distance criteria to select wood and leaf classes.
Each knn class result is accumulated in a list and in the end a voting is applied. For each point, if the number of times it was classified as wood is larger than threhsold, the final class is set to wood. Otherwise it is set as leaf.
Class selection will mask points according to their class mean distance to reference classes. The closes reference class gets assignes to each intermediate class.
Parameters: - arr : array
Three-dimensional point cloud of a single tree to perform the wood-leaf separation. This should be a n-dimensional array (m x n) containing a set of coordinates (n) over a set of points (m).
- knn_lst : list
List of knn values to use in the search to constitue local subsets of points around each point in ‘arr’. It can be a single knn value, as long as it has list data type.
- class_file : pandas dataframe or str
Dataframe or path to reference classes file.
- n_classes : int
Number of classes to use in the Gaussian Mixture Classification.
Returns: - class_dict : dict
Dictionary containing indices for all classes in class_ref. Classes are labeled according to classes names in class_file.
- count_dict : dict
Dictionary containin votes count for all classes in class_ref. Classes are labeled according to classes names in class_file.
- prob_dict : dict
Dictionary containing probabilities for all classes in class_ref. Classes are labeled according to classes names in class_file.