0 votes
1 view
in Machine Learning by (12.5k points)
As a newbie in Machine Learning, I have a set of trajectories that may be of different lengths. I wish to cluster them, because some of them are actually the same path and they just SEEM different due to the noise.

In addition, not all of them are of the same lengths. So maybe although Trajectory A is not the same as Trajectory B, yet it is part of Trajectory B. I wish to present this property after the clustering as well.

I have only a bit knowledge of K-means Clustering and Fuzzy N-means Clustering. How may I choose between them two? Or should I adopt other methods?

Any method that takes the "belongness" into consideration? (e.g. After the clustering, I have 3 clusters A, B and C. One particular trajectory X belongs to cluster A. And a shorter trajectory Y, although is not clustered in A, is identified as part of trajectory B.)

The aforementioned trajectories are the pedestrians' trajectories. They can be either presented as a series of (x, y) points or a series of step vectors (length, direction). The presentation form is under my control.

1 Answer

0 votes
by (32.8k points)
edited by

This is the best approach I have seen for clustering trajectories because:

  • Can discover common sub-trajectories.
  • Focuses on Segments instead of points (so it filters out noise-outliers).
  • It works over trajectories of different lengths.

Here is a 2 phase approach:

  1. Phase one - Partition: Divide trajectories into segments, this is done using MDL Optimization with complexity of O(n) where n is the numbers of points in a given trajectory. Here the input is a set of trajectories and output is a set of segments.

    • Complexity: O(n) where n is the number of points on a trajectory
    • Input: Set of trajectories.
    • Output: Set D of segments
  2. Phase two - Group: This phase discovers the clusters using some version of density-based clustering like in DBSCAN. Input in this phase is the set of segments obtained from phase one and some parameters of what constitutes a neighborhood and the minimum amount of lines that can constitute a cluster. The output is a set of clusters. Clustering is done over segments. They define their own distance measure made of 3 components: Parallel distance, perpendicular distance, and angular distance. This phase has a complexity of O(n log n) where n is the number of segments.

    • Complexity: O(n log n) where n is number of segments on set D
    • Input: Set D of segments, parameter E that sets neighborhood threshold and parameter MinLns that is the minimum number of lines.
    • Output: Set C of Cluster, that is a Cluster of segments (trajectories clustered).

Then calculate a for each cluster a representative trajectory, which is nothing else that a discovered common sub-trajectory in each cluster.

Hope this answer helps you!

If you want to learn K Means Clustering Algorithm then you can refer to the below video:

...