MachineLearning/ManifoldLearning

Manifold Learning

Several usual manifold learning are available here :

  • PCA
  • Isomap
  • LLE
  • Laplacian Eigenmaps
  • Hessian Eigenmaps
  • Diffusion Maps
  • CCA (Curvilinear Component Analysis)
  • NonLinear Mapping (Sammond)
  • ...

If you use this package and you publish an article, please consider citing :

@Article{
  Author         = {Brucher, Matthieu and Heinrich, Christian and Heitz, Fabrice and Armspach, Jean-Paul},
  Title          = {A metric multidimensional scaling-based manifold learning approach for unsupervised data reduction},
  Journal        = {EURASIP Journal of Advances in Signal Processing},
  month          = mar,
  year           = 2008
}

Here is a link to the article: http://dx.doi.org/10.1155/2008/862015

Tutorial

How to use the package ?

Dimensionality reduction

Every dimensionality reduction function is located in scikits.learn.machine.manifold_learning.compression

from scikits.learn.machine.manifold_learning.compression import isomap
coords = isomap(samples, 2)

Here are the available functions :

  • PCA,
  • isomap,
  • LLE,
  • NLM (NonLinear Mapping from Sammond with Euclidien distances),
  • geodesicNLM (NonLinear Mapping from Sammond with approximated geodesic distances),
  • isomapCompression (minimizes explicitely the Isomap cost function, uses the metric MDS),
  • laplacianEigenmap,
  • diffusionMap,
  • hessianEigenmap,
  • ccaCompression (minimizes the CCA cost function, uses the metric MDS),
  • robustCompression (minimizes a robust cost function, cf the article in EURASIP JASA, uses the metric MDS),
  • robustMultiresolutionCompression (minimizes a robust cost function, cf the article in EURASIP JASA, uses the metric MDS).

These functions have a set of common parameters :

  • samples is the array of original samples (shape : nbsamples * size),
  • nb_coords is the number of dimension of the reduced space,
  • neighbors is the number of neighbors to look for (default = 9),
  • neigh is the tool to get the nearest neighbors (default = distances.kneigh),
  • parameter is the size of the kernel in similarity computations (default = .5),

CCA :

  • max_dist is the cutoff (in percentage of the distances) for CCA cost function (default = 5),

Robust cost function :

  • epsilon is a small value to ensure derivability (default = 0.0000001),
  • sigma is the cutoff (in percentage of the distances) for small distances (default = 1),
  • tau is the cutoff (in percentage of the distances) for discrepancies in estimated distances (default = 60),

The generic optimization framework is used for the Compression algorithms.

Multidimensional Regression

A regression is the process of creating a mapping between the coordinates that were just created and the original points :

from scikits.learn.machine.manifold_learning.regression import MLPLMR
model = MLPLMR(samples, coords)
model.learn()

These are different available regression classes:

  • PCA, which uses a simple affine model for the mapping,
  • PLMR that will learn several models as long as points are not labeled,
  • MLPLMR, which tries to maximize an information criterion (AIC here) with the likelihood of the model,
  • CPLMR, a bloc-wise MLPLMR, pretty much experimental at this point.

These classes have a set of common arguments :

  • samples is the array of original samples (shape : nbsamples * size),
  • coords is the coordinates in the reduced space,
  • neighbors is the number of neighbors to look for (default = 9) when creating a new model (not needed for PCA),

Once the model is learned, a set of equations is set and a belonging_vector is created. Besides, a random_variable models the noise.

Projection

There are two main types of projections : those who can use the fact that we use piecewise functions and those who can't and must test the likelihood on a grid.

  • MLProjection projects on a PLMR with a maximum likelihood approach
  • MAPProjection projects on a PLMR with a maximum a posteriori approach
  • GridMLProjection projects on a custom model with a maximum likelihood approach
  • GridMAPProjection projects on a custom model with a maximum a posteriori approach

You create the projection with the model as the only constructor parameter. Then, use project(point) to project a new point. This function returns:

  • the coordinates of the point (with a trailing 1)
  • the projection on the manifold
  • the likelihood of the point

The grid projection needs the method named get_log_likelihood or get_MAP (depending on the kind of cost you want) and the method get_point that will take some coordinates as only parameter and return the associated point on the manifold.

How to extend this package : Interfaces

In order to provide more manifold learning algorithms, here is the interface of the current manifold learning package.

Dimensionality reduction

There are three ways of generically expanding the dimensionality reduction section:

  • using a new cost function (for the MDS framework)
  • adding new kernels for Laplacian Eigenmaps or Diffusion maps
  • proposing new neighbors algorithms to select the tangent space

Optimizing a cost function

The reduction main function is reduct(reduction, function, samples, nb_coords, **kwargs). It allows the choice of a specific reduction technique/function:

  • dimensionality_reduction.optimize_cost_function uses a Fleetcher-Reeves-modified Polak-Ribere-Polyak conjugate gradient and a String-Wolfe-Powell rule as a line search
  • quadratic_dimensionality_reduction.optimize_cost_function uses a Newton step with no line search
  • robust_dimensionality_reduction.optimize_cost_function uses a gradient step and an exact Fibonacci line search, before each iteration a small amount of noise is added to the result
  • multiresolution_dimensionality_reduction.optimize_cost_function uses a multiresolution approach to find an optimum to the function

Each time, the reduction function must have the following signature: (distances, function, nb_coords, **kwargs). The **kwargs will be passed to the function constructor. No function instance must be passed here, only the function class.

The generic optimization framework is used for these optimizations.

The signature of the cost function constructor must be: (self, distances, nb_coords, **kwargs). The distances argument is the array containing the estimated distances to preserve and nb_coords is the number of coordinates of the reduced space.

Similarities kernels

The kernels used in the Laplacian Eigenmaps can be modified. Laplacian Eigenmaps and Diffusion Maps call the function laplacian_map(samples, nbCoords, method= ??? , **kwargs). You can change the method by providing a new kernel.

The kernel function arguments are (samples, **kwargs), samples being the array with the samples in the original space. A sparse matrix can be returned.

Neighbors selection

The neigh argument specifies a class that will be called with a point indice and return a list of adjacent points. This can be used to robustly choose the neighbors for each technique that uses neighbors (geodsic-distance-based algorithms, Laplacian Eigenmaps, LLE, ...).

The class instance will be built with the arguments (samples, **kwargs). Then the instance will be called with an indice i and return a list with the indices of the points near i. The actual Parzen-window and K-neighbors is implemented as a function that returns a list of lists of indices, both solutions are valid.

Multidimensional Regression

A model must have:

  • a learn() method to train the model
  • equations, a sequence of linear models (arrays)
  • belonging_vector, a sequence of ints indicating which points is assigned to which model
  • RBFF is a sequence of the same length than equations, indicating the likelihood of a point in the reduced space to belong to a specific mdoel
  • random_variable is the error variable.

Projection

Well, I don't have much to say here ;)

Where can I help?

If you want to help enhancing this package, there are some algorithms to implement: - the Nyström extension for projection without a regression (but with the original points) - other ways of selecting neighbors (neighbors that lie near the tangent space for instance) - a Dijkstra algorithm for computing the geodesic distances (instead of the Floyd Warshall algorithm) - better approximation of the geodesic distance (see Improving geodesic distance estimation based on locally linear assumption for a starter) - ...