API and modules

ot:

backend

Multi-lib backend for POT

bregman

Solvers related to Bregman projections for entropic regularized OT

coot

CO-Optimal Transport solver

da

Domain adaptation with optimal transport

datasets

Simple example datasets

dr

Dimension reduction with OT

factored

Factored OT solvers (low rank, cost or OT plan)

gaussian

Optimal transport for Gaussian distributions

gmm

Optimal transport for Gaussian Mixtures

gnn

Layers and functions for optimal transport in Graph Neural Networks.

gromov

Solvers related to Gromov-Wasserstein problems.

lowrank

Low rank OT solvers

lp

Solvers for the original linear program OT problem.

mapping

Optimal Transport maps and variants

optim

Generic solvers for regularized OT or its semi-relaxed version.

partial

Partial OT solvers

plot

Functions for plotting OT matrices

regpath

Regularization path OT solvers

sliced

Sliced OT Distances

smooth

Smooth and Sparse (KL an L2 reg.) and sparsity-constrained OT solvers.

stochastic

Stochastic solvers for regularized OT.

unbalanced

Solvers related to Unbalanced Optimal Transport problems.

utils

Various useful functions

weak

Weak optimal ransport solvers

Main ot functions

Warning

The list of automatically imported sub-modules is as follows: ot.lp, ot.bregman, ot.optim ot.utils, ot.datasets, ot.gromov, ot.smooth ot.stochastic, ot.partial, ot.regpath , ot.unbalanced, ot.mapping . The following sub-modules are not imported due to additional dependencies: - ot.dr : depends on pymanopt and autograd. - ot.plot : depends on matplotlib

ot.barycenter(A, M, reg, weights=None, method='sinkhorn', numItermax=10000, stopThr=0.0001, verbose=False, log=False, warn=True, **kwargs)[source]

Compute the entropic regularized wasserstein barycenter of distributions \(\mathbf{A}\)

The function solves the following optimization problem:

\[\mathbf{a} = \mathop{\arg \min}_\mathbf{a} \quad \sum_i W_{reg}(\mathbf{a},\mathbf{a}_i)\]

where :

  • \(W_{reg}(\cdot,\cdot)\) is the entropic regularized Wasserstein distance (see ot.bregman.sinkhorn()) if method is sinkhorn or sinkhorn_stabilized or sinkhorn_log.

  • \(\mathbf{a}_i\) are training distributions in the columns of matrix \(\mathbf{A}\)

  • reg and \(\mathbf{M}\) are respectively the regularization term and the cost matrix for OT

The algorithm used for solving the problem is the Sinkhorn-Knopp matrix scaling algorithm as proposed in [3]

Parameters:
  • A (array-like, shape (dim, n_hists)) – n_hists training distributions \(\mathbf{a}_i\) of size dim

  • M (array-like, shape (dim, dim)) – loss matrix for OT

  • reg (float) – Regularization term > 0

  • method (str (optional)) – method used for the solver either ‘sinkhorn’ or ‘sinkhorn_stabilized’ or ‘sinkhorn_log’

  • weights (array-like, shape (n_hists,)) – Weights of each histogram \(\mathbf{a}_i\) on the simplex (barycentric coordinates)

  • numItermax (int, optional) – Max number of iterations

  • stopThr (float, optional) – Stop threshold on error (>0)

  • verbose (bool, optional) – Print information along iterations

  • log (bool, optional) – record log if True

  • warn (bool, optional) – if True, raises a warning if the algorithm doesn’t convergence.

Returns:

  • a ((dim,) array-like) – Wasserstein barycenter

  • log (dict) – log dictionary return only if log==True in parameters

References

ot.barycenter_unbalanced(A, M, reg, reg_m, method='sinkhorn', weights=None, numItermax=1000, stopThr=1e-06, verbose=False, log=False, **kwargs)[source]

Compute the entropic unbalanced wasserstein barycenter of \(\mathbf{A}\).

The function solves the following optimization problem with \(\mathbf{a}\)

\[\mathbf{a} = \mathop{\arg \min}_\mathbf{a} \quad \sum_i W_{u_{reg}}(\mathbf{a},\mathbf{a}_i)\]

where :

  • \(W_{u_{reg}}(\cdot,\cdot)\) is the unbalanced entropic regularized Wasserstein distance (see ot.unbalanced.sinkhorn_unbalanced())

  • \(\mathbf{a}_i\) are training distributions in the columns of matrix \(\mathbf{A}\)

  • reg and \(\mathbf{M}\) are respectively the regularization term and the cost matrix for OT

  • reg_mis the marginal relaxation hyperparameter

The algorithm used for solving the problem is the generalized Sinkhorn-Knopp matrix scaling algorithm as proposed in [10]

Parameters:
  • A (array-like, shape (dim, n_hists)) – n_hists training distributions \(\mathbf{a}_i\) of dimension dim

  • M (array-like, shape (dim, dim)) – ground metric matrix for OT.

  • reg (float) – Entropy regularization term > 0

  • reg_m (float) – Marginal relaxation term > 0

  • weights (array-like, shape (n_hists,) optional) – Weight of each distribution (barycentric coordinates) If None, uniform weights are used.

  • numItermax (int, optional) – Max number of iterations

  • stopThr (float, optional) – Stop threshold on error (> 0)

  • verbose (bool, optional) – Print information along iterations

  • log (bool, optional) – record log if True

Returns:

  • a (array-like, shape (dim,)) – Unbalanced Wasserstein barycenter

  • log (dict) – log dictionary return only if log==True in parameters

References

ot.binary_search_circle(u_values, v_values, u_weights=None, v_weights=None, p=1, Lm=10, Lp=10, tm=-1, tp=1, eps=1e-06, require_sort=True, log=False)[source]

Computes the Wasserstein distance on the circle using the Binary search algorithm proposed in [44]. Samples need to be in \(S^1\cong [0,1[\). If they are on \(\mathbb{R}\), takes the value modulo 1. If the values are on \(S^1\subset\mathbb{R}^2\), it is required to first find the coordinates using e.g. the atan2 function.

\[W_p^p(u,v) = \inf_{\theta\in\mathbb{R}}\int_0^1 |F_u^{-1}(q) - (F_v-\theta)^{-1}(q)|^p\ \mathrm{d}q\]

where:

  • \(F_u\) and \(F_v\) are respectively the cdfs of \(u\) and \(v\)

For values \(x=(x_1,x_2)\in S^1\), it is required to first get their coordinates with

\[u = \frac{\pi + \mathrm{atan2}(-x_2,-x_1)}{2\pi}\]

using e.g. ot.utils.get_coordinate_circle(x)

The function runs on backend but tensorflow and jax are not supported.

Parameters:
  • u_values (ndarray, shape (n, ...)) – samples in the source domain (coordinates on [0,1[)

  • v_values (ndarray, shape (n, ...)) – samples in the target domain (coordinates on [0,1[)

  • u_weights (ndarray, shape (n, ...), optional) – samples weights in the source domain

  • v_weights (ndarray, shape (n, ...), optional) – samples weights in the target domain

  • p (float, optional (default=1)) – Power p used for computing the Wasserstein distance

  • Lm (int, optional) – Lower bound dC

  • Lp (int, optional) – Upper bound dC

  • tm (float, optional) – Lower bound theta

  • tp (float, optional) – Upper bound theta

  • eps (float, optional) – Stopping condition

  • require_sort (bool, optional) – If True, sort the values.

  • log (bool, optional) – If True, returns also the optimal theta

Returns:

  • loss (float) – Cost associated to the optimal transportation

  • log (dict, optional) – log dictionary returned only if log==True in parameters

Examples

>>> u = np.array([[0.2,0.5,0.8]])%1
>>> v = np.array([[0.4,0.5,0.7]])%1
>>> binary_search_circle(u.T, v.T, p=1)
array([0.1])

References

ot.dist(x1, x2=None, metric='sqeuclidean', p=2, w=None)[source]

Compute distance between samples in \(\mathbf{x_1}\) and \(\mathbf{x_2}\)

Note

This function is backend-compatible and will work on arrays from all compatible backends.

Parameters:
  • x1 (array-like, shape (n1,d)) – matrix with n1 samples of size d

  • x2 (array-like, shape (n2,d), optional) – matrix with n2 samples of size d (if None then \(\mathbf{x_2} = \mathbf{x_1}\))

  • metric (str | callable, optional) – ‘sqeuclidean’ or ‘euclidean’ on all backends. On numpy the function also accepts from the scipy.spatial.distance.cdist function : ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘kulczynski1’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘wminkowski’, ‘yule’.

  • p (float, optional) – p-norm for the Minkowski and the Weighted Minkowski metrics. Default value is 2.

  • w (array-like, rank 1) – Weights for the weighted metrics.

Returns:

M – distance matrix computed with given metric

Return type:

array-like, shape (n1, n2)

ot.emd(a, b, M, numItermax=100000, log=False, center_dual=True, numThreads=1, check_marginals=True)[source]

Solves the Earth Movers distance problem and returns the OT matrix

\[ \begin{align}\begin{aligned}\gamma = \mathop{\arg \min}_\gamma \quad \langle \gamma, \mathbf{M} \rangle_F\\s.t. \ \gamma \mathbf{1} = \mathbf{a}\\ \gamma^T \mathbf{1} = \mathbf{b}\\ \gamma \geq 0\end{aligned}\end{align} \]

where :

  • \(\mathbf{M}\) is the metric cost matrix

  • \(\mathbf{a}\) and \(\mathbf{b}\) are the sample weights

Warning

Note that the \(\mathbf{M}\) matrix in numpy needs to be a C-order numpy.array in float64 format. It will be converted if not in this format

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

Note

This function will cast the computed transport plan to the data type of the provided input with the following priority: \(\mathbf{a}\), then \(\mathbf{b}\), then \(\mathbf{M}\) if marginals are not provided. Casting to an integer tensor might result in a loss of precision. If this behaviour is unwanted, please make sure to provide a floating point input.

Note

An error will be raised if the vectors \(\mathbf{a}\) and \(\mathbf{b}\) do not sum to the same value.

Uses the algorithm proposed in [1].

Parameters:
  • a ((ns,) array-like, float) – Source histogram (uniform weight if empty list)

  • b ((nt,) array-like, float) – Target histogram (uniform weight if empty list)

  • M ((ns,nt) array-like, float) – Loss matrix (c-order array in numpy with type float64)

  • numItermax (int, optional (default=100000)) – The maximum number of iterations before stopping the optimization algorithm if it has not converged.

  • log (bool, optional (default=False)) – If True, returns a dictionary containing the cost and dual variables. Otherwise returns only the optimal transportation matrix.

  • center_dual (boolean, optional (default=True)) – If True, centers the dual potential using function ot.lp.center_ot_dual().

  • numThreads (int or "max", optional (default=1, i.e. OpenMP is not used)) – If compiled with OpenMP, chooses the number of threads to parallelize. “max” selects the highest number possible.

  • check_marginals (bool, optional (default=True)) – If True, checks that the marginals mass are equal. If False, skips the check.

Returns:

  • gamma (array-like, shape (ns, nt)) – Optimal transportation matrix for the given parameters

  • log (dict, optional) – If input log is true, a dictionary containing the cost and dual variables and exit status

Examples

Simple example with obvious solution. The function emd accepts lists and perform automatic conversion to numpy arrays

>>> import ot
>>> a=[.5,.5]
>>> b=[.5,.5]
>>> M=[[0.,1.],[1.,0.]]
>>> ot.emd(a, b, M)
array([[0.5, 0. ],
       [0. , 0.5]])

References

See also

ot.bregman.sinkhorn

Entropic regularized OT

ot.optim.cg

General regularized OT

ot.emd2(a, b, M, processes=1, numItermax=100000, log=False, return_matrix=False, center_dual=True, numThreads=1, check_marginals=True)[source]

Solves the Earth Movers distance problem and returns the loss

\[ \begin{align}\begin{aligned}\min_\gamma \quad \langle \gamma, \mathbf{M} \rangle_F\\s.t. \ \gamma \mathbf{1} = \mathbf{a}\\ \gamma^T \mathbf{1} = \mathbf{b}\\ \gamma \geq 0\end{aligned}\end{align} \]

where :

  • \(\mathbf{M}\) is the metric cost matrix

  • \(\mathbf{a}\) and \(\mathbf{b}\) are the sample weights

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

Note

This function will cast the computed transport plan and transportation loss to the data type of the provided input with the following priority: \(\mathbf{a}\), then \(\mathbf{b}\), then \(\mathbf{M}\) if marginals are not provided. Casting to an integer tensor might result in a loss of precision. If this behaviour is unwanted, please make sure to provide a floating point input.

Note

An error will be raised if the vectors \(\mathbf{a}\) and \(\mathbf{b}\) do not sum to the same value.

Uses the algorithm proposed in [1].

Parameters:
  • a ((ns,) array-like, float64) – Source histogram (uniform weight if empty list)

  • b ((nt,) array-like, float64) – Target histogram (uniform weight if empty list)

  • M ((ns,nt) array-like, float64) – Loss matrix (for numpy c-order array with type float64)

  • processes (int, optional (default=1)) – Nb of processes used for multiple emd computation (deprecated)

  • numItermax (int, optional (default=100000)) – The maximum number of iterations before stopping the optimization algorithm if it has not converged.

  • log (boolean, optional (default=False)) – If True, returns a dictionary containing dual variables. Otherwise returns only the optimal transportation cost.

  • return_matrix (boolean, optional (default=False)) – If True, returns the optimal transportation matrix in the log.

  • center_dual (boolean, optional (default=True)) – If True, centers the dual potential using function ot.lp.center_ot_dual().

  • numThreads (int or "max", optional (default=1, i.e. OpenMP is not used)) – If compiled with OpenMP, chooses the number of threads to parallelize. “max” selects the highest number possible.

  • check_marginals (bool, optional (default=True)) – If True, checks that the marginals mass are equal. If False, skips the check.

Returns:

  • W (float, array-like) – Optimal transportation loss for the given parameters

  • log (dict) – If input log is true, a dictionary containing dual variables and exit status

Examples

Simple example with obvious solution. The function emd accepts lists and perform automatic conversion to numpy arrays

>>> import ot
>>> a=[.5,.5]
>>> b=[.5,.5]
>>> M=[[0.,1.],[1.,0.]]
>>> ot.emd2(a,b,M)
0.0

References

See also

ot.bregman.sinkhorn

Entropic regularized OT

ot.optim.cg

General regularized OT

ot.emd2_1d(x_a, x_b, a=None, b=None, metric='sqeuclidean', p=1.0, dense=True, log=False)[source]

Solves the Earth Movers distance problem between 1d measures and returns the loss

\[ \begin{align}\begin{aligned}\gamma = arg\min_\gamma \sum_i \sum_j \gamma_{ij} d(x_a[i], x_b[j])\\s.t. \gamma 1 = a, \gamma^T 1= b, \gamma\geq 0\end{aligned}\end{align} \]

where :

  • d is the metric

  • \(x_a\) and \(x_b\) are the samples

  • a and b are the sample weights

This implementation only supports metrics of the form \(d(x, y) = |x - y|^p\).

Uses the algorithm detailed in [1]_

Parameters:
  • x_a (ndarray of float64, shape (ns,) or (ns, 1)) – Source dirac locations (on the real line)

  • x_b (ndarray of float64, shape (nt,) or (ns, 1)) – Target dirac locations (on the real line)

  • a (ndarray of float64, shape (ns,), optional) – Source histogram (default is uniform weight)

  • b (ndarray of float64, shape (nt,), optional) – Target histogram (default is uniform weight)

  • metric (str, optional (default='sqeuclidean')) – Metric to be used. Only works with either of the strings ‘sqeuclidean’, ‘minkowski’, ‘cityblock’, or ‘euclidean’.

  • p (float, optional (default=1.0)) – The p-norm to apply for if metric=’minkowski’

  • dense (boolean, optional (default=True)) – If True, returns \(\gamma\) as a dense ndarray of shape (ns, nt). Otherwise returns a sparse representation using scipy’s coo_matrix format. Only used if log is set to True. Due to implementation details, this function runs faster when dense is set to False.

  • log (boolean, optional (default=False)) – If True, returns a dictionary containing the transportation matrix. Otherwise returns only the loss.

Returns:

  • loss (float) – Cost associated to the optimal transportation

  • log (dict) – If input log is True, a dictionary containing the Optimal transportation matrix for the given parameters

Examples

Simple example with obvious solution. The function emd2_1d accepts lists and performs automatic conversion to numpy arrays

>>> import ot
>>> a=[.5, .5]
>>> b=[.5, .5]
>>> x_a = [2., 0.]
>>> x_b = [0., 3.]
>>> ot.emd2_1d(x_a, x_b, a, b)
0.5
>>> ot.emd2_1d(x_a, x_b)
0.5

References

See also

ot.lp.emd2

EMD for multidimensional distributions

ot.lp.emd_1d

EMD for 1d distributions (returns the transportation matrix instead of the cost)

ot.emd_1d(x_a, x_b, a=None, b=None, metric='sqeuclidean', p=1.0, dense=True, log=False, check_marginals=True)[source]

Solves the Earth Movers distance problem between 1d measures and returns the OT matrix

\[ \begin{align}\begin{aligned}\gamma = arg\min_\gamma \sum_i \sum_j \gamma_{ij} d(x_a[i], x_b[j])\\s.t. \gamma 1 = a, \gamma^T 1= b, \gamma\geq 0\end{aligned}\end{align} \]

where :

  • d is the metric

  • \(x_a\) and \(x_b\) are the samples

  • a and b are the sample weights

This implementation only supports metrics of the form \(d(x, y) = |x - y|^p\).

Uses the algorithm detailed in [1]_

Parameters:
  • x_a (ndarray of float64, shape (ns,) or (ns, 1)) – Source dirac locations (on the real line)

  • x_b (ndarray of float64, shape (nt,) or (ns, 1)) – Target dirac locations (on the real line)

  • a (ndarray of float64, shape (ns,), optional) – Source histogram (default is uniform weight)

  • b (ndarray of float64, shape (nt,), optional) – Target histogram (default is uniform weight)

  • metric (str, optional (default='sqeuclidean')) – Metric to be used. Only works with either of the strings ‘sqeuclidean’, ‘minkowski’, ‘cityblock’, or ‘euclidean’.

  • p (float, optional (default=1.0)) – The p-norm to apply for if metric=’minkowski’

  • dense (boolean, optional (default=True)) – If True, returns \(\gamma\) as a dense ndarray of shape (ns, nt). Otherwise returns a sparse representation using scipy’s coo_matrix format. Due to implementation details, this function runs faster when ‘sqeuclidean’, ‘minkowski’, ‘cityblock’, or ‘euclidean’ metrics are used.

  • log (boolean, optional (default=False)) – If True, returns a dictionary containing the cost. Otherwise returns only the optimal transportation matrix.

  • check_marginals (bool, optional (default=True)) – If True, checks that the marginals mass are equal. If False, skips the check.

Returns:

  • gamma (ndarray, shape (ns, nt)) – Optimal transportation matrix for the given parameters

  • log (dict) – If input log is True, a dictionary containing the cost

Examples

Simple example with obvious solution. The function emd_1d accepts lists and performs automatic conversion to numpy arrays

>>> import ot
>>> a=[.5, .5]
>>> b=[.5, .5]
>>> x_a = [2., 0.]
>>> x_b = [0., 3.]
>>> ot.emd_1d(x_a, x_b, a, b)
array([[0. , 0.5],
       [0.5, 0. ]])
>>> ot.emd_1d(x_a, x_b)
array([[0. , 0.5],
       [0.5, 0. ]])

References

See also

ot.lp.emd

EMD for multidimensional distributions

ot.lp.emd2_1d

EMD for 1d distributions (returns cost instead of the transportation matrix)

ot.factored_optimal_transport(Xa, Xb, a=None, b=None, reg=0.0, r=100, X0=None, stopThr=1e-07, numItermax=100, verbose=False, log=False, **kwargs)[source]

Solves factored OT problem and return OT plans and intermediate distribution

This function solve the following OT problem [40]_

\[\mathop{\arg \min}_\mu \quad W_2^2(\mu_a,\mu)+ W_2^2(\mu,\mu_b)\]

where :

  • \(\mu_a\) and \(\mu_b\) are empirical distributions.

  • \(\mu\) is an empirical distribution with r samples

And returns the two OT plans between

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

Uses the conditional gradient algorithm to solve the problem proposed in [39].

Parameters:
  • Xa ((ns,d) array-like, float) – Source samples

  • Xb ((nt,d) array-like, float) – Target samples

  • a ((ns,) array-like, float) – Source histogram (uniform weight if empty list)

  • b ((nt,) array-like, float) – Target histogram (uniform weight if empty list))

  • numItermax (int, optional) – Max number of iterations

  • stopThr (float, optional) – Stop threshold on the relative variation (>0)

  • verbose (bool, optional) – Print information along iterations

  • log (bool, optional) – record log if True

Returns:

  • Ga (array-like, shape (ns, r)) – Optimal transportation matrix between source and the intermediate distribution

  • Gb (array-like, shape (r, nt)) – Optimal transportation matrix between the intermediate and target distribution

  • X (array-like, shape (r, d)) – Support of the intermediate distribution

  • log (dict, optional) – If input log is true, a dictionary containing the cost and dual variables and exit status

References

See also

ot.bregman.sinkhorn

Entropic regularized OT

ot.optim.cg

General regularized OT

ot.fused_gromov_wasserstein(M, C1, C2, p=None, q=None, loss_fun='square_loss', symmetric=None, alpha=0.5, armijo=False, G0=None, log=False, max_iter=10000.0, tol_rel=1e-09, tol_abs=1e-09, **kwargs)[source]

Returns the Fused Gromov-Wasserstein transport between \((\mathbf{C_1}, \mathbf{Y_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{Y_2}, \mathbf{q})\) with pairwise distance matrix \(\mathbf{M}\) between node feature matrices \(\mathbf{Y_1}\) and \(\mathbf{Y_2}\) (see [24]).

The function solves the following optimization problem using Conditional Gradient:

\[ \begin{align}\begin{aligned}\mathbf{T}^* \in\mathop{\arg\min}_\mathbf{T} \quad (1 - \alpha) \langle \mathbf{T}, \mathbf{M} \rangle_F + \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

Where :

  • \(\mathbf{M}\): metric cost matrix between features across domains

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • \(\mathbf{p}\): distribution in the source space

  • \(\mathbf{q}\): distribution in the target space

  • L: loss function to account for the misfit between the similarity and feature matrices

  • \(\alpha\): trade-off parameter

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

Note

All computations in the conjugate gradient solver are done with numpy to limit memory overhead.

Note

This function will cast the computed transport plan to the data type of the provided input \(\mathbf{M}\). Casting to an integer tensor might result in a loss of precision. If this behaviour is unwanted, please make sure to provide a floating point input.

Parameters:
  • M (array-like, shape (ns, nt)) – Metric cost matrix between features across domains

  • C1 (array-like, shape (ns, ns)) – Metric cost matrix representative of the structure in the source space

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix representative of the structure in the target space

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • q (array-like, shape (nt,), optional) – Distribution in the target space. If let to its default value None, uniform distribution is taken.

  • loss_fun (str, optional) – Loss function used for the solver

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • alpha (float, optional) – Trade-off parameter (0 < alpha < 1)

  • armijo (bool, optional) – If True the step of the line-search is found via an armijo research. Else closed form is used. If there are convergence issues use False.

  • G0 (array-like, shape (ns,nt), optional) – If None the initial transport plan of the solver is pq^T. Otherwise G0 must satisfy marginal constraints and will be used as initial transport of the solver.

  • log (bool, optional) – record log if True

  • max_iter (int, optional) – Max number of iterations

  • tol_rel (float, optional) – Stop threshold on relative error (>0)

  • tol_abs (float, optional) – Stop threshold on absolute error (>0)

  • **kwargs (dict) – parameters can be directly passed to the ot.optim.cg solver

Returns:

  • T (array-like, shape (ns, nt)) – Optimal transportation matrix for the given parameters.

  • log (dict) – Log dictionary return only if log==True in parameters.

References

ot.fused_gromov_wasserstein2(M, C1, C2, p=None, q=None, loss_fun='square_loss', symmetric=None, alpha=0.5, armijo=False, G0=None, log=False, max_iter=10000.0, tol_rel=1e-09, tol_abs=1e-09, **kwargs)[source]

Returns the Fused Gromov-Wasserstein distance between \((\mathbf{C_1}, \mathbf{Y_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{Y_2}, \mathbf{q})\) with pairwise distance matrix \(\mathbf{M}\) between node feature matrices \(\mathbf{Y_1}\) and \(\mathbf{Y_2}\) (see [24]).

The function solves the following optimization problem using Conditional Gradient:

\[ \begin{align}\begin{aligned}\mathbf{FGW} = \mathop{\min}_\mathbf{T} \quad (1 - \alpha) \langle \mathbf{T}, \mathbf{M} \rangle_F + \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

Where :

  • \(\mathbf{M}\): metric cost matrix between features across domains

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • \(\mathbf{p}\): distribution in the source space

  • \(\mathbf{q}\): distribution in the target space

  • L: loss function to account for the misfit between the similarity and feature matrices

  • \(\alpha\): trade-off parameter

Note that when using backends, this loss function is differentiable wrt the matrices (C1, C2, M) and weights (p, q) for quadratic loss using the gradients from [38]_.

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

Note

All computations in the conjugate gradient solver are done with numpy to limit memory overhead.

Note

This function will cast the computed transport plan to the data type of the provided input \(\mathbf{M}\). Casting to an integer tensor might result in a loss of precision. If this behaviour is unwanted, please make sure to provide a floating point input.

Parameters:
  • M (array-like, shape (ns, nt)) – Metric cost matrix between features across domains

  • C1 (array-like, shape (ns, ns)) – Metric cost matrix representative of the structure in the source space.

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix representative of the structure in the target space.

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • q (array-like, shape (nt,), optional) – Distribution in the target space. If let to its default value None, uniform distribution is taken.

  • loss_fun (str, optional) – Loss function used for the solver.

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • alpha (float, optional) – Trade-off parameter (0 < alpha < 1)

  • armijo (bool, optional) – If True the step of the line-search is found via an armijo research. Else closed form is used. If there are convergence issues use False.

  • G0 (array-like, shape (ns,nt), optional) – If None the initial transport plan of the solver is pq^T. Otherwise G0 must satisfy marginal constraints and will be used as initial transport of the solver.

  • log (bool, optional) – Record log if True.

  • max_iter (int, optional) – Max number of iterations

  • tol_rel (float, optional) – Stop threshold on relative error (>0)

  • tol_abs (float, optional) – Stop threshold on absolute error (>0)

  • **kwargs (dict) – Parameters can be directly passed to the ot.optim.cg solver.

Returns:

  • fgw-distance (float) – Fused Gromov-Wasserstein distance for the given parameters.

  • log (dict) – Log dictionary return only if log==True in parameters.

References

ot.gromov_barycenters(N, Cs, ps=None, p=None, lambdas=None, loss_fun='square_loss', symmetric=True, armijo=False, max_iter=1000, tol=1e-09, stop_criterion='barycenter', warmstartT=False, verbose=False, log=False, init_C=None, random_state=None, **kwargs)[source]

Returns the Gromov-Wasserstein barycenters of S measured similarity matrices \((\mathbf{C}_s)_{1 \leq s \leq S}\)

The function solves the following optimization problem with block coordinate descent:

\[\mathbf{C}^* = \mathop{\arg \min}_{\mathbf{C}\in \mathbb{R}^{N \times N}} \quad \sum_s \lambda_s \mathrm{GW}(\mathbf{C}, \mathbf{C}_s, \mathbf{p}, \mathbf{p}_s)\]

Where :

  • \(\mathbf{C}_s\): metric cost matrix

  • \(\mathbf{p}_s\): distribution

Parameters:
  • N (int) – Size of the targeted barycenter

  • Cs (list of S array-like of shape (ns, ns)) – Metric cost matrices

  • ps (list of S array-like of shape (ns,), optional) – Sample weights in the S spaces. If let to its default value None, uniform distributions are taken.

  • p (array-like, shape (N,), optional) – Weights in the targeted barycenter. If let to its default value None, uniform distribution is taken.

  • lambdas (list of float, optional) – List of the S spaces’ weights. If let to its default value None, uniform weights are taken.

  • loss_fun (callable, optional) – tensor-matrix multiplication function based on specific loss function

  • symmetric (bool, optional.) – Either structures are to be assumed symmetric or not. Default value is True. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • armijo (bool, optional) – If True the step of the line-search is found via an armijo research. Else closed form is used. If there are convergence issues use False.

  • max_iter (int, optional) – Max number of iterations

  • tol (float, optional) – Stop threshold on relative error (>0)

  • stop_criterion (str, optional. Default is 'barycenter'.) – Stop criterion taking values in [‘barycenter’, ‘loss’]. If set to ‘barycenter’ uses absolute norm variations of estimated barycenters. Else if set to ‘loss’ uses the relative variations of the loss.

  • warmstartT (bool, optional) – Either to perform warmstart of transport plans in the successive fused gromov-wasserstein transport problems.s

  • verbose (bool, optional) – Print information along iterations.

  • log (bool, optional) – Record log if True.

  • init_C (bool | array-like, shape(N,N)) – Random initial value for the \(\mathbf{C}\) matrix provided by user.

  • random_state (int or RandomState instance, optional) – Fix the seed for reproducibility

Returns:

  • C (array-like, shape (N, N)) – Similarity matrix in the barycenter space (permutated arbitrarily)

  • log (dict) – Only returned when log=True. It contains the keys:

    • \(\mathbf{T}\): list of (N, ns) transport matrices

    • \(\mathbf{p}\): (N,) barycenter weights

    • values used in convergence evaluation.

References

ot.gromov_wasserstein(C1, C2, p=None, q=None, loss_fun='square_loss', symmetric=None, log=False, armijo=False, G0=None, max_iter=10000.0, tol_rel=1e-09, tol_abs=1e-09, **kwargs)[source]

Returns the Gromov-Wasserstein transport between \((\mathbf{C_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{q})\).

The function solves the following optimization problem using Conditional Gradient:

\[ \begin{align}\begin{aligned}\mathbf{T}^* \in \mathop{\arg \min}_\mathbf{T} \quad \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{\gamma} \mathbf{1} &= \mathbf{p}\\ \mathbf{\gamma}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{\gamma} &\geq 0\end{aligned}\end{align} \]

Where :

  • \(\mathbf{C_1}\): Metric cost matrix in the source space.

  • \(\mathbf{C_2}\): Metric cost matrix in the target space.

  • \(\mathbf{p}\): Distribution in the source space.

  • \(\mathbf{q}\): Distribution in the target space.

  • L: Loss function to account for the misfit between the similarity matrices.

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

Note

All computations in the conjugate gradient solver are done with numpy to limit memory overhead.

Note

This function will cast the computed transport plan to the data type of the provided input \(\mathbf{C}_1\). Casting to an integer tensor might result in a loss of precision. If this behaviour is unwanted, please make sure to provide a floating point input.

Parameters:
  • C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space.

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space.

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • q (array-like, shape (nt,), optional) – Distribution in the target space. If let to its default value None, uniform distribution is taken.

  • loss_fun (str, optional) – Loss function used for the solver either ‘square_loss’ or ‘kl_loss’.

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • verbose (bool, optional) – Print information along iterations.

  • log (bool, optional) – Record log if True.

  • armijo (bool, optional) – If True, the step of the line-search is found via an armijo search. Else closed form is used. If there are convergence issues, use False.

  • G0 (array-like, shape (ns,nt), optional) – If None, the initial transport plan of the solver is pq^T. Otherwise G0 must satisfy marginal constraints and will be used as initial transport of the solver.

  • max_iter (int, optional) – Max number of iterations.

  • tol_rel (float, optional) – Stop threshold on relative error (>0).

  • tol_abs (float, optional) – Stop threshold on absolute error (>0).

  • **kwargs (dict) – Parameters can be directly passed to the ot.optim.cg solver.

Returns:

  • T (array-like, shape (ns, nt)) –

    Coupling between the two spaces that minimizes:

    \(\sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\)

  • log (dict) – Convergence information and loss.

References

ot.gromov_wasserstein2(C1, C2, p=None, q=None, loss_fun='square_loss', symmetric=None, log=False, armijo=False, G0=None, max_iter=10000.0, tol_rel=1e-09, tol_abs=1e-09, **kwargs)[source]

Returns the Gromov-Wasserstein loss \(\mathbf{GW}\) between \((\mathbf{C_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{q})\). To recover the Gromov-Wasserstein distance as defined in [13] compute \(d_{GW} = \frac{1}{2} \sqrt{\mathbf{GW}}\).

The function solves the following optimization problem using Conditional Gradient:

\[ \begin{align}\begin{aligned}\mathbf{GW} = \min_\mathbf{T} \quad \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{\gamma} \mathbf{1} &= \mathbf{p}\\ \mathbf{\gamma}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{\gamma} &\geq 0\end{aligned}\end{align} \]

Where :

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • \(\mathbf{p}\): distribution in the source space

  • \(\mathbf{q}\): distribution in the target space

  • L: loss function to account for the misfit between the similarity matrices

Note that when using backends, this loss function is differentiable wrt the matrices (C1, C2) and weights (p, q) for quadratic loss using the gradients from [38]_.

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

Note

All computations in the conjugate gradient solver are done with numpy to limit memory overhead.

Note

This function will cast the computed transport plan to the data type of the provided input \(\mathbf{C}_1\). Casting to an integer tensor might result in a loss of precision. If this behaviour is unwanted, please make sure to provide a floating point input.

Parameters:
  • C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • q (array-like, shape (nt,), optional) – Distribution in the target space. If let to its default value None, uniform distribution is taken.

  • loss_fun (str) – loss function used for the solver either ‘square_loss’ or ‘kl_loss’

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • verbose (bool, optional) – Print information along iterations

  • log (bool, optional) – record log if True

  • armijo (bool, optional) – If True the step of the line-search is found via an armijo research. Else closed form is used. If there are convergence issues use False.

  • G0 (array-like, shape (ns,nt), optional) – If None the initial transport plan of the solver is pq^T. Otherwise G0 must satisfy marginal constraints and will be used as initial transport of the solver.

  • max_iter (int, optional) – Max number of iterations

  • tol_rel (float, optional) – Stop threshold on relative error (>0)

  • tol_abs (float, optional) – Stop threshold on absolute error (>0)

  • **kwargs (dict) – parameters can be directly passed to the ot.optim.cg solver

Returns:

  • gw_dist (float) – Gromov-Wasserstein distance

  • log (dict) – convergence information and Coupling matrix

References

ot.lowrank_gromov_wasserstein_samples(X_s, X_t, a=None, b=None, reg=0, rank=None, alpha=1e-10, gamma_init='rescale', rescale_cost=True, cost_factorized_Xs=None, cost_factorized_Xt=None, stopThr=0.0001, numItermax=1000, stopThr_dykstra=0.001, numItermax_dykstra=10000, seed_init=49, warn=True, warn_dykstra=False, log=False)[source]

Solve the entropic regularization Gromov-Wasserstein transport problem under low-nonnegative rank constraints on the couplings and cost matrices.

Squared euclidean distance matrices are considered for the target and source distributions.

The function solves the following optimization problem:

\[\mathop{\min_{(Q,R,g) \in \mathcal{C(a,b,r)}}} \mathcal{Q}_{A,B}(Q\mathrm{diag}(1/g)R^T) - \epsilon \cdot H((Q,R,g))\]

where :

  • \(A\) is the (dim_a, dim_a) square pairwise cost matrix of the source domain.

  • \(B\) is the (dim_a, dim_a) square pairwise cost matrix of the target domain.

  • \(\mathcal{Q}_{A,B}\) is quadratic objective function of the Gromov Wasserstein plan.

  • \(Q\) and R are the low-rank matrix decomposition of the Gromov-Wasserstein plan.

  • \(g\) is the weight vector for the low-rank decomposition of the Gromov-Wasserstein plan.

  • \(\mathbf{a}\) and \(\mathbf{b}\) are source and target weights (histograms, both sum to 1).

  • \(r\) is the rank of the Gromov-Wasserstein plan.

  • \(\mathcal{C(a,b,r)}\) are the low-rank couplings of the OT problem.

  • \(H((Q,R,g))\) is the values of the three respective entropies evaluated for each term.

Parameters:
  • X_s (array-like, shape (n_samples_a, dim_Xs)) – Samples in the source domain

  • X_t (array-like, shape (n_samples_b, dim_Xt)) – Samples in the target domain

  • a (array-like, shape (n_samples_a,), optional) – Samples weights in the source domain If let to its default value None, uniform distribution is taken.

  • b (array-like, shape (n_samples_b,), optional) – Samples weights in the target domain If let to its default value None, uniform distribution is taken.

  • reg (float, optional) – Regularization term >=0

  • rank (int, optional. Default is None. (>0)) – Nonnegative rank of the OT plan. If None, min(ns, nt) is considered.

  • alpha (int, optional. Default is 1e-10. (>0 and <1/r)) – Lower bound for the weight vector g.

  • rescale_cost (bool, optional. Default is False) – Rescale the low rank factorization of the sqeuclidean cost matrix

  • seed_init (int, optional. Default is 49. (>0)) – Random state for the ‘random’ initialization of low rank couplings

  • gamma_init (str, optional. Default is "rescale".) – Initialization strategy for gamma. ‘rescale’, or ‘theory’ Gamma is a constant that scales the convergence criterion of the Mirror Descent optimization scheme used to compute the low-rank couplings (Q, R and g)

  • numItermax (int, optional. Default is 1000.) – Max number of iterations for Low Rank GW

  • stopThr (float, optional. Default is 1e-4.) – Stop threshold on error (>0) for Low Rank GW The error is the sum of Kullback Divergences computed for each low rank coupling (Q, R and g) and scaled using gamma.

  • numItermax_dykstra (int, optional. Default is 2000.) – Max number of iterations for the Dykstra algorithm

  • stopThr_dykstra (float, optional. Default is 1e-7.) – Stop threshold on error (>0) in Dykstra

  • cost_factorized_Xs (tuple, optional. Default is None) – Tuple with two pre-computed low rank decompositions (A1, A2) of the source cost matrix. Both matrices should have a shape of (n_samples_a, dim_Xs + 2). If None, the low rank cost matrices will be computed as sqeuclidean cost matrices.

  • cost_factorized_Xt (tuple, optional. Default is None) – Tuple with two pre-computed low rank decompositions (B1, B2) of the target cost matrix. Both matrices should have a shape of (n_samples_b, dim_Xt + 2). If None, the low rank cost matrices will be computed as sqeuclidean cost matrices.

  • warn (bool, optional) – if True, raises a warning if the low rank GW algorithm doesn’t convergence.

  • warn_dykstra (bool, optional) – if True, raises a warning if the Dykstra algorithm doesn’t convergence.

  • log (bool, optional) – record log if True

Returns:

  • Q (array-like, shape (n_samples_a, r)) – First low-rank matrix decomposition of the OT plan

  • R (array-like, shape (n_samples_b, r)) – Second low-rank matrix decomposition of the OT plan

  • g (array-like, shape (r, )) – Weight vector for the low-rank decomposition of the OT

  • log (dict (lazy_plan, value and value_linear)) – log dictionary return only if log==True in parameters

References

ot.lowrank_sinkhorn(X_s, X_t, a=None, b=None, reg=0, rank=None, alpha=1e-10, rescale_cost=True, init='random', reg_init=0.1, seed_init=49, gamma_init='rescale', numItermax=2000, stopThr=1e-07, warn=True, log=False)[source]

Solve the entropic regularization optimal transport problem under low-nonnegative rank constraints on the couplings.

The function solves the following optimization problem:

\[\mathop{\inf_{(\mathbf{Q},\mathbf{R},\mathbf{g}) \in \mathcal{C}(\mathbf{a},\mathbf{b},r)}} \langle \mathbf{C}, \mathbf{Q}\mathrm{diag}(1/\mathbf{g})\mathbf{R}^\top \rangle - \mathrm{reg} \cdot H((\mathbf{Q}, \mathbf{R}, \mathbf{g}))\]

where :

  • \(\mathbf{C}\) is the (dim_a, dim_b) metric cost matrix

  • \(H((\mathbf{Q}, \mathbf{R}, \mathbf{g}))\) is the values of the three respective entropies evaluated for each term.

  • \(\mathbf{Q}\) and \(\mathbf{R}\) are the low-rank matrix decomposition of the OT plan

  • \(\mathbf{g}\) is the weight vector for the low-rank decomposition of the OT plan

  • \(\mathbf{a}\) and \(\mathbf{b}\) are source and target weights (histograms, both sum to 1)

  • \(r\) is the rank of the OT plan

  • \(\mathcal{C}(\mathbf{a}, \mathbf{b}, r)\) are the low-rank couplings of the OT problem

Parameters:
  • X_s (array-like, shape (n_samples_a, dim)) – samples in the source domain

  • X_t (array-like, shape (n_samples_b, dim)) – samples in the target domain

  • a (array-like, shape (n_samples_a,)) – samples weights in the source domain

  • b (array-like, shape (n_samples_b,)) – samples weights in the target domain

  • reg (float, optional) – Regularization term >0

  • rank (int, optional. Default is None. (>0)) – Nonnegative rank of the OT plan. If None, min(ns, nt) is considered.

  • alpha (int, optional. Default is 1e-10. (>0 and <1/r)) – Lower bound for the weight vector g.

  • rescale_cost (bool, optional. Default is False) – Rescale the low rank factorization of the sqeuclidean cost matrix

  • init (str, optional. Default is 'random'.) – Initialization strategy for the low rank couplings. ‘random’, ‘deterministic’ or ‘kmeans’

  • reg_init (float, optional. Default is 1e-1. (>0)) – Regularization term for a ‘kmeans’ init. If None, 1 is considered.

  • seed_init (int, optional. Default is 49. (>0)) – Random state for a ‘random’ or ‘kmeans’ init strategy.

  • gamma_init (str, optional. Default is "rescale".) – Initialization strategy for gamma. ‘rescale’, or ‘theory’ Gamma is a constant that scales the convergence criterion of the Mirror Descent optimization scheme used to compute the low-rank couplings (Q, R and g)

  • numItermax (int, optional. Default is 2000.) – Max number of iterations for the Dykstra algorithm

  • stopThr (float, optional. Default is 1e-7.) – Stop threshold on error (>0) in Dykstra

  • warn (bool, optional) – if True, raises a warning if the algorithm doesn’t convergence.

  • log (bool, optional) – record log if True

Returns:

  • Q (array-like, shape (n_samples_a, r)) – First low-rank matrix decomposition of the OT plan

  • R (array-like, shape (n_samples_b, r)) – Second low-rank matrix decomposition of the OT plan

  • g (array-like, shape (r, )) – Weight vector for the low-rank decomposition of the OT

  • log (dict (lazy_plan, value and value_linear)) – log dictionary return only if log==True in parameters

References

ot.max_sliced_wasserstein_distance(X_s, X_t, a=None, b=None, n_projections=50, p=2, projections=None, seed=None, log=False)[source]

Computes a Monte-Carlo approximation of the max p-Sliced Wasserstein distance

\[\mathcal{Max-SWD}_p(\mu, \nu) = \underset{\theta _in \mathcal{U}(\mathbb{S}^{d-1})}{\max} [\mathcal{W}_p^p(\theta_\# \mu, \theta_\# \nu)]^{\frac{1}{p}}\]

where :

  • \(\theta_\# \mu\) stands for the pushforwards of the projection \(\mathbb{R}^d \ni X \mapsto \langle \theta, X \rangle\)

Parameters:
  • X_s (ndarray, shape (n_samples_a, dim)) – samples in the source domain

  • X_t (ndarray, shape (n_samples_b, dim)) – samples in the target domain

  • a (ndarray, shape (n_samples_a,), optional) – samples weights in the source domain

  • b (ndarray, shape (n_samples_b,), optional) – samples weights in the target domain

  • n_projections (int, optional) – Number of projections used for the Monte-Carlo approximation

  • p (float, optional =) – Power p used for computing the sliced Wasserstein

  • projections (shape (dim, n_projections), optional) – Projection matrix (n_projections and seed are not used in this case)

  • seed (int or RandomState or None, optional) – Seed used for random number generator

  • log (bool, optional) – if True, sliced_wasserstein_distance returns the projections used and their associated EMD.

Returns:

  • cost (float) – Sliced Wasserstein Cost

  • log (dict, optional) – log dictionary return only if log==True in parameters

Examples

>>> n_samples_a = 20
>>> X = np.random.normal(0., 1., (n_samples_a, 5))
>>> sliced_wasserstein_distance(X, X, seed=0)  
0.0

References

ot.semidiscrete_wasserstein2_unif_circle(u_values, u_weights=None)[source]

Computes the closed-form for the 2-Wasserstein distance between samples and a uniform distribution on \(S^1\) Samples need to be in \(S^1\cong [0,1[\). If they are on \(\mathbb{R}\), takes the value modulo 1. If the values are on \(S^1\subset\mathbb{R}^2\), it is required to first find the coordinates using e.g. the atan2 function.

\[W_2^2(\mu_n, \nu) = \sum_{i=1}^n \alpha_i x_i^2 - \left(\sum_{i=1}^n \alpha_i x_i\right)^2 + \sum_{i=1}^n \alpha_i x_i \left(1-\alpha_i-2\sum_{k=1}^{i-1}\alpha_k\right) + \frac{1}{12}\]

where:

  • \(\nu=\mathrm{Unif}(S^1)\) and \(\mu_n = \sum_{i=1}^n \alpha_i \delta_{x_i}\)

For values \(x=(x_1,x_2)\in S^1\), it is required to first get their coordinates with

\[u = \frac{\pi + \mathrm{atan2}(-x_2,-x_1)}{2\pi},\]

using e.g. ot.utils.get_coordinate_circle(x)

Parameters:
  • u_values (ndarray, shape (n, ...)) – Samples

  • u_weights (ndarray, shape (n, ...), optional) – samples weights in the source domain

Returns:

loss – Cost associated to the optimal transportation

Return type:

float

Examples

>>> x0 = np.array([[0], [0.2], [0.4]])
>>> semidiscrete_wasserstein2_unif_circle(x0)
array([0.02111111])

References

ot.sinkhorn(a, b, M, reg, method='sinkhorn', numItermax=1000, stopThr=1e-09, verbose=False, log=False, warn=True, warmstart=None, **kwargs)[source]

Solve the entropic regularization optimal transport problem and return the OT matrix

The function solves the following optimization problem:

\[ \begin{align}\begin{aligned}\gamma = \mathop{\arg \min}_\gamma \quad \langle \gamma, \mathbf{M} \rangle_F + \mathrm{reg}\cdot\Omega(\gamma)\\s.t. \ \gamma \mathbf{1} &= \mathbf{a}\\ \gamma^T \mathbf{1} &= \mathbf{b}\\ \gamma &\geq 0\end{aligned}\end{align} \]

where :

  • \(\mathbf{M}\) is the (dim_a, dim_b) metric cost matrix

  • \(\Omega\) is the entropic regularization term \(\Omega(\gamma)=\sum_{i,j} \gamma_{i,j}\log(\gamma_{i,j})\)

  • \(\mathbf{a}\) and \(\mathbf{b}\) are source and target weights (histograms, both sum to 1)

Note

This function is backend-compatible and will work on arrays from all compatible backends.

The algorithm used for solving the problem is the Sinkhorn-Knopp matrix scaling algorithm as proposed in [2]

Choosing a Sinkhorn solver

By default and when using a regularization parameter that is not too small the default sinkhorn solver should be enough. If you need to use a small regularization to get sharper OT matrices, you should use the ot.bregman.sinkhorn_stabilized() solver that will avoid numerical errors. This last solver can be very slow in practice and might not even converge to a reasonable OT matrix in a finite time. This is why ot.bregman.sinkhorn_epsilon_scaling() that relies on iterating the value of the regularization (and using warm start) sometimes leads to better solutions. Note that the greedy version of the sinkhorn ot.bregman.greenkhorn() can also lead to a speedup and the screening version of the sinkhorn ot.bregman.screenkhorn() aim at providing a fast approximation of the Sinkhorn problem. For use of GPU and gradient computation with small number of iterations we strongly recommend the ot.bregman.sinkhorn_log() solver that will no need to check for numerical problems.

Parameters:
  • a (array-like, shape (dim_a,)) – samples weights in the source domain

  • b (array-like, shape (dim_b,) or ndarray, shape (dim_b, n_hists)) – samples in the target domain, compute sinkhorn with multiple targets and fixed \(\mathbf{M}\) if \(\mathbf{b}\) is a matrix (return OT loss + dual variables in log)

  • M (array-like, shape (dim_a, dim_b)) – loss matrix

  • reg (float) – Regularization term >0

  • method (str) – method used for the solver either ‘sinkhorn’,’sinkhorn_log’, ‘greenkhorn’, ‘sinkhorn_stabilized’ or ‘sinkhorn_epsilon_scaling’, see those function for specific parameters

  • numItermax (int, optional) – Max number of iterations

  • stopThr (float, optional) – Stop threshold on error (>0)

  • verbose (bool, optional) – Print information along iterations

  • log (bool, optional) – record log if True

  • warn (bool, optional) – if True, raises a warning if the algorithm doesn’t convergence.

  • warmstart (tuple of arrays, shape (dim_a, dim_b), optional) – Initialization of dual potentials. If provided, the dual potentials should be given (that is the logarithm of the u,v sinkhorn scaling vectors)

Returns:

  • gamma (array-like, shape (dim_a, dim_b)) – Optimal transportation matrix for the given parameters

  • log (dict) – log dictionary return only if log==True in parameters

Examples

>>> import ot
>>> a=[.5, .5]
>>> b=[.5, .5]
>>> M=[[0., 1.], [1., 0.]]
>>> ot.sinkhorn(a, b, M, 1)
array([[0.36552929, 0.13447071],
       [0.13447071, 0.36552929]])

References

See also

ot.lp.emd

Unregularized OT

ot.optim.cg

General regularized OT

ot.bregman.sinkhorn_knopp

Classic Sinkhorn [2]

ot.bregman.sinkhorn_stabilized

Stabilized sinkhorn [9] [10]

ot.bregman.sinkhorn_epsilon_scaling

Sinkhorn with epsilon scaling [9] [10]

ot.sinkhorn2(a, b, M, reg, method='sinkhorn', numItermax=1000, stopThr=1e-09, verbose=False, log=False, warn=False, warmstart=None, **kwargs)[source]

Solve the entropic regularization optimal transport problem and return the loss

The function solves the following optimization problem:

\[ \begin{align}\begin{aligned}W = \min_\gamma \quad \langle \gamma, \mathbf{M} \rangle_F + \mathrm{reg}\cdot\Omega(\gamma)\\s.t. \ \gamma \mathbf{1} &= \mathbf{a}\\ \gamma^T \mathbf{1} &= \mathbf{b}\\ \gamma &\geq 0\end{aligned}\end{align} \]

where :

  • \(\mathbf{M}\) is the (dim_a, dim_b) metric cost matrix

  • \(\Omega\) is the entropic regularization term \(\Omega(\gamma)=\sum_{i,j} \gamma_{i,j}\log(\gamma_{i,j})\)

  • \(\mathbf{a}\) and \(\mathbf{b}\) are source and target weights (histograms, both sum to 1)

and returns \(\langle \gamma^*, \mathbf{M} \rangle_F\) (without the entropic contribution).

Note

This function is backend-compatible and will work on arrays from all compatible backends.

The algorithm used for solving the problem is the Sinkhorn-Knopp matrix scaling algorithm as proposed in [2]

Choosing a Sinkhorn solver

By default and when using a regularization parameter that is not too small the default sinkhorn solver should be enough. If you need to use a small regularization to get sharper OT matrices, you should use the ot.bregman.sinkhorn_log() solver that will avoid numerical errors. This last solver can be very slow in practice and might not even converge to a reasonable OT matrix in a finite time. This is why ot.bregman.sinkhorn_epsilon_scaling() that relies on iterating the value of the regularization (and using warm start) sometimes leads to better solutions. Note that the greedy version of the sinkhorn ot.bregman.greenkhorn() can also lead to a speedup and the screening version of the sinkhorn ot.bregman.screenkhorn() aim a providing a fast approximation of the Sinkhorn problem. For use of GPU and gradient computation with small number of iterations we strongly recommend the ot.bregman.sinkhorn_log() solver that will no need to check for numerical problems.

Parameters:
  • a (array-like, shape (dim_a,)) – samples weights in the source domain

  • b (array-like, shape (dim_b,) or ndarray, shape (dim_b, n_hists)) – samples in the target domain, compute sinkhorn with multiple targets and fixed \(\mathbf{M}\) if \(\mathbf{b}\) is a matrix (return OT loss + dual variables in log)

  • M (array-like, shape (dim_a, dim_b)) – loss matrix

  • reg (float) – Regularization term >0

  • method (str) – method used for the solver either ‘sinkhorn’,’sinkhorn_log’, ‘sinkhorn_stabilized’, see those function for specific parameters

  • numItermax (int, optional) – Max number of iterations

  • stopThr (float, optional) – Stop threshold on error (>0)

  • verbose (bool, optional) – Print information along iterations

  • log (bool, optional) – record log if True

  • warn (bool, optional) – if True, raises a warning if the algorithm doesn’t convergence.

  • warmstart (tuple of arrays, shape (dim_a, dim_b), optional) – Initialization of dual potentials. If provided, the dual potentials should be given (that is the logarithm of the u,v sinkhorn scaling vectors)

Returns:

  • W ((n_hists) float/array-like) – Optimal transportation loss for the given parameters

  • log (dict) – log dictionary return only if log==True in parameters

Examples

>>> import ot
>>> a=[.5, .5]
>>> b=[.5, .5]
>>> M=[[0., 1.], [1., 0.]]
>>> ot.sinkhorn2(a, b, M, 1)
0.26894142136999516

References

See also

ot.lp.emd

Unregularized OT

ot.optim.cg

General regularized OT

ot.bregman.sinkhorn_knopp

Classic Sinkhorn [2]

ot.bregman.greenkhorn

Greenkhorn [21]

ot.bregman.sinkhorn_stabilized

Stabilized sinkhorn [9] [10]

ot.sinkhorn_lpl1_mm(a, labels_a, b, M, reg, eta=0.1, numItermax=10, numInnerItermax=200, stopInnerThr=1e-09, verbose=False, log=False)[source]

Solve the entropic regularization optimal transport problem with non-convex group lasso regularization

The function solves the following optimization problem:

\[ \begin{align}\begin{aligned}\gamma = \mathop{\arg \min}_\gamma \quad \langle \gamma, \mathbf{M} \rangle_F + \mathrm{reg} \cdot \Omega_e(\gamma) + \eta \ \Omega_g(\gamma)\\s.t. \ \gamma \mathbf{1} = \mathbf{a}\\ \gamma^T \mathbf{1} = \mathbf{b}\\ \gamma \geq 0\end{aligned}\end{align} \]

where :

  • \(\mathbf{M}\) is the (ns, nt) metric cost matrix

  • \(\Omega_e\) is the entropic regularization term \(\Omega_e (\gamma)=\sum_{i,j} \gamma_{i,j}\log(\gamma_{i,j})\)

  • \(\Omega_g\) is the group lasso regularization term \(\Omega_g(\gamma)=\sum_{i,c} \|\gamma_{i,\mathcal{I}_c}\|^{1/2}_1\) where \(\mathcal{I}_c\) are the index of samples from class c in the source domain.

  • \(\mathbf{a}\) and \(\mathbf{b}\) are source and target weights (sum to 1)

The algorithm used for solving the problem is the generalized conditional gradient as proposed in [5, 7].

Parameters:
  • a (array-like (ns,)) – samples weights in the source domain

  • labels_a (array-like (ns,)) – labels of samples in the source domain

  • b (array-like (nt,)) – samples weights in the target domain

  • M (array-like (ns,nt)) – loss matrix

  • reg (float) – Regularization term for entropic regularization >0

  • eta (float, optional) – Regularization term for group lasso regularization >0

  • numItermax (int, optional) – Max number of iterations

  • numInnerItermax (int, optional) – Max number of iterations (inner sinkhorn solver)

  • stopInnerThr (float, optional) – Stop threshold on error (inner sinkhorn solver) (>0)

  • verbose (bool, optional) – Print information along iterations

  • log (bool, optional) – record log if True

Returns:

  • gamma ((ns, nt) array-like) – Optimal transportation matrix for the given parameters

  • log (dict) – log dictionary return only if log==True in parameters

References

See also

ot.lp.emd

Unregularized OT

ot.bregman.sinkhorn

Entropic regularized OT

ot.optim.cg

General regularized OT

ot.sinkhorn_unbalanced(a, b, M, reg, reg_m, method='sinkhorn', reg_type='kl', c=None, warmstart=None, numItermax=1000, stopThr=1e-06, verbose=False, log=False, **kwargs)[source]

Solve the unbalanced entropic regularization optimal transport problem and return the OT plan

The function solves the following optimization problem:

\[ \begin{align}\begin{aligned}W = \arg \min_\gamma \ \langle \gamma, \mathbf{M} \rangle_F + \mathrm{reg} \cdot \mathrm{KL}(\gamma, \mathbf{c}) + \mathrm{reg_{m1}} \cdot \mathrm{KL}(\gamma \mathbf{1}, \mathbf{a}) + \mathrm{reg_{m2}} \cdot \mathrm{KL}(\gamma^T \mathbf{1}, \mathbf{b})\\s.t. \gamma \geq 0\end{aligned}\end{align} \]

where :

  • \(\mathbf{M}\) is the (dim_a, dim_b) metric cost matrix

  • \(\mathbf{a}\) and \(\mathbf{b}\) are source and target unbalanced distributions

  • \(\mathbf{c}\) is a reference distribution for the regularization

  • KL is the Kullback-Leibler divergence

The algorithm used for solving the problem is the generalized Sinkhorn-Knopp matrix scaling algorithm as proposed in [10, 25]

Warning

Starting from version 0.9.5, the default value has been changed to reg_type=’kl’ instead of reg_type=’entropy’. This makes the function more consistent with the literature and the other solvers. If you want to use the entropy regularization, please set reg_type=’entropy’ explicitly.

Parameters:
  • a (array-like, shape (dim_a,)) – Unnormalized histogram of dimension dim_a If a is an empty list or array ([]), then a is set to uniform distribution.

  • b (array-like, shape (dim_b,)) – One or multiple unnormalized histograms of dimension dim_b. If b is an empty list or array ([]), then b is set to uniform distribution. If many, compute all the OT costs \((\mathbf{a}, \mathbf{b}_i)_i\)

  • M (array-like, shape (dim_a, dim_b)) – loss matrix

  • reg (float) – Entropy regularization term > 0

  • reg_m (float or indexable object of length 1 or 2) – Marginal relaxation term. If \(\mathrm{reg_{m}}\) is a scalar or an indexable object of length 1, then the same \(\mathrm{reg_{m}}\) is applied to both marginal relaxations. The entropic balanced OT can be recovered using \(\mathrm{reg_{m}}=float("inf")\). For semi-relaxed case, use either \(\mathrm{reg_{m}}=(float("inf"), scalar)\) or \(\mathrm{reg_{m}}=(scalar, float("inf"))\). If \(\mathrm{reg_{m}}\) is an array, it must have the same backend as input arrays (a, b, M).

  • method (str) – method used for the solver either ‘sinkhorn’, ‘sinkhorn_stabilized’, ‘sinkhorn_translation_invariant’ or ‘sinkhorn_reg_scaling’, see those function for specific parameters

  • reg_type (string, optional) –

    Regularizer term. Can take two values:

    • Negative entropy: ‘entropy’: \(\Omega(\gamma) = \sum_{i,j} \gamma_{i,j} \log(\gamma_{i,j}) - \sum_{i,j} \gamma_{i,j}\). This is equivalent (up to a constant) to \(\Omega(\gamma) = \text{KL}(\gamma, 1_{dim_a} 1_{dim_b}^T)\).

    • Kullback-Leibler divergence (default): ‘kl’: \(\Omega(\gamma) = \text{KL}(\gamma, \mathbf{a} \mathbf{b}^T)\).

  • c (array-like, shape (dim_a, dim_b), optional (default=None)) – Reference measure for the regularization. If None, then use \(\mathbf{c} = \mathbf{a} \mathbf{b}^T\). If \(\texttt{reg_type}=\)’entropy’, then \(\mathbf{c} = 1_{dim_a} 1_{dim_b}^T\).

  • warmstart (tuple of arrays, shape (dim_a, dim_b), optional) – Initialization of dual potentials. If provided, the dual potentials should be given (that is the logarithm of the u, v sinkhorn scaling vectors).

  • numItermax (int, optional) – Max number of iterations

  • stopThr (float, optional) – Stop threshold on error (>0)

  • verbose (bool, optional) – Print information along iterations

  • log (bool, optional) – record log if True

Returns:

  • if n_hists == 1

    • gammaarray-like, shape(dim_a, dim_b)

      Optimal transportation matrix for the given parameters

    • logdict

      log dictionary returned only if log is True

  • else

    • ot_distancearray-like, shape (n_hists,)

      the OT distance between \(\mathbf{a}\) and each of the histograms \(\mathbf{b}_i\)

    • logdict

      log dictionary returned only if log is True

Examples

>>> import ot
>>> import numpy as np
>>> a=[.5, .5]
>>> b=[.5, .5]
>>> M=[[0., 1.], [1., 0.]]
>>> np.round(ot.sinkhorn_unbalanced(a, b, M, 1, 1), 7)
array([[0.3220536, 0.1184769],
       [0.1184769, 0.3220536]])

References

See also

ot.unbalanced.sinkhorn_knopp_unbalanced

Unbalanced Classic Sinkhorn [10]

ot.unbalanced.sinkhorn_stabilized_unbalanced

Unbalanced Stabilized sinkhorn [9, 10]

ot.unbalanced.sinkhorn_reg_scaling_unbalanced

Unbalanced Sinkhorn with epsilon scaling [9, 10]

ot.unbalanced.sinkhorn_unbalanced_translation_invariant

Translation Invariant Unbalanced Sinkhorn [73]

ot.sinkhorn_unbalanced2(a, b, M, reg, reg_m, method='sinkhorn', reg_type='kl', c=None, warmstart=None, returnCost='linear', numItermax=1000, stopThr=1e-06, verbose=False, log=False, **kwargs)[source]

Solve the entropic regularization unbalanced optimal transport problem and return the cost

The function solves the following optimization problem:

\[ \begin{align}\begin{aligned}\min_\gamma \quad \langle \gamma, \mathbf{M} \rangle_F + \mathrm{reg} \cdot \mathrm{KL}(\gamma, \mathbf{c}) + \mathrm{reg_{m1}} \cdot \mathrm{KL}(\gamma \mathbf{1}, \mathbf{a}) + \mathrm{reg_{m2}} \cdot \mathrm{KL}(\gamma^T \mathbf{1}, \mathbf{b})\\s.t. \gamma\geq 0\end{aligned}\end{align} \]

where :

  • \(\mathbf{M}\) is the (dim_a, dim_b) metric cost matrix

  • \(\mathbf{a}\) and \(\mathbf{b}\) are source and target unbalanced distributions

  • \(\mathbf{c}\) is a reference distribution for the regularization

  • KL is the Kullback-Leibler divergence

The algorithm used for solving the problem is the generalized Sinkhorn-Knopp matrix scaling algorithm as proposed in [10, 25]

Warning

Starting from version 0.9.5, the default value has been changed to reg_type=’kl’ instead of reg_type=’entropy’. This makes the function more consistent with the literature and the other solvers. If you want to use the entropy regularization, please set reg_type=’entropy’ explicitly.

Parameters:
  • a (array-like, shape (dim_a,)) – Unnormalized histogram of dimension dim_a If a is an empty list or array ([]), then a is set to uniform distribution.

  • b (array-like, shape (dim_b,)) – One or multiple unnormalized histograms of dimension dim_b. If b is an empty list or array ([]), then b is set to uniform distribution. If many, compute all the OT costs \((\mathbf{a}, \mathbf{b}_i)_i\)

  • M (array-like, shape (dim_a, dim_b)) – loss matrix

  • reg (float) – Entropy regularization term > 0

  • reg_m (float or indexable object of length 1 or 2) – Marginal relaxation term. If \(\mathrm{reg_{m}}\) is a scalar or an indexable object of length 1, then the same \(\mathrm{reg_{m}}\) is applied to both marginal relaxations. The entropic balanced OT can be recovered using \(\mathrm{reg_{m}}=float("inf")\). For semi-relaxed case, use either \(\mathrm{reg_{m}}=(float("inf"), scalar)\) or \(\mathrm{reg_{m}}=(scalar, float("inf"))\). If \(\mathrm{reg_{m}}\) is an array, it must have the same backend as input arrays (a, b, M).

  • method (str) – method used for the solver either ‘sinkhorn’, ‘sinkhorn_stabilized’, ‘sinkhorn_translation_invariant’ or ‘sinkhorn_reg_scaling’, see those function for specific parameters

  • reg_type (string, optional) –

    Regularizer term. Can take two values:

    • Negative entropy: ‘entropy’: \(\Omega(\gamma) = \sum_{i,j} \gamma_{i,j} \log(\gamma_{i,j}) - \sum_{i,j} \gamma_{i,j}\). This is equivalent (up to a constant) to \(\Omega(\gamma) = \text{KL}(\gamma, 1_{dim_a} 1_{dim_b}^T)\).

    • Kullback-Leibler divergence: ‘kl’: \(\Omega(\gamma) = \text{KL}(\gamma, \mathbf{a} \mathbf{b}^T)\).

  • c (array-like, shape (dim_a, dim_b), optional (default=None)) – Reference measure for the regularization. If None, then use \(\mathbf{c} = \mathbf{a} \mathbf{b}^T\). If \(\texttt{reg_type}=\)’entropy’, then \(\mathbf{c} = 1_{dim_a} 1_{dim_b}^T\).

  • warmstart (tuple of arrays, shape (dim_a, dim_b), optional) – Initialization of dual potentials. If provided, the dual potentials should be given (that is the logarithm of the u,v sinkhorn scaling vectors).

  • returnCost (string, optional (default = "linear")) – If returnCost = “linear”, then return the linear part of the unbalanced OT loss. If returnCost = “total”, then return the total unbalanced OT loss.

  • numItermax (int, optional) – Max number of iterations

  • stopThr (float, optional) – Stop threshold on error (>0)

  • verbose (bool, optional) – Print information along iterations

  • log (bool, optional) – record log if True

Returns:

  • ot_cost (array-like, shape (n_hists,)) – the OT cost between \(\mathbf{a}\) and each of the histograms \(\mathbf{b}_i\)

  • log (dict) – log dictionary returned only if log is True

Examples

>>> import ot
>>> import numpy as np
>>> a=[.5, .10]
>>> b=[.5, .5]
>>> M=[[0., 1.],[1., 0.]]
>>> np.round(ot.unbalanced.sinkhorn_unbalanced2(a, b, M, 1., 1.), 8)
0.19600125

References

See also

ot.unbalanced.sinkhorn_knopp

Unbalanced Classic Sinkhorn [10]

ot.unbalanced.sinkhorn_stabilized

Unbalanced Stabilized sinkhorn [9, 10]

ot.unbalanced.sinkhorn_reg_scaling

Unbalanced Sinkhorn with epsilon scaling [9, 10]

ot.unbalanced.sinkhorn_unbalanced_translation_invariant

Translation Invariant Unbalanced Sinkhorn [73]

ot.sliced_wasserstein_distance(X_s, X_t, a=None, b=None, n_projections=50, p=2, projections=None, seed=None, log=False)[source]

Computes a Monte-Carlo approximation of the p-Sliced Wasserstein distance

\[\mathcal{SWD}_p(\mu, \nu) = \underset{\theta \sim \mathcal{U}(\mathbb{S}^{d-1})}{\mathbb{E}}\left(\mathcal{W}_p^p(\theta_\# \mu, \theta_\# \nu)\right)^{\frac{1}{p}}\]

where :

  • \(\theta_\# \mu\) stands for the pushforwards of the projection \(X \in \mathbb{R}^d \mapsto \langle \theta, X \rangle\)

Parameters:
  • X_s (ndarray, shape (n_samples_a, dim)) – samples in the source domain

  • X_t (ndarray, shape (n_samples_b, dim)) – samples in the target domain

  • a (ndarray, shape (n_samples_a,), optional) – samples weights in the source domain

  • b (ndarray, shape (n_samples_b,), optional) – samples weights in the target domain

  • n_projections (int, optional) – Number of projections used for the Monte-Carlo approximation

  • p (float, optional =) – Power p used for computing the sliced Wasserstein

  • projections (shape (dim, n_projections), optional) – Projection matrix (n_projections and seed are not used in this case)

  • seed (int or RandomState or None, optional) – Seed used for random number generator

  • log (bool, optional) – if True, sliced_wasserstein_distance returns the projections used and their associated EMD.

Returns:

  • cost (float) – Sliced Wasserstein Cost

  • log (dict, optional) – log dictionary return only if log==True in parameters

Examples

>>> n_samples_a = 20
>>> X = np.random.normal(0., 1., (n_samples_a, 5))
>>> sliced_wasserstein_distance(X, X, seed=0)  
0.0

References

ot.sliced_wasserstein_sphere(X_s, X_t, a=None, b=None, n_projections=50, p=2, projections=None, seed=None, log=False)[source]

Compute the spherical sliced-Wasserstein discrepancy.

\[SSW_p(\mu,\nu) = \left(\int_{\mathbb{V}_{d,2}} W_p^p(P^U_\#\mu, P^U_\#\nu)\ \mathrm{d}\sigma(U)\right)^{\frac{1}{p}}\]

where:

  • \(P^U_\# \mu\) stands for the pushforwards of the projection \(\forall x\in S^{d-1},\ P^U(x) = \frac{U^Tx}{\|U^Tx\|_2}\)

The function runs on backend but tensorflow and jax are not supported.

Parameters:
  • X_s (ndarray, shape (n_samples_a, dim)) – Samples in the source domain

  • X_t (ndarray, shape (n_samples_b, dim)) – Samples in the target domain

  • a (ndarray, shape (n_samples_a,), optional) – samples weights in the source domain

  • b (ndarray, shape (n_samples_b,), optional) – samples weights in the target domain

  • n_projections (int, optional) – Number of projections used for the Monte-Carlo approximation

  • p (float, optional (default=2)) – Power p used for computing the spherical sliced Wasserstein

  • projections (shape (n_projections, dim, 2), optional) – Projection matrix (n_projections and seed are not used in this case)

  • seed (int or RandomState or None, optional) – Seed used for random number generator

  • log (bool, optional) – if True, sliced_wasserstein_sphere returns the projections used and their associated EMD.

Returns:

  • cost (float) – Spherical Sliced Wasserstein Cost

  • log (dict, optional) – log dictionary return only if log==True in parameters

Examples

>>> n_samples_a = 20
>>> X = np.random.normal(0., 1., (n_samples_a, 5))
>>> X = X / np.sqrt(np.sum(X**2, -1, keepdims=True))
>>> sliced_wasserstein_sphere(X, X, seed=0)  
0.0

References

ot.sliced_wasserstein_sphere_unif(X_s, a=None, n_projections=50, seed=None, log=False)[source]

Compute the 2-spherical sliced wasserstein w.r.t. a uniform distribution.

\[SSW_2(\mu_n, \nu)\]

where

  • \(\mu_n=\sum_{i=1}^n \alpha_i \delta_{x_i}\)

  • \(\nu=\mathrm{Unif}(S^1)\)

Parameters:
  • X_s (ndarray, shape (n_samples_a, dim)) – Samples in the source domain

  • a (ndarray, shape (n_samples_a,), optional) – samples weights in the source domain

  • n_projections (int, optional) – Number of projections used for the Monte-Carlo approximation

  • seed (int or RandomState or None, optional) – Seed used for random number generator

  • log (bool, optional) – if True, sliced_wasserstein_distance returns the projections used and their associated EMD.

Returns:

  • cost (float) – Spherical Sliced Wasserstein Cost

  • log (dict, optional) – log dictionary return only if log==True in parameters

Examples

>>> np.random.seed(42)
>>> x0 = np.random.randn(500,3)
>>> x0 = x0 / np.sqrt(np.sum(x0**2, -1, keepdims=True))
>>> ssw = sliced_wasserstein_sphere_unif(x0, seed=42)
>>> np.allclose(sliced_wasserstein_sphere_unif(x0, seed=42), 0.01734, atol=1e-3)
True

References:

ot.solve(M, a=None, b=None, reg=None, c=None, reg_type='KL', unbalanced=None, unbalanced_type='KL', method=None, n_threads=1, max_iter=None, plan_init=None, potentials_init=None, tol=None, verbose=False, grad='autodiff')[source]

Solve the discrete optimal transport problem and return OTResult object

The function solves the following general optimal transport problem

\[\min_{\mathbf{T}\geq 0} \quad \sum_{i,j} T_{i,j}M_{i,j} + \lambda_r R(\mathbf{T}) + \lambda_1 U(\mathbf{T}\mathbf{1},\mathbf{a}) + \lambda_2 U(\mathbf{T}^T\mathbf{1},\mathbf{b})\]

The regularization is selected with reg (\(\lambda_r\)) and reg_type. By default reg=None and there is no regularization. The unbalanced marginal penalization can be selected with unbalanced (\((\lambda_1, \lambda_2)\)) and unbalanced_type. By default unbalanced=None and the function solves the exact optimal transport problem (respecting the marginals).

Parameters:
  • M (array-like, shape (dim_a, dim_b)) – Loss matrix

  • a (array-like, shape (dim_a,), optional) – Samples weights in the source domain (default is uniform)

  • b (array-like, shape (dim_b,), optional) – Samples weights in the source domain (default is uniform)

  • reg (float, optional) – Regularization weight \(\lambda_r\), by default None (no reg., exact OT)

  • c (array-like, shape (dim_a, dim_b), optional (default=None)) – Reference measure for the regularization. If None, then use \(\mathbf{c} = \mathbf{a} \mathbf{b}^T\). If \(\texttt{reg_type}=\)’entropy’, then \(\mathbf{c} = 1_{dim_a} 1_{dim_b}^T\).

  • reg_type (str, optional) – Type of regularization \(R\) either “KL”, “L2”, “entropy”, by default “KL”. a tuple of functions can be provided for general solver (see cg). This is only used when reg!=None.

  • unbalanced (float or indexable object of length 1 or 2) – Marginal relaxation term. If it is a scalar or an indexable object of length 1, then the same relaxation is applied to both marginal relaxations. The balanced OT can be recovered using \(unbalanced=float("inf")\). For semi-relaxed case, use either \(unbalanced=(float("inf"), scalar)\) or \(unbalanced=(scalar, float("inf"))\). If unbalanced is an array, it must have the same backend as input arrays (a, b, M).

  • unbalanced_type (str, optional) – Type of unbalanced penalization function \(U\) either “KL”, “L2”, “TV”, by default “KL”.

  • method (str, optional) – Method for solving the problem when multiple algorithms are available, default None for automatic selection.

  • n_threads (int, optional) – Number of OMP threads for exact OT solver, by default 1

  • max_iter (int, optional) – Maximum number of iterations, by default None (default values in each solvers)

  • plan_init (array-like, shape (dim_a, dim_b), optional) – Initialization of the OT plan for iterative methods, by default None

  • potentials_init ((array-like(dim_a,),array-like(dim_b,)), optional) – Initialization of the OT dual potentials for iterative methods, by default None

  • tol (_type_, optional) – Tolerance for solution precision, by default None (default values in each solvers)

  • verbose (bool, optional) – Print information in the solver, by default False

  • grad (str, optional) – Type of gradient computation, either or ‘autodiff’, ‘envelope’ or ‘last_step’ used only for Sinkhorn solver. By default ‘autodiff’ provides gradients wrt all outputs (plan, value, value_linear) but with important memory cost. ‘envelope’ provides gradients only for value and and other outputs are detached. This is useful for memory saving when only the value is needed. ‘last_step’ provides gradients only for the last iteration of the Sinkhorn solver, but provides gradient for both the OT plan and the objective values. ‘detach’ does not compute the gradients for the Sinkhorn solver.

Returns:

res – Result of the optimization problem. The information can be obtained as follows:

  • res.plan : OT plan \(\mathbf{T}\)

  • res.potentials : OT dual potentials

  • res.value : Optimal value of the optimization problem

  • res.value_linear : Linear OT loss with the optimal OT plan

See OTResult for more information.

Return type:

OTResult()

Notes

The following methods are available for solving the OT problems:

  • Classical exact OT problem [1] (default parameters) :

\[ \begin{align}\begin{aligned}\min_\mathbf{T} \quad \langle \mathbf{T}, \mathbf{M} \rangle_F\\s.t. \ \mathbf{T} \mathbf{1} = \mathbf{a}\\ \mathbf{T}^T \mathbf{1} = \mathbf{b}\\ \mathbf{T} \geq 0\end{aligned}\end{align} \]

can be solved with the following code:

res = ot.solve(M, a, b)
  • Entropic regularized OT [2] (when reg!=None):

\[ \begin{align}\begin{aligned}\min_\mathbf{T} \quad \langle \mathbf{T}, \mathbf{M} \rangle_F + \lambda R(\mathbf{T})\\s.t. \ \mathbf{T} \mathbf{1} = \mathbf{a}\\ \mathbf{T}^T \mathbf{1} = \mathbf{b}\\ \mathbf{T} \geq 0\end{aligned}\end{align} \]

can be solved with the following code:

# default is ``"KL"`` regularization (``reg_type="KL"``)
res = ot.solve(M, a, b, reg=1.0)
# or for original Sinkhorn paper formulation [2]
res = ot.solve(M, a, b, reg=1.0, reg_type='entropy')

# Use envelope theorem differentiation for memory saving
res = ot.solve(M, a, b, reg=1.0, grad='envelope') # M, a, b are torch tensors
res.value.backward() # only the value is differentiable

Note that by default the Sinkhorn solver uses automatic differentiation to compute the gradients of the values and plan. This can be changed with the grad parameter. The envelope mode computes the gradients only for the value and the other outputs are detached. This is useful for memory saving when only the gradient of value is needed.

  • Quadratic regularized OT [17] (when reg!=None and reg_type="L2"):

\[ \begin{align}\begin{aligned}\min_\mathbf{T} \quad \langle \mathbf{T}, \mathbf{M} \rangle_F + \lambda R(\mathbf{T})\\s.t. \ \mathbf{T} \mathbf{1} = \mathbf{a}\\ \mathbf{T}^T \mathbf{1} = \mathbf{b}\\ \mathbf{T} \geq 0\end{aligned}\end{align} \]

can be solved with the following code:

res = ot.solve(M,a,b,reg=1.0,reg_type='L2')
  • Unbalanced OT [41] (when unbalanced!=None):

\[\min_{\mathbf{T}\geq 0} \quad \sum_{i,j} T_{i,j}M_{i,j} + \lambda_1 U(\mathbf{T}\mathbf{1},\mathbf{a}) + \lambda_2 U(\mathbf{T}^T\mathbf{1},\mathbf{b})\]

can be solved with the following code:

# default is ``"KL"``
res = ot.solve(M,a,b,unbalanced=1.0)
# quadratic unbalanced OT
res = ot.solve(M,a,b,unbalanced=1.0,unbalanced_type='L2')
# TV = partial OT
res = ot.solve(M,a,b,unbalanced=1.0,unbalanced_type='TV')
  • Regularized unbalanced regularized OT [34] (when unbalanced!=None and reg!=None):

\[\min_{\mathbf{T}\geq 0} \quad \sum_{i,j} T_{i,j}M_{i,j} + \lambda_r R(\mathbf{T}) + \lambda_1 U(\mathbf{T}\mathbf{1},\mathbf{a}) + \lambda_2 U(\mathbf{T}^T\mathbf{1},\mathbf{b})\]

can be solved with the following code:

# default is ``"KL"`` for both
res = ot.solve(M,a,b,reg=1.0,unbalanced=1.0)
# quadratic unbalanced OT with KL regularization
res = ot.solve(M,a,b,reg=1.0,unbalanced=1.0,unbalanced_type='L2')
# both quadratic
res = ot.solve(M,a,b,reg=1.0, reg_type='L2',unbalanced=1.0,unbalanced_type='L2')

References

ot.solve_gromov(Ca, Cb, M=None, a=None, b=None, loss='L2', symmetric=None, alpha=0.5, reg=None, reg_type='entropy', unbalanced=None, unbalanced_type='KL', n_threads=1, method=None, max_iter=None, plan_init=None, tol=None, verbose=False)[source]

Solve the discrete (Fused) Gromov-Wasserstein and return OTResult object

The function solves the following optimization problem:

\[\min_{\mathbf{T}\geq 0} \quad (1 - \alpha) \langle \mathbf{T}, \mathbf{M} \rangle_F + \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} + \lambda_r R(\mathbf{T}) + \lambda_u U(\mathbf{T}\mathbf{1},\mathbf{a}) + \lambda_u U(\mathbf{T}^T\mathbf{1},\mathbf{b})\]

The regularization is selected with reg (\(\lambda_r\)) and reg_type. By default reg=None and there is no regularization. The unbalanced marginal penalization can be selected with unbalanced (\(\lambda_u\)) and unbalanced_type. By default unbalanced=None and the function solves the exact optimal transport problem (respecting the marginals).

Parameters:
  • Ca (array-like, shape (dim_a, dim_a)) – Cost matrix in the source domain

  • Cb (array-like, shape (dim_b, dim_b)) – Cost matrix in the target domain

  • M (array-like, shape (dim_a, dim_b), optional) – Linear cost matrix for Fused Gromov-Wasserstein (default is None).

  • a (array-like, shape (dim_a,), optional) – Samples weights in the source domain (default is uniform)

  • b (array-like, shape (dim_b,), optional) – Samples weights in the source domain (default is uniform)

  • loss (str, optional) – Type of loss function, either "L2" or "KL", by default "L2"

  • symmetric (bool, optional) – Use symmetric version of the Gromov-Wasserstein problem, by default None tests whether the matrices are symmetric or True/False to avoid the test.

  • reg (float, optional) – Regularization weight \(\lambda_r\), by default None (no reg., exact OT)

  • reg_type (str, optional) – Type of regularization \(R\), by default “entropy” (only used when reg!=None)

  • alpha (float, optional) – Weight the quadratic term (alpha*Gromov) and the linear term ((1-alpha)*Wass) in the Fused Gromov-Wasserstein problem. Not used for Gromov problem (when M is not provided). By default alpha=None corresponds to alpha=1 for Gromov problem (M==None) and alpha=0.5 for Fused Gromov-Wasserstein problem (M!=None)

  • unbalanced (float, optional) – Unbalanced penalization weight \(\lambda_u\), by default None (balanced OT), Not implemented yet

  • unbalanced_type (str, optional) – Type of unbalanced penalization function \(U\) either “KL”, “semirelaxed”, “partial”, by default “KL” but note that it is not implemented yet.

  • n_threads (int, optional) – Number of OMP threads for exact OT solver, by default 1

  • method (str, optional) – Method for solving the problem when multiple algorithms are available, default None for automatic selection.

  • max_iter (int, optional) – Maximum number of iterations, by default None (default values in each solvers)

  • plan_init (array-like, shape (dim_a, dim_b), optional) – Initialization of the OT plan for iterative methods, by default None

  • tol (float, optional) – Tolerance for solution precision, by default None (default values in each solvers)

  • verbose (bool, optional) – Print information in the solver, by default False

Returns:

res – Result of the optimization problem. The information can be obtained as follows:

  • res.plan : OT plan \(\mathbf{T}\)

  • res.potentials : OT dual potentials

  • res.value : Optimal value of the optimization problem

  • res.value_linear : Linear OT loss with the optimal OT plan

  • res.value_quad : Quadratic (GW) part of the OT loss with the optimal OT plan

See OTResult for more information.

Return type:

OTResult()

Notes

The following methods are available for solving the Gromov-Wasserstein problem:

  • Classical Gromov-Wasserstein (GW) problem [3] (default parameters):

\[ \begin{align}\begin{aligned}\min_{\mathbf{T}\geq 0} \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j}\mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} = \mathbf{a}\\ \mathbf{T}^T \mathbf{1} = \mathbf{b}\\ \mathbf{T} \geq 0\end{aligned}\end{align} \]

can be solved with the following code:

res = ot.solve_gromov(Ca, Cb) # uniform weights
res = ot.solve_gromov(Ca, Cb, a=a, b=b) # given weights
res = ot.solve_gromov(Ca, Cb, loss='KL') # KL loss

plan = res.plan # GW plan
value = res.value # GW value
  • Fused Gromov-Wasserstein (FGW) problem [24] (when M!=None):

\[ \begin{align}\begin{aligned}\min_{\mathbf{T}\geq 0} \quad (1 - \alpha) \langle \mathbf{T}, \mathbf{M} \rangle_F + \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j}\mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} = \mathbf{a}\\ \mathbf{T}^T \mathbf{1} = \mathbf{b}\\ \mathbf{T} \geq 0\end{aligned}\end{align} \]

can be solved with the following code:

res = ot.solve_gromov(Ca, Cb, M) # uniform weights, alpha=0.5 (default)
res = ot.solve_gromov(Ca, Cb, M, a=a, b=b, alpha=0.1) # given weights and alpha

plan = res.plan # FGW plan
loss_linear_term = res.value_linear # Wasserstein part of the loss
loss_quad_term = res.value_quad # Gromov part of the loss
loss = res.value # FGW value
  • Regularized (Fused) Gromov-Wasserstein (GW) problem [12] (when reg!=None):

\[ \begin{align}\begin{aligned}\min_{\mathbf{T}\geq 0} \quad (1 - \alpha) \langle \mathbf{T}, \mathbf{M} \rangle_F + \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j}\mathbf{T}_{k,l} + \lambda_r R(\mathbf{T})\\s.t. \ \mathbf{T} \mathbf{1} = \mathbf{a}\\ \mathbf{T}^T \mathbf{1} = \mathbf{b}\\ \mathbf{T} \geq 0\end{aligned}\end{align} \]

can be solved with the following code:

res = ot.solve_gromov(Ca, Cb, reg=1.0) # GW entropy regularization (default)
res = ot.solve_gromov(Ca, Cb, M, a=a, b=b, reg=10, alpha=0.1) # FGW with entropy

plan = res.plan # FGW plan
loss_linear_term = res.value_linear # Wasserstein part of the loss
loss_quad_term = res.value_quad # Gromov part of the loss
loss = res.value # FGW value (including regularization)
  • Semi-relaxed (Fused) Gromov-Wasserstein (GW) [48] (when unbalanced='semirelaxed'):

\[ \begin{align}\begin{aligned}\min_{\mathbf{T}\geq 0} \quad (1 - \alpha) \langle \mathbf{T}, \mathbf{M} \rangle_F + \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j}\mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} = \mathbf{a}\\ \mathbf{T} \geq 0\end{aligned}\end{align} \]

can be solved with the following code:

res = ot.solve_gromov(Ca, Cb, unbalanced='semirelaxed') # semirelaxed GW
res = ot.solve_gromov(Ca, Cb, unbalanced='semirelaxed', reg=1) # entropic semirelaxed GW
res = ot.solve_gromov(Ca, Cb, M, unbalanced='semirelaxed', alpha=0.1) # semirelaxed FGW

plan = res.plan # FGW plan
right_marginal = res.marginal_b # right marginal of the plan
  • Partial (Fused) Gromov-Wasserstein (GW) problem [29] (when unbalanced='partial'):

\[ \begin{align}\begin{aligned}\min_{\mathbf{T}\geq 0} \quad (1 - \alpha) \langle \mathbf{T}, \mathbf{M} \rangle_F + \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j}\mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} \leq \mathbf{a}\\ \mathbf{T}^T \mathbf{1} \leq \mathbf{b}\\ \mathbf{T} \geq 0\\ \mathbf{1}^T\mathbf{T}\mathbf{1} = m\end{aligned}\end{align} \]

can be solved with the following code:

res = ot.solve_gromov(Ca, Cb, unbalanced_type='partial', unbalanced=0.8) # partial GW with m=0.8
res = ot.solve_gromov(Ca, Cb, M, unbalanced_type='partial', unbalanced=0.8, alpha=0.5) # partial FGW with m=0.8

References

ot.solve_sample(X_a, X_b, a=None, b=None, metric='sqeuclidean', reg=None, c=None, reg_type='KL', unbalanced=None, unbalanced_type='KL', lazy=False, batch_size=None, method=None, n_threads=1, max_iter=None, plan_init=None, rank=100, scaling=0.95, potentials_init=None, X_init=None, tol=None, verbose=False, grad='autodiff')[source]

Solve the discrete optimal transport problem using the samples in the source and target domains.

The function solves the following general optimal transport problem

\[\min_{\mathbf{T}\geq 0} \quad \sum_{i,j} T_{i,j}M_{i,j} + \lambda_r R(\mathbf{T}) + \lambda_u U(\mathbf{T}\mathbf{1},\mathbf{a}) + \lambda_u U(\mathbf{T}^T\mathbf{1},\mathbf{b})\]

where the cost matrix \(\mathbf{M}\) is computed from the samples in the source and target domains such that \(M_{i,j} = d(x_i,y_j)\) where \(d\) is a metric (by default the squared Euclidean distance).

The regularization is selected with reg (\(\lambda_r\)) and reg_type. By default reg=None and there is no regularization. The unbalanced marginal penalization can be selected with unbalanced (\(\lambda_u\)) and unbalanced_type. By default unbalanced=None and the function solves the exact optimal transport problem (respecting the marginals).

Parameters:
  • X_s (array-like, shape (n_samples_a, dim)) – samples in the source domain

  • X_t (array-like, shape (n_samples_b, dim)) – samples in the target domain

  • a (array-like, shape (dim_a,), optional) – Samples weights in the source domain (default is uniform)

  • b (array-like, shape (dim_b,), optional) – Samples weights in the source domain (default is uniform)

  • reg (float, optional) – Regularization weight \(\lambda_r\), by default None (no reg., exact OT)

  • c (array-like, shape (dim_a, dim_b), optional (default=None)) – Reference measure for the regularization. If None, then use \(\mathbf{c} = \mathbf{a} \mathbf{b}^T\). If \(\texttt{reg_type}=\)’entropy’, then \(\mathbf{c} = 1_{dim_a} 1_{dim_b}^T\).

  • reg_type (str, optional) – Type of regularization \(R\) either “KL”, “L2”, “entropy”, by default “KL”

  • unbalanced (float or indexable object of length 1 or 2) – Marginal relaxation term. If it is a scalar or an indexable object of length 1, then the same relaxation is applied to both marginal relaxations. The balanced OT can be recovered using \(unbalanced=float("inf")\). For semi-relaxed case, use either \(unbalanced=(float("inf"), scalar)\) or \(unbalanced=(scalar, float("inf"))\). If unbalanced is an array, it must have the same backend as input arrays (a, b, M).

  • unbalanced_type (str, optional) – Type of unbalanced penalization function \(U\) either “KL”, “L2”, “TV”, by default “KL”

  • lazy (bool, optional) – Return OTResultlazy object to reduce memory cost when True, by default False

  • batch_size (int, optional) – Batch size for lazy solver, by default None (default values in each solvers)

  • method (str, optional) – Method for solving the problem, this can be used to select the solver for unbalanced problems (see ot.solve), or to select a specific large scale solver.

  • n_threads (int, optional) – Number of OMP threads for exact OT solver, by default 1

  • max_iter (int, optional) – Maximum number of iteration, by default None (default values in each solvers)

  • plan_init (array-like, shape (dim_a, dim_b), optional) – Initialization of the OT plan for iterative methods, by default None

  • rank (int, optional) – Rank of the OT matrix for lazy solers (method=’factored’), by default 100

  • scaling (float, optional) – Scaling factor for the epsilon scaling lazy solvers (method=’geomloss’), by default 0.95

  • potentials_init ((array-like(dim_a,),array-like(dim_b,)), optional) – Initialization of the OT dual potentials for iterative methods, by default None

  • tol (_type_, optional) – Tolerance for solution precision, by default None (default values in each solvers)

  • verbose (bool, optional) – Print information in the solver, by default False

  • grad (str, optional) – Type of gradient computation, either or ‘autodiff’ or ‘envelope’ used only for Sinkhorn solver. By default ‘autodiff’ provides gradients wrt all outputs (plan, value, value_linear) but with important memory cost. ‘envelope’ provides gradients only for value and and other outputs are detached. This is useful for memory saving when only the value is needed.

Returns:

res – Result of the optimization problem. The information can be obtained as follows:

  • res.plan : OT plan \(\mathbf{T}\)

  • res.potentials : OT dual potentials

  • res.value : Optimal value of the optimization problem

  • res.value_linear : Linear OT loss with the optimal OT plan

  • res.lazy_plan : Lazy OT plan (when lazy=True or lazy method)

See OTResult for more information.

Return type:

OTResult()

Notes

The following methods are available for solving the OT problems:

  • Classical exact OT problem [1] (default parameters) :

\[ \begin{align}\begin{aligned}\min_\mathbf{T} \quad \langle \mathbf{T}, \mathbf{M} \rangle_F\\s.t. \ \mathbf{T} \mathbf{1} = \mathbf{a}\\ \mathbf{T}^T \mathbf{1} = \mathbf{b}\\ \mathbf{T} \geq 0, M_{i,j} = d(x_i,y_j)\end{aligned}\end{align} \]

can be solved with the following code:

res = ot.solve_sample(xa, xb, a, b)

# for uniform weights
res = ot.solve_sample(xa, xb)
  • Entropic regularized OT [2] (when reg!=None):

\[ \begin{align}\begin{aligned}\min_\mathbf{T} \quad \langle \mathbf{T}, \mathbf{M} \rangle_F + \lambda R(\mathbf{T})\\s.t. \ \mathbf{T} \mathbf{1} = \mathbf{a}\\ \mathbf{T}^T \mathbf{1} = \mathbf{b}\\ \mathbf{T} \geq 0, M_{i,j} = d(x_i,y_j)\end{aligned}\end{align} \]

can be solved with the following code:

# default is ``"KL"`` regularization (``reg_type="KL"``)
res = ot.solve_sample(xa, xb, a, b, reg=1.0)
# or for original Sinkhorn paper formulation [2]
res = ot.solve_sample(xa, xb, a, b, reg=1.0, reg_type='entropy')

# lazy solver of memory complexity O(n)
res = ot.solve_sample(xa, xb, a, b, reg=1.0, lazy=True, batch_size=100)
# lazy OT plan
lazy_plan = res.lazy_plan

# Use envelope theorem differentiation for memory saving
res = ot.solve_sample(xa, xb, a, b, reg=1.0, grad='envelope')
res.value.backward() # only the value is differentiable

Note that by default the Sinkhorn solver uses automatic differentiation to compute the gradients of the values and plan. This can be changed with the grad parameter. The envelope mode computes the gradients only for the value and the other outputs are detached. This is useful for memory saving when only the gradient of value is needed.

We also have a very efficient solver with compiled CPU/CUDA code using geomloss/PyKeOps that can be used with the following code:

# automatic solver
res = ot.solve_sample(xa, xb, a, b, reg=1.0, method='geomloss')

# force O(n) memory efficient solver
res = ot.solve_sample(xa, xb, a, b, reg=1.0, method='geomloss_online')

# force pre-computed cost matrix
res = ot.solve_sample(xa, xb, a, b, reg=1.0, method='geomloss_tensorized')

# use multiscale solver
res = ot.solve_sample(xa, xb, a, b, reg=1.0, method='geomloss_multiscale')

# One can play with speed (small scaling factor) and precision (scaling close to 1)
res = ot.solve_sample(xa, xb, a, b, reg=1.0, method='geomloss', scaling=0.5)
  • Quadratic regularized OT [17] (when reg!=None and reg_type="L2"):

\[ \begin{align}\begin{aligned}\min_\mathbf{T} \quad \langle \mathbf{T}, \mathbf{M} \rangle_F + \lambda R(\mathbf{T})\\s.t. \ \mathbf{T} \mathbf{1} = \mathbf{a}\\ \mathbf{T}^T \mathbf{1} = \mathbf{b}\\ \mathbf{T} \geq 0, M_{i,j} = d(x_i,y_j)\end{aligned}\end{align} \]

can be solved with the following code:

res = ot.solve_sample(xa, xb, a, b, reg=1.0, reg_type='L2')
  • Unbalanced OT [41] (when unbalanced!=None):

\[ \begin{align}\begin{aligned}\min_{\mathbf{T}\geq 0} \quad \sum_{i,j} T_{i,j}M_{i,j} + \lambda_u U(\mathbf{T}\mathbf{1},\mathbf{a}) + \lambda_u U(\mathbf{T}^T\mathbf{1},\mathbf{b})\\\text{with} \ M_{i,j} = d(x_i,y_j)\end{aligned}\end{align} \]

can be solved with the following code:

# default is ``"KL"``
res = ot.solve_sample(xa, xb, a, b, unbalanced=1.0)
# quadratic unbalanced OT
res = ot.solve_sample(xa, xb, a, b, unbalanced=1.0,unbalanced_type='L2')
# TV = partial OT
res = ot.solve_sample(xa, xb, a, b, unbalanced=1.0,unbalanced_type='TV')
  • Regularized unbalanced regularized OT [34] (when unbalanced!=None and reg!=None):

\[ \begin{align}\begin{aligned}\min_{\mathbf{T}\geq 0} \quad \sum_{i,j} T_{i,j}M_{i,j} + \lambda_r R(\mathbf{T}) + \lambda_u U(\mathbf{T}\mathbf{1},\mathbf{a}) + \lambda_u U(\mathbf{T}^T\mathbf{1},\mathbf{b})\\\text{with} \ M_{i,j} = d(x_i,y_j)\end{aligned}\end{align} \]

can be solved with the following code:

# default is ``"KL"`` for both
res = ot.solve_sample(xa, xb, a, b, reg=1.0, unbalanced=1.0)
# quadratic unbalanced OT with KL regularization
res = ot.solve_sample(xa, xb, a, b, reg=1.0, unbalanced=1.0,unbalanced_type='L2')
# both quadratic
res = ot.solve_sample(xa, xb, a, b, reg=1.0, reg_type='L2',
unbalanced=1.0, unbalanced_type='L2')
  • Factored OT [2] (when method='factored'):

This method solve the following OT problem [40]_

\[\mathop{\arg \min}_\mu \quad W_2^2(\mu_a,\mu)+ W_2^2(\mu,\mu_b)\]

where $mu$ is a uniform weighted empirical distribution of \(\mu_a\) and \(\mu_b\) are the empirical measures associated to the samples in the source and target domains, and \(W_2\) is the Wasserstein distance. This problem is solved using exact OT solvers for reg=None and the Sinkhorn solver for reg!=None. The solution provides two transport plans that can be used to recover a low rank OT plan between the two distributions.

res = ot.solve_sample(xa, xb, method='factored', rank=10)

# recover the lazy low rank plan
factored_solution_lazy = res.lazy_plan

# recover the full low rank plan
factored_solution = factored_solution_lazy[:]
  • Gaussian Bures-Wasserstein [2] (when method='gaussian'):

This method computes the Gaussian Bures-Wasserstein distance between two Gaussian distributions estimated from the empirical distributions

\[\mathcal{W}(\mu_s, \mu_t)_2^2= \left\lVert \mathbf{m}_s - \mathbf{m}_t \right\rVert^2 + \mathcal{B}(\Sigma_s, \Sigma_t)^{2}\]

where :

\[\mathbf{B}(\Sigma_s, \Sigma_t)^{2} = \text{Tr}\left(\Sigma_s + \Sigma_t - 2 \sqrt{\Sigma_s^{1/2}\Sigma_t\Sigma_s^{1/2}} \right)\]

The covariances and means are estimated from the data.

res = ot.solve_sample(xa, xb, method='gaussian')

# recover the squared Gaussian Bures-Wasserstein distance
BW_dist = res.value
  • Wasserstein 1d [1] (when method='1D'):

This method computes the Wasserstein distance between two 1d distributions estimated from the empirical distributions. For multivariate data the distances are computed independently for each dimension.

res = ot.solve_sample(xa, xb, method='1D')

# recover the squared Wasserstein distances
W_dists = res.value

References

ot.tic()[source]

Python implementation of Matlab tic() function

ot.toc(message='Elapsed time : {} s')[source]

Python implementation of Matlab toc() function

ot.toq()[source]

Python implementation of Julia toc() function

ot.unif(n, type_as=None)[source]

Return a uniform histogram of length n (simplex).

Parameters:
  • n (int) – number of bins in the histogram

  • type_as (array-like) – array of the same type of the expected output (numpy/pytorch/jax)

Returns:

h – histogram of length n such that \(\forall i, \mathbf{h}_i = \frac{1}{n}\)

Return type:

array-like, shape (n,)

ot.wasserstein_1d(u_values, v_values, u_weights=None, v_weights=None, p=1, require_sort=True)[source]

Computes the 1 dimensional OT loss [15] between two (batched) empirical distributions

It is formally the p-Wasserstein distance raised to the power p. We do so in a vectorized way by first building the individual quantile functions then integrating them.

This function should be preferred to emd_1d whenever the backend is different to numpy, and when gradients over either sample positions or weights are required.

Parameters:
  • u_values (array-like, shape (n, ...)) – locations of the first empirical distribution

  • v_values (array-like, shape (m, ...)) – locations of the second empirical distribution

  • u_weights (array-like, shape (n, ...), optional) – weights of the first empirical distribution, if None then uniform weights are used

  • v_weights (array-like, shape (m, ...), optional) – weights of the second empirical distribution, if None then uniform weights are used

  • p (int, optional) – order of the ground metric used, should be at least 1 (see [2, Chap. 2], default is 1

  • require_sort (bool, optional) – sort the distributions atoms locations, if False we will consider they have been sorted prior to being passed to the function, default is True

Returns:

cost – the batched EMD

Return type:

float/array-like, shape (…)

References

ot.wasserstein_circle(u_values, v_values, u_weights=None, v_weights=None, p=1, Lm=10, Lp=10, tm=-1, tp=1, eps=1e-06, require_sort=True)[source]

Computes the Wasserstein distance on the circle using either [45] for p=1 or the binary search algorithm proposed in [44] otherwise. Samples need to be in \(S^1\cong [0,1[\). If they are on \(\mathbb{R}\), takes the value modulo 1. If the values are on \(S^1\subset\mathbb{R}^2\), it requires to first find the coordinates using e.g. the atan2 function.

General loss returned:

\[OT_{loss} = \inf_{\theta\in\mathbb{R}}\int_0^1 |cdf_u^{-1}(q) - (cdf_v-\theta)^{-1}(q)|^p\ \mathrm{d}q\]

For p=1, [45]

\[W_1(u,v) = \int_0^1 |F_u(t)-F_v(t)-LevMed(F_u-F_v)|\ \mathrm{d}t\]

For values \(x=(x_1,x_2)\in S^1\), it is required to first get their coordinates with

\[u = \frac{\pi + \mathrm{atan2}(-x_2,-x_1)}{2\pi}\]

using e.g. ot.utils.get_coordinate_circle(x)

The function runs on backend but tensorflow and jax are not supported.

Parameters:
  • u_values (ndarray, shape (n, ...)) – samples in the source domain (coordinates on [0,1[)

  • v_values (ndarray, shape (n, ...)) – samples in the target domain (coordinates on [0,1[)

  • u_weights (ndarray, shape (n, ...), optional) – samples weights in the source domain

  • v_weights (ndarray, shape (n, ...), optional) – samples weights in the target domain

  • p (float, optional (default=1)) – Power p used for computing the Wasserstein distance

  • Lm (int, optional) – Lower bound dC. For p>1.

  • Lp (int, optional) – Upper bound dC. For p>1.

  • tm (float, optional) – Lower bound theta. For p>1.

  • tp (float, optional) – Upper bound theta. For p>1.

  • eps (float, optional) – Stopping condition. For p>1.

  • require_sort (bool, optional) – If True, sort the values.

Returns:

loss – Cost associated to the optimal transportation

Return type:

float

Examples

>>> u = np.array([[0.2,0.5,0.8]])%1
>>> v = np.array([[0.4,0.5,0.7]])%1
>>> wasserstein_circle(u.T, v.T)
array([0.1])

References

ot.weak_optimal_transport(Xa, Xb, a=None, b=None, verbose=False, log=False, G0=None, **kwargs)[source]

Solves the weak optimal transport problem between two empirical distributions

\[ \begin{align}\begin{aligned}\gamma = \mathop{\arg \min}_\gamma \quad \sum_i \mathbf{a}_i \left(\mathbf{X^a}_i - \frac{1}{\mathbf{a}_i} \sum_j \gamma_{ij} \mathbf{X^b}_j \right)^2\\s.t. \ \gamma \mathbf{1} = \mathbf{a}\\ \gamma^T \mathbf{1} = \mathbf{b}\\ \gamma \geq 0\end{aligned}\end{align} \]

where :

  • \(X^a\) and \(X^b\) are the sample matrices.

  • \(\mathbf{a}\) and \(\mathbf{b}\) are the sample weights

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

Uses the conditional gradient algorithm to solve the problem proposed in [39].

Parameters:
  • Xa ((ns,d) array-like, float) – Source samples

  • Xb ((nt,d) array-like, float) – Target samples

  • a ((ns,) array-like, float) – Source histogram (uniform weight if empty list)

  • b ((nt,) array-like, float) – Target histogram (uniform weight if empty list))

  • G0 ((ns,nt) array-like, float) – initial guess (default is indep joint density)

  • numItermax (int, optional) – Max number of iterations

  • numItermaxEmd (int, optional) – Max number of iterations for emd

  • stopThr (float, optional) – Stop threshold on the relative variation (>0)

  • stopThr2 (float, optional) – Stop threshold on the absolute variation (>0)

  • verbose (bool, optional) – Print information along iterations

  • log (bool, optional) – record log if True

Returns:

  • gamma (array-like, shape (ns, nt)) – Optimal transportation matrix for the given parameters

  • log (dict, optional) – If input log is true, a dictionary containing the cost and dual variables and exit status

References

See also

ot.bregman.sinkhorn

Entropic regularized OT

ot.optim.cg

General regularized OT