\(W_{reg}(\cdot,\cdot)\) is the entropic regularized Wasserstein
distance (see ot.bregman.sinkhorn())
if method is sinkhorn or sinkhorn_stabilized or sinkhorn_log.
\(\mathbf{a}_i\) are training distributions in the columns of matrix
\(\mathbf{A}\)
reg and \(\mathbf{M}\) are respectively the regularization term and
the cost matrix for OT
The algorithm used for solving the problem is the Sinkhorn-Knopp matrix scaling
algorithm as proposed in [3]
Parameters:
A (array-like, shape (dim, n_hists)) – n_hists training distributions \(\mathbf{a}_i\) of size dim
M (array-like, shape (dim, dim)) – loss matrix for OT
Computes the Wasserstein distance on the circle using the Binary search algorithm proposed in [44].
Samples need to be in \(S^1\cong [0,1[\). If they are on \(\mathbb{R}\),
takes the value modulo 1.
If the values are on \(S^1\subset\mathbb{R}^2\), it is required to first find the coordinates
using e.g. the atan2 function.
Compute distance between samples in \(\mathbf{x_1}\) and \(\mathbf{x_2}\)
Note
This function is backend-compatible and will work on arrays
from all compatible backends.
Parameters:
x1 (array-like, shape (n1,d)) – matrix with n1 samples of size d
x2 (array-like, shape (n2,d), optional) – matrix with n2 samples of size d (if None then \(\mathbf{x_2} = \mathbf{x_1}\))
metric (str | callable, optional) – ‘sqeuclidean’ or ‘euclidean’ on all backends. On numpy the function also
accepts from the scipy.spatial.distance.cdist function : ‘braycurtis’,
‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’,
‘euclidean’, ‘hamming’, ‘jaccard’, ‘kulczynski1’, ‘mahalanobis’,
‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’,
‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘wminkowski’, ‘yule’.
p (float, optional) – p-norm for the Minkowski and the Weighted Minkowski metrics. Default value is 2.
w (array-like, rank 1) – Weights for the weighted metrics.
Returns:
M – distance matrix computed with given metric
Return type:
array-like, shape (n1, n2)
ot.emd(a, b, M, numItermax=100000, log=False, center_dual=True, numThreads=1, check_marginals=True)[source]
Solves the Earth Movers distance problem and returns the OT matrix
\(\mathbf{a}\) and \(\mathbf{b}\) are the sample weights
Warning
Note that the \(\mathbf{M}\) matrix in numpy needs to be a C-order
numpy.array in float64 format. It will be converted if not in this
format
Note
This function is backend-compatible and will work on arrays
from all compatible backends. But the algorithm uses the C++ CPU backend
which can lead to copy overhead on GPU arrays.
Note
This function will cast the computed transport plan to the data type
of the provided input with the following priority: \(\mathbf{a}\),
then \(\mathbf{b}\), then \(\mathbf{M}\) if marginals are not provided.
Casting to an integer tensor might result in a loss of precision.
If this behaviour is unwanted, please make sure to provide a
floating point input.
Note
An error will be raised if the vectors \(\mathbf{a}\) and \(\mathbf{b}\) do not sum to the same value.
a ((ns,) array-like, float) – Source histogram (uniform weight if empty list)
b ((nt,) array-like, float) – Target histogram (uniform weight if empty list)
M ((ns,nt) array-like, float) – Loss matrix (c-order array in numpy with type float64)
numItermax (int, optional (default=100000)) – The maximum number of iterations before stopping the optimization
algorithm if it has not converged.
log (bool, optional (default=False)) – If True, returns a dictionary containing the cost and dual variables.
Otherwise returns only the optimal transportation matrix.
center_dual (boolean, optional (default=True)) – If True, centers the dual potential using function
ot.lp.center_ot_dual().
numThreads (int or "max", optional (default=1, i.e. OpenMP is not used)) – If compiled with OpenMP, chooses the number of threads to parallelize.
“max” selects the highest number possible.
check_marginals (bool, optional (default=True)) – If True, checks that the marginals mass are equal. If False, skips the
check.
Returns:
gamma (array-like, shape (ns, nt)) – Optimal transportation matrix for the given
parameters
log (dict, optional) – If input log is true, a dictionary containing the
cost and dual variables and exit status
Examples
Simple example with obvious solution. The function emd accepts lists and
perform automatic conversion to numpy arrays
\(\mathbf{a}\) and \(\mathbf{b}\) are the sample weights
Note
This function is backend-compatible and will work on arrays
from all compatible backends. But the algorithm uses the C++ CPU backend
which can lead to copy overhead on GPU arrays.
Note
This function will cast the computed transport plan and
transportation loss to the data type of the provided input with the
following priority: \(\mathbf{a}\), then \(\mathbf{b}\),
then \(\mathbf{M}\) if marginals are not provided.
Casting to an integer tensor might result in a loss of precision.
If this behaviour is unwanted, please make sure to provide a
floating point input.
Note
An error will be raised if the vectors \(\mathbf{a}\) and \(\mathbf{b}\) do not sum to the same value.
a ((ns,) array-like, float64) – Source histogram (uniform weight if empty list)
b ((nt,) array-like, float64) – Target histogram (uniform weight if empty list)
M ((ns,nt) array-like, float64) – Loss matrix (for numpy c-order array with type float64)
processes (int, optional (default=1)) – Nb of processes used for multiple emd computation (deprecated)
numItermax (int, optional (default=100000)) – The maximum number of iterations before stopping the optimization
algorithm if it has not converged.
log (boolean, optional (default=False)) – If True, returns a dictionary containing dual
variables. Otherwise returns only the optimal transportation cost.
return_matrix (boolean, optional (default=False)) – If True, returns the optimal transportation matrix in the log.
center_dual (boolean, optional (default=True)) – If True, centers the dual potential using function
ot.lp.center_ot_dual().
numThreads (int or "max", optional (default=1, i.e. OpenMP is not used)) – If compiled with OpenMP, chooses the number of threads to parallelize.
“max” selects the highest number possible.
check_marginals (bool, optional (default=True)) – If True, checks that the marginals mass are equal. If False, skips the
check.
Returns:
W (float, array-like) – Optimal transportation loss for the given parameters
log (dict) – If input log is true, a dictionary containing dual
variables and exit status
Examples
Simple example with obvious solution. The function emd accepts lists and
perform automatic conversion to numpy arrays
x_a ((ns,) or (ns, 1) ndarray, float64) – Source dirac locations (on the real line)
x_b ((nt,) or (ns, 1) ndarray, float64) – Target dirac locations (on the real line)
a ((ns,) ndarray, float64, optional) – Source histogram (default is uniform weight)
b ((nt,) ndarray, float64, optional) – Target histogram (default is uniform weight)
metric (str, optional (default='sqeuclidean')) – Metric to be used. Only works with either of the strings
‘sqeuclidean’, ‘minkowski’, ‘cityblock’, or ‘euclidean’.
p (float, optional (default=1.0)) – The p-norm to apply for if metric=’minkowski’
dense (boolean, optional (default=True)) – If True, returns math:gamma as a dense ndarray of shape (ns, nt).
Otherwise returns a sparse representation using scipy’s coo_matrix
format. Only used if log is set to True. Due to implementation details,
this function runs faster when dense is set to False.
log (boolean, optional (default=False)) – If True, returns a dictionary containing the transportation matrix.
Otherwise returns only the loss.
Returns:
loss (float) – Cost associated to the optimal transportation
log (dict) – If input log is True, a dictionary containing the Optimal transportation
matrix for the given parameters
Examples
Simple example with obvious solution. The function emd2_1d accepts lists and
performs automatic conversion to numpy arrays
x_a ((ns,) or (ns, 1) ndarray, float64) – Source dirac locations (on the real line)
x_b ((nt,) or (ns, 1) ndarray, float64) – Target dirac locations (on the real line)
a ((ns,) ndarray, float64, optional) – Source histogram (default is uniform weight)
b ((nt,) ndarray, float64, optional) – Target histogram (default is uniform weight)
metric (str, optional (default='sqeuclidean')) – Metric to be used. Only works with either of the strings
‘sqeuclidean’, ‘minkowski’, ‘cityblock’, or ‘euclidean’.
p (float, optional (default=1.0)) – The p-norm to apply for if metric=’minkowski’
dense (boolean, optional (default=True)) – If True, returns math:gamma as a dense ndarray of shape (ns, nt).
Otherwise returns a sparse representation using scipy’s coo_matrix
format. Due to implementation details, this function runs faster when
‘sqeuclidean’, ‘minkowski’, ‘cityblock’, or ‘euclidean’ metrics
are used.
log (boolean, optional (default=False)) – If True, returns a dictionary containing the cost.
Otherwise returns only the optimal transportation matrix.
check_marginals (bool, optional (default=True)) – If True, checks that the marginals mass are equal. If False, skips the
check.
Returns:
gamma ((ns, nt) ndarray) – Optimal transportation matrix for the given parameters
log (dict) – If input log is True, a dictionary containing the cost
Examples
Simple example with obvious solution. The function emd_1d accepts lists and
performs automatic conversion to numpy arrays
\(\mu_a\) and \(\mu_b\) are empirical distributions.
\(\mu\) is an empirical distribution with r samples
And returns the two OT plans between
Note
This function is backend-compatible and will work on arrays
from all compatible backends. But the algorithm uses the C++ CPU backend
which can lead to copy overhead on GPU arrays.
Uses the conditional gradient algorithm to solve the problem proposed in
[39].
Returns the Fused Gromov-Wasserstein transport between \((\mathbf{C_1}, \mathbf{Y_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{Y_2}, \mathbf{q})\)
with pairwise distance matrix \(\mathbf{M}\) between node feature matrices \(\mathbf{Y_1}\) and \(\mathbf{Y_2}\) (see [24]).
The function solves the following optimization problem using Conditional Gradient:
\(\mathbf{M}\): metric cost matrix between features across domains
\(\mathbf{C_1}\): Metric cost matrix in the source space
\(\mathbf{C_2}\): Metric cost matrix in the target space
\(\mathbf{p}\): distribution in the source space
\(\mathbf{q}\): distribution in the target space
L: loss function to account for the misfit between the similarity and feature matrices
\(\alpha\): trade-off parameter
Note
This function is backend-compatible and will work on arrays
from all compatible backends. But the algorithm uses the C++ CPU backend
which can lead to copy overhead on GPU arrays.
Note
All computations in the conjugate gradient solver are done with
numpy to limit memory overhead.
Note
This function will cast the computed transport plan to the data
type of the provided input \(\mathbf{M}\). Casting to an integer
tensor might result in a loss of precision. If this behaviour is
unwanted, please make sure to provide a floating point input.
Parameters:
M (array-like, shape (ns, nt)) – Metric cost matrix between features across domains
C1 (array-like, shape (ns, ns)) – Metric cost matrix representative of the structure in the source space
C2 (array-like, shape (nt, nt)) – Metric cost matrix representative of the structure in the target space
p (array-like, shape (ns,), optional) – Distribution in the source space.
If let to its default value None, uniform distribution is taken.
q (array-like, shape (nt,), optional) – Distribution in the target space.
If let to its default value None, uniform distribution is taken.
loss_fun (str, optional) – Loss function used for the solver
symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not.
If let to its default None value, a symmetry test will be conducted.
Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).
armijo (bool, optional) – If True the step of the line-search is found via an armijo research. Else closed form is used.
If there are convergence issues use False.
G0 (array-like, shape (ns,nt), optional) – If None the initial transport plan of the solver is pq^T.
Otherwise G0 must satisfy marginal constraints and will be used as initial transport of the solver.
Returns the Fused Gromov-Wasserstein distance between \((\mathbf{C_1}, \mathbf{Y_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{Y_2}, \mathbf{q})\)
with pairwise distance matrix \(\mathbf{M}\) between node feature matrices \(\mathbf{Y_1}\) and \(\mathbf{Y_2}\) (see [24]).
The function solves the following optimization problem using Conditional Gradient:
\(\mathbf{M}\): metric cost matrix between features across domains
\(\mathbf{C_1}\): Metric cost matrix in the source space
\(\mathbf{C_2}\): Metric cost matrix in the target space
\(\mathbf{p}\): distribution in the source space
\(\mathbf{q}\): distribution in the target space
L: loss function to account for the misfit between the similarity and feature matrices
\(\alpha\): trade-off parameter
Note that when using backends, this loss function is differentiable wrt the
matrices (C1, C2, M) and weights (p, q) for quadratic loss using the gradients from [38]_.
Note
This function is backend-compatible and will work on arrays
from all compatible backends. But the algorithm uses the C++ CPU backend
which can lead to copy overhead on GPU arrays.
Note
All computations in the conjugate gradient solver are done with
numpy to limit memory overhead.
Note
This function will cast the computed transport plan to the data
type of the provided input \(\mathbf{M}\). Casting to an integer
tensor might result in a loss of precision. If this behaviour is
unwanted, please make sure to provide a floating point input.
Parameters:
M (array-like, shape (ns, nt)) – Metric cost matrix between features across domains
C1 (array-like, shape (ns, ns)) – Metric cost matrix representative of the structure in the source space.
C2 (array-like, shape (nt, nt)) – Metric cost matrix representative of the structure in the target space.
p (array-like, shape (ns,), optional) – Distribution in the source space.
If let to its default value None, uniform distribution is taken.
q (array-like, shape (nt,), optional) – Distribution in the target space.
If let to its default value None, uniform distribution is taken.
loss_fun (str, optional) – Loss function used for the solver.
symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not.
If let to its default None value, a symmetry test will be conducted.
Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).
armijo (bool, optional) – If True the step of the line-search is found via an armijo research.
Else closed form is used. If there are convergence issues use False.
G0 (array-like, shape (ns,nt), optional) – If None the initial transport plan of the solver is pq^T.
Otherwise G0 must satisfy marginal constraints and will be used as initial transport of the solver.
Cs (list of S array-like of shape (ns, ns)) – Metric cost matrices
ps (list of S array-like of shape (ns,), optional) – Sample weights in the S spaces.
If let to its default value None, uniform distributions are taken.
p (array-like, shape (N,), optional) – Weights in the targeted barycenter.
If let to its default value None, uniform distribution is taken.
lambdas (list of float, optional) – List of the S spaces’ weights.
If let to its default value None, uniform weights are taken.
loss_fun (callable, optional) – tensor-matrix multiplication function based on specific loss function
symmetric (bool, optional.) – Either structures are to be assumed symmetric or not. Default value is True.
Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).
armijo (bool, optional) – If True the step of the line-search is found via an armijo research.
Else closed form is used. If there are convergence issues use False.
max_iter (int, optional) – Max number of iterations
tol (float, optional) – Stop threshold on relative error (>0)
stop_criterion (str, optional. Default is 'barycenter'.) – Stop criterion taking values in [‘barycenter’, ‘loss’]. If set to ‘barycenter’
uses absolute norm variations of estimated barycenters. Else if set to ‘loss’
uses the relative variations of the loss.
warmstartT (bool, optional) – Either to perform warmstart of transport plans in the successive
fused gromov-wasserstein transport problems.s
verbose (bool, optional) – Print information along iterations.
\(\mathbf{C_1}\): Metric cost matrix in the source space.
\(\mathbf{C_2}\): Metric cost matrix in the target space.
\(\mathbf{p}\): Distribution in the source space.
\(\mathbf{q}\): Distribution in the target space.
L: Loss function to account for the misfit between the similarity matrices.
Note
This function is backend-compatible and will work on arrays
from all compatible backends. But the algorithm uses the C++ CPU backend
which can lead to copy overhead on GPU arrays.
Note
All computations in the conjugate gradient solver are done with
numpy to limit memory overhead.
Note
This function will cast the computed transport plan to the data
type of the provided input \(\mathbf{C}_1\). Casting to an integer
tensor might result in a loss of precision. If this behaviour is
unwanted, please make sure to provide a floating point input.
Parameters:
C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space.
C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space.
p (array-like, shape (ns,), optional) – Distribution in the source space.
If let to its default value None, uniform distribution is taken.
q (array-like, shape (nt,), optional) – Distribution in the target space.
If let to its default value None, uniform distribution is taken.
loss_fun (str, optional) – Loss function used for the solver either ‘square_loss’ or ‘kl_loss’.
symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not.
If let to its default None value, a symmetry test will be conducted.
Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).
verbose (bool, optional) – Print information along iterations.
armijo (bool, optional) – If True, the step of the line-search is found via an armijo search. Else closed form is used.
If there are convergence issues, use False.
G0 (array-like, shape (ns,nt), optional) – If None, the initial transport plan of the solver is pq^T.
Otherwise G0 must satisfy marginal constraints and will be used as initial transport of the solver.
max_iter (int, optional) – Max number of iterations.
tol_rel (float, optional) – Stop threshold on relative error (>0).
tol_abs (float, optional) – Stop threshold on absolute error (>0).
**kwargs (dict) – Parameters can be directly passed to the ot.optim.cg solver.
Returns the Gromov-Wasserstein loss \(\mathbf{GW}\) between \((\mathbf{C_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{q})\).
To recover the Gromov-Wasserstein distance as defined in [13] compute \(d_{GW} = \frac{1}{2} \sqrt{\mathbf{GW}}\).
The function solves the following optimization problem using Conditional Gradient:
\(\mathbf{C_1}\): Metric cost matrix in the source space
\(\mathbf{C_2}\): Metric cost matrix in the target space
\(\mathbf{p}\): distribution in the source space
\(\mathbf{q}\): distribution in the target space
L: loss function to account for the misfit between the similarity
matrices
Note that when using backends, this loss function is differentiable wrt the
matrices (C1, C2) and weights (p, q) for quadratic loss using the gradients from [38]_.
Note
This function is backend-compatible and will work on arrays
from all compatible backends. But the algorithm uses the C++ CPU backend
which can lead to copy overhead on GPU arrays.
Note
All computations in the conjugate gradient solver are done with
numpy to limit memory overhead.
Note
This function will cast the computed transport plan to the data
type of the provided input \(\mathbf{C}_1\). Casting to an integer
tensor might result in a loss of precision. If this behaviour is
unwanted, please make sure to provide a floating point input.
Parameters:
C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space
C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space
p (array-like, shape (ns,), optional) – Distribution in the source space.
If let to its default value None, uniform distribution is taken.
q (array-like, shape (nt,), optional) – Distribution in the target space.
If let to its default value None, uniform distribution is taken.
loss_fun (str) – loss function used for the solver either ‘square_loss’ or ‘kl_loss’
symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not.
If let to its default None value, a symmetry test will be conducted.
Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).
verbose (bool, optional) – Print information along iterations
armijo (bool, optional) – If True the step of the line-search is found via an armijo research. Else closed form is used.
If there are convergence issues use False.
G0 (array-like, shape (ns,nt), optional) – If None the initial transport plan of the solver is pq^T.
Otherwise G0 must satisfy marginal constraints and will be used as initial transport of the solver.
max_iter (int, optional) – Max number of iterations
tol_rel (float, optional) – Stop threshold on relative error (>0)
tol_abs (float, optional) – Stop threshold on absolute error (>0)
**kwargs (dict) – parameters can be directly passed to the ot.optim.cg solver
Returns:
gw_dist (float) – Gromov-Wasserstein distance
log (dict) – convergence information and Coupling matrix
rank (int, optional. Default is None. (>0)) – Nonnegative rank of the OT plan. If None, min(ns, nt) is considered.
alpha (int, optional. Default is 1e-10. (>0 and <1/r)) – Lower bound for the weight vector g.
rescale_cost (bool, optional. Default is False) – Rescale the low rank factorization of the sqeuclidean cost matrix
seed_init (int, optional. Default is 49. (>0)) – Random state for the ‘random’ initialization of low rank couplings
gamma_init (str, optional. Default is "rescale".) – Initialization strategy for gamma. ‘rescale’, or ‘theory’
Gamma is a constant that scales the convergence criterion of the Mirror Descent
optimization scheme used to compute the low-rank couplings (Q, R and g)
numItermax (int, optional. Default is 1000.) – Max number of iterations for Low Rank GW
stopThr (float, optional. Default is 1e-4.) – Stop threshold on error (>0) for Low Rank GW
The error is the sum of Kullback Divergences computed for each low rank
coupling (Q, R and g) and scaled using gamma.
numItermax_dykstra (int, optional. Default is 2000.) – Max number of iterations for the Dykstra algorithm
stopThr_dykstra (float, optional. Default is 1e-7.) – Stop threshold on error (>0) in Dykstra
cost_factorized_Xs (tuple, optional. Default is None) – Tuple with two pre-computed low rank decompositions (A1, A2) of the source cost
matrix. Both matrices should have a shape of (n_samples_a, dim_Xs + 2).
If None, the low rank cost matrices will be computed as sqeuclidean cost matrices.
cost_factorized_Xt (tuple, optional. Default is None) – Tuple with two pre-computed low rank decompositions (B1, B2) of the target cost
matrix. Both matrices should have a shape of (n_samples_b, dim_Xt + 2).
If None, the low rank cost matrices will be computed as sqeuclidean cost matrices.
warn (bool, optional) – if True, raises a warning if the low rank GW algorithm doesn’t convergence.
warn_dykstra (bool, optional) – if True, raises a warning if the Dykstra algorithm doesn’t convergence.
rank (int, optional. Default is None. (>0)) – Nonnegative rank of the OT plan. If None, min(ns, nt) is considered.
alpha (int, optional. Default is 1e-10. (>0 and <1/r)) – Lower bound for the weight vector g.
rescale_cost (bool, optional. Default is False) – Rescale the low rank factorization of the sqeuclidean cost matrix
init (str, optional. Default is 'random'.) – Initialization strategy for the low rank couplings. ‘random’, ‘deterministic’ or ‘kmeans’
reg_init (float, optional. Default is 1e-1. (>0)) – Regularization term for a ‘kmeans’ init. If None, 1 is considered.
seed_init (int, optional. Default is 49. (>0)) – Random state for a ‘random’ or ‘kmeans’ init strategy.
gamma_init (str, optional. Default is "rescale".) – Initialization strategy for gamma. ‘rescale’, or ‘theory’
Gamma is a constant that scales the convergence criterion of the Mirror Descent
optimization scheme used to compute the low-rank couplings (Q, R and g)
numItermax (int, optional. Default is 2000.) – Max number of iterations for the Dykstra algorithm
stopThr (float, optional. Default is 1e-7.) – Stop threshold on error (>0) in Dykstra
warn (bool, optional) – if True, raises a warning if the algorithm doesn’t convergence.
Computes the closed-form for the 2-Wasserstein distance between samples and a uniform distribution on \(S^1\)
Samples need to be in \(S^1\cong [0,1[\). If they are on \(\mathbb{R}\),
takes the value modulo 1.
If the values are on \(S^1\subset\mathbb{R}^2\), it is required to first find the coordinates
using e.g. the atan2 function.
\(\mathbf{M}\) is the (dim_a, dim_b) metric cost matrix
\(\Omega\) is the entropic regularization term
\(\Omega(\gamma)=\sum_{i,j} \gamma_{i,j}\log(\gamma_{i,j})\)
\(\mathbf{a}\) and \(\mathbf{b}\) are source and target
weights (histograms, both sum to 1)
Note
This function is backend-compatible and will work on arrays
from all compatible backends.
The algorithm used for solving the problem is the Sinkhorn-Knopp matrix
scaling algorithm as proposed in [2]
Choosing a Sinkhorn solver
By default and when using a regularization parameter that is not too small
the default sinkhorn solver should be enough. If you need to use a small
regularization to get sharper OT matrices, you should use the
ot.bregman.sinkhorn_stabilized() solver that will avoid numerical
errors. This last solver can be very slow in practice and might not even
converge to a reasonable OT matrix in a finite time. This is why
ot.bregman.sinkhorn_epsilon_scaling() that relies on iterating the value
of the regularization (and using warm start) sometimes leads to better
solutions. Note that the greedy version of the sinkhorn
ot.bregman.greenkhorn() can also lead to a speedup and the screening
version of the sinkhorn ot.bregman.screenkhorn() aim at providing a
fast approximation of the Sinkhorn problem. For use of GPU and gradient
computation with small number of iterations we strongly recommend the
ot.bregman.sinkhorn_log() solver that will no need to check for
numerical problems.
Parameters:
a (array-like, shape (dim_a,)) – samples weights in the source domain
b (array-like, shape (dim_b,) or ndarray, shape (dim_b, n_hists)) – samples in the target domain, compute sinkhorn with multiple targets
and fixed \(\mathbf{M}\) if \(\mathbf{b}\) is a matrix
(return OT loss + dual variables in log)
M (array-like, shape (dim_a, dim_b)) – loss matrix
method (str) – method used for the solver either ‘sinkhorn’,’sinkhorn_log’,
‘greenkhorn’, ‘sinkhorn_stabilized’ or ‘sinkhorn_epsilon_scaling’, see
those function for specific parameters
numItermax (int, optional) – Max number of iterations
stopThr (float, optional) – Stop threshold on error (>0)
verbose (bool, optional) – Print information along iterations
warn (bool, optional) – if True, raises a warning if the algorithm doesn’t convergence.
warmstart (tuple of arrays, shape (dim_a, dim_b), optional) – Initialization of dual potentials. If provided, the dual potentials should be given
(that is the logarithm of the u,v sinkhorn scaling vectors)
Returns:
gamma (array-like, shape (dim_a, dim_b)) – Optimal transportation matrix for the given parameters
log (dict) – log dictionary return only if log==True in parameters
\(\mathbf{M}\) is the (dim_a, dim_b) metric cost matrix
\(\Omega\) is the entropic regularization term
\(\Omega(\gamma)=\sum_{i,j} \gamma_{i,j}\log(\gamma_{i,j})\)
\(\mathbf{a}\) and \(\mathbf{b}\) are source and target
weights (histograms, both sum to 1)
and returns \(\langle \gamma^*, \mathbf{M} \rangle_F\) (without
the entropic contribution).
Note
This function is backend-compatible and will work on arrays
from all compatible backends.
The algorithm used for solving the problem is the Sinkhorn-Knopp matrix
scaling algorithm as proposed in [2]
Choosing a Sinkhorn solver
By default and when using a regularization parameter that is not too small
the default sinkhorn solver should be enough. If you need to use a small
regularization to get sharper OT matrices, you should use the
ot.bregman.sinkhorn_log() solver that will avoid numerical
errors. This last solver can be very slow in practice and might not even
converge to a reasonable OT matrix in a finite time. This is why
ot.bregman.sinkhorn_epsilon_scaling() that relies on iterating the value
of the regularization (and using warm start) sometimes leads to better
solutions. Note that the greedy version of the sinkhorn
ot.bregman.greenkhorn() can also lead to a speedup and the screening
version of the sinkhorn ot.bregman.screenkhorn() aim a providing a
fast approximation of the Sinkhorn problem. For use of GPU and gradient
computation with small number of iterations we strongly recommend the
ot.bregman.sinkhorn_log() solver that will no need to check for
numerical problems.
Parameters:
a (array-like, shape (dim_a,)) – samples weights in the source domain
b (array-like, shape (dim_b,) or ndarray, shape (dim_b, n_hists)) – samples in the target domain, compute sinkhorn with multiple targets
and fixed \(\mathbf{M}\) if \(\mathbf{b}\) is a matrix
(return OT loss + dual variables in log)
M (array-like, shape (dim_a, dim_b)) – loss matrix
warn (bool, optional) – if True, raises a warning if the algorithm doesn’t convergence.
warmstart (tuple of arrays, shape (dim_a, dim_b), optional) – Initialization of dual potentials. If provided, the dual potentials should be given
(that is the logarithm of the u,v sinkhorn scaling vectors)
Returns:
W ((n_hists) float/array-like) – Optimal transportation loss for the given parameters
log (dict) – log dictionary return only if log==True in parameters
\(\Omega_e\) is the entropic regularization term \(\Omega_e
(\gamma)=\sum_{i,j} \gamma_{i,j}\log(\gamma_{i,j})\)
\(\Omega_g\) is the group lasso regularization term
\(\Omega_g(\gamma)=\sum_{i,c} \|\gamma_{i,\mathcal{I}_c}\|^{1/2}_1\)
where \(\mathcal{I}_c\) are the index of samples from class c
in the source domain.
\(\mathbf{a}\) and \(\mathbf{b}\) are source and target weights (sum to 1)
The algorithm used for solving the problem is the generalized conditional
gradient as proposed in [5, 7].
Parameters:
a (array-like (ns,)) – samples weights in the source domain
labels_a (array-like (ns,)) – labels of samples in the source domain
b (array-like (nt,)) – samples weights in the target domain
M (array-like (ns,nt)) – loss matrix
reg (float) – Regularization term for entropic regularization >0
eta (float, optional) – Regularization term for group lasso regularization >0
numItermax (int, optional) – Max number of iterations
numInnerItermax (int, optional) – Max number of iterations (inner sinkhorn solver)
\(\mathbf{M}\) is the (dim_a, dim_b) metric cost matrix
\(\mathbf{a}\) and \(\mathbf{b}\) are source and target unbalanced distributions
\(\mathbf{c}\) is a reference distribution for the regularization
KL is the Kullback-Leibler divergence
The algorithm used for solving the problem is the generalized
Sinkhorn-Knopp matrix scaling algorithm as proposed in [10, 25]
Warning
Starting from version 0.9.5, the default value has been changed to reg_type=’kl’ instead of reg_type=’entropy’. This makes the function more consistent with the literature
and the other solvers. If you want to use the entropy regularization, please set reg_type=’entropy’ explicitly.
Parameters:
a (array-like (dim_a,)) – Unnormalized histogram of dimension dim_a
If a is an empty list or array ([]),
then a is set to uniform distribution.
b (array-like (dim_b,)) – One or multiple unnormalized histograms of dimension dim_b.
If b is an empty list or array ([]),
then b is set to uniform distribution.
If many, compute all the OT costs \((\mathbf{a}, \mathbf{b}_i)_i\)
reg_m (float or indexable object of length 1 or 2) – Marginal relaxation term.
If \(\mathrm{reg_{m}}\) is a scalar or an indexable object of length 1,
then the same \(\mathrm{reg_{m}}\) is applied to both marginal relaxations.
The entropic balanced OT can be recovered using \(\mathrm{reg_{m}}=float("inf")\).
For semi-relaxed case, use either
\(\mathrm{reg_{m}}=(float("inf"), scalar)\) or
\(\mathrm{reg_{m}}=(scalar, float("inf"))\).
If \(\mathrm{reg_{m}}\) is an array,
it must have the same backend as input arrays (a, b, M).
method (str) – method used for the solver either ‘sinkhorn’, ‘sinkhorn_stabilized’, ‘sinkhorn_translation_invariant’ or
‘sinkhorn_reg_scaling’, see those function for specific parameters
reg_type (string, optional) – Regularizer term. Can take two values:
+ Negative entropy: ‘entropy’:
\(\Omega(\gamma) = \sum_{i,j} \gamma_{i,j} \log(\gamma_{i,j}) - \sum_{i,j} \gamma_{i,j}\).
This is equivalent (up to a constant) to \(\Omega(\gamma) = \text{KL}(\gamma, 1_{dim_a} 1_{dim_b}^T)\).
+ Kullback-Leibler divergence (default): ‘kl’:
\(\Omega(\gamma) = \text{KL}(\gamma, \mathbf{a} \mathbf{b}^T)\).
c (array-like (dim_a, dim_b), optional (default=None)) – Reference measure for the regularization.
If None, then use \(\mathbf{c} = \mathbf{a} \mathbf{b}^T\).
If \(\texttt{reg_type}='entropy'\), then \(\mathbf{c} = 1_{dim_a} 1_{dim_b}^T\).
warmstart (tuple of arrays, shape (dim_a, dim_b), optional) – Initialization of dual potentials. If provided, the dual potentials should be given
(that is the logarithm of the u, v sinkhorn scaling vectors).
numItermax (int, optional) – Max number of iterations
stopThr (float, optional) – Stop threshold on error (>0)
verbose (bool, optional) – Print information along iterations
\(\mathbf{M}\) is the (dim_a, dim_b) metric cost matrix
\(\mathbf{a}\) and \(\mathbf{b}\) are source and target unbalanced distributions
\(\mathbf{c}\) is a reference distribution for the regularization
KL is the Kullback-Leibler divergence
The algorithm used for solving the problem is the generalized
Sinkhorn-Knopp matrix scaling algorithm as proposed in [10, 25]
Warning
Starting from version 0.9.5, the default value has been changed to reg_type=’kl’ instead of reg_type=’entropy’. This makes the function more consistent with the literature
and the other solvers. If you want to use the entropy regularization, please set reg_type=’entropy’ explicitly.
Parameters:
a (array-like (dim_a,)) – Unnormalized histogram of dimension dim_a
If a is an empty list or array ([]),
then a is set to uniform distribution.
b (array-like (dim_b,)) – One or multiple unnormalized histograms of dimension dim_b.
If b is an empty list or array ([]),
then b is set to uniform distribution.
If many, compute all the OT costs \((\mathbf{a}, \mathbf{b}_i)_i\)
reg_m (float or indexable object of length 1 or 2) – Marginal relaxation term.
If \(\mathrm{reg_{m}}\) is a scalar or an indexable object of length 1,
then the same \(\mathrm{reg_{m}}\) is applied to both marginal relaxations.
The entropic balanced OT can be recovered using \(\mathrm{reg_{m}}=float("inf")\).
For semi-relaxed case, use either
\(\mathrm{reg_{m}}=(float("inf"), scalar)\) or
\(\mathrm{reg_{m}}=(scalar, float("inf"))\).
If \(\mathrm{reg_{m}}\) is an array,
it must have the same backend as input arrays (a, b, M).
method (str) – method used for the solver either ‘sinkhorn’, ‘sinkhorn_stabilized’, ‘sinkhorn_translation_invariant’ or
‘sinkhorn_reg_scaling’, see those function for specific parameters
reg_type (string, optional) – Regularizer term. Can take two values:
+ Negative entropy: ‘entropy’:
\(\Omega(\gamma) = \sum_{i,j} \gamma_{i,j} \log(\gamma_{i,j}) - \sum_{i,j} \gamma_{i,j}\).
This is equivalent (up to a constant) to \(\Omega(\gamma) = \text{KL}(\gamma, 1_{dim_a} 1_{dim_b}^T)\).
+ Kullback-Leibler divergence: ‘kl’:
\(\Omega(\gamma) = \text{KL}(\gamma, \mathbf{a} \mathbf{b}^T)\).
c (array-like (dim_a, dim_b), optional (default=None)) – Reference measure for the regularization.
If None, then use \(\mathbf{c} = \mathbf{a} \mathbf{b}^T\).
If \(\texttt{reg_type}='entropy'\), then \(\mathbf{c} = 1_{dim_a} 1_{dim_b}^T\).
warmstart (tuple of arrays, shape (dim_a, dim_b), optional) – Initialization of dual potentials. If provided, the dual potentials should be given
(that is the logarithm of the u,v sinkhorn scaling vectors).
returnCost (string, optional (default = "linear")) – If returnCost = “linear”, then return the linear part of the unbalanced OT loss.
If returnCost = “total”, then return the total unbalanced OT loss.
numItermax (int, optional) – Max number of iterations
stopThr (float, optional) – Stop threshold on error (>0)
verbose (bool, optional) – Print information along iterations
The regularization is selected with reg (\(\lambda_r\)) and reg_type. By
default reg=None and there is no regularization. The unbalanced marginal
penalization can be selected with unbalanced (\((\lambda_1, \lambda_2)\)) and
unbalanced_type. By default unbalanced=None and the function
solves the exact optimal transport problem (respecting the marginals).
Parameters:
M (array_like, shape (dim_a, dim_b)) – Loss matrix
a (array-like, shape (dim_a,), optional) – Samples weights in the source domain (default is uniform)
b (array-like, shape (dim_b,), optional) – Samples weights in the source domain (default is uniform)
reg (float, optional) – Regularization weight \(\lambda_r\), by default None (no reg., exact
OT)
c (array-like (dim_a, dim_b), optional (default=None)) – Reference measure for the regularization.
If None, then use \(\mathbf{c} = \mathbf{a} \mathbf{b}^T\).
If \(\texttt{reg_type}='entropy'\), then \(\mathbf{c} = 1_{dim_a} 1_{dim_b}^T\).
reg_type (str, optional) – Type of regularization \(R\) either “KL”, “L2”, “entropy”,
by default “KL”. a tuple of functions can be provided for general
solver (see cg). This is only used when reg!=None.
unbalanced (float or indexable object of length 1 or 2) – Marginal relaxation term.
If it is a scalar or an indexable object of length 1,
then the same relaxation is applied to both marginal relaxations.
The balanced OT can be recovered using \(unbalanced=float("inf")\).
For semi-relaxed case, use either
\(unbalanced=(float("inf"), scalar)\) or
\(unbalanced=(scalar, float("inf"))\).
If unbalanced is an array,
it must have the same backend as input arrays (a, b, M).
unbalanced_type (str, optional) – Type of unbalanced penalization function \(U\) either “KL”, “L2”,
“TV”, by default “KL”.
method (str, optional) – Method for solving the problem when multiple algorithms are available,
default None for automatic selection.
n_threads (int, optional) – Number of OMP threads for exact OT solver, by default 1
max_iter (int, optional) – Maximum number of iterations, by default None (default values in each solvers)
plan_init (array_like, shape (dim_a, dim_b), optional) – Initialization of the OT plan for iterative methods, by default None
potentials_init ((array_like(dim_a,),array_like(dim_b,)), optional) – Initialization of the OT dual potentials for iterative methods, by default None
tol (_type_, optional) – Tolerance for solution precision, by default None (default values in each solvers)
verbose (bool, optional) – Print information in the solver, by default False
grad (str, optional) – Type of gradient computation, either or ‘autodiff’, ‘envelope’ or ‘last_step’ used only for
Sinkhorn solver. By default ‘autodiff’ provides gradients wrt all
outputs (plan, value, value_linear) but with important memory cost.
‘envelope’ provides gradients only for value and and other outputs are
detached. This is useful for memory saving when only the value is needed. ‘last_step’ provides
gradients only for the last iteration of the Sinkhorn solver, but provides gradient for both the OT plan and the objective values.
‘detach’ does not compute the gradients for the Sinkhorn solver.
Returns:
res – Result of the optimization problem. The information can be obtained as follows:
res.plan : OT plan \(\mathbf{T}\)
res.potentials : OT dual potentials
res.value : Optimal value of the optimization problem
res.value_linear : Linear OT loss with the optimal OT plan
# default is ``"KL"`` regularization (``reg_type="KL"``)res=ot.solve(M,a,b,reg=1.0)# or for original Sinkhorn paper formulation [2]res=ot.solve(M,a,b,reg=1.0,reg_type='entropy')# Use envelope theorem differentiation for memory savingres=ot.solve(M,a,b,reg=1.0,grad='envelope')# M, a, b are torch tensorsres.value.backward()# only the value is differentiable
Note that by default the Sinkhorn solver uses automatic differentiation to
compute the gradients of the values and plan. This can be changed with the
grad parameter. The envelope mode computes the gradients only
for the value and the other outputs are detached. This is useful for
memory saving when only the gradient of value is needed.
Quadratic regularized OT [17] (when reg!=None and reg_type="L2"):
# default is ``"KL"`` for bothres=ot.solve(M,a,b,reg=1.0,unbalanced=1.0)# quadratic unbalanced OT with KL regularizationres=ot.solve(M,a,b,reg=1.0,unbalanced=1.0,unbalanced_type='L2')# both quadraticres=ot.solve(M,a,b,reg=1.0,reg_type='L2',unbalanced=1.0,unbalanced_type='L2')
The regularization is selected with reg (\(\lambda_r\)) and
reg_type. By default reg=None and there is no regularization. The
unbalanced marginal penalization can be selected with unbalanced
(\(\lambda_u\)) and unbalanced_type. By default unbalanced=None
and the function solves the exact optimal transport problem (respecting the
marginals).
Parameters:
Ca (array_like, shape (dim_a, dim_a)) – Cost matrix in the source domain
Cb (array_like, shape (dim_b, dim_b)) – Cost matrix in the target domain
M (array_like, shape (dim_a, dim_b), optional) – Linear cost matrix for Fused Gromov-Wasserstein (default is None).
a (array-like, shape (dim_a,), optional) – Samples weights in the source domain (default is uniform)
b (array-like, shape (dim_b,), optional) – Samples weights in the source domain (default is uniform)
loss (str, optional) – Type of loss function, either "L2" or "KL", by default "L2"
symmetric (bool, optional) – Use symmetric version of the Gromov-Wasserstein problem, by default None
tests whether the matrices are symmetric or True/False to avoid the test.
reg (float, optional) – Regularization weight \(\lambda_r\), by default None (no reg., exact
OT)
reg_type (str, optional) – Type of regularization \(R\), by default “entropy” (only used when
reg!=None)
alpha (float, optional) – Weight the quadratic term (alpha*Gromov) and the linear term
((1-alpha)*Wass) in the Fused Gromov-Wasserstein problem. Not used for
Gromov problem (when M is not provided). By default alpha=None
corresponds to alpha=1 for Gromov problem (M==None) and
alpha=0.5 for Fused Gromov-Wasserstein problem (M!=None)
unbalanced (float, optional) – Unbalanced penalization weight \(\lambda_u\), by default None
(balanced OT), Not implemented yet
unbalanced_type (str, optional) – Type of unbalanced penalization function \(U\) either “KL”, “semirelaxed”,
“partial”, by default “KL” but note that it is not implemented yet.
n_threads (int, optional) – Number of OMP threads for exact OT solver, by default 1
method (str, optional) – Method for solving the problem when multiple algorithms are available,
default None for automatic selection.
max_iter (int, optional) – Maximum number of iterations, by default None (default values in each
solvers)
plan_init (array_like, shape (dim_a, dim_b), optional) – Initialization of the OT plan for iterative methods, by default None
tol (float, optional) – Tolerance for solution precision, by default None (default values in
each solvers)
verbose (bool, optional) – Print information in the solver, by default False
Returns:
res – Result of the optimization problem. The information can be obtained as follows:
res.plan : OT plan \(\mathbf{T}\)
res.potentials : OT dual potentials
res.value : Optimal value of the optimization problem
res.value_linear : Linear OT loss with the optimal OT plan
res.value_quad : Quadratic (GW) part of the OT loss with the optimal OT plan
res=ot.solve_gromov(Ca,Cb,M)# uniform weights, alpha=0.5 (default)res=ot.solve_gromov(Ca,Cb,M,a=a,b=b,alpha=0.1)# given weights and alphaplan=res.plan# FGW planloss_linear_term=res.value_linear# Wasserstein part of the lossloss_quad_term=res.value_quad# Gromov part of the lossloss=res.value# FGW value
Regularized (Fused) Gromov-Wasserstein (GW) problem [12] (when reg!=None):
res=ot.solve_gromov(Ca,Cb,reg=1.0)# GW entropy regularization (default)res=ot.solve_gromov(Ca,Cb,M,a=a,b=b,reg=10,alpha=0.1)# FGW with entropyplan=res.plan# FGW planloss_linear_term=res.value_linear# Wasserstein part of the lossloss_quad_term=res.value_quad# Gromov part of the lossloss=res.value# FGW value (including regularization)
res=ot.solve_gromov(Ca,Cb,unbalanced='semirelaxed')# semirelaxed GWres=ot.solve_gromov(Ca,Cb,unbalanced='semirelaxed',reg=1)# entropic semirelaxed GWres=ot.solve_gromov(Ca,Cb,M,unbalanced='semirelaxed',alpha=0.1)# semirelaxed FGWplan=res.plan# FGW planright_marginal=res.marginal_b# right marginal of the plan
Partial (Fused) Gromov-Wasserstein (GW) problem [29] (when unbalanced='partial'):
res=ot.solve_gromov(Ca,Cb,unbalanced_type='partial',unbalanced=0.8)# partial GW with m=0.8res=ot.solve_gromov(Ca,Cb,M,unbalanced_type='partial',unbalanced=0.8,alpha=0.5)# partial FGW with m=0.8
where the cost matrix \(\mathbf{M}\) is computed from the samples in the
source and target domains such that \(M_{i,j} = d(x_i,y_j)\) where
\(d\) is a metric (by default the squared Euclidean distance).
The regularization is selected with reg (\(\lambda_r\)) and reg_type. By
default reg=None and there is no regularization. The unbalanced marginal
penalization can be selected with unbalanced (\(\lambda_u\)) and
unbalanced_type. By default unbalanced=None and the function
solves the exact optimal transport problem (respecting the marginals).
Parameters:
X_s (array-like, shape (n_samples_a, dim)) – samples in the source domain
X_t (array-like, shape (n_samples_b, dim)) – samples in the target domain
a (array-like, shape (dim_a,), optional) – Samples weights in the source domain (default is uniform)
b (array-like, shape (dim_b,), optional) – Samples weights in the source domain (default is uniform)
reg (float, optional) – Regularization weight \(\lambda_r\), by default None (no reg., exact
OT)
c (array-like (dim_a, dim_b), optional (default=None)) – Reference measure for the regularization.
If None, then use \(\mathbf{c} = \mathbf{a} \mathbf{b}^T\).
If \(\texttt{reg_type}='entropy'\), then \(\mathbf{c} = 1_{dim_a} 1_{dim_b}^T\).
reg_type (str, optional) – Type of regularization \(R\) either “KL”, “L2”, “entropy”, by default “KL”
unbalanced (float or indexable object of length 1 or 2) – Marginal relaxation term.
If it is a scalar or an indexable object of length 1,
then the same relaxation is applied to both marginal relaxations.
The balanced OT can be recovered using \(unbalanced=float("inf")\).
For semi-relaxed case, use either
\(unbalanced=(float("inf"), scalar)\) or
\(unbalanced=(scalar, float("inf"))\).
If unbalanced is an array,
it must have the same backend as input arrays (a, b, M).
unbalanced_type (str, optional) – Type of unbalanced penalization function \(U\) either “KL”, “L2”, “TV”, by default “KL”
lazy (bool, optional) – Return OTResultlazy object to reduce memory cost when True, by
default False
batch_size (int, optional) – Batch size for lazy solver, by default None (default values in each
solvers)
method (str, optional) – Method for solving the problem, this can be used to select the solver
for unbalanced problems (see ot.solve), or to select a specific
large scale solver.
n_threads (int, optional) – Number of OMP threads for exact OT solver, by default 1
max_iter (int, optional) – Maximum number of iteration, by default None (default values in each solvers)
plan_init (array_like, shape (dim_a, dim_b), optional) – Initialization of the OT plan for iterative methods, by default None
rank (int, optional) – Rank of the OT matrix for lazy solers (method=’factored’), by default 100
scaling (float, optional) – Scaling factor for the epsilon scaling lazy solvers (method=’geomloss’), by default 0.95
potentials_init ((array_like(dim_a,),array_like(dim_b,)), optional) – Initialization of the OT dual potentials for iterative methods, by default None
tol (_type_, optional) – Tolerance for solution precision, by default None (default values in each solvers)
verbose (bool, optional) – Print information in the solver, by default False
grad (str, optional) – Type of gradient computation, either or ‘autodiff’ or ‘envelope’ used only for
Sinkhorn solver. By default ‘autodiff’ provides gradients wrt all
outputs (plan, value, value_linear) but with important memory cost.
‘envelope’ provides gradients only for value and and other outputs are
detached. This is useful for memory saving when only the value is needed.
Returns:
res – Result of the optimization problem. The information can be obtained as follows:
res.plan : OT plan \(\mathbf{T}\)
res.potentials : OT dual potentials
res.value : Optimal value of the optimization problem
res.value_linear : Linear OT loss with the optimal OT plan
res.lazy_plan : Lazy OT plan (when lazy=True or lazy method)
# default is ``"KL"`` regularization (``reg_type="KL"``)res=ot.solve_sample(xa,xb,a,b,reg=1.0)# or for original Sinkhorn paper formulation [2]res=ot.solve_sample(xa,xb,a,b,reg=1.0,reg_type='entropy')# lazy solver of memory complexity O(n)res=ot.solve_sample(xa,xb,a,b,reg=1.0,lazy=True,batch_size=100)# lazy OT planlazy_plan=res.lazy_plan# Use envelope theorem differentiation for memory savingres=ot.solve_sample(xa,xb,a,b,reg=1.0,grad='envelope')res.value.backward()# only the value is differentiable
Note that by default the Sinkhorn solver uses automatic differentiation to
compute the gradients of the values and plan. This can be changed with the
grad parameter. The envelope mode computes the gradients only
for the value and the other outputs are detached. This is useful for
memory saving when only the gradient of value is needed.
We also have a very efficient solver with compiled CPU/CUDA code using
geomloss/PyKeOps that can be used with the following code:
# automatic solverres=ot.solve_sample(xa,xb,a,b,reg=1.0,method='geomloss')# force O(n) memory efficient solverres=ot.solve_sample(xa,xb,a,b,reg=1.0,method='geomloss_online')# force pre-computed cost matrixres=ot.solve_sample(xa,xb,a,b,reg=1.0,method='geomloss_tensorized')# use multiscale solverres=ot.solve_sample(xa,xb,a,b,reg=1.0,method='geomloss_multiscale')# One can play with speed (small scaling factor) and precision (scaling close to 1)res=ot.solve_sample(xa,xb,a,b,reg=1.0,method='geomloss',scaling=0.5)
Quadratic regularized OT [17] (when reg!=None and reg_type="L2"):
# default is ``"KL"`` for bothres=ot.solve_sample(xa,xb,a,b,reg=1.0,unbalanced=1.0)# quadratic unbalanced OT with KL regularizationres=ot.solve_sample(xa,xb,a,b,reg=1.0,unbalanced=1.0,unbalanced_type='L2')# both quadraticres=ot.solve_sample(xa,xb,a,b,reg=1.0,reg_type='L2',unbalanced=1.0,unbalanced_type='L2')
where $mu$ is a uniform weighted empirical distribution of \(\mu_a\) and \(\mu_b\) are the empirical measures associated
to the samples in the source and target domains, and \(W_2\) is the
Wasserstein distance. This problem is solved using exact OT solvers for
reg=None and the Sinkhorn solver for reg!=None. The solution provides
two transport plans that can be used to recover a low rank OT plan between
the two distributions.
res=ot.solve_sample(xa,xb,method='factored',rank=10)# recover the lazy low rank planfactored_solution_lazy=res.lazy_plan# recover the full low rank planfactored_solution=factored_solution_lazy[:]
The covariances and means are estimated from the data.
res=ot.solve_sample(xa,xb,method='gaussian')# recover the squared Gaussian Bures-Wasserstein distanceBW_dist=res.value
Wasserstein 1d [1] (when method='1D'):
This method computes the Wasserstein distance between two 1d distributions
estimated from the empirical distributions. For multivariate data the
distances are computed independently for each dimension.
res=ot.solve_sample(xa,xb,method='1D')# recover the squared Wasserstein distancesW_dists=res.value
Computes the 1 dimensional OT loss [15] between two (batched) empirical
distributions
It is formally the p-Wasserstein distance raised to the power p.
We do so in a vectorized way by first building the individual quantile functions then integrating them.
This function should be preferred to emd_1d whenever the backend is
different to numpy, and when gradients over
either sample positions or weights are required.
Parameters:
u_values (array-like, shape (n, ...)) – locations of the first empirical distribution
v_values (array-like, shape (m, ...)) – locations of the second empirical distribution
u_weights (array-like, shape (n, ...), optional) – weights of the first empirical distribution, if None then uniform weights are used
v_weights (array-like, shape (m, ...), optional) – weights of the second empirical distribution, if None then uniform weights are used
p (int, optional) – order of the ground metric used, should be at least 1 (see [2, Chap. 2], default is 1
require_sort (bool, optional) – sort the distributions atoms locations, if False we will consider they have been sorted prior to being passed to
the function, default is True
Computes the Wasserstein distance on the circle using either [45] for p=1 or
the binary search algorithm proposed in [44] otherwise.
Samples need to be in \(S^1\cong [0,1[\). If they are on \(\mathbb{R}\),
takes the value modulo 1.
If the values are on \(S^1\subset\mathbb{R}^2\), it requires to first find the coordinates
using e.g. the atan2 function.
\(\mathbf{a}\) and \(\mathbf{b}\) are the sample weights
Note
This function is backend-compatible and will work on arrays
from all compatible backends. But the algorithm uses the C++ CPU backend
which can lead to copy overhead on GPU arrays.
Uses the conditional gradient algorithm to solve the problem proposed
in [39].