API and modules
ot
:
Solvers for the original linear program OT problem 

Multilib backend for POT 

Bregman projections solvers for entropic regularized OT 

Smooth and Sparse Optimal Transport solvers (KL an L2 reg.) 

GromovWasserstein and FusedGromovWasserstein solvers 

Generic solvers for regularized OT 

Domain adaptation with optimal transport 

Dimension reduction with OT 

Various useful functions 

Simple example datasets 

Functions for plotting OT matrices 

Stochastic solvers for regularized OT. 

Regularized Unbalanced OT solvers 

Regularization path OT solvers 

Partial OT solvers 

Sliced OT Distances 

Weak optimal ransport solvers 

Factored OT solvers (low rank, cost or OT plan) 
Warning
The list of automatically imported submodules is as follows:
ot.lp
, ot.bregman
, ot.optim
ot.utils
, ot.datasets
,
ot.gromov
, ot.smooth
ot.stochastic
, ot.partial
, ot.regpath
, ot.unbalanced
.
The following submodules are not imported due to additional dependencies:
 ot.dr
: depends on pymanopt
and autograd
.
 ot.gpu
: depends on cupy
and a CUDA GPU.
 ot.plot
: depends on matplotlib
 ot.barycenter(A, M, reg, weights=None, method='sinkhorn', numItermax=10000, stopThr=0.0001, verbose=False, log=False, warn=True, **kwargs)[source]
Compute the entropic regularized wasserstein barycenter of distributions \(\mathbf{A}\)
The function solves the following optimization problem:
\[\mathbf{a} = \mathop{\arg \min}_\mathbf{a} \quad \sum_i W_{reg}(\mathbf{a},\mathbf{a}_i)\]where :
\(W_{reg}(\cdot,\cdot)\) is the entropic regularized Wasserstein distance (see
ot.bregman.sinkhorn()
) if method is sinkhorn or sinkhorn_stabilized or sinkhorn_log.\(\mathbf{a}_i\) are training distributions in the columns of matrix \(\mathbf{A}\)
reg and \(\mathbf{M}\) are respectively the regularization term and the cost matrix for OT
The algorithm used for solving the problem is the SinkhornKnopp matrix scaling algorithm as proposed in [3]
 Parameters
A (arraylike, shape (dim, n_hists)) – n_hists training distributions \(\mathbf{a}_i\) of size dim
M (arraylike, shape (dim, dim)) – loss matrix for OT
reg (float) – Regularization term > 0
method (str (optional)) – method used for the solver either ‘sinkhorn’ or ‘sinkhorn_stabilized’ or ‘sinkhorn_log’
weights (arraylike, shape (n_hists,)) – Weights of each histogram \(\mathbf{a}_i\) on the simplex (barycentric coodinates)
numItermax (int, optional) – Max number of iterations
stopThr (float, optional) – Stop threshold on error (>0)
verbose (bool, optional) – Print information along iterations
log (bool, optional) – record log if True
warn (bool, optional) – if True, raises a warning if the algorithm doesn’t convergence.
 Returns
a ((dim,) arraylike) – Wasserstein barycenter
log (dict) – log dictionary return only if log==True in parameters
References
 3
Benamou, J. D., Carlier, G., Cuturi, M., Nenna, L., & Peyré, G. (2015). Iterative Bregman projections for regularized transportation problems. SIAM Journal on Scientific Computing, 37(2), A1111A1138.
 ot.barycenter_unbalanced(A, M, reg, reg_m, method='sinkhorn', weights=None, numItermax=1000, stopThr=1e06, verbose=False, log=False, **kwargs)[source]
Compute the entropic unbalanced wasserstein barycenter of \(\mathbf{A}\).
The function solves the following optimization problem with \(\mathbf{a}\)
\[\mathbf{a} = \mathop{\arg \min}_\mathbf{a} \quad \sum_i W_{u_{reg}}(\mathbf{a},\mathbf{a}_i)\]where :
\(W_{u_{reg}}(\cdot,\cdot)\) is the unbalanced entropic regularized Wasserstein distance (see
ot.unbalanced.sinkhorn_unbalanced()
)\(\mathbf{a}_i\) are training distributions in the columns of matrix \(\mathbf{A}\)
reg and \(\mathbf{M}\) are respectively the regularization term and the cost matrix for OT
reg_mis the marginal relaxation hyperparameter
The algorithm used for solving the problem is the generalized SinkhornKnopp matrix scaling algorithm as proposed in [10]
 Parameters
A (arraylike (dim, n_hists)) – n_hists training distributions \(\mathbf{a}_i\) of dimension dim
M (arraylike (dim, dim)) – ground metric matrix for OT.
reg (float) – Entropy regularization term > 0
reg_m (float) – Marginal relaxation term > 0
weights (arraylike (n_hists,) optional) – Weight of each distribution (barycentric coodinates) If None, uniform weights are used.
numItermax (int, optional) – Max number of iterations
stopThr (float, optional) – Stop threshold on error (> 0)
verbose (bool, optional) – Print information along iterations
log (bool, optional) – record log if True
 Returns
a ((dim,) arraylike) – Unbalanced Wasserstein barycenter
log (dict) – log dictionary return only if log==True in parameters
References
 3
Benamou, J. D., Carlier, G., Cuturi, M., Nenna, L., & Peyré, G. (2015). Iterative Bregman projections for regularized transportation problems. SIAM Journal on Scientific Computing, 37(2), A1111A1138.
 10
Chizat, L., Peyré, G., Schmitzer, B., & Vialard, F. X. (2016). Scaling algorithms for unbalanced transport problems. arXiv preprin arXiv:1607.05816.
 ot.dist(x1, x2=None, metric='sqeuclidean', p=2, w=None)[source]
Compute distance between samples in \(\mathbf{x_1}\) and \(\mathbf{x_2}\)
Note
This function is backendcompatible and will work on arrays from all compatible backends.
 Parameters
x1 (arraylike, shape (n1,d)) – matrix with n1 samples of size d
x2 (arraylike, shape (n2,d), optional) – matrix with n2 samples of size d (if None then \(\mathbf{x_2} = \mathbf{x_1}\))
metric (str  callable, optional) – ‘sqeuclidean’ or ‘euclidean’ on all backends. On numpy the function also accepts from the scipy.spatial.distance.cdist function : ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘wminkowski’, ‘yule’.
p (float, optional) – pnorm for the Minkowski and the Weighted Minkowski metrics. Default value is 2.
w (arraylike, rank 1) – Weights for the weighted metrics.
 Returns
M – distance matrix computed with given metric
 Return type
arraylike, shape (n1, n2)
 ot.emd(a, b, M, numItermax=100000, log=False, center_dual=True, numThreads=1)[source]
Solves the Earth Movers distance problem and returns the OT matrix
\[ \begin{align}\begin{aligned}\gamma = \mathop{\arg \min}_\gamma \quad \langle \gamma, \mathbf{M} \rangle_F\\s.t. \ \gamma \mathbf{1} = \mathbf{a}\\ \gamma^T \mathbf{1} = \mathbf{b}\\ \gamma \geq 0\end{aligned}\end{align} \]where :
\(\mathbf{M}\) is the metric cost matrix
\(\mathbf{a}\) and \(\mathbf{b}\) are the sample weights
Warning
Note that the \(\mathbf{M}\) matrix in numpy needs to be a Corder numpy.array in float64 format. It will be converted if not in this format
Note
This function is backendcompatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.
Note
This function will cast the computed transport plan to the data type of the provided input with the following priority: \(\mathbf{a}\), then \(\mathbf{b}\), then \(\mathbf{M}\) if marginals are not provided. Casting to an integer tensor might result in a loss of precision. If this behaviour is unwanted, please make sure to provide a floating point input.
Uses the algorithm proposed in [1].
 Parameters
a ((ns,) arraylike, float) – Source histogram (uniform weight if empty list)
b ((nt,) arraylike, float) – Target histogram (uniform weight if empty list)
M ((ns,nt) arraylike, float) – Loss matrix (corder array in numpy with type float64)
numItermax (int, optional (default=100000)) – The maximum number of iterations before stopping the optimization algorithm if it has not converged.
log (bool, optional (default=False)) – If True, returns a dictionary containing the cost and dual variables. Otherwise returns only the optimal transportation matrix.
center_dual (boolean, optional (default=True)) – If True, centers the dual potential using function center_ot_dual.
numThreads (int or "max", optional (default=1, i.e. OpenMP is not used)) – If compiled with OpenMP, chooses the number of threads to parallelize. “max” selects the highest number possible.
 Returns
gamma (arraylike, shape (ns, nt)) – Optimal transportation matrix for the given parameters
log (dict, optional) – If input log is true, a dictionary containing the cost and dual variables and exit status
Examples
Simple example with obvious solution. The function emd accepts lists and perform automatic conversion to numpy arrays
>>> import ot >>> a=[.5,.5] >>> b=[.5,.5] >>> M=[[0.,1.],[1.,0.]] >>> ot.emd(a, b, M) array([[0.5, 0. ], [0. , 0.5]])
References
 1
Bonneel, N., Van De Panne, M., Paris, S., & Heidrich, W. (2011, December). Displacement interpolation using Lagrangian mass transport. In ACM Transactions on Graphics (TOG) (Vol. 30, No. 6, p. 158). ACM.
See also
ot.bregman.sinkhorn
Entropic regularized OT
ot.optim.cg
General regularized OT
 ot.emd2(a, b, M, processes=1, numItermax=100000, log=False, return_matrix=False, center_dual=True, numThreads=1)[source]
Solves the Earth Movers distance problem and returns the loss
\[ \begin{align}\begin{aligned}\min_\gamma \quad \langle \gamma, \mathbf{M} \rangle_F\\s.t. \ \gamma \mathbf{1} = \mathbf{a}\\ \gamma^T \mathbf{1} = \mathbf{b}\\ \gamma \geq 0\end{aligned}\end{align} \]where :
\(\mathbf{M}\) is the metric cost matrix
\(\mathbf{a}\) and \(\mathbf{b}\) are the sample weights
Note
This function is backendcompatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.
Note
This function will cast the computed transport plan and transportation loss to the data type of the provided input with the following priority: \(\mathbf{a}\), then \(\mathbf{b}\), then \(\mathbf{M}\) if marginals are not provided. Casting to an integer tensor might result in a loss of precision. If this behaviour is unwanted, please make sure to provide a floating point input.
Uses the algorithm proposed in [1].
 Parameters
a ((ns,) arraylike, float64) – Source histogram (uniform weight if empty list)
b ((nt,) arraylike, float64) – Target histogram (uniform weight if empty list)
M ((ns,nt) arraylike, float64) – Loss matrix (for numpy corder array with type float64)
processes (int, optional (default=1)) – Nb of processes used for multiple emd computation (deprecated)
numItermax (int, optional (default=100000)) – The maximum number of iterations before stopping the optimization algorithm if it has not converged.
log (boolean, optional (default=False)) – If True, returns a dictionary containing dual variables. Otherwise returns only the optimal transportation cost.
return_matrix (boolean, optional (default=False)) – If True, returns the optimal transportation matrix in the log.
center_dual (boolean, optional (default=True)) – If True, centers the dual potential using function center_ot_dual.
numThreads (int or "max", optional (default=1, i.e. OpenMP is not used)) – If compiled with OpenMP, chooses the number of threads to parallelize. “max” selects the highest number possible.
 Returns
W (float, arraylike) – Optimal transportation loss for the given parameters
log (dict) – If input log is true, a dictionary containing dual variables and exit status
Examples
Simple example with obvious solution. The function emd accepts lists and perform automatic conversion to numpy arrays
>>> import ot >>> a=[.5,.5] >>> b=[.5,.5] >>> M=[[0.,1.],[1.,0.]] >>> ot.emd2(a,b,M) 0.0
References
 1
Bonneel, N., Van De Panne, M., Paris, S., & Heidrich, W. (2011, December). Displacement interpolation using Lagrangian mass transport. In ACM Transactions on Graphics (TOG) (Vol. 30, No. 6, p. 158). ACM.
See also
ot.bregman.sinkhorn
Entropic regularized OT
ot.optim.cg
General regularized OT
 ot.emd2_1d(x_a, x_b, a=None, b=None, metric='sqeuclidean', p=1.0, dense=True, log=False)[source]
Solves the Earth Movers distance problem between 1d measures and returns the loss
\[ \begin{align}\begin{aligned}\gamma = arg\min_\gamma \sum_i \sum_j \gamma_{ij} d(x_a[i], x_b[j])\\s.t. \gamma 1 = a, \gamma^T 1= b, \gamma\geq 0\end{aligned}\end{align} \]where :
d is the metric
x_a and x_b are the samples
a and b are the sample weights
When ‘minkowski’ is used as a metric, \(d(x, y) = x  y^p\).
Uses the algorithm detailed in [1]_
 Parameters
x_a ((ns,) or (ns, 1) ndarray, float64) – Source dirac locations (on the real line)
x_b ((nt,) or (ns, 1) ndarray, float64) – Target dirac locations (on the real line)
a ((ns,) ndarray, float64, optional) – Source histogram (default is uniform weight)
b ((nt,) ndarray, float64, optional) – Target histogram (default is uniform weight)
metric (str, optional (default='sqeuclidean')) – Metric to be used. Only strings listed in
ot.dist()
are accepted. Due to implementation details, this function runs faster when ‘sqeuclidean’, ‘minkowski’, ‘cityblock’, or ‘euclidean’ metrics are used.p (float, optional (default=1.0)) – The pnorm to apply for if metric=’minkowski’
dense (boolean, optional (default=True)) – If True, returns math:gamma as a dense ndarray of shape (ns, nt). Otherwise returns a sparse representation using scipy’s coo_matrix format. Only used if log is set to True. Due to implementation details, this function runs faster when dense is set to False.
log (boolean, optional (default=False)) – If True, returns a dictionary containing the transportation matrix. Otherwise returns only the loss.
 Returns
loss (float) – Cost associated to the optimal transportation
log (dict) – If input log is True, a dictionary containing the Optimal transportation matrix for the given parameters
Examples
Simple example with obvious solution. The function emd2_1d accepts lists and performs automatic conversion to numpy arrays
>>> import ot >>> a=[.5, .5] >>> b=[.5, .5] >>> x_a = [2., 0.] >>> x_b = [0., 3.] >>> ot.emd2_1d(x_a, x_b, a, b) 0.5 >>> ot.emd2_1d(x_a, x_b) 0.5
References
 1
Peyré, G., & Cuturi, M. (2017). “Computational Optimal Transport”, 2018.
See also
ot.lp.emd2
EMD for multidimensional distributions
ot.lp.emd_1d
EMD for 1d distributions (returns the transportation matrix instead of the cost)
 ot.emd_1d(x_a, x_b, a=None, b=None, metric='sqeuclidean', p=1.0, dense=True, log=False)[source]
Solves the Earth Movers distance problem between 1d measures and returns the OT matrix
\[ \begin{align}\begin{aligned}\gamma = arg\min_\gamma \sum_i \sum_j \gamma_{ij} d(x_a[i], x_b[j])\\s.t. \gamma 1 = a, \gamma^T 1= b, \gamma\geq 0\end{aligned}\end{align} \]where :
d is the metric
x_a and x_b are the samples
a and b are the sample weights
When ‘minkowski’ is used as a metric, \(d(x, y) = x  y^p\).
Uses the algorithm detailed in [1]_
 Parameters
x_a ((ns,) or (ns, 1) ndarray, float64) – Source dirac locations (on the real line)
x_b ((nt,) or (ns, 1) ndarray, float64) – Target dirac locations (on the real line)
a ((ns,) ndarray, float64, optional) – Source histogram (default is uniform weight)
b ((nt,) ndarray, float64, optional) – Target histogram (default is uniform weight)
metric (str, optional (default='sqeuclidean')) – Metric to be used. Only strings listed in
ot.dist()
are accepted. Due to implementation details, this function runs faster when ‘sqeuclidean’, ‘cityblock’, or ‘euclidean’ metrics are used.p (float, optional (default=1.0)) – The pnorm to apply for if metric=’minkowski’
dense (boolean, optional (default=True)) – If True, returns math:gamma as a dense ndarray of shape (ns, nt). Otherwise returns a sparse representation using scipy’s coo_matrix format. Due to implementation details, this function runs faster when ‘sqeuclidean’, ‘minkowski’, ‘cityblock’, or ‘euclidean’ metrics are used.
log (boolean, optional (default=False)) – If True, returns a dictionary containing the cost. Otherwise returns only the optimal transportation matrix.
 Returns
gamma ((ns, nt) ndarray) – Optimal transportation matrix for the given parameters
log (dict) – If input log is True, a dictionary containing the cost
Examples
Simple example with obvious solution. The function emd_1d accepts lists and performs automatic conversion to numpy arrays
>>> import ot >>> a=[.5, .5] >>> b=[.5, .5] >>> x_a = [2., 0.] >>> x_b = [0., 3.] >>> ot.emd_1d(x_a, x_b, a, b) array([[0. , 0.5], [0.5, 0. ]]) >>> ot.emd_1d(x_a, x_b) array([[0. , 0.5], [0.5, 0. ]])
References
 1
Peyré, G., & Cuturi, M. (2017). “Computational Optimal Transport”, 2018.
See also
ot.lp.emd
EMD for multidimensional distributions
ot.lp.emd2_1d
EMD for 1d distributions (returns cost instead of the transportation matrix)
 ot.factored_optimal_transport(Xa, Xb, a=None, b=None, reg=0.0, r=100, X0=None, stopThr=1e07, numItermax=100, verbose=False, log=False, **kwargs)[source]
Solves factored OT problem and return OT plans and intermediate distribution
This function solve the following OT problem 40
\[\mathop{\arg \min}_\mu \quad W_2^2(\mu_a,\mu)+ W_2^2(\mu,\mu_b)\]where :
\(\mu_a\) and \(\mu_b\) are empirical distributions.
\(\mu\) is an empirical distribution with r samples
And returns the two OT plans between
Note
This function is backendcompatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.
Uses the conditional gradient algorithm to solve the problem proposed in [39].
 Parameters
Xa ((ns,d) arraylike, float) – Source samples
Xb ((nt,d) arraylike, float) – Target samples
a ((ns,) arraylike, float) – Source histogram (uniform weight if empty list)
b ((nt,) arraylike, float) – Target histogram (uniform weight if empty list))
numItermax (int, optional) – Max number of iterations
stopThr (float, optional) – Stop threshold on the relative variation (>0)
verbose (bool, optional) – Print information along iterations
log (bool, optional) – record log if True
 Returns
Ga (arraylike, shape (ns, r)) – Optimal transportation matrix between source and the intermediate distribution
Gb (arraylike, shape (r, nt)) – Optimal transportation matrix between the intermediate and target distribution
X (arraylike, shape (r, d)) – Support of the intermediate distribution
log (dict, optional) – If input log is true, a dictionary containing the cost and dual variables and exit status
References
 40
Forrow, A., Hütter, J. C., Nitzan, M., Rigollet, P., Schiebinger, G., & Weed, J. (2019, April). Statistical optimal transport via factored couplings. In The 22nd International Conference on Artificial Intelligence and Statistics (pp. 24542465). PMLR.
 ot.fused_gromov_wasserstein(M, C1, C2, p, q, loss_fun='square_loss', alpha=0.5, armijo=False, G0=None, log=False, **kwargs)[source]
Computes the FGW transport between two graphs (see [24])
\[ \begin{align}\begin{aligned}\gamma = \mathop{\arg \min}_\gamma \quad (1  \alpha) \langle \gamma, \mathbf{M} \rangle_F + \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{\gamma} \mathbf{1} &= \mathbf{p}\\ \mathbf{\gamma}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{\gamma} &\geq 0\end{aligned}\end{align} \]where :
\(\mathbf{M}\) is the (ns, nt) metric cost matrix
\(\mathbf{p}\) and \(\mathbf{q}\) are source and target weights (sum to 1)
L is a loss function to account for the misfit between the similarity matrices
Note
This function is backendcompatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.
The algorithm used for solving the problem is conditional gradient as discussed in [24]
 Parameters
M (arraylike, shape (ns, nt)) – Metric cost matrix between features across domains
C1 (arraylike, shape (ns, ns)) – Metric cost matrix representative of the structure in the source space
C2 (arraylike, shape (nt, nt)) – Metric cost matrix representative of the structure in the target space
p (arraylike, shape (ns,)) – Distribution in the source space
q (arraylike, shape (nt,)) – Distribution in the target space
loss_fun (str, optional) – Loss function used for the solver
alpha (float, optional) – Tradeoff parameter (0 < alpha < 1)
armijo (bool, optional) – If True the step of the linesearch is found via an armijo research. Else closed form is used. If there are convergence issues use False.
G0 (arraylike, shape (ns,nt), optional) – If None the initial transport plan of the solver is pq^T. Otherwise G0 must satisfy marginal constraints and will be used as initial transport of the solver.
log (bool, optional) – record log if True
**kwargs (dict) – parameters can be directly passed to the ot.optim.cg solver
 Returns
gamma (arraylike, shape (ns, nt)) – Optimal transportation matrix for the given parameters.
log (dict) – Log dictionary return only if log==True in parameters.
References
 24
Vayer Titouan, Chapel Laetitia, Flamary Rémi, Tavenard Romain and Courty Nicolas “Optimal Transport for structured data with application on graphs”, International Conference on Machine Learning (ICML). 2019.
 ot.fused_gromov_wasserstein2(M, C1, C2, p, q, loss_fun='square_loss', alpha=0.5, armijo=False, G0=None, log=False, **kwargs)[source]
Computes the FGW distance between two graphs see (see [24])
\[ \begin{align}\begin{aligned}\min_\gamma \quad (1  \alpha) \langle \gamma, \mathbf{M} \rangle_F + \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{\gamma} \mathbf{1} &= \mathbf{p}\\ \mathbf{\gamma}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{\gamma} &\geq 0\end{aligned}\end{align} \]where :
\(\mathbf{M}\) is the (ns, nt) metric cost matrix
\(\mathbf{p}\) and \(\mathbf{q}\) are source and target weights (sum to 1)
L is a loss function to account for the misfit between the similarity matrices
The algorithm used for solving the problem is conditional gradient as discussed in [24]
Note
This function is backendcompatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.
Note that when using backends, this loss function is differentiable wrt the marices and weights for quadratic loss using the gradients from [38]_.
 Parameters
M (arraylike, shape (ns, nt)) – Metric cost matrix between features across domains
C1 (arraylike, shape (ns, ns)) – Metric cost matrix representative of the structure in the source space.
C2 (arraylike, shape (nt, nt)) – Metric cost matrix representative of the structure in the target space.
p (arraylike, shape (ns,)) – Distribution in the source space.
q (arraylike, shape (nt,)) – Distribution in the target space.
loss_fun (str, optional) – Loss function used for the solver.
alpha (float, optional) – Tradeoff parameter (0 < alpha < 1)
armijo (bool, optional) – If True the step of the linesearch is found via an armijo research. Else closed form is used. If there are convergence issues use False.
G0 (arraylike, shape (ns,nt), optional) – If None the initial transport plan of the solver is pq^T. Otherwise G0 must satisfy marginal constraints and will be used as initial transport of the solver.
log (bool, optional) – Record log if True.
**kwargs (dict) – Parameters can be directly passed to the ot.optim.cg solver.
 Returns
fgwdistance (float) – Fused gromov wasserstein distance for the given parameters.
log (dict) – Log dictionary return only if log==True in parameters.
References
 24
Vayer Titouan, Chapel Laetitia, Flamary Rémi, Tavenard Romain and Courty Nicolas “Optimal Transport for structured data with application on graphs” International Conference on Machine Learning (ICML). 2019.
 38
C. VincentCuaz, T. Vayer, R. Flamary, M. Corneli, N. Courty, Online Graph Dictionary Learning, International Conference on Machine Learning (ICML), 2021.
 ot.gromov_barycenters(N, Cs, ps, p, lambdas, loss_fun, max_iter=1000, tol=1e09, verbose=False, log=False, init_C=None, random_state=None)[source]
Returns the gromovwasserstein barycenters of S measured similarity matrices \((\mathbf{C}_s)_{1 \leq s \leq S}\)
The function solves the following optimization problem with block coordinate descent:
\[\mathbf{C} = \mathop{\arg \min}_{\mathbf{C}\in \mathbb{R}^{N \times N}} \quad \sum_s \lambda_s \mathrm{GW}(\mathbf{C}, \mathbf{C}_s, \mathbf{p}, \mathbf{p}_s)\]Where :
\(\mathbf{C}_s\): metric cost matrix
\(\mathbf{p}_s\): distribution
 Parameters
N (int) – Size of the targeted barycenter
Cs (list of S arraylike of shape (ns, ns)) – Metric cost matrices
ps (list of S arraylike of shape (ns,)) – Sample weights in the S spaces
p (arraylike, shape (N,)) – Weights in the targeted barycenter
lambdas (list of float) – List of the S spaces’ weights
loss_fun (callable) – tensormatrix multiplication function based on specific loss function
update (callable) – function(\(\mathbf{p}\), lambdas, \(\mathbf{T}\), \(\mathbf{Cs}\)) that updates \(\mathbf{C}\) according to a specific Kernel with the S \(\mathbf{T}_s\) couplings calculated at each iteration
max_iter (int, optional) – Max number of iterations
tol (float, optional) – Stop threshold on error (>0).
verbose (bool, optional) – Print information along iterations.
log (bool, optional) – Record log if True.
init_C (bool  arraylike, shape(N,N)) – Random initial value for the \(\mathbf{C}\) matrix provided by user.
random_state (int or RandomState instance, optional) – Fix the seed for reproducibility
 Returns
C (arraylike, shape (N, N)) – Similarity matrix in the barycenter space (permutated arbitrarily)
log (dict) – Log dictionary of error during iterations. Return only if log=True in parameters.
References
 12
Gabriel Peyré, Marco Cuturi, and Justin Solomon, “GromovWasserstein averaging of kernel and distance matrices.” International Conference on Machine Learning (ICML). 2016.
 ot.gromov_wasserstein(C1, C2, p, q, loss_fun='square_loss', log=False, armijo=False, G0=None, **kwargs)[source]
Returns the gromovwasserstein transport between \((\mathbf{C_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{q})\)
The function solves the following optimization problem:
\[\mathbf{GW} = \mathop{\arg \min}_\mathbf{T} \quad \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\]Where :
\(\mathbf{C_1}\): Metric cost matrix in the source space
\(\mathbf{C_2}\): Metric cost matrix in the target space
\(\mathbf{p}\): distribution in the source space
\(\mathbf{q}\): distribution in the target space
L: loss function to account for the misfit between the similarity matrices
Note
This function is backendcompatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.
 Parameters
C1 (arraylike, shape (ns, ns)) – Metric cost matrix in the source space
C2 (arraylike, shape (nt, nt)) – Metric cost matrix in the target space
p (arraylike, shape (ns,)) – Distribution in the source space
q (arraylike, shape (nt,)) – Distribution in the target space
loss_fun (str) – loss function used for the solver either ‘square_loss’ or ‘kl_loss’
max_iter (int, optional) – Max number of iterations
tol (float, optional) – Stop threshold on error (>0)
verbose (bool, optional) – Print information along iterations
log (bool, optional) – record log if True
armijo (bool, optional) – If True the step of the linesearch is found via an armijo research. Else closed form is used. If there are convergence issues use False.
G0 (arraylike, shape (ns,nt), optional) – If None the initial transport plan of the solver is pq^T. Otherwise G0 must satisfy marginal constraints and will be used as initial transport of the solver.
**kwargs (dict) – parameters can be directly passed to the ot.optim.cg solver
 Returns
T (arraylike, shape (ns, nt)) –
Coupling between the two spaces that minimizes:
\(\sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\)
log (dict) – Convergence information and loss.
References
 12
Gabriel Peyré, Marco Cuturi, and Justin Solomon, “GromovWasserstein averaging of kernel and distance matrices.” International Conference on Machine Learning (ICML). 2016.
 13
Mémoli, Facundo. Gromov–Wasserstein distances and the metric approach to object matching. Foundations of computational mathematics 11.4 (2011): 417487.
 ot.gromov_wasserstein2(C1, C2, p, q, loss_fun='square_loss', log=False, armijo=False, G0=None, **kwargs)[source]
Returns the gromovwasserstein discrepancy between \((\mathbf{C_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{q})\)
The function solves the following optimization problem:
\[GW = \min_\mathbf{T} \quad \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\]Where :
\(\mathbf{C_1}\): Metric cost matrix in the source space
\(\mathbf{C_2}\): Metric cost matrix in the target space
\(\mathbf{p}\): distribution in the source space
\(\mathbf{q}\): distribution in the target space
L: loss function to account for the misfit between the similarity matrices
Note that when using backends, this loss function is differentiable wrt the marices and weights for quadratic loss using the gradients from [38]_.
Note
This function is backendcompatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.
 Parameters
C1 (arraylike, shape (ns, ns)) – Metric cost matrix in the source space
C2 (arraylike, shape (nt, nt)) – Metric cost matrix in the target space
p (arraylike, shape (ns,)) – Distribution in the source space.
q (arraylike, shape (nt,)) – Distribution in the target space.
loss_fun (str) – loss function used for the solver either ‘square_loss’ or ‘kl_loss’
max_iter (int, optional) – Max number of iterations
tol (float, optional) – Stop threshold on error (>0)
verbose (bool, optional) – Print information along iterations
log (bool, optional) – record log if True
armijo (bool, optional) – If True the step of the linesearch is found via an armijo research. Else closed form is used. If there are convergence issues use False.
G0 (arraylike, shape (ns,nt), optional) – If None the initial transport plan of the solver is pq^T. Otherwise G0 must satisfy marginal constraints and will be used as initial transport of the solver.
 Returns
gw_dist (float) – GromovWasserstein distance
log (dict) – convergence information and Coupling marix
References
 12
Gabriel Peyré, Marco Cuturi, and Justin Solomon, “GromovWasserstein averaging of kernel and distance matrices.” International Conference on Machine Learning (ICML). 2016.
 13
Mémoli, Facundo. Gromov–Wasserstein distances and the metric approach to object matching. Foundations of computational mathematics 11.4 (2011): 417487.
 38
C. VincentCuaz, T. Vayer, R. Flamary, M. Corneli, N. Courty, Online Graph Dictionary Learning, International Conference on Machine Learning (ICML), 2021.
 ot.max_sliced_wasserstein_distance(X_s, X_t, a=None, b=None, n_projections=50, p=2, projections=None, seed=None, log=False)[source]
Computes a MonteCarlo approximation of the max pSliced Wasserstein distance
\[\mathcal{MaxSWD}_p(\mu, \nu) = \underset{\theta _in \mathcal{U}(\mathbb{S}^{d1})}{\max} [\mathcal{W}_p^p(\theta_\# \mu, \theta_\# \nu)]^{\frac{1}{p}}\]where :
\(\theta_\# \mu\) stands for the pushforwars of the projection \(\mathbb{R}^d \ni X \mapsto \langle \theta, X \rangle\)
 Parameters
X_s (ndarray, shape (n_samples_a, dim)) – samples in the source domain
X_t (ndarray, shape (n_samples_b, dim)) – samples in the target domain
a (ndarray, shape (n_samples_a,), optional) – samples weights in the source domain
b (ndarray, shape (n_samples_b,), optional) – samples weights in the target domain
n_projections (int, optional) – Number of projections used for the MonteCarlo approximation
p (float, optional =) – Power p used for computing the sliced Wasserstein
projections (shape (dim, n_projections), optional) – Projection matrix (n_projections and seed are not used in this case)
seed (int or RandomState or None, optional) – Seed used for random number generator
log (bool, optional) – if True, sliced_wasserstein_distance returns the projections used and their associated EMD.
 Returns
cost (float) – Sliced Wasserstein Cost
log (dict, optional) – log dictionary return only if log==True in parameters
Examples
>>> n_samples_a = 20 >>> reg = 0.1 >>> X = np.random.normal(0., 1., (n_samples_a, 5)) >>> sliced_wasserstein_distance(X, X, seed=0) 0.0
References
 35
Deshpande, I., Hu, Y. T., Sun, R., Pyrros, A., Siddiqui, N., Koyejo, S., … & Schwing, A. G. (2019). Maxsliced wasserstein distance and its use for gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1064810656).
 ot.sinkhorn(a, b, M, reg, method='sinkhorn', numItermax=1000, stopThr=1e09, verbose=False, log=False, warn=True, **kwargs)[source]
Solve the entropic regularization optimal transport problem and return the OT matrix
The function solves the following optimization problem:
\[ \begin{align}\begin{aligned}\gamma = \mathop{\arg \min}_\gamma \quad \langle \gamma, \mathbf{M} \rangle_F + \mathrm{reg}\cdot\Omega(\gamma)\\s.t. \ \gamma \mathbf{1} &= \mathbf{a}\\ \gamma^T \mathbf{1} &= \mathbf{b}\\ \gamma &\geq 0\end{aligned}\end{align} \]where :
\(\mathbf{M}\) is the (dim_a, dim_b) metric cost matrix
\(\Omega\) is the entropic regularization term \(\Omega(\gamma)=\sum_{i,j} \gamma_{i,j}\log(\gamma_{i,j})\)
\(\mathbf{a}\) and \(\mathbf{b}\) are source and target weights (histograms, both sum to 1)
Note
This function is backendcompatible and will work on arrays from all compatible backends.
The algorithm used for solving the problem is the SinkhornKnopp matrix scaling algorithm as proposed in [2]
Choosing a Sinkhorn solver
By default and when using a regularization parameter that is not too small the default sinkhorn solver should be enough. If you need to use a small regularization to get sharper OT matrices, you should use the
ot.bregman.sinkhorn_stabilized()
solver that will avoid numerical errors. This last solver can be very slow in practice and might not even converge to a reasonable OT matrix in a finite time. This is whyot.bregman.sinkhorn_epsilon_scaling()
that relies on iterating the value of the regularization (and using warm start) sometimes leads to better solutions. Note that the greedy version of the sinkhornot.bregman.greenkhorn()
can also lead to a speedup and the screening version of the sinkhornot.bregman.screenkhorn()
aim at providing a fast approximation of the Sinkhorn problem. For use of GPU and gradient computation with small number of iterations we strongly recommend theot.bregman.sinkhorn_log()
solver that will no need to check for numerical problems. Parameters
a (arraylike, shape (dim_a,)) – samples weights in the source domain
b (arraylike, shape (dim_b,) or ndarray, shape (dim_b, n_hists)) – samples in the target domain, compute sinkhorn with multiple targets and fixed \(\mathbf{M}\) if \(\mathbf{b}\) is a matrix (return OT loss + dual variables in log)
M (arraylike, shape (dim_a, dim_b)) – loss matrix
reg (float) – Regularization term >0
method (str) – method used for the solver either ‘sinkhorn’,’sinkhorn_log’, ‘greenkhorn’, ‘sinkhorn_stabilized’ or ‘sinkhorn_epsilon_scaling’, see those function for specific parameters
numItermax (int, optional) – Max number of iterations
stopThr (float, optional) – Stop threshold on error (>0)
verbose (bool, optional) – Print information along iterations
log (bool, optional) – record log if True
warn (bool, optional) – if True, raises a warning if the algorithm doesn’t convergence.
 Returns
gamma (arraylike, shape (dim_a, dim_b)) – Optimal transportation matrix for the given parameters
log (dict) – log dictionary return only if log==True in parameters
Examples
>>> import ot >>> a=[.5, .5] >>> b=[.5, .5] >>> M=[[0., 1.], [1., 0.]] >>> ot.sinkhorn(a, b, M, 1) array([[0.36552929, 0.13447071], [0.13447071, 0.36552929]])
References
 2
M. Cuturi, Sinkhorn Distances : Lightspeed Computation of Optimal Transport, Advances in Neural Information Processing Systems (NIPS) 26, 2013
 9
Schmitzer, B. (2016). Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems. arXiv preprint arXiv:1610.06519.
 10
Chizat, L., Peyré, G., Schmitzer, B., & Vialard, F. X. (2016). Scaling algorithms for unbalanced transport problems. arXiv preprint arXiv:1607.05816.
 34
Feydy, J., Séjourné, T., Vialard, F. X., Amari, S. I., Trouvé, A., & Peyré, G. (2019, April). Interpolating between optimal transport and MMD using Sinkhorn divergences. In The 22nd International Conference on Artificial Intelligence and Statistics (pp. 26812690). PMLR.
See also
ot.lp.emd
Unregularized OT
ot.optim.cg
General regularized OT
ot.bregman.sinkhorn_knopp
Classic Sinkhorn [2]
ot.bregman.sinkhorn_stabilized
ot.bregman.sinkhorn_epsilon_scaling
 ot.sinkhorn2(a, b, M, reg, method='sinkhorn', numItermax=1000, stopThr=1e09, verbose=False, log=False, warn=False, **kwargs)[source]
Solve the entropic regularization optimal transport problem and return the loss
The function solves the following optimization problem:
\[ \begin{align}\begin{aligned}W = \min_\gamma \quad \langle \gamma, \mathbf{M} \rangle_F + \mathrm{reg}\cdot\Omega(\gamma)\\s.t. \ \gamma \mathbf{1} &= \mathbf{a}\\ \gamma^T \mathbf{1} &= \mathbf{b}\\ \gamma &\geq 0\end{aligned}\end{align} \]where :
\(\mathbf{M}\) is the (dim_a, dim_b) metric cost matrix
\(\Omega\) is the entropic regularization term \(\Omega(\gamma)=\sum_{i,j} \gamma_{i,j}\log(\gamma_{i,j})\)
\(\mathbf{a}\) and \(\mathbf{b}\) are source and target weights (histograms, both sum to 1)
Note
This function is backendcompatible and will work on arrays from all compatible backends.
The algorithm used for solving the problem is the SinkhornKnopp matrix scaling algorithm as proposed in [2]
Choosing a Sinkhorn solver
By default and when using a regularization parameter that is not too small the default sinkhorn solver should be enough. If you need to use a small regularization to get sharper OT matrices, you should use the
ot.bregman.sinkhorn_log()
solver that will avoid numerical errors. This last solver can be very slow in practice and might not even converge to a reasonable OT matrix in a finite time. This is whyot.bregman.sinkhorn_epsilon_scaling()
that relies on iterating the value of the regularization (and using warm start) sometimes leads to better solutions. Note that the greedy version of the sinkhornot.bregman.greenkhorn()
can also lead to a speedup and the screening version of the sinkhornot.bregman.screenkhorn()
aim a providing a fast approximation of the Sinkhorn problem. For use of GPU and gradient computation with small number of iterations we strongly recommend theot.bregman.sinkhorn_log()
solver that will no need to check for numerical problems. Parameters
a (arraylike, shape (dim_a,)) – samples weights in the source domain
b (arraylike, shape (dim_b,) or ndarray, shape (dim_b, n_hists)) – samples in the target domain, compute sinkhorn with multiple targets and fixed \(\mathbf{M}\) if \(\mathbf{b}\) is a matrix (return OT loss + dual variables in log)
M (arraylike, shape (dim_a, dim_b)) – loss matrix
reg (float) – Regularization term >0
method (str) – method used for the solver either ‘sinkhorn’,’sinkhorn_log’, ‘sinkhorn_stabilized’, see those function for specific parameters
numItermax (int, optional) – Max number of iterations
stopThr (float, optional) – Stop threshold on error (>0)
verbose (bool, optional) – Print information along iterations
log (bool, optional) – record log if True
warn (bool, optional) – if True, raises a warning if the algorithm doesn’t convergence.
 Returns
W ((n_hists) float/arraylike) – Optimal transportation loss for the given parameters
log (dict) – log dictionary return only if log==True in parameters
Examples
>>> import ot >>> a=[.5, .5] >>> b=[.5, .5] >>> M=[[0., 1.], [1., 0.]] >>> ot.sinkhorn2(a, b, M, 1) 0.26894142136999516
References
 2
M. Cuturi, Sinkhorn Distances : Lightspeed Computation of Optimal Transport, Advances in Neural Information Processing Systems (NIPS) 26, 2013
 9
Schmitzer, B. (2016). Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems. arXiv preprint arXiv:1610.06519.
 10
Chizat, L., Peyré, G., Schmitzer, B., & Vialard, F. X. (2016). Scaling algorithms for unbalanced transport problems. arXiv preprint arXiv:1607.05816.
 21
Altschuler J., Weed J., Rigollet P. : Nearlinear time approximation algorithms for optimal transport via Sinkhorn iteration, Advances in Neural Information Processing Systems (NIPS) 31, 2017
 34
Feydy, J., Séjourné, T., Vialard, F. X., Amari, S. I., Trouvé, A., & Peyré, G. (2019, April). Interpolating between optimal transport and MMD using Sinkhorn divergences. In The 22nd International Conference on Artificial Intelligence and Statistics (pp. 26812690). PMLR.
See also
ot.lp.emd
Unregularized OT
ot.optim.cg
General regularized OT
ot.bregman.sinkhorn_knopp
Classic Sinkhorn [2]
ot.bregman.greenkhorn
Greenkhorn [21]
ot.bregman.sinkhorn_stabilized
 ot.sinkhorn_lpl1_mm(a, labels_a, b, M, reg, eta=0.1, numItermax=10, numInnerItermax=200, stopInnerThr=1e09, verbose=False, log=False)[source]
Solve the entropic regularization optimal transport problem with nonconvex group lasso regularization
The function solves the following optimization problem:
\[ \begin{align}\begin{aligned}\gamma = \mathop{\arg \min}_\gamma \quad \langle \gamma, \mathbf{M} \rangle_F + \mathrm{reg} \cdot \Omega_e(\gamma) + \eta \ \Omega_g(\gamma)\\s.t. \ \gamma \mathbf{1} = \mathbf{a}\\ \gamma^T \mathbf{1} = \mathbf{b}\\ \gamma \geq 0\end{aligned}\end{align} \]where :
\(\mathbf{M}\) is the (ns, nt) metric cost matrix
\(\Omega_e\) is the entropic regularization term \(\Omega_e (\gamma)=\sum_{i,j} \gamma_{i,j}\log(\gamma_{i,j})\)
\(\Omega_g\) is the group lasso regularization term \(\Omega_g(\gamma)=\sum_{i,c} \\gamma_{i,\mathcal{I}_c}\^{1/2}_1\) where \(\mathcal{I}_c\) are the index of samples from class c in the source domain.
\(\mathbf{a}\) and \(\mathbf{b}\) are source and target weights (sum to 1)
The algorithm used for solving the problem is the generalized conditional gradient as proposed in [5, 7].
 Parameters
a (arraylike (ns,)) – samples weights in the source domain
labels_a (arraylike (ns,)) – labels of samples in the source domain
b (arraylike (nt,)) – samples weights in the target domain
M (arraylike (ns,nt)) – loss matrix
reg (float) – Regularization term for entropic regularization >0
eta (float, optional) – Regularization term for group lasso regularization >0
numItermax (int, optional) – Max number of iterations
numInnerItermax (int, optional) – Max number of iterations (inner sinkhorn solver)
stopInnerThr (float, optional) – Stop threshold on error (inner sinkhorn solver) (>0)
verbose (bool, optional) – Print information along iterations
log (bool, optional) – record log if True
 Returns
gamma ((ns, nt) arraylike) – Optimal transportation matrix for the given parameters
log (dict) – log dictionary return only if log==True in parameters
References
 5
N. Courty; R. Flamary; D. Tuia; A. Rakotomamonjy, “Optimal Transport for Domain Adaptation,” in IEEE Transactions on Pattern Analysis and Machine Intelligence , vol.PP, no.99, pp.11
 7
Rakotomamonjy, A., Flamary, R., & Courty, N. (2015). Generalized conditional gradient: analysis of convergence and applications. arXiv preprint arXiv:1510.06567.
See also
ot.lp.emd
Unregularized OT
ot.bregman.sinkhorn
Entropic regularized OT
ot.optim.cg
General regularized OT
 ot.sinkhorn_unbalanced(a, b, M, reg, reg_m, method='sinkhorn', numItermax=1000, stopThr=1e06, verbose=False, log=False, **kwargs)[source]
Solve the unbalanced entropic regularization optimal transport problem and return the OT plan
The function solves the following optimization problem:
\[ \begin{align}\begin{aligned}W = \min_\gamma \ \langle \gamma, \mathbf{M} \rangle_F + \mathrm{reg}\cdot\Omega(\gamma) + \mathrm{reg_m} \cdot \mathrm{KL}(\gamma \mathbf{1}, \mathbf{a}) + \mathrm{reg_m} \cdot \mathrm{KL}(\gamma^T \mathbf{1}, \mathbf{b})\\s.t. \gamma \geq 0\end{aligned}\end{align} \]where :
\(\mathbf{M}\) is the (dim_a, dim_b) metric cost matrix
\(\Omega\) is the entropic regularization term, \(\Omega(\gamma)=\sum_{i,j} \gamma_{i,j}\log(\gamma_{i,j})\)
\(\mathbf{a}\) and \(\mathbf{b}\) are source and target unbalanced distributions
KL is the KullbackLeibler divergence
The algorithm used for solving the problem is the generalized SinkhornKnopp matrix scaling algorithm as proposed in [10, 25]
 Parameters
a (arraylike (dim_a,)) – Unnormalized histogram of dimension dim_a
b (arraylike (dim_b,) or arraylike (dim_b, n_hists)) – One or multiple unnormalized histograms of dimension dim_b. If many, compute all the OT distances \((\mathbf{a}, \mathbf{b}_i)_i\)
M (arraylike (dim_a, dim_b)) – loss matrix
reg (float) – Entropy regularization term > 0
reg_m (float) – Marginal relaxation term > 0
method (str) – method used for the solver either ‘sinkhorn’, ‘sinkhorn_stabilized’ or ‘sinkhorn_reg_scaling’, see those function for specific parameters
numItermax (int, optional) – Max number of iterations
stopThr (float, optional) – Stop threshold on error (>0)
verbose (bool, optional) – Print information along iterations
log (bool, optional) – record log if True
 Returns
if n_hists == 1 –
 gamma(dim_a, dim_b) arraylike
Optimal transportation matrix for the given parameters
 logdict
log dictionary returned only if log is True
else –
 ot_distance(n_hists,) arraylike
the OT distance between \(\mathbf{a}\) and each of the histograms \(\mathbf{b}_i\)
 logdict
log dictionary returned only if log is True
Examples
>>> import ot >>> a=[.5, .5] >>> b=[.5, .5] >>> M=[[0., 1.], [1., 0.]] >>> ot.sinkhorn_unbalanced(a, b, M, 1, 1) array([[0.51122823, 0.18807035], [0.18807035, 0.51122823]])
References
 2
M. Cuturi, Sinkhorn Distances : Lightspeed Computation of Optimal Transport, Advances in Neural Information Processing Systems (NIPS) 26, 2013
 9
Schmitzer, B. (2016). Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems. arXiv preprint arXiv:1610.06519.
 10
Chizat, L., Peyré, G., Schmitzer, B., & Vialard, F. X. (2016). Scaling algorithms for unbalanced transport problems. arXiv preprint arXiv:1607.05816.
 25
Frogner C., Zhang C., Mobahi H., ArayaPolo M., Poggio T. : Learning with a Wasserstein Loss, Advances in Neural Information Processing Systems (NIPS) 2015
See also
ot.unbalanced.sinkhorn_knopp_unbalanced
Unbalanced Classic Sinkhorn [10]
ot.unbalanced.sinkhorn_stabilized_unbalanced
Unbalanced Stabilized sinkhorn [9, 10]
ot.unbalanced.sinkhorn_reg_scaling_unbalanced
Unbalanced Sinkhorn with epslilon scaling [9, 10]
 ot.sinkhorn_unbalanced2(a, b, M, reg, reg_m, method='sinkhorn', numItermax=1000, stopThr=1e06, verbose=False, log=False, **kwargs)[source]
Solve the entropic regularization unbalanced optimal transport problem and return the loss
The function solves the following optimization problem:
\[ \begin{align}\begin{aligned}W = \min_\gamma \quad \langle \gamma, \mathbf{M} \rangle_F + \mathrm{reg}\cdot\Omega(\gamma) + \mathrm{reg_m} \cdot \mathrm{KL}(\gamma \mathbf{1}, \mathbf{a}) + \mathrm{reg_m} \cdot \mathrm{KL}(\gamma^T \mathbf{1}, \mathbf{b})\\s.t. \gamma\geq 0\end{aligned}\end{align} \]where :
\(\mathbf{M}\) is the (dim_a, dim_b) metric cost matrix
\(\Omega\) is the entropic regularization term, \(\Omega(\gamma)=\sum_{i,j} \gamma_{i,j}\log(\gamma_{i,j})\)
\(\mathbf{a}\) and \(\mathbf{b}\) are source and target unbalanced distributions
KL is the KullbackLeibler divergence
The algorithm used for solving the problem is the generalized SinkhornKnopp matrix scaling algorithm as proposed in [10, 25]
 Parameters
a (arraylike (dim_a,)) – Unnormalized histogram of dimension dim_a
b (arraylike (dim_b,) or arraylike (dim_b, n_hists)) – One or multiple unnormalized histograms of dimension dim_b. If many, compute all the OT distances \((\mathbf{a}, \mathbf{b}_i)_i\)
M (arraylike (dim_a, dim_b)) – loss matrix
reg (float) – Entropy regularization term > 0
reg_m (float) – Marginal relaxation term > 0
method (str) – method used for the solver either ‘sinkhorn’, ‘sinkhorn_stabilized’ or ‘sinkhorn_reg_scaling’, see those function for specific parameters
numItermax (int, optional) – Max number of iterations
stopThr (float, optional) – Stop threshold on error (>0)
verbose (bool, optional) – Print information along iterations
log (bool, optional) – record log if True
 Returns
ot_distance ((n_hists,) arraylike) – the OT distance between \(\mathbf{a}\) and each of the histograms \(\mathbf{b}_i\)
log (dict) – log dictionary returned only if log is True
Examples
>>> import ot >>> a=[.5, .10] >>> b=[.5, .5] >>> M=[[0., 1.],[1., 0.]] >>> ot.unbalanced.sinkhorn_unbalanced2(a, b, M, 1., 1.) array([0.31912866])
References
 2
M. Cuturi, Sinkhorn Distances : Lightspeed Computation of Optimal Transport, Advances in Neural Information Processing Systems (NIPS) 26, 2013
 9
Schmitzer, B. (2016). Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems. arXiv preprint arXiv:1610.06519.
 10
Chizat, L., Peyré, G., Schmitzer, B., & Vialard, F. X. (2016). Scaling algorithms for unbalanced transport problems. arXiv preprint arXiv:1607.05816.
 25
Frogner C., Zhang C., Mobahi H., ArayaPolo M., Poggio T. : Learning with a Wasserstein Loss, Advances in Neural Information Processing Systems (NIPS) 2015
 ot.sliced_wasserstein_distance(X_s, X_t, a=None, b=None, n_projections=50, p=2, projections=None, seed=None, log=False)[source]
Computes a MonteCarlo approximation of the pSliced Wasserstein distance
\[\mathcal{SWD}_p(\mu, \nu) = \underset{\theta \sim \mathcal{U}(\mathbb{S}^{d1})}{\mathbb{E}}\left(\mathcal{W}_p^p(\theta_\# \mu, \theta_\# \nu)\right)^{\frac{1}{p}}\]where :
\(\theta_\# \mu\) stands for the pushforwards of the projection \(X \in \mathbb{R}^d \mapsto \langle \theta, X \rangle\)
 Parameters
X_s (ndarray, shape (n_samples_a, dim)) – samples in the source domain
X_t (ndarray, shape (n_samples_b, dim)) – samples in the target domain
a (ndarray, shape (n_samples_a,), optional) – samples weights in the source domain
b (ndarray, shape (n_samples_b,), optional) – samples weights in the target domain
n_projections (int, optional) – Number of projections used for the MonteCarlo approximation
p (float, optional =) – Power p used for computing the sliced Wasserstein
projections (shape (dim, n_projections), optional) – Projection matrix (n_projections and seed are not used in this case)
seed (int or RandomState or None, optional) – Seed used for random number generator
log (bool, optional) – if True, sliced_wasserstein_distance returns the projections used and their associated EMD.
 Returns
cost (float) – Sliced Wasserstein Cost
log (dict, optional) – log dictionary return only if log==True in parameters
Examples
>>> n_samples_a = 20 >>> reg = 0.1 >>> X = np.random.normal(0., 1., (n_samples_a, 5)) >>> sliced_wasserstein_distance(X, X, seed=0) 0.0
References
 31
Bonneel, Nicolas, et al. “Sliced and radon wasserstein barycenters of measures.” Journal of Mathematical Imaging and Vision 51.1 (2015): 2245
 ot.unif(n, type_as=None)[source]
Return a uniform histogram of length n (simplex).
 Parameters
n (int) – number of bins in the histogram
type_as (array_like) – array of the same type of the expected output (numpy/pytorch/jax)
 Returns
h – histogram of length n such that \(\forall i, \mathbf{h}_i = \frac{1}{n}\)
 Return type
array_like (n,)
 ot.wasserstein_1d(u_values, v_values, u_weights=None, v_weights=None, p=1, require_sort=True)[source]
Computes the 1 dimensional OT loss [15] between two (batched) empirical distributions
It is formally the pWasserstein distance raised to the power p. We do so in a vectorized way by first building the individual quantile functions then integrating them.
This function should be preferred to emd_1d whenever the backend is different to numpy, and when gradients over either sample positions or weights are required.
 Parameters
u_values (arraylike, shape (n, ...)) – locations of the first empirical distribution
v_values (arraylike, shape (m, ...)) – locations of the second empirical distribution
u_weights (arraylike, shape (n, ...), optional) – weights of the first empirical distribution, if None then uniform weights are used
v_weights (arraylike, shape (m, ...), optional) – weights of the second empirical distribution, if None then uniform weights are used
p (int, optional) – order of the ground metric used, should be at least 1 (see [2, Chap. 2], default is 1
require_sort (bool, optional) – sort the distributions atoms locations, if False we will consider they have been sorted prior to being passed to the function, default is True
 Returns
cost – the batched EMD
 Return type
float/arraylike, shape (…)
References
 15
Peyré, G., & Cuturi, M. (2018). Computational Optimal Transport.
 ot.weak_optimal_transport(Xa, Xb, a=None, b=None, verbose=False, log=False, G0=None, **kwargs)[source]
Solves the weak optimal transport problem between two empirical distributions
\[ \begin{align}\begin{aligned}\gamma = \mathop{\arg \min}_\gamma \quad \X_adiag(1/a)\gammaX_b\_F^2\\s.t. \ \gamma \mathbf{1} = \mathbf{a}\\ \gamma^T \mathbf{1} = \mathbf{b}\\ \gamma \geq 0\end{aligned}\end{align} \]where :
\(X_a\) \(X_b\) are the sample matrices.
\(\mathbf{a}\) and \(\mathbf{b}\) are the sample weights
Note
This function is backendcompatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.
Uses the conditional gradient algorithm to solve the problem proposed in [39].
 Parameters
Xa ((ns,d) arraylike, float) – Source samples
Xb ((nt,d) arraylike, float) – Target samples
a ((ns,) arraylike, float) – Source histogram (uniform weight if empty list)
b ((nt,) arraylike, float) – Target histogram (uniform weight if empty list))
numItermax (int, optional) – Max number of iterations
numItermaxEmd (int, optional) – Max number of iterations for emd
stopThr (float, optional) – Stop threshold on the relative variation (>0)
stopThr2 (float, optional) – Stop threshold on the absolute variation (>0)
verbose (bool, optional) – Print information along iterations
log (bool, optional) – record log if True
 Returns
gamma (arraylike, shape (ns, nt)) – Optimal transportation matrix for the given parameters
log (dict, optional) – If input log is true, a dictionary containing the cost and dual variables and exit status
References
 39
Gozlan, N., Roberto, C., Samson, P. M., & Tetali, P. (2017). Kantorovich duality for general transport costs and applications. Journal of Functional Analysis, 273(11), 33273405.
See also
ot.bregman.sinkhorn
Entropic regularized OT
ot.optim.cg
General regularized OT