# API and modules

 lp Solvers for the original linear program OT problem backend Multi-lib backend for POT bregman Bregman projections solvers for entropic regularized OT smooth Smooth and Sparse Optimal Transport solvers (KL an L2 reg.) gromov Gromov-Wasserstein and Fused-Gromov-Wasserstein solvers optim Generic solvers for regularized OT da Domain adaptation with optimal transport dr Dimension reduction with OT utils Various useful functions datasets Simple example datasets plot Functions for plotting OT matrices stochastic Stochastic solvers for regularized OT. unbalanced Regularized Unbalanced OT solvers regpath Regularization path OT solvers partial Partial OT solvers sliced Sliced OT Distances weak Weak optimal ransport solvers factored Factored OT solvers (low rank, cost or OT plan)

Warning

The list of automatically imported sub-modules is as follows: ot.lp, ot.bregman, ot.optim ot.utils, ot.datasets, ot.gromov, ot.smooth ot.stochastic, ot.partial, ot.regpath , ot.unbalanced. The following sub-modules are not imported due to additional dependencies: - ot.dr : depends on pymanopt and autograd. - ot.gpu : depends on cupy and a CUDA GPU. - ot.plot : depends on matplotlib

ot.barycenter(A, M, reg, weights=None, method='sinkhorn', numItermax=10000, stopThr=0.0001, verbose=False, log=False, warn=True, **kwargs)[source]

Compute the entropic regularized wasserstein barycenter of distributions $$\mathbf{A}$$

The function solves the following optimization problem:

$\mathbf{a} = \mathop{\arg \min}_\mathbf{a} \quad \sum_i W_{reg}(\mathbf{a},\mathbf{a}_i)$

where :

• $$W_{reg}(\cdot,\cdot)$$ is the entropic regularized Wasserstein distance (see ot.bregman.sinkhorn()) if method is sinkhorn or sinkhorn_stabilized or sinkhorn_log.

• $$\mathbf{a}_i$$ are training distributions in the columns of matrix $$\mathbf{A}$$

• reg and $$\mathbf{M}$$ are respectively the regularization term and the cost matrix for OT

The algorithm used for solving the problem is the Sinkhorn-Knopp matrix scaling algorithm as proposed in 

Parameters
• A (array-like, shape (dim, n_hists)) – n_hists training distributions $$\mathbf{a}_i$$ of size dim

• M (array-like, shape (dim, dim)) – loss matrix for OT

• reg (float) – Regularization term > 0

• method (str (optional)) – method used for the solver either ‘sinkhorn’ or ‘sinkhorn_stabilized’ or ‘sinkhorn_log’

• weights (array-like, shape (n_hists,)) – Weights of each histogram $$\mathbf{a}_i$$ on the simplex (barycentric coodinates)

• numItermax (int, optional) – Max number of iterations

• stopThr (float, optional) – Stop threshold on error (>0)

• verbose (bool, optional) – Print information along iterations

• log (bool, optional) – record log if True

• warn (bool, optional) – if True, raises a warning if the algorithm doesn’t convergence.

Returns

• a ((dim,) array-like) – Wasserstein barycenter

• log (dict) – log dictionary return only if log==True in parameters

References

3

Benamou, J. D., Carlier, G., Cuturi, M., Nenna, L., & Peyré, G. (2015). Iterative Bregman projections for regularized transportation problems. SIAM Journal on Scientific Computing, 37(2), A1111-A1138.

ot.barycenter_unbalanced(A, M, reg, reg_m, method='sinkhorn', weights=None, numItermax=1000, stopThr=1e-06, verbose=False, log=False, **kwargs)[source]

Compute the entropic unbalanced wasserstein barycenter of $$\mathbf{A}$$.

The function solves the following optimization problem with $$\mathbf{a}$$

$\mathbf{a} = \mathop{\arg \min}_\mathbf{a} \quad \sum_i W_{u_{reg}}(\mathbf{a},\mathbf{a}_i)$

where :

• $$W_{u_{reg}}(\cdot,\cdot)$$ is the unbalanced entropic regularized Wasserstein distance (see ot.unbalanced.sinkhorn_unbalanced())

• $$\mathbf{a}_i$$ are training distributions in the columns of matrix $$\mathbf{A}$$

• reg and $$\mathbf{M}$$ are respectively the regularization term and the cost matrix for OT

• reg_mis the marginal relaxation hyperparameter

The algorithm used for solving the problem is the generalized Sinkhorn-Knopp matrix scaling algorithm as proposed in 

Parameters
• A (array-like (dim, n_hists)) – n_hists training distributions $$\mathbf{a}_i$$ of dimension dim

• M (array-like (dim, dim)) – ground metric matrix for OT.

• reg (float) – Entropy regularization term > 0

• reg_m (float) – Marginal relaxation term > 0

• weights (array-like (n_hists,) optional) – Weight of each distribution (barycentric coodinates) If None, uniform weights are used.

• numItermax (int, optional) – Max number of iterations

• stopThr (float, optional) – Stop threshold on error (> 0)

• verbose (bool, optional) – Print information along iterations

• log (bool, optional) – record log if True

Returns

• a ((dim,) array-like) – Unbalanced Wasserstein barycenter

• log (dict) – log dictionary return only if log==True in parameters

References

3

Benamou, J. D., Carlier, G., Cuturi, M., Nenna, L., & Peyré, G. (2015). Iterative Bregman projections for regularized transportation problems. SIAM Journal on Scientific Computing, 37(2), A1111-A1138.

10

Chizat, L., Peyré, G., Schmitzer, B., & Vialard, F. X. (2016). Scaling algorithms for unbalanced transport problems. arXiv preprin arXiv:1607.05816.

ot.dist(x1, x2=None, metric='sqeuclidean', p=2, w=None)[source]

Compute distance between samples in $$\mathbf{x_1}$$ and $$\mathbf{x_2}$$

Note

This function is backend-compatible and will work on arrays from all compatible backends.

Parameters
• x1 (array-like, shape (n1,d)) – matrix with n1 samples of size d

• x2 (array-like, shape (n2,d), optional) – matrix with n2 samples of size d (if None then $$\mathbf{x_2} = \mathbf{x_1}$$)

• metric (str | callable, optional) – ‘sqeuclidean’ or ‘euclidean’ on all backends. On numpy the function also accepts from the scipy.spatial.distance.cdist function : ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘wminkowski’, ‘yule’.

• p (float, optional) – p-norm for the Minkowski and the Weighted Minkowski metrics. Default value is 2.

• w (array-like, rank 1) – Weights for the weighted metrics.

Returns

M – distance matrix computed with given metric

Return type

array-like, shape (n1, n2)

ot.emd(a, b, M, numItermax=100000, log=False, center_dual=True, numThreads=1)[source]

Solves the Earth Movers distance problem and returns the OT matrix

\begin{align}\begin{aligned}\gamma = \mathop{\arg \min}_\gamma \quad \langle \gamma, \mathbf{M} \rangle_F\\s.t. \ \gamma \mathbf{1} = \mathbf{a}\\ \gamma^T \mathbf{1} = \mathbf{b}\\ \gamma \geq 0\end{aligned}\end{align}

where :

• $$\mathbf{M}$$ is the metric cost matrix

• $$\mathbf{a}$$ and $$\mathbf{b}$$ are the sample weights

Warning

Note that the $$\mathbf{M}$$ matrix in numpy needs to be a C-order numpy.array in float64 format. It will be converted if not in this format

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

Note

This function will cast the computed transport plan to the data type of the provided input with the following priority: $$\mathbf{a}$$, then $$\mathbf{b}$$, then $$\mathbf{M}$$ if marginals are not provided. Casting to an integer tensor might result in a loss of precision. If this behaviour is unwanted, please make sure to provide a floating point input.

Uses the algorithm proposed in .

Parameters
• a ((ns,) array-like, float) – Source histogram (uniform weight if empty list)

• b ((nt,) array-like, float) – Target histogram (uniform weight if empty list)

• M ((ns,nt) array-like, float) – Loss matrix (c-order array in numpy with type float64)

• numItermax (int, optional (default=100000)) – The maximum number of iterations before stopping the optimization algorithm if it has not converged.

• log (bool, optional (default=False)) – If True, returns a dictionary containing the cost and dual variables. Otherwise returns only the optimal transportation matrix.

• center_dual (boolean, optional (default=True)) – If True, centers the dual potential using function center_ot_dual.

• numThreads (int or "max", optional (default=1, i.e. OpenMP is not used)) – If compiled with OpenMP, chooses the number of threads to parallelize. “max” selects the highest number possible.

Returns

• gamma (array-like, shape (ns, nt)) – Optimal transportation matrix for the given parameters

• log (dict, optional) – If input log is true, a dictionary containing the cost and dual variables and exit status

Examples

Simple example with obvious solution. The function emd accepts lists and perform automatic conversion to numpy arrays

>>> import ot
>>> a=[.5,.5]
>>> b=[.5,.5]
>>> M=[[0.,1.],[1.,0.]]
>>> ot.emd(a, b, M)
array([[0.5, 0. ],
[0. , 0.5]])


References

1

Bonneel, N., Van De Panne, M., Paris, S., & Heidrich, W. (2011, December). Displacement interpolation using Lagrangian mass transport. In ACM Transactions on Graphics (TOG) (Vol. 30, No. 6, p. 158). ACM.

ot.bregman.sinkhorn

Entropic regularized OT

ot.optim.cg

General regularized OT

ot.emd2(a, b, M, processes=1, numItermax=100000, log=False, return_matrix=False, center_dual=True, numThreads=1)[source]

Solves the Earth Movers distance problem and returns the loss

\begin{align}\begin{aligned}\min_\gamma \quad \langle \gamma, \mathbf{M} \rangle_F\\s.t. \ \gamma \mathbf{1} = \mathbf{a}\\ \gamma^T \mathbf{1} = \mathbf{b}\\ \gamma \geq 0\end{aligned}\end{align}

where :

• $$\mathbf{M}$$ is the metric cost matrix

• $$\mathbf{a}$$ and $$\mathbf{b}$$ are the sample weights

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

Note

This function will cast the computed transport plan and transportation loss to the data type of the provided input with the following priority: $$\mathbf{a}$$, then $$\mathbf{b}$$, then $$\mathbf{M}$$ if marginals are not provided. Casting to an integer tensor might result in a loss of precision. If this behaviour is unwanted, please make sure to provide a floating point input.

Uses the algorithm proposed in .

Parameters
• a ((ns,) array-like, float64) – Source histogram (uniform weight if empty list)

• b ((nt,) array-like, float64) – Target histogram (uniform weight if empty list)

• M ((ns,nt) array-like, float64) – Loss matrix (for numpy c-order array with type float64)

• processes (int, optional (default=1)) – Nb of processes used for multiple emd computation (deprecated)

• numItermax (int, optional (default=100000)) – The maximum number of iterations before stopping the optimization algorithm if it has not converged.

• log (boolean, optional (default=False)) – If True, returns a dictionary containing dual variables. Otherwise returns only the optimal transportation cost.

• return_matrix (boolean, optional (default=False)) – If True, returns the optimal transportation matrix in the log.

• center_dual (boolean, optional (default=True)) – If True, centers the dual potential using function center_ot_dual.

• numThreads (int or "max", optional (default=1, i.e. OpenMP is not used)) – If compiled with OpenMP, chooses the number of threads to parallelize. “max” selects the highest number possible.

Returns

• W (float, array-like) – Optimal transportation loss for the given parameters

• log (dict) – If input log is true, a dictionary containing dual variables and exit status

Examples

Simple example with obvious solution. The function emd accepts lists and perform automatic conversion to numpy arrays

>>> import ot
>>> a=[.5,.5]
>>> b=[.5,.5]
>>> M=[[0.,1.],[1.,0.]]
>>> ot.emd2(a,b,M)
0.0


References

1

Bonneel, N., Van De Panne, M., Paris, S., & Heidrich, W. (2011, December). Displacement interpolation using Lagrangian mass transport. In ACM Transactions on Graphics (TOG) (Vol. 30, No. 6, p. 158). ACM.

ot.bregman.sinkhorn

Entropic regularized OT

ot.optim.cg

General regularized OT

ot.emd2_1d(x_a, x_b, a=None, b=None, metric='sqeuclidean', p=1.0, dense=True, log=False)[source]

Solves the Earth Movers distance problem between 1d measures and returns the loss

\begin{align}\begin{aligned}\gamma = arg\min_\gamma \sum_i \sum_j \gamma_{ij} d(x_a[i], x_b[j])\\s.t. \gamma 1 = a, \gamma^T 1= b, \gamma\geq 0\end{aligned}\end{align}

where :

• d is the metric

• x_a and x_b are the samples

• a and b are the sample weights

When ‘minkowski’ is used as a metric, $$d(x, y) = |x - y|^p$$.

Uses the algorithm detailed in _

Parameters
• x_a ((ns,) or (ns, 1) ndarray, float64) – Source dirac locations (on the real line)

• x_b ((nt,) or (ns, 1) ndarray, float64) – Target dirac locations (on the real line)

• a ((ns,) ndarray, float64, optional) – Source histogram (default is uniform weight)

• b ((nt,) ndarray, float64, optional) – Target histogram (default is uniform weight)

• metric (str, optional (default='sqeuclidean')) – Metric to be used. Only strings listed in ot.dist() are accepted. Due to implementation details, this function runs faster when ‘sqeuclidean’, ‘minkowski’, ‘cityblock’, or ‘euclidean’ metrics are used.

• p (float, optional (default=1.0)) – The p-norm to apply for if metric=’minkowski’

• dense (boolean, optional (default=True)) – If True, returns math:gamma as a dense ndarray of shape (ns, nt). Otherwise returns a sparse representation using scipy’s coo_matrix format. Only used if log is set to True. Due to implementation details, this function runs faster when dense is set to False.

• log (boolean, optional (default=False)) – If True, returns a dictionary containing the transportation matrix. Otherwise returns only the loss.

Returns

• loss (float) – Cost associated to the optimal transportation

• log (dict) – If input log is True, a dictionary containing the Optimal transportation matrix for the given parameters

Examples

Simple example with obvious solution. The function emd2_1d accepts lists and performs automatic conversion to numpy arrays

>>> import ot
>>> a=[.5, .5]
>>> b=[.5, .5]
>>> x_a = [2., 0.]
>>> x_b = [0., 3.]
>>> ot.emd2_1d(x_a, x_b, a, b)
0.5
>>> ot.emd2_1d(x_a, x_b)
0.5


References

1

Peyré, G., & Cuturi, M. (2017). “Computational Optimal Transport”, 2018.

ot.lp.emd2

EMD for multidimensional distributions

ot.lp.emd_1d

EMD for 1d distributions (returns the transportation matrix instead of the cost)

ot.emd_1d(x_a, x_b, a=None, b=None, metric='sqeuclidean', p=1.0, dense=True, log=False)[source]

Solves the Earth Movers distance problem between 1d measures and returns the OT matrix

\begin{align}\begin{aligned}\gamma = arg\min_\gamma \sum_i \sum_j \gamma_{ij} d(x_a[i], x_b[j])\\s.t. \gamma 1 = a, \gamma^T 1= b, \gamma\geq 0\end{aligned}\end{align}

where :

• d is the metric

• x_a and x_b are the samples

• a and b are the sample weights

When ‘minkowski’ is used as a metric, $$d(x, y) = |x - y|^p$$.

Uses the algorithm detailed in _

Parameters
• x_a ((ns,) or (ns, 1) ndarray, float64) – Source dirac locations (on the real line)

• x_b ((nt,) or (ns, 1) ndarray, float64) – Target dirac locations (on the real line)

• a ((ns,) ndarray, float64, optional) – Source histogram (default is uniform weight)

• b ((nt,) ndarray, float64, optional) – Target histogram (default is uniform weight)

• metric (str, optional (default='sqeuclidean')) – Metric to be used. Only strings listed in ot.dist() are accepted. Due to implementation details, this function runs faster when ‘sqeuclidean’, ‘cityblock’, or ‘euclidean’ metrics are used.

• p (float, optional (default=1.0)) – The p-norm to apply for if metric=’minkowski’

• dense (boolean, optional (default=True)) – If True, returns math:gamma as a dense ndarray of shape (ns, nt). Otherwise returns a sparse representation using scipy’s coo_matrix format. Due to implementation details, this function runs faster when ‘sqeuclidean’, ‘minkowski’, ‘cityblock’, or ‘euclidean’ metrics are used.

• log (boolean, optional (default=False)) – If True, returns a dictionary containing the cost. Otherwise returns only the optimal transportation matrix.

Returns

• gamma ((ns, nt) ndarray) – Optimal transportation matrix for the given parameters

• log (dict) – If input log is True, a dictionary containing the cost

Examples

Simple example with obvious solution. The function emd_1d accepts lists and performs automatic conversion to numpy arrays

>>> import ot
>>> a=[.5, .5]
>>> b=[.5, .5]
>>> x_a = [2., 0.]
>>> x_b = [0., 3.]
>>> ot.emd_1d(x_a, x_b, a, b)
array([[0. , 0.5],
[0.5, 0. ]])
>>> ot.emd_1d(x_a, x_b)
array([[0. , 0.5],
[0.5, 0. ]])


References

1

Peyré, G., & Cuturi, M. (2017). “Computational Optimal Transport”, 2018.

ot.lp.emd

EMD for multidimensional distributions

ot.lp.emd2_1d

EMD for 1d distributions (returns cost instead of the transportation matrix)

ot.factored_optimal_transport(Xa, Xb, a=None, b=None, reg=0.0, r=100, X0=None, stopThr=1e-07, numItermax=100, verbose=False, log=False, **kwargs)[source]

Solves factored OT problem and return OT plans and intermediate distribution

This function solve the following OT problem 40

$\mathop{\arg \min}_\mu \quad W_2^2(\mu_a,\mu)+ W_2^2(\mu,\mu_b)$

where :

• $$\mu_a$$ and $$\mu_b$$ are empirical distributions.

• $$\mu$$ is an empirical distribution with r samples

And returns the two OT plans between

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

Uses the conditional gradient algorithm to solve the problem proposed in .

Parameters
• Xa ((ns,d) array-like, float) – Source samples

• Xb ((nt,d) array-like, float) – Target samples

• a ((ns,) array-like, float) – Source histogram (uniform weight if empty list)

• b ((nt,) array-like, float) – Target histogram (uniform weight if empty list))

• numItermax (int, optional) – Max number of iterations

• stopThr (float, optional) – Stop threshold on the relative variation (>0)

• verbose (bool, optional) – Print information along iterations

• log (bool, optional) – record log if True

Returns

• Ga (array-like, shape (ns, r)) – Optimal transportation matrix between source and the intermediate distribution

• Gb (array-like, shape (r, nt)) – Optimal transportation matrix between the intermediate and target distribution

• X (array-like, shape (r, d)) – Support of the intermediate distribution

• log (dict, optional) – If input log is true, a dictionary containing the cost and dual variables and exit status

References

40

Forrow, A., Hütter, J. C., Nitzan, M., Rigollet, P., Schiebinger, G., & Weed, J. (2019, April). Statistical optimal transport via factored couplings. In The 22nd International Conference on Artificial Intelligence and Statistics (pp. 2454-2465). PMLR.

ot.bregman.sinkhorn

Entropic regularized OT ot.optim.cg : General

regularized

ot.fused_gromov_wasserstein(M, C1, C2, p, q, loss_fun='square_loss', alpha=0.5, armijo=False, G0=None, log=False, **kwargs)[source]

Computes the FGW transport between two graphs (see )

\begin{align}\begin{aligned}\gamma = \mathop{\arg \min}_\gamma \quad (1 - \alpha) \langle \gamma, \mathbf{M} \rangle_F + \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{\gamma} \mathbf{1} &= \mathbf{p}\\ \mathbf{\gamma}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{\gamma} &\geq 0\end{aligned}\end{align}

where :

• $$\mathbf{M}$$ is the (ns, nt) metric cost matrix

• $$\mathbf{p}$$ and $$\mathbf{q}$$ are source and target weights (sum to 1)

• L is a loss function to account for the misfit between the similarity matrices

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

The algorithm used for solving the problem is conditional gradient as discussed in 

Parameters
• M (array-like, shape (ns, nt)) – Metric cost matrix between features across domains

• C1 (array-like, shape (ns, ns)) – Metric cost matrix representative of the structure in the source space

• C2 (array-like, shape (nt, nt)) – Metric cost matrix representative of the structure in the target space

• p (array-like, shape (ns,)) – Distribution in the source space

• q (array-like, shape (nt,)) – Distribution in the target space

• loss_fun (str, optional) – Loss function used for the solver

• alpha (float, optional) – Trade-off parameter (0 < alpha < 1)

• armijo (bool, optional) – If True the step of the line-search is found via an armijo research. Else closed form is used. If there are convergence issues use False.

• G0 (array-like, shape (ns,nt), optional) – If None the initial transport plan of the solver is pq^T. Otherwise G0 must satisfy marginal constraints and will be used as initial transport of the solver.

• log (bool, optional) – record log if True

• **kwargs (dict) – parameters can be directly passed to the ot.optim.cg solver

Returns

• gamma (array-like, shape (ns, nt)) – Optimal transportation matrix for the given parameters.

• log (dict) – Log dictionary return only if log==True in parameters.

References

24

Vayer Titouan, Chapel Laetitia, Flamary Rémi, Tavenard Romain and Courty Nicolas “Optimal Transport for structured data with application on graphs”, International Conference on Machine Learning (ICML). 2019.

ot.fused_gromov_wasserstein2(M, C1, C2, p, q, loss_fun='square_loss', alpha=0.5, armijo=False, G0=None, log=False, **kwargs)[source]

Computes the FGW distance between two graphs see (see )

\begin{align}\begin{aligned}\min_\gamma \quad (1 - \alpha) \langle \gamma, \mathbf{M} \rangle_F + \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{\gamma} \mathbf{1} &= \mathbf{p}\\ \mathbf{\gamma}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{\gamma} &\geq 0\end{aligned}\end{align}

where :

• $$\mathbf{M}$$ is the (ns, nt) metric cost matrix

• $$\mathbf{p}$$ and $$\mathbf{q}$$ are source and target weights (sum to 1)

• L is a loss function to account for the misfit between the similarity matrices

The algorithm used for solving the problem is conditional gradient as discussed in 

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

Note that when using backends, this loss function is differentiable wrt the marices and weights for quadratic loss using the gradients from _.

Parameters
• M (array-like, shape (ns, nt)) – Metric cost matrix between features across domains

• C1 (array-like, shape (ns, ns)) – Metric cost matrix representative of the structure in the source space.

• C2 (array-like, shape (nt, nt)) – Metric cost matrix representative of the structure in the target space.

• p (array-like, shape (ns,)) – Distribution in the source space.

• q (array-like, shape (nt,)) – Distribution in the target space.

• loss_fun (str, optional) – Loss function used for the solver.

• alpha (float, optional) – Trade-off parameter (0 < alpha < 1)

• armijo (bool, optional) – If True the step of the line-search is found via an armijo research. Else closed form is used. If there are convergence issues use False.

• G0 (array-like, shape (ns,nt), optional) – If None the initial transport plan of the solver is pq^T. Otherwise G0 must satisfy marginal constraints and will be used as initial transport of the solver.

• log (bool, optional) – Record log if True.

• **kwargs (dict) – Parameters can be directly passed to the ot.optim.cg solver.

Returns

• fgw-distance (float) – Fused gromov wasserstein distance for the given parameters.

• log (dict) – Log dictionary return only if log==True in parameters.

References

24

Vayer Titouan, Chapel Laetitia, Flamary Rémi, Tavenard Romain and Courty Nicolas “Optimal Transport for structured data with application on graphs” International Conference on Machine Learning (ICML). 2019.

38

C. Vincent-Cuaz, T. Vayer, R. Flamary, M. Corneli, N. Courty, Online Graph Dictionary Learning, International Conference on Machine Learning (ICML), 2021.

ot.gromov_barycenters(N, Cs, ps, p, lambdas, loss_fun, max_iter=1000, tol=1e-09, verbose=False, log=False, init_C=None, random_state=None)[source]

Returns the gromov-wasserstein barycenters of S measured similarity matrices $$(\mathbf{C}_s)_{1 \leq s \leq S}$$

The function solves the following optimization problem with block coordinate descent:

$\mathbf{C} = \mathop{\arg \min}_{\mathbf{C}\in \mathbb{R}^{N \times N}} \quad \sum_s \lambda_s \mathrm{GW}(\mathbf{C}, \mathbf{C}_s, \mathbf{p}, \mathbf{p}_s)$

Where :

• $$\mathbf{C}_s$$: metric cost matrix

• $$\mathbf{p}_s$$: distribution

Parameters
• N (int) – Size of the targeted barycenter

• Cs (list of S array-like of shape (ns, ns)) – Metric cost matrices

• ps (list of S array-like of shape (ns,)) – Sample weights in the S spaces

• p (array-like, shape (N,)) – Weights in the targeted barycenter

• lambdas (list of float) – List of the S spaces’ weights

• loss_fun (callable) – tensor-matrix multiplication function based on specific loss function

• update (callable) – function($$\mathbf{p}$$, lambdas, $$\mathbf{T}$$, $$\mathbf{Cs}$$) that updates $$\mathbf{C}$$ according to a specific Kernel with the S $$\mathbf{T}_s$$ couplings calculated at each iteration

• max_iter (int, optional) – Max number of iterations

• tol (float, optional) – Stop threshold on error (>0).

• verbose (bool, optional) – Print information along iterations.

• log (bool, optional) – Record log if True.

• init_C (bool | array-like, shape(N,N)) – Random initial value for the $$\mathbf{C}$$ matrix provided by user.

• random_state (int or RandomState instance, optional) – Fix the seed for reproducibility

Returns

• C (array-like, shape (N, N)) – Similarity matrix in the barycenter space (permutated arbitrarily)

• log (dict) – Log dictionary of error during iterations. Return only if log=True in parameters.

References

12

Gabriel Peyré, Marco Cuturi, and Justin Solomon, “Gromov-Wasserstein averaging of kernel and distance matrices.” International Conference on Machine Learning (ICML). 2016.

ot.gromov_wasserstein(C1, C2, p, q, loss_fun='square_loss', log=False, armijo=False, G0=None, **kwargs)[source]

Returns the gromov-wasserstein transport between $$(\mathbf{C_1}, \mathbf{p})$$ and $$(\mathbf{C_2}, \mathbf{q})$$

The function solves the following optimization problem:

$\mathbf{GW} = \mathop{\arg \min}_\mathbf{T} \quad \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}$

Where :

• $$\mathbf{C_1}$$: Metric cost matrix in the source space

• $$\mathbf{C_2}$$: Metric cost matrix in the target space

• $$\mathbf{p}$$: distribution in the source space

• $$\mathbf{q}$$: distribution in the target space

• L: loss function to account for the misfit between the similarity matrices

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

Parameters
• C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space

• C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space

• p (array-like, shape (ns,)) – Distribution in the source space

• q (array-like, shape (nt,)) – Distribution in the target space

• loss_fun (str) – loss function used for the solver either ‘square_loss’ or ‘kl_loss’

• max_iter (int, optional) – Max number of iterations

• tol (float, optional) – Stop threshold on error (>0)

• verbose (bool, optional) – Print information along iterations

• log (bool, optional) – record log if True

• armijo (bool, optional) – If True the step of the line-search is found via an armijo research. Else closed form is used. If there are convergence issues use False.

• G0 (array-like, shape (ns,nt), optional) – If None the initial transport plan of the solver is pq^T. Otherwise G0 must satisfy marginal constraints and will be used as initial transport of the solver.

• **kwargs (dict) – parameters can be directly passed to the ot.optim.cg solver

Returns

• T (array-like, shape (ns, nt)) –

Coupling between the two spaces that minimizes:

$$\sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}$$

• log (dict) – Convergence information and loss.

References

12

Gabriel Peyré, Marco Cuturi, and Justin Solomon, “Gromov-Wasserstein averaging of kernel and distance matrices.” International Conference on Machine Learning (ICML). 2016.

13

Mémoli, Facundo. Gromov–Wasserstein distances and the metric approach to object matching. Foundations of computational mathematics 11.4 (2011): 417-487.

ot.gromov_wasserstein2(C1, C2, p, q, loss_fun='square_loss', log=False, armijo=False, G0=None, **kwargs)[source]

Returns the gromov-wasserstein discrepancy between $$(\mathbf{C_1}, \mathbf{p})$$ and $$(\mathbf{C_2}, \mathbf{q})$$

The function solves the following optimization problem:

$GW = \min_\mathbf{T} \quad \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}$

Where :

• $$\mathbf{C_1}$$: Metric cost matrix in the source space

• $$\mathbf{C_2}$$: Metric cost matrix in the target space

• $$\mathbf{p}$$: distribution in the source space

• $$\mathbf{q}$$: distribution in the target space

• L: loss function to account for the misfit between the similarity matrices

Note that when using backends, this loss function is differentiable wrt the marices and weights for quadratic loss using the gradients from _.

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

Parameters
• C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space

• C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space

• p (array-like, shape (ns,)) – Distribution in the source space.

• q (array-like, shape (nt,)) – Distribution in the target space.

• loss_fun (str) – loss function used for the solver either ‘square_loss’ or ‘kl_loss’

• max_iter (int, optional) – Max number of iterations

• tol (float, optional) – Stop threshold on error (>0)

• verbose (bool, optional) – Print information along iterations

• log (bool, optional) – record log if True

• armijo (bool, optional) – If True the step of the line-search is found via an armijo research. Else closed form is used. If there are convergence issues use False.

• G0 (array-like, shape (ns,nt), optional) – If None the initial transport plan of the solver is pq^T. Otherwise G0 must satisfy marginal constraints and will be used as initial transport of the solver.

Returns

• gw_dist (float) – Gromov-Wasserstein distance

• log (dict) – convergence information and Coupling marix

References

12

Gabriel Peyré, Marco Cuturi, and Justin Solomon, “Gromov-Wasserstein averaging of kernel and distance matrices.” International Conference on Machine Learning (ICML). 2016.

13

Mémoli, Facundo. Gromov–Wasserstein distances and the metric approach to object matching. Foundations of computational mathematics 11.4 (2011): 417-487.

38

C. Vincent-Cuaz, T. Vayer, R. Flamary, M. Corneli, N. Courty, Online Graph Dictionary Learning, International Conference on Machine Learning (ICML), 2021.

ot.max_sliced_wasserstein_distance(X_s, X_t, a=None, b=None, n_projections=50, p=2, projections=None, seed=None, log=False)[source]

Computes a Monte-Carlo approximation of the max p-Sliced Wasserstein distance

$\mathcal{Max-SWD}_p(\mu, \nu) = \underset{\theta _in \mathcal{U}(\mathbb{S}^{d-1})}{\max} [\mathcal{W}_p^p(\theta_\# \mu, \theta_\# \nu)]^{\frac{1}{p}}$

where :

• $$\theta_\# \mu$$ stands for the pushforwars of the projection $$\mathbb{R}^d \ni X \mapsto \langle \theta, X \rangle$$

Parameters
• X_s (ndarray, shape (n_samples_a, dim)) – samples in the source domain

• X_t (ndarray, shape (n_samples_b, dim)) – samples in the target domain

• a (ndarray, shape (n_samples_a,), optional) – samples weights in the source domain

• b (ndarray, shape (n_samples_b,), optional) – samples weights in the target domain

• n_projections (int, optional) – Number of projections used for the Monte-Carlo approximation

• p (float, optional =) – Power p used for computing the sliced Wasserstein

• projections (shape (dim, n_projections), optional) – Projection matrix (n_projections and seed are not used in this case)

• seed (int or RandomState or None, optional) – Seed used for random number generator

• log (bool, optional) – if True, sliced_wasserstein_distance returns the projections used and their associated EMD.

Returns

• cost (float) – Sliced Wasserstein Cost

• log (dict, optional) – log dictionary return only if log==True in parameters

Examples

>>> n_samples_a = 20
>>> reg = 0.1
>>> X = np.random.normal(0., 1., (n_samples_a, 5))
>>> sliced_wasserstein_distance(X, X, seed=0)
0.0


References

35

Deshpande, I., Hu, Y. T., Sun, R., Pyrros, A., Siddiqui, N., Koyejo, S., … & Schwing, A. G. (2019). Max-sliced wasserstein distance and its use for gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10648-10656).

ot.sinkhorn(a, b, M, reg, method='sinkhorn', numItermax=1000, stopThr=1e-09, verbose=False, log=False, warn=True, **kwargs)[source]

Solve the entropic regularization optimal transport problem and return the OT matrix

The function solves the following optimization problem:

\begin{align}\begin{aligned}\gamma = \mathop{\arg \min}_\gamma \quad \langle \gamma, \mathbf{M} \rangle_F + \mathrm{reg}\cdot\Omega(\gamma)\\s.t. \ \gamma \mathbf{1} &= \mathbf{a}\\ \gamma^T \mathbf{1} &= \mathbf{b}\\ \gamma &\geq 0\end{aligned}\end{align}

where :

• $$\mathbf{M}$$ is the (dim_a, dim_b) metric cost matrix

• $$\Omega$$ is the entropic regularization term $$\Omega(\gamma)=\sum_{i,j} \gamma_{i,j}\log(\gamma_{i,j})$$

• $$\mathbf{a}$$ and $$\mathbf{b}$$ are source and target weights (histograms, both sum to 1)

Note

This function is backend-compatible and will work on arrays from all compatible backends.

The algorithm used for solving the problem is the Sinkhorn-Knopp matrix scaling algorithm as proposed in 

Choosing a Sinkhorn solver

By default and when using a regularization parameter that is not too small the default sinkhorn solver should be enough. If you need to use a small regularization to get sharper OT matrices, you should use the ot.bregman.sinkhorn_stabilized() solver that will avoid numerical errors. This last solver can be very slow in practice and might not even converge to a reasonable OT matrix in a finite time. This is why ot.bregman.sinkhorn_epsilon_scaling() that relies on iterating the value of the regularization (and using warm start) sometimes leads to better solutions. Note that the greedy version of the sinkhorn ot.bregman.greenkhorn() can also lead to a speedup and the screening version of the sinkhorn ot.bregman.screenkhorn() aim at providing a fast approximation of the Sinkhorn problem. For use of GPU and gradient computation with small number of iterations we strongly recommend the ot.bregman.sinkhorn_log() solver that will no need to check for numerical problems.

Parameters
• a (array-like, shape (dim_a,)) – samples weights in the source domain

• b (array-like, shape (dim_b,) or ndarray, shape (dim_b, n_hists)) – samples in the target domain, compute sinkhorn with multiple targets and fixed $$\mathbf{M}$$ if $$\mathbf{b}$$ is a matrix (return OT loss + dual variables in log)

• M (array-like, shape (dim_a, dim_b)) – loss matrix

• reg (float) – Regularization term >0

• method (str) – method used for the solver either ‘sinkhorn’,’sinkhorn_log’, ‘greenkhorn’, ‘sinkhorn_stabilized’ or ‘sinkhorn_epsilon_scaling’, see those function for specific parameters

• numItermax (int, optional) – Max number of iterations

• stopThr (float, optional) – Stop threshold on error (>0)

• verbose (bool, optional) – Print information along iterations

• log (bool, optional) – record log if True

• warn (bool, optional) – if True, raises a warning if the algorithm doesn’t convergence.

Returns

• gamma (array-like, shape (dim_a, dim_b)) – Optimal transportation matrix for the given parameters

• log (dict) – log dictionary return only if log==True in parameters

Examples

>>> import ot
>>> a=[.5, .5]
>>> b=[.5, .5]
>>> M=[[0., 1.], [1., 0.]]
>>> ot.sinkhorn(a, b, M, 1)
array([[0.36552929, 0.13447071],
[0.13447071, 0.36552929]])


References

2

M. Cuturi, Sinkhorn Distances : Lightspeed Computation of Optimal Transport, Advances in Neural Information Processing Systems (NIPS) 26, 2013

9

Schmitzer, B. (2016). Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems. arXiv preprint arXiv:1610.06519.

10

Chizat, L., Peyré, G., Schmitzer, B., & Vialard, F. X. (2016). Scaling algorithms for unbalanced transport problems. arXiv preprint arXiv:1607.05816.

34

Feydy, J., Séjourné, T., Vialard, F. X., Amari, S. I., Trouvé, A., & Peyré, G. (2019, April). Interpolating between optimal transport and MMD using Sinkhorn divergences. In The 22nd International Conference on Artificial Intelligence and Statistics (pp. 2681-2690). PMLR.

ot.sinkhorn2(a, b, M, reg, method='sinkhorn', numItermax=1000, stopThr=1e-09, verbose=False, log=False, warn=False, **kwargs)[source]

Solve the entropic regularization optimal transport problem and return the loss

The function solves the following optimization problem:

\begin{align}\begin{aligned}W = \min_\gamma \quad \langle \gamma, \mathbf{M} \rangle_F + \mathrm{reg}\cdot\Omega(\gamma)\\s.t. \ \gamma \mathbf{1} &= \mathbf{a}\\ \gamma^T \mathbf{1} &= \mathbf{b}\\ \gamma &\geq 0\end{aligned}\end{align}

where :

• $$\mathbf{M}$$ is the (dim_a, dim_b) metric cost matrix

• $$\Omega$$ is the entropic regularization term $$\Omega(\gamma)=\sum_{i,j} \gamma_{i,j}\log(\gamma_{i,j})$$

• $$\mathbf{a}$$ and $$\mathbf{b}$$ are source and target weights (histograms, both sum to 1)

Note

This function is backend-compatible and will work on arrays from all compatible backends.

The algorithm used for solving the problem is the Sinkhorn-Knopp matrix scaling algorithm as proposed in 

Choosing a Sinkhorn solver

By default and when using a regularization parameter that is not too small the default sinkhorn solver should be enough. If you need to use a small regularization to get sharper OT matrices, you should use the ot.bregman.sinkhorn_log() solver that will avoid numerical errors. This last solver can be very slow in practice and might not even converge to a reasonable OT matrix in a finite time. This is why ot.bregman.sinkhorn_epsilon_scaling() that relies on iterating the value of the regularization (and using warm start) sometimes leads to better solutions. Note that the greedy version of the sinkhorn ot.bregman.greenkhorn() can also lead to a speedup and the screening version of the sinkhorn ot.bregman.screenkhorn() aim a providing a fast approximation of the Sinkhorn problem. For use of GPU and gradient computation with small number of iterations we strongly recommend the ot.bregman.sinkhorn_log() solver that will no need to check for numerical problems.

Parameters
• a (array-like, shape (dim_a,)) – samples weights in the source domain

• b (array-like, shape (dim_b,) or ndarray, shape (dim_b, n_hists)) – samples in the target domain, compute sinkhorn with multiple targets and fixed $$\mathbf{M}$$ if $$\mathbf{b}$$ is a matrix (return OT loss + dual variables in log)

• M (array-like, shape (dim_a, dim_b)) – loss matrix

• reg (float) – Regularization term >0

• method (str) – method used for the solver either ‘sinkhorn’,’sinkhorn_log’, ‘sinkhorn_stabilized’, see those function for specific parameters

• numItermax (int, optional) – Max number of iterations

• stopThr (float, optional) – Stop threshold on error (>0)

• verbose (bool, optional) – Print information along iterations

• log (bool, optional) – record log if True

• warn (bool, optional) – if True, raises a warning if the algorithm doesn’t convergence.

Returns

• W ((n_hists) float/array-like) – Optimal transportation loss for the given parameters

• log (dict) – log dictionary return only if log==True in parameters

Examples

>>> import ot
>>> a=[.5, .5]
>>> b=[.5, .5]
>>> M=[[0., 1.], [1., 0.]]
>>> ot.sinkhorn2(a, b, M, 1)
0.26894142136999516


References

2

M. Cuturi, Sinkhorn Distances : Lightspeed Computation of Optimal Transport, Advances in Neural Information Processing Systems (NIPS) 26, 2013

9

Schmitzer, B. (2016). Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems. arXiv preprint arXiv:1610.06519.

10

Chizat, L., Peyré, G., Schmitzer, B., & Vialard, F. X. (2016). Scaling algorithms for unbalanced transport problems. arXiv preprint arXiv:1607.05816.

21

Altschuler J., Weed J., Rigollet P. : Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration, Advances in Neural Information Processing Systems (NIPS) 31, 2017

34

Feydy, J., Séjourné, T., Vialard, F. X., Amari, S. I., Trouvé, A., & Peyré, G. (2019, April). Interpolating between optimal transport and MMD using Sinkhorn divergences. In The 22nd International Conference on Artificial Intelligence and Statistics (pp. 2681-2690). PMLR.

ot.sinkhorn_lpl1_mm(a, labels_a, b, M, reg, eta=0.1, numItermax=10, numInnerItermax=200, stopInnerThr=1e-09, verbose=False, log=False)[source]

Solve the entropic regularization optimal transport problem with nonconvex group lasso regularization

The function solves the following optimization problem:

\begin{align}\begin{aligned}\gamma = \mathop{\arg \min}_\gamma \quad \langle \gamma, \mathbf{M} \rangle_F + \mathrm{reg} \cdot \Omega_e(\gamma) + \eta \ \Omega_g(\gamma)\\s.t. \ \gamma \mathbf{1} = \mathbf{a}\\ \gamma^T \mathbf{1} = \mathbf{b}\\ \gamma \geq 0\end{aligned}\end{align}

where :

• $$\mathbf{M}$$ is the (ns, nt) metric cost matrix

• $$\Omega_e$$ is the entropic regularization term $$\Omega_e (\gamma)=\sum_{i,j} \gamma_{i,j}\log(\gamma_{i,j})$$

• $$\Omega_g$$ is the group lasso regularization term $$\Omega_g(\gamma)=\sum_{i,c} \|\gamma_{i,\mathcal{I}_c}\|^{1/2}_1$$ where $$\mathcal{I}_c$$ are the index of samples from class c in the source domain.

• $$\mathbf{a}$$ and $$\mathbf{b}$$ are source and target weights (sum to 1)

The algorithm used for solving the problem is the generalized conditional gradient as proposed in [5, 7].

Parameters
• a (array-like (ns,)) – samples weights in the source domain

• labels_a (array-like (ns,)) – labels of samples in the source domain

• b (array-like (nt,)) – samples weights in the target domain

• M (array-like (ns,nt)) – loss matrix

• reg (float) – Regularization term for entropic regularization >0

• eta (float, optional) – Regularization term for group lasso regularization >0

• numItermax (int, optional) – Max number of iterations

• numInnerItermax (int, optional) – Max number of iterations (inner sinkhorn solver)

• stopInnerThr (float, optional) – Stop threshold on error (inner sinkhorn solver) (>0)

• verbose (bool, optional) – Print information along iterations

• log (bool, optional) – record log if True

Returns

• gamma ((ns, nt) array-like) – Optimal transportation matrix for the given parameters

• log (dict) – log dictionary return only if log==True in parameters

References

5

N. Courty; R. Flamary; D. Tuia; A. Rakotomamonjy, “Optimal Transport for Domain Adaptation,” in IEEE Transactions on Pattern Analysis and Machine Intelligence , vol.PP, no.99, pp.1-1

7

Rakotomamonjy, A., Flamary, R., & Courty, N. (2015). Generalized conditional gradient: analysis of convergence and applications. arXiv preprint arXiv:1510.06567.

ot.lp.emd

Unregularized OT

ot.bregman.sinkhorn

Entropic regularized OT

ot.optim.cg

General regularized OT

ot.sinkhorn_unbalanced(a, b, M, reg, reg_m, method='sinkhorn', numItermax=1000, stopThr=1e-06, verbose=False, log=False, **kwargs)[source]

Solve the unbalanced entropic regularization optimal transport problem and return the OT plan

The function solves the following optimization problem:

\begin{align}\begin{aligned}W = \min_\gamma \ \langle \gamma, \mathbf{M} \rangle_F + \mathrm{reg}\cdot\Omega(\gamma) + \mathrm{reg_m} \cdot \mathrm{KL}(\gamma \mathbf{1}, \mathbf{a}) + \mathrm{reg_m} \cdot \mathrm{KL}(\gamma^T \mathbf{1}, \mathbf{b})\\s.t. \gamma \geq 0\end{aligned}\end{align}

where :

• $$\mathbf{M}$$ is the (dim_a, dim_b) metric cost matrix

• $$\Omega$$ is the entropic regularization term, $$\Omega(\gamma)=\sum_{i,j} \gamma_{i,j}\log(\gamma_{i,j})$$

• $$\mathbf{a}$$ and $$\mathbf{b}$$ are source and target unbalanced distributions

• KL is the Kullback-Leibler divergence

The algorithm used for solving the problem is the generalized Sinkhorn-Knopp matrix scaling algorithm as proposed in [10, 25]

Parameters
• a (array-like (dim_a,)) – Unnormalized histogram of dimension dim_a

• b (array-like (dim_b,) or array-like (dim_b, n_hists)) – One or multiple unnormalized histograms of dimension dim_b. If many, compute all the OT distances $$(\mathbf{a}, \mathbf{b}_i)_i$$

• M (array-like (dim_a, dim_b)) – loss matrix

• reg (float) – Entropy regularization term > 0

• reg_m (float) – Marginal relaxation term > 0

• method (str) – method used for the solver either ‘sinkhorn’, ‘sinkhorn_stabilized’ or ‘sinkhorn_reg_scaling’, see those function for specific parameters

• numItermax (int, optional) – Max number of iterations

• stopThr (float, optional) – Stop threshold on error (>0)

• verbose (bool, optional) – Print information along iterations

• log (bool, optional) – record log if True

Returns

• if n_hists == 1

• gamma(dim_a, dim_b) array-like

Optimal transportation matrix for the given parameters

• logdict

log dictionary returned only if log is True

• else

• ot_distance(n_hists,) array-like

the OT distance between $$\mathbf{a}$$ and each of the histograms $$\mathbf{b}_i$$

• logdict

log dictionary returned only if log is True

Examples

>>> import ot
>>> a=[.5, .5]
>>> b=[.5, .5]
>>> M=[[0., 1.], [1., 0.]]
>>> ot.sinkhorn_unbalanced(a, b, M, 1, 1)
array([[0.51122823, 0.18807035],
[0.18807035, 0.51122823]])


References

2

M. Cuturi, Sinkhorn Distances : Lightspeed Computation of Optimal Transport, Advances in Neural Information Processing Systems (NIPS) 26, 2013

9

Schmitzer, B. (2016). Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems. arXiv preprint arXiv:1610.06519.

10

Chizat, L., Peyré, G., Schmitzer, B., & Vialard, F. X. (2016). Scaling algorithms for unbalanced transport problems. arXiv preprint arXiv:1607.05816.

25

Frogner C., Zhang C., Mobahi H., Araya-Polo M., Poggio T. : Learning with a Wasserstein Loss, Advances in Neural Information Processing Systems (NIPS) 2015

ot.unbalanced.sinkhorn_knopp_unbalanced

Unbalanced Classic Sinkhorn 

ot.unbalanced.sinkhorn_stabilized_unbalanced

Unbalanced Stabilized sinkhorn [9, 10]

ot.unbalanced.sinkhorn_reg_scaling_unbalanced

Unbalanced Sinkhorn with epslilon scaling [9, 10]

ot.sinkhorn_unbalanced2(a, b, M, reg, reg_m, method='sinkhorn', numItermax=1000, stopThr=1e-06, verbose=False, log=False, **kwargs)[source]

Solve the entropic regularization unbalanced optimal transport problem and return the loss

The function solves the following optimization problem:

\begin{align}\begin{aligned}W = \min_\gamma \quad \langle \gamma, \mathbf{M} \rangle_F + \mathrm{reg}\cdot\Omega(\gamma) + \mathrm{reg_m} \cdot \mathrm{KL}(\gamma \mathbf{1}, \mathbf{a}) + \mathrm{reg_m} \cdot \mathrm{KL}(\gamma^T \mathbf{1}, \mathbf{b})\\s.t. \gamma\geq 0\end{aligned}\end{align}

where :

• $$\mathbf{M}$$ is the (dim_a, dim_b) metric cost matrix

• $$\Omega$$ is the entropic regularization term, $$\Omega(\gamma)=\sum_{i,j} \gamma_{i,j}\log(\gamma_{i,j})$$

• $$\mathbf{a}$$ and $$\mathbf{b}$$ are source and target unbalanced distributions

• KL is the Kullback-Leibler divergence

The algorithm used for solving the problem is the generalized Sinkhorn-Knopp matrix scaling algorithm as proposed in [10, 25]

Parameters
• a (array-like (dim_a,)) – Unnormalized histogram of dimension dim_a

• b (array-like (dim_b,) or array-like (dim_b, n_hists)) – One or multiple unnormalized histograms of dimension dim_b. If many, compute all the OT distances $$(\mathbf{a}, \mathbf{b}_i)_i$$

• M (array-like (dim_a, dim_b)) – loss matrix

• reg (float) – Entropy regularization term > 0

• reg_m (float) – Marginal relaxation term > 0

• method (str) – method used for the solver either ‘sinkhorn’, ‘sinkhorn_stabilized’ or ‘sinkhorn_reg_scaling’, see those function for specific parameters

• numItermax (int, optional) – Max number of iterations

• stopThr (float, optional) – Stop threshold on error (>0)

• verbose (bool, optional) – Print information along iterations

• log (bool, optional) – record log if True

Returns

• ot_distance ((n_hists,) array-like) – the OT distance between $$\mathbf{a}$$ and each of the histograms $$\mathbf{b}_i$$

• log (dict) – log dictionary returned only if log is True

Examples

>>> import ot
>>> a=[.5, .10]
>>> b=[.5, .5]
>>> M=[[0., 1.],[1., 0.]]
>>> ot.unbalanced.sinkhorn_unbalanced2(a, b, M, 1., 1.)
array([0.31912866])


References

2

M. Cuturi, Sinkhorn Distances : Lightspeed Computation of Optimal Transport, Advances in Neural Information Processing Systems (NIPS) 26, 2013

9

Schmitzer, B. (2016). Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems. arXiv preprint arXiv:1610.06519.

10

Chizat, L., Peyré, G., Schmitzer, B., & Vialard, F. X. (2016). Scaling algorithms for unbalanced transport problems. arXiv preprint arXiv:1607.05816.

25

Frogner C., Zhang C., Mobahi H., Araya-Polo M., Poggio T. : Learning with a Wasserstein Loss, Advances in Neural Information Processing Systems (NIPS) 2015

ot.unbalanced.sinkhorn_knopp

Unbalanced Classic Sinkhorn 

ot.unbalanced.sinkhorn_stabilized

Unbalanced Stabilized sinkhorn [9, 10]

ot.unbalanced.sinkhorn_reg_scaling

Unbalanced Sinkhorn with epslilon scaling [9, 10]

ot.sliced_wasserstein_distance(X_s, X_t, a=None, b=None, n_projections=50, p=2, projections=None, seed=None, log=False)[source]

Computes a Monte-Carlo approximation of the p-Sliced Wasserstein distance

$\mathcal{SWD}_p(\mu, \nu) = \underset{\theta \sim \mathcal{U}(\mathbb{S}^{d-1})}{\mathbb{E}}\left(\mathcal{W}_p^p(\theta_\# \mu, \theta_\# \nu)\right)^{\frac{1}{p}}$

where :

• $$\theta_\# \mu$$ stands for the pushforwards of the projection $$X \in \mathbb{R}^d \mapsto \langle \theta, X \rangle$$

Parameters
• X_s (ndarray, shape (n_samples_a, dim)) – samples in the source domain

• X_t (ndarray, shape (n_samples_b, dim)) – samples in the target domain

• a (ndarray, shape (n_samples_a,), optional) – samples weights in the source domain

• b (ndarray, shape (n_samples_b,), optional) – samples weights in the target domain

• n_projections (int, optional) – Number of projections used for the Monte-Carlo approximation

• p (float, optional =) – Power p used for computing the sliced Wasserstein

• projections (shape (dim, n_projections), optional) – Projection matrix (n_projections and seed are not used in this case)

• seed (int or RandomState or None, optional) – Seed used for random number generator

• log (bool, optional) – if True, sliced_wasserstein_distance returns the projections used and their associated EMD.

Returns

• cost (float) – Sliced Wasserstein Cost

• log (dict, optional) – log dictionary return only if log==True in parameters

Examples

>>> n_samples_a = 20
>>> reg = 0.1
>>> X = np.random.normal(0., 1., (n_samples_a, 5))
>>> sliced_wasserstein_distance(X, X, seed=0)
0.0


References

31

Bonneel, Nicolas, et al. “Sliced and radon wasserstein barycenters of measures.” Journal of Mathematical Imaging and Vision 51.1 (2015): 22-45

ot.tic()[source]

Python implementation of Matlab tic() function

ot.toc(message='Elapsed time : {} s')[source]

Python implementation of Matlab toc() function

ot.toq()[source]

Python implementation of Julia toc() function

ot.unif(n, type_as=None)[source]

Return a uniform histogram of length n (simplex).

Parameters
• n (int) – number of bins in the histogram

• type_as (array_like) – array of the same type of the expected output (numpy/pytorch/jax)

Returns

h – histogram of length n such that $$\forall i, \mathbf{h}_i = \frac{1}{n}$$

Return type

array_like (n,)

ot.wasserstein_1d(u_values, v_values, u_weights=None, v_weights=None, p=1, require_sort=True)[source]

Computes the 1 dimensional OT loss  between two (batched) empirical distributions

It is formally the p-Wasserstein distance raised to the power p. We do so in a vectorized way by first building the individual quantile functions then integrating them.

This function should be preferred to emd_1d whenever the backend is different to numpy, and when gradients over either sample positions or weights are required.

Parameters
• u_values (array-like, shape (n, ...)) – locations of the first empirical distribution

• v_values (array-like, shape (m, ...)) – locations of the second empirical distribution

• u_weights (array-like, shape (n, ...), optional) – weights of the first empirical distribution, if None then uniform weights are used

• v_weights (array-like, shape (m, ...), optional) – weights of the second empirical distribution, if None then uniform weights are used

• p (int, optional) – order of the ground metric used, should be at least 1 (see [2, Chap. 2], default is 1

• require_sort (bool, optional) – sort the distributions atoms locations, if False we will consider they have been sorted prior to being passed to the function, default is True

Returns

cost – the batched EMD

Return type

float/array-like, shape (…)

References

15

Peyré, G., & Cuturi, M. (2018). Computational Optimal Transport.

ot.weak_optimal_transport(Xa, Xb, a=None, b=None, verbose=False, log=False, G0=None, **kwargs)[source]

Solves the weak optimal transport problem between two empirical distributions

\begin{align}\begin{aligned}\gamma = \mathop{\arg \min}_\gamma \quad \|X_a-diag(1/a)\gammaX_b\|_F^2\\s.t. \ \gamma \mathbf{1} = \mathbf{a}\\ \gamma^T \mathbf{1} = \mathbf{b}\\ \gamma \geq 0\end{aligned}\end{align}

where :

• $$X_a$$ $$X_b$$ are the sample matrices.

• $$\mathbf{a}$$ and $$\mathbf{b}$$ are the sample weights

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

Uses the conditional gradient algorithm to solve the problem proposed in .

Parameters
• Xa ((ns,d) array-like, float) – Source samples

• Xb ((nt,d) array-like, float) – Target samples

• a ((ns,) array-like, float) – Source histogram (uniform weight if empty list)

• b ((nt,) array-like, float) – Target histogram (uniform weight if empty list))

• numItermax (int, optional) – Max number of iterations

• numItermaxEmd (int, optional) – Max number of iterations for emd

• stopThr (float, optional) – Stop threshold on the relative variation (>0)

• stopThr2 (float, optional) – Stop threshold on the absolute variation (>0)

• verbose (bool, optional) – Print information along iterations

• log (bool, optional) – record log if True

Returns

• gamma (array-like, shape (ns, nt)) – Optimal transportation matrix for the given parameters

• log (dict, optional) – If input log is true, a dictionary containing the cost and dual variables and exit status

References

39

Gozlan, N., Roberto, C., Samson, P. M., & Tetali, P. (2017). Kantorovich duality for general transport costs and applications. Journal of Functional Analysis, 273(11), 3327-3405.

ot.bregman.sinkhorn
ot.optim.cg