ot.gromov

Solvers related to Gromov-Wasserstein problems.

ot.gromov.BAPG_fused_gromov_wasserstein(M, C1, C2, p=None, q=None, loss_fun='square_loss', epsilon=0.1, symmetric=None, alpha=0.5, G0=None, max_iter=1000, tol=1e-09, marginal_loss=False, verbose=False, log=False)[source]

Returns the Fused Gromov-Wasserstein transport between \((\mathbf{C_1}, \mathbf{Y_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{Y_2}, \mathbf{q})\) with pairwise distance matrix \(\mathbf{M}\) between node feature matrices \(\mathbf{Y_1}\) and \(\mathbf{Y_2}\), estimated using Bregman Alternated Projected Gradient method.

If marginal_loss=True, the function solves the following Fused Gromov-Wasserstein optimization problem :

\[ \begin{align}\begin{aligned}\mathbf{T}^* \in\mathop{\arg\min}_\mathbf{T} \quad (1 - \alpha) \langle \mathbf{T}, \mathbf{M} \rangle_F + \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

Else, the function solves an equivalent problem [63, 64], where constant terms only depending on the marginals \(\mathbf{p}\): and \(\mathbf{q}\): are discarded while assuming that L decomposes as in Proposition 1 in [12]:

\[ \begin{align}\begin{aligned}\mathbf{T}^* \in\mathop{\arg\min}_\mathbf{T} \quad (1 - \alpha) \langle \mathbf{T}, \mathbf{M} \rangle_F - \alpha \langle h_1(\mathbf{C}_1) \mathbf{T} h_2(\mathbf{C_2})^\top , \mathbf{T} \rangle_F s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

Where :

  • \(\mathbf{M}\): pairwise relation matrix between features across domains

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • \(\mathbf{p}\): distribution in the source space

  • \(\mathbf{q}\): distribution in the target space

  • L: loss function to account for the misfit between the similarity and feature matrices

    satisfying \(L(a, b) = f_1(a) + f_2(b) - h_1(a) h_2(b)\)

  • \(\alpha\): trade-off parameter

Note

By algorithmic design the optimal coupling \(\mathbf{T}\) returned by this function does not necessarily satisfy the marginal constraints \(\mathbf{T}\mathbf{1}=\mathbf{p}\) and \(\mathbf{T}^T\mathbf{1}=\mathbf{q}\). So the returned Fused Gromov-Wasserstein loss does not necessarily satisfy distance properties and may be negative.

Parameters:
  • M (array-like, shape (ns, nt)) – Pairwise relation matrix between features across domains

  • C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • q (array-like, shape (nt,), optional) – Distribution in the target space. If let to its default value None, uniform distribution is taken.

  • loss_fun (string, optional (default='square_loss')) – Loss function used for the solver either ‘square_loss’ or ‘kl_loss’

  • epsilon (float, optional) – Regularization term >0

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • alpha (float, optional) – Trade-off parameter (0 < alpha < 1)

  • G0 (array-like, shape (ns,nt), optional) – If None the initial transport plan of the solver is pq^T. Otherwise G0 will be used as initial transport of the solver. G0 is not required to satisfy marginal constraints but we strongly recommend it to correctly estimate the GW distance.

  • max_iter (int, optional) – Max number of iterations

  • tol (float, optional) – Stop threshold on error (>0)

  • marginal_loss (bool, optional. Default is False.) – Include constant marginal terms or not in the objective function.

  • verbose (bool, optional) – Print information along iterations

  • log (bool, optional) – Record log if True.

Returns:

T – Optimal coupling between the two joint spaces

Return type:

array-like, shape (ns, nt)

References

ot.gromov.BAPG_fused_gromov_wasserstein2(M, C1, C2, p=None, q=None, loss_fun='square_loss', epsilon=0.1, symmetric=None, alpha=0.5, G0=None, max_iter=1000, tol=1e-09, marginal_loss=False, verbose=False, log=False)[source]

Returns the Fused Gromov-Wasserstein loss between \((\mathbf{C_1}, \mathbf{Y_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{Y_2}, \mathbf{q})\) with pairwise distance matrix \(\mathbf{M}\) between node feature matrices \(\mathbf{Y_1}\) and \(\mathbf{Y_2}\), estimated using Bregman Alternated Projected Gradient method.

If marginal_loss=True, the function solves the following Fused Gromov-Wasserstein optimization problem :

\[ \begin{align}\begin{aligned}\mathbf{FGW} = \mathop{\min}_\mathbf{T} \quad (1 - \alpha) \langle \mathbf{T}, \mathbf{M} \rangle_F + \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

Else, the function solves an equivalent problem [63, 64], where constant terms only depending on the marginals \(\mathbf{p}\): and \(\mathbf{q}\): are discarded while assuming that L decomposes as in Proposition 1 in [12]:

\[ \begin{align}\begin{aligned}\mathop{\min}_\mathbf{T} \quad (1 - \alpha) \langle \mathbf{T}, \mathbf{M} \rangle_F - \alpha \langle h_1(\mathbf{C}_1) \mathbf{T} h_2(\mathbf{C_2})^\top , \mathbf{T} \rangle_F s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

Where :

  • \(\mathbf{M}\): metric cost matrix between features across domains

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • \(\mathbf{p}\): distribution in the source space

  • \(\mathbf{q}\): distribution in the target space

  • L: loss function to account for the misfit between the similarity and feature matrices

    satisfying \(L(a, b) = f_1(a) + f_2(b) - h_1(a) h_2(b)\)

  • \(\alpha\): trade-off parameter

Note

By algorithmic design the optimal coupling \(\mathbf{T}\) returned by this function does not necessarily satisfy the marginal constraints \(\mathbf{T}\mathbf{1}=\mathbf{p}\) and \(\mathbf{T}^T\mathbf{1}=\mathbf{q}\). So the returned Fused Gromov-Wasserstein loss does not necessarily satisfy distance properties and may be negative.

Parameters:
  • M (array-like, shape (ns, nt)) – Metric cost matrix between features across domains

  • C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • q (array-like, shape (nt,), optional) – Distribution in the target space. If let to its default value None, uniform distribution is taken.

  • loss_fun (string, optional (default='square_loss')) – Loss function used for the solver either ‘square_loss’ or ‘kl_loss’

  • epsilon (float, optional) – Regularization term >0

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • alpha (float, optional) – Trade-off parameter (0 < alpha < 1)

  • G0 (array-like, shape (ns,nt), optional) – If None the initial transport plan of the solver is pq^T. Otherwise G0 will be used as initial transport of the solver. G0 is not required to satisfy marginal constraints but we strongly recommend it to correctly estimate the GW distance.

  • max_iter (int, optional) – Max number of iterations

  • tol (float, optional) – Stop threshold on error (>0)

  • marginal_loss (bool, optional. Default is False.) – Include constant marginal terms or not in the objective function.

  • verbose (bool, optional) – Print information along iterations

  • log (bool, optional) – Record log if True.

Returns:

T – Optimal coupling between the two joint spaces

Return type:

array-like, shape (ns, nt)

References

ot.gromov.BAPG_gromov_wasserstein(C1, C2, p=None, q=None, loss_fun='square_loss', epsilon=0.1, symmetric=None, G0=None, max_iter=1000, tol=1e-09, marginal_loss=False, verbose=False, log=False)[source]

Returns the Gromov-Wasserstein transport between \((\mathbf{C_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{q})\) estimated using Bregman Alternated Projected Gradient method.

If marginal_loss=True, the function solves the following Gromov-Wasserstein optimization problem :

\[ \begin{align}\begin{aligned}\mathbf{T}^* \in \mathop{\arg\min}_\mathbf{T} \quad \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

Else, the function solves an equivalent problem [63], where constant terms only depending on the marginals \(\mathbf{p}\): and \(\mathbf{q}\): are discarded while assuming that L decomposes as in Proposition 1 in [12]:

\[ \begin{align}\begin{aligned}\mathbf{T}^* \in\mathop{\arg\min}_\mathbf{T} \quad - \langle h_1(\mathbf{C}_1) \mathbf{T} h_2(\mathbf{C_2})^\top , \mathbf{T} \rangle_F\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

Where :

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • \(\mathbf{p}\): distribution in the source space

  • \(\mathbf{q}\): distribution in the target space

  • L: loss function to account for the misfit between the similarity matrices

    satisfying \(L(a, b) = f_1(a) + f_2(b) - h_1(a) h_2(b)\)

Note

By algorithmic design the optimal coupling \(\mathbf{T}\) returned by this function does not necessarily satisfy the marginal constraints \(\mathbf{T}\mathbf{1}=\mathbf{p}\) and \(\mathbf{T}^T\mathbf{1}=\mathbf{q}\). So the returned Gromov-Wasserstein loss does not necessarily satisfy distance properties and may be negative.

Parameters:
  • C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • q (array-like, shape (nt,), optional) – Distribution in the target space. If let to its default value None, uniform distribution is taken.

  • loss_fun (string, optional (default='square_loss')) – Loss function used for the solver either ‘square_loss’ or ‘kl_loss’

  • epsilon (float, optional) – Regularization term >0

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • G0 (array-like, shape (ns,nt), optional) – If None the initial transport plan of the solver is pq^T. Otherwise G0 will be used as initial transport of the solver. G0 is not required to satisfy marginal constraints but we strongly recommend it to correctly estimate the GW distance.

  • max_iter (int, optional) – Max number of iterations

  • tol (float, optional) – Stop threshold on error (>0)

  • marginal_loss (bool, optional. Default is False.) – Include constant marginal terms or not in the objective function.

  • verbose (bool, optional) – Print information along iterations

  • log (bool, optional) – Record log if True.

Returns:

T – Optimal coupling between the two spaces

Return type:

array-like, shape (ns, nt)

References

ot.gromov.BAPG_gromov_wasserstein2(C1, C2, p=None, q=None, loss_fun='square_loss', epsilon=0.1, symmetric=None, G0=None, max_iter=1000, tol=1e-09, marginal_loss=False, verbose=False, log=False)[source]

Returns the Gromov-Wasserstein loss \(\mathbf{GW}\) between \((\mathbf{C_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{q})\) estimated using Bregman Alternated Projected Gradient method.

If marginal_loss=True, the function solves the following Gromov-Wasserstein optimization problem :

\[ \begin{align}\begin{aligned}\mathbf{GW} = \mathop{\min}_\mathbf{T} \quad \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

Else, the function solves an equivalent problem [63, 64], where constant terms only depending on the marginals \(\mathbf{p}\): and \(\mathbf{q}\): are discarded while assuming that L decomposes as in Proposition 1 in [12]:

\[ \begin{align}\begin{aligned}\mathop{\min}_\mathbf{T} \quad - \langle h_1(\mathbf{C}_1) \mathbf{T} h_2(\mathbf{C_2})^\top , \mathbf{T} \rangle_F\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

Where :

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • \(\mathbf{p}\): distribution in the source space

  • \(\mathbf{q}\): distribution in the target space

  • L: loss function to account for the misfit between the similarity matrices

    satisfying \(L(a, b) = f_1(a) + f_2(b) - h_1(a) h_2(b)\)

Note

By algorithmic design the optimal coupling \(\mathbf{T}\) returned by this function does not necessarily satisfy the marginal constraints \(\mathbf{T}\mathbf{1}=\mathbf{p}\) and \(\mathbf{T}^T\mathbf{1}=\mathbf{q}\). So the returned Gromov-Wasserstein loss does not necessarily satisfy distance properties and may be negative.

Parameters:
  • C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • q (array-like, shape (nt,), optional) – Distribution in the target space. If let to its default value None, uniform distribution is taken.

  • loss_fun (string, optional (default='square_loss')) – Loss function used for the solver either ‘square_loss’ or ‘kl_loss’

  • epsilon (float, optional) – Regularization term >0

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • G0 (array-like, shape (ns,nt), optional) – If None the initial transport plan of the solver is pq^T. Otherwise G0 will be used as initial transport of the solver. G0 is not required to satisfy marginal constraints but we strongly recommend it to correctly estimate the GW distance.

  • max_iter (int, optional) – Max number of iterations

  • tol (float, optional) – Stop threshold on error (>0)

  • marginal_loss (bool, optional. Default is False.) – Include constant marginal terms or not in the objective function.

  • verbose (bool, optional) – Print information along iterations

  • log (bool, optional) – Record log if True.

Returns:

gw_dist – Gromov-Wasserstein distance

Return type:

float

References

ot.gromov.GW_distance_estimation(C1, C2, p, q, loss_fun, T, nb_samples_p=None, nb_samples_q=None, std=True, random_state=None)[source]

Returns an approximation of the Gromov-Wasserstein loss between \((\mathbf{C_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{q})\) with a fixed transport plan \(\mathbf{T}\). To recover an approximation of the Gromov-Wasserstein distance as defined in [13] compute \(d_{GW} = \frac{1}{2} \sqrt{\mathbf{GW}}\).

The function gives an unbiased approximation of the following equation:

\[\mathbf{GW} = \sum_{i,j,k,l} L(\mathbf{C_{1}}_{i,k}, \mathbf{C_{2}}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\]

Where :

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • L : Loss function to account for the misfit between the similarity matrices

  • \(\mathbf{T}\): Matrix with marginal \(\mathbf{p}\) and \(\mathbf{q}\)

Parameters:
  • C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space

  • p (array-like, shape (ns,)) – Distribution in the source space

  • q (array-like, shape (nt,)) – Distribution in the target space

  • loss_fun (function: \(\mathbb{R} \times \mathbb{R} \mapsto \mathbb{R}\)) – Loss function used for the distance, the transport plan does not depend on the loss function

  • T (csr or array-like, shape (ns, nt)) – Transport plan matrix, either a sparse csr or a dense matrix

  • nb_samples_p (int, optional) – nb_samples_p is the number of samples (without replacement) along the first dimension of \(\mathbf{T}\)

  • nb_samples_q (int, optional) – nb_samples_q is the number of samples along the second dimension of \(\mathbf{T}\), for each sample along the first

  • std (bool, optional) – Standard deviation associated with the prediction of the gromov-wasserstein cost

  • random_state (int or RandomState instance, optional) – Fix the seed for reproducibility

Returns:

Gromov-wasserstein cost

Return type:

float

References

ot.gromov.div_between_product(mu, nu, alpha, beta, divergence, nx=None)[source]

Fast computation of the Bregman divergence between two product measures. Only support for Kullback-Leibler and half-squared L2 divergences.

For half-squared L2 divergence:

\[\frac{1}{2} || \mu \otimes \nu, \alpha \otimes \beta ||^2 = \frac{1}{2} \Big[ ||\alpha||^2 ||\beta||^2 + ||\mu||^2 ||\nu||^2 - 2 \langle \alpha, \mu \rangle \langle \beta, \nu \rangle \Big]\]

For Kullback-Leibler divergence:

\[KL(\mu \otimes \nu, \alpha \otimes \beta) = m(\mu) * KL(\nu, \beta) + m(\nu) * KL(\mu, \alpha) + (m(\mu) - m(\alpha)) * (m(\nu) - m(\beta))\]

where:

  • \(\mu\) and \(\alpha\) are two measures having the same shape.

  • \(\nu\) and \(\beta\) are two measures having the same shape.

  • \(m\) denotes the mass of the measure

Parameters:
  • mu (array-like) – vector or matrix

  • nu (array-like) – vector or matrix

  • alpha (array-like) – vector or matrix with the same shape as mu

  • beta (array-like) – vector or matrix with the same shape as nu

  • divergence (string, default = "kl") – Bregman divergence, either “kl” (Kullback-Leibler divergence) or “l2” (half-squared L2 divergence)

  • nx (backend, optional) – If let to its default value None, a backend test will be conducted.

Return type:

Bregman divergence between two product measures.

ot.gromov.div_to_product(pi, a, b, pi1=None, pi2=None, divergence='kl', mass=True, nx=None)[source]

Fast computation of the Bregman divergence between an arbitrary measure and a product measure. Only support for Kullback-Leibler and half-squared L2 divergences.

  • For half-squared L2 divergence:

\[\frac{1}{2} || \pi - a \otimes b ||^2 = \frac{1}{2} \Big[ \sum_{i, j} \pi_{ij}^2 + (\sum_i a_i^2) ( \sum_j b_j^2) - 2 \sum_{i, j} a_i \pi_{ij} b_j \Big]\]
  • For Kullback-Leibler divergence:

\[KL(\pi | a \otimes b) = \langle \pi, \log \pi \rangle - \langle \pi_1, \log a \rangle - \langle \pi_2, \log b \rangle - m(\pi) + m(a) m(b)\]

where :

  • \(\pi\) is the (dim_a, dim_b) transport plan

  • \(\pi_1\) and \(\pi_2\) are the marginal distributions

  • \(\mathbf{a}\) and \(\mathbf{b}\) are source and target unbalanced distributions

  • \(m\) denotes the mass of the measure

Parameters:
  • pi (array-like (dim_a, dim_b)) – Transport plan

  • a (array-like (dim_a,)) – Unnormalized histogram of dimension dim_a

  • b (array-like (dim_b,)) – Unnormalized histogram of dimension dim_b

  • pi1 (array-like (dim_a,), optional (default = None)) – Marginal distribution with respect to the first dimension of the transport plan Only used in case of Kullback-Leibler divergence.

  • pi2 (array-like (dim_a,), optional (default = None)) – Marginal distribution with respect to the second dimension of the transport plan Only used in case of Kullback-Leibler divergence.

  • divergence (string, default = "kl") – Bregman divergence, either “kl” (Kullback-Leibler divergence) or “l2” (half-squared L2 divergence)

  • mass (bool, optional. Default is False.) – Only used in case of Kullback-Leibler divergence. If False, calculate the relative entropy. If True, calculate the Kullback-Leibler divergence.

  • nx (backend, optional) – If let to its default value None, a backend test will be conducted.

Return type:

Bregman divergence between an arbitrary measure and a product measure.

ot.gromov.entropic_fused_gromov_barycenters(N, Ys, Cs, ps=None, p=None, lambdas=None, loss_fun='square_loss', epsilon=0.1, symmetric=True, alpha=0.5, max_iter=1000, tol=1e-09, stop_criterion='barycenter', warmstartT=False, verbose=False, log=False, init_C=None, init_Y=None, fixed_structure=False, fixed_features=False, random_state=None, **kwargs)[source]

Returns the Fused Gromov-Wasserstein barycenters of S measurable networks with node features \((\mathbf{C}_s, \mathbf{Y}_s, \mathbf{p}_s)_{1 \leq s \leq S}\) estimated using Fused Gromov-Wasserstein transports from Sinkhorn projections.

The function solves the following optimization problem:

\[\mathbf{C}^*, \mathbf{Y}^* = \mathop{\arg \min}_{\mathbf{C}\in \mathbb{R}^{N \times N}, \mathbf{Y}\in \mathbb{Y}^{N \times d}} \quad \sum_s \lambda_s \mathrm{FGW}_{\alpha}(\mathbf{C}, \mathbf{C}_s, \mathbf{Y}, \mathbf{Y}_s, \mathbf{p}, \mathbf{p}_s)\]

Where :

  • \(\mathbf{Y}_s\): feature matrix

  • \(\mathbf{C}_s\): metric cost matrix

  • \(\mathbf{p}_s\): distribution

Parameters:
  • N (int) – Size of the targeted barycenter

  • Ys (list of array-like, each element has shape (ns,d)) – Features of all samples

  • Cs (list of S array-like of shape (ns,ns)) – Metric cost matrices

  • ps (list of S array-like of shape (ns,), optional) – Sample weights in the S spaces. If let to its default value None, uniform distributions are taken.

  • p (array-like, shape (N,), optional) – Weights in the targeted barycenter. If let to its default value None, uniform distribution is taken.

  • lambdas (list of float, optional) – List of the S spaces’ weights. If let to its default value None, uniform weights are taken.

  • loss_fun (string, optional (default='square_loss')) – Loss function used for the solver either ‘square_loss’ or ‘kl_loss’

  • epsilon (float, optional) – Regularization term >0

  • symmetric (bool, optional.) – Either structures are to be assumed symmetric or not. Default value is True. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • alpha (float, optional) – Trade-off parameter (0 < alpha < 1)

  • max_iter (int, optional) – Max number of iterations

  • tol (float, optional) – Stop threshold on error (>0)

  • stop_criterion (str, optional. Default is 'barycenter'.) – Stop criterion taking values in [‘barycenter’, ‘loss’]. If set to ‘barycenter’ uses absolute norm variations of estimated barycenters. Else if set to ‘loss’ uses the relative variations of the loss.

  • warmstartT (bool, optional) – Either to perform warmstart of transport plans in the successive fused gromov-wasserstein transport problems.

  • verbose (bool, optional) – Print information along iterations.

  • log (bool, optional) – Record log if True.

  • init_C (bool | array-like, shape (N, N)) – Random initial value for the \(\mathbf{C}\) matrix provided by user.

  • init_Y (array-like, shape (N,d), optional) – Initialization for the barycenters’ features. If not set a random init is used.

  • fixed_structure (bool, optional) – Whether to fix the structure of the barycenter during the updates.

  • fixed_features (bool, optional) – Whether to fix the feature of the barycenter during the updates

  • random_state (int or RandomState instance, optional) – Fix the seed for reproducibility

  • **kwargs (dict) – parameters can be directly passed to the ot.entropic_fused_gromov_wasserstein solver.

Returns:

  • Y (array-like, shape (N, d)) – Feature matrix in the barycenter space (permutated arbitrarily)

  • C (array-like, shape (N, N)) – Similarity matrix in the barycenter space (permutated as Y’s rows)

  • log (dict) – Only returned when log=True. It contains the keys:

    • \(\mathbf{T}\): list of (N, ns) transport matrices

    • \(\mathbf{p}\): (N,) barycenter weights

    • \((\mathbf{M}_s)_s\): all distance matrices between the feature of the barycenter and the other features \((dist(\mathbf{X}, \mathbf{Y}_s))_s\) shape (N, ns)

    • values used in convergence evaluation.

References

ot.gromov.entropic_fused_gromov_wasserstein(M, C1, C2, p=None, q=None, loss_fun='square_loss', epsilon=0.1, symmetric=None, alpha=0.5, G0=None, max_iter=1000, tol=1e-09, solver='PGD', warmstart=False, verbose=False, log=False, **kwargs)[source]

Returns the Fused Gromov-Wasserstein transport between \((\mathbf{C_1}, \mathbf{Y_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{Y_2}, \mathbf{q})\) with pairwise distance matrix \(\mathbf{M}\) between node feature matrices \(\mathbf{Y_1}\) and \(\mathbf{Y_2}\), estimated using Sinkhorn projections.

If solver=”PGD”, the function solves the following entropic-regularized Fused Gromov-Wasserstein optimization problem using Projected Gradient Descent [12]:

\[ \begin{align}\begin{aligned}\mathbf{T}^* \in \mathop{\arg\min}_\mathbf{T} \quad (1 - \alpha) \langle \mathbf{T}, \mathbf{M} \rangle_F + \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l} - \epsilon H(\mathbf{T})\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

Else if solver=”PPA”, the function solves the following Fused Gromov-Wasserstein optimization problem using Proximal Point Algorithm [51]:

\[ \begin{align}\begin{aligned}\mathbf{T}^* \in\mathop{\arg\min}_\mathbf{T} \quad (1 - \alpha) \langle \mathbf{T}, \mathbf{M} \rangle_F + \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

Where :

  • \(\mathbf{M}\): metric cost matrix between features across domains

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • \(\mathbf{p}\): distribution in the source space

  • \(\mathbf{q}\): distribution in the target space

  • L: loss function to account for the misfit between the similarity and feature matrices

  • H: entropy

  • \(\alpha\): trade-off parameter

Note

If the inner solver ot.sinkhorn did not convergence, the optimal coupling \(\mathbf{T}\) returned by this function does not necessarily satisfy the marginal constraints \(\mathbf{T}\mathbf{1}=\mathbf{p}\) and \(\mathbf{T}^T\mathbf{1}=\mathbf{q}\). So the returned Fused Gromov-Wasserstein loss does not necessarily satisfy distance properties and may be negative.

Parameters:
  • M (array-like, shape (ns, nt)) – Metric cost matrix between features across domains

  • C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • q (array-like, shape (nt,), optional) – Distribution in the target space. If let to its default value None, uniform distribution is taken.

  • loss_fun (string, optional (default='square_loss')) – Loss function used for the solver either ‘square_loss’ or ‘kl_loss’

  • epsilon (float, optional) – Regularization term >0

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • alpha (float, optional) – Trade-off parameter (0 < alpha < 1)

  • G0 (array-like, shape (ns,nt), optional) – If None the initial transport plan of the solver is pq^T. Otherwise G0 will be used as initial transport of the solver. G0 is not required to satisfy marginal constraints but we strongly recommend it to correctly estimate the GW distance.

  • max_iter (int, optional) – Max number of iterations

  • tol (float, optional) – Stop threshold on error (>0)

  • solver (string, optional) – Solver to use either ‘PGD’ for Projected Gradient Descent or ‘PPA’ for Proximal Point Algorithm. Default value is ‘PGD’.

  • warmstart (bool, optional) – Either to perform warmstart of dual potentials in the successive Sinkhorn projections.

  • verbose (bool, optional) – Print information along iterations

  • log (bool, optional) – Record log if True.

  • **kwargs (dict) – parameters can be directly passed to the ot.sinkhorn solver. Such as numItermax and stopThr to control its estimation precision, e.g [51] suggests to use numItermax=1.

Returns:

T – Optimal coupling between the two joint spaces

Return type:

array-like, shape (ns, nt)

References

ot.gromov.entropic_fused_gromov_wasserstein2(M, C1, C2, p=None, q=None, loss_fun='square_loss', epsilon=0.1, symmetric=None, alpha=0.5, G0=None, max_iter=1000, tol=1e-09, solver='PGD', warmstart=False, verbose=False, log=False, **kwargs)[source]

Returns the Fused Gromov-Wasserstein distance between \((\mathbf{C_1}, \mathbf{Y_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{Y_2}, \mathbf{q})\) with pairwise distance matrix \(\mathbf{M}\) between node feature matrices \(\mathbf{Y_1}\) and \(\mathbf{Y_2}\), estimated using Sinkhorn projections.

If solver=”PGD”, the function solves the following entropic-regularized Fused Gromov-Wasserstein optimization problem using Projected Gradient Descent [12]:

\[ \begin{align}\begin{aligned}\mathbf{FGW} = \mathop{\min}_\mathbf{T} \quad (1 - \alpha) \langle \mathbf{T}, \mathbf{M} \rangle_F + \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l} - \epsilon H(\mathbf{T})\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

Else if solver=”PPA”, the function solves the following Fused Gromov-Wasserstein optimization problem using Proximal Point Algorithm [51]:

\[ \begin{align}\begin{aligned}\mathbf{FGW} = \mathop{\min}_\mathbf{T} \quad (1 - \alpha) \langle \mathbf{T}, \mathbf{M} \rangle_F + \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

Where :

  • \(\mathbf{M}\): metric cost matrix between features across domains

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • \(\mathbf{p}\): distribution in the source space

  • \(\mathbf{q}\): distribution in the target space

  • L: loss function to account for the misfit between the similarity and feature matrices

  • H: entropy

  • \(\alpha\): trade-off parameter

Note

If the inner solver ot.sinkhorn did not convergence, the optimal coupling \(\mathbf{T}\) returned by this function does not necessarily satisfy the marginal constraints \(\mathbf{T}\mathbf{1}=\mathbf{p}\) and \(\mathbf{T}^T\mathbf{1}=\mathbf{q}\). So the returned Fused Gromov-Wasserstein loss does not necessarily satisfy distance properties and may be negative.

Parameters:
  • M (array-like, shape (ns, nt)) – Metric cost matrix between features across domains

  • C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • q (array-like, shape (nt,), optional) – Distribution in the target space. If let to its default value None, uniform distribution is taken.

  • loss_fun (string, optional (default='square_loss')) – Loss function used for the solver either ‘square_loss’ or ‘kl_loss’

  • epsilon (float, optional) – Regularization term >0

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • alpha (float, optional) – Trade-off parameter (0 < alpha < 1)

  • G0 (array-like, shape (ns,nt), optional) – If None the initial transport plan of the solver is pq^T. Otherwise G0 will be used as initial transport of the solver. G0 is not required to satisfy marginal constraints but we strongly recommend it to correctly estimate the GW distance.

  • max_iter (int, optional) – Max number of iterations

  • tol (float, optional) – Stop threshold on error (>0)

  • verbose (bool, optional) – Print information along iterations

  • log (bool, optional) – Record log if True.

Returns:

fgw_dist – Fused Gromov-Wasserstein distance

Return type:

float

References

ot.gromov.entropic_gromov_barycenters(N, Cs, ps=None, p=None, lambdas=None, loss_fun='square_loss', epsilon=0.1, symmetric=True, max_iter=1000, tol=1e-09, stop_criterion='barycenter', warmstartT=False, verbose=False, log=False, init_C=None, random_state=None, **kwargs)[source]

Returns the Gromov-Wasserstein barycenters of S measured similarity matrices \((\mathbf{C}_s)_{1 \leq s \leq S}\) estimated using Gromov-Wasserstein transports from Sinkhorn projections.

The function solves the following optimization problem:

\[\mathbf{C}^* = \mathop{\arg \min}_{\mathbf{C}\in \mathbb{R}^{N \times N}} \quad \sum_s \lambda_s \mathrm{GW}(\mathbf{C}, \mathbf{C}_s, \mathbf{p}, \mathbf{p}_s)\]

Where :

  • \(\mathbf{C}_s\): metric cost matrix

  • \(\mathbf{p}_s\): distribution

Parameters:
  • N (int) – Size of the targeted barycenter

  • Cs (list of S array-like of shape (ns,ns)) – Metric cost matrices

  • ps (list of S array-like of shape (ns,), optional) – Sample weights in the S spaces. If let to its default value None, uniform distributions are taken.

  • p (array-like, shape (N,), optional) – Weights in the targeted barycenter. If let to its default value None, uniform distribution is taken.

  • lambdas (list of float, optional) – List of the S spaces’ weights. If let to its default value None, uniform weights are taken.

  • loss_fun (string, optional (default='square_loss')) – Loss function used for the solver either ‘square_loss’ or ‘kl_loss’

  • epsilon (float, optional) – Regularization term >0

  • symmetric (bool, optional.) – Either structures are to be assumed symmetric or not. Default value is True. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • max_iter (int, optional) – Max number of iterations

  • tol (float, optional) – Stop threshold on error (>0)

  • stop_criterion (str, optional. Default is 'barycenter'.) – Convergence criterion taking values in [‘barycenter’, ‘loss’]. If set to ‘barycenter’ uses absolute norm variations of estimated barycenters. Else if set to ‘loss’ uses the relative variations of the loss.

  • warmstartT (bool, optional) – Either to perform warmstart of transport plans in the successive gromov-wasserstein transport problems.

  • verbose (bool, optional) – Print information along iterations.

  • log (bool, optional) – Record log if True.

  • init_C (bool | array-like, shape (N, N)) – Random initial value for the \(\mathbf{C}\) matrix provided by user.

  • random_state (int or RandomState instance, optional) – Fix the seed for reproducibility

  • **kwargs (dict) – parameters can be directly passed to the ot.entropic_gromov_wasserstein solver.

Returns:

  • C (array-like, shape (N, N)) – Similarity matrix in the barycenter space (permutated arbitrarily)

  • log (dict) – Only returned when log=True. It contains the keys:

    • \(\mathbf{T}\): list of (N, ns) transport matrices

    • \(\mathbf{p}\): (N,) barycenter weights

    • values used in convergence evaluation.

References

ot.gromov.entropic_gromov_wasserstein(C1, C2, p=None, q=None, loss_fun='square_loss', epsilon=0.1, symmetric=None, G0=None, max_iter=1000, tol=1e-09, solver='PGD', warmstart=False, verbose=False, log=False, **kwargs)[source]

Returns the Gromov-Wasserstein transport between \((\mathbf{C_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{q})\) estimated using Sinkhorn projections.

If solver=”PGD”, the function solves the following entropic-regularized Gromov-Wasserstein optimization problem using Projected Gradient Descent [12]:

\[ \begin{align}\begin{aligned}\mathbf{T}^* \in \mathop{\arg\min}_\mathbf{T} \quad \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l} - \epsilon H(\mathbf{T})\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

Else if solver=”PPA”, the function solves the following Gromov-Wasserstein optimization problem using Proximal Point Algorithm [51]:

\[ \begin{align}\begin{aligned}\mathbf{T}^* \in \mathop{\arg\min}_\mathbf{T} \quad \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

Where :

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • \(\mathbf{p}\): distribution in the source space

  • \(\mathbf{q}\): distribution in the target space

  • L: loss function to account for the misfit between the similarity matrices

  • H: entropy

Note

If the inner solver ot.sinkhorn did not convergence, the optimal coupling \(\mathbf{T}\) returned by this function does not necessarily satisfy the marginal constraints \(\mathbf{T}\mathbf{1}=\mathbf{p}\) and \(\mathbf{T}^T\mathbf{1}=\mathbf{q}\). So the returned Gromov-Wasserstein loss does not necessarily satisfy distance properties and may be negative.

Parameters:
  • C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • q (array-like, shape (nt,), optional) – Distribution in the target space. If let to its default value None, uniform distribution is taken.

  • loss_fun (string, optional (default='square_loss')) – Loss function used for the solver either ‘square_loss’ or ‘kl_loss’

  • epsilon (float, optional) – Regularization term >0

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • G0 (array-like, shape (ns,nt), optional) – If None the initial transport plan of the solver is pq^T. Otherwise G0 will be used as initial transport of the solver. G0 is not required to satisfy marginal constraints but we strongly recommend it to correctly estimate the GW distance.

  • max_iter (int, optional) – Max number of iterations

  • tol (float, optional) – Stop threshold on error (>0)

  • solver (string, optional) – Solver to use either ‘PGD’ for Projected Gradient Descent or ‘PPA’ for Proximal Point Algorithm. Default value is ‘PGD’.

  • warmstart (bool, optional) – Either to perform warmstart of dual potentials in the successive Sinkhorn projections.

  • verbose (bool, optional) – Print information along iterations

  • log (bool, optional) – Record log if True.

  • **kwargs (dict) – parameters can be directly passed to the ot.sinkhorn solver. Such as numItermax and stopThr to control its estimation precision, e.g [51] suggests to use numItermax=1.

Returns:

T – Optimal coupling between the two spaces

Return type:

array-like, shape (ns, nt)

References

ot.gromov.entropic_gromov_wasserstein2(C1, C2, p=None, q=None, loss_fun='square_loss', epsilon=0.1, symmetric=None, G0=None, max_iter=1000, tol=1e-09, solver='PGD', warmstart=False, verbose=False, log=False, **kwargs)[source]

Returns the Gromov-Wasserstein loss \(\mathbf{GW}\) between \((\mathbf{C_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{q})\) estimated using Sinkhorn projections. To recover the Gromov-Wasserstein distance as defined in [13] compute \(d_{GW} = \frac{1}{2} \sqrt{\mathbf{GW}}\).

If solver=”PGD”, the function solves the following entropic-regularized Gromov-Wasserstein optimization problem using Projected Gradient Descent [12]:

\[ \begin{align}\begin{aligned}\mathbf{GW} = \mathop{\min}_\mathbf{T} \quad \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l} - \epsilon H(\mathbf{T})\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

Else if solver=”PPA”, the function solves the following Gromov-Wasserstein optimization problem using Proximal Point Algorithm [51]:

\[ \begin{align}\begin{aligned}\mathbf{GW} = \mathop{\min}_\mathbf{T} \quad \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

Where :

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • \(\mathbf{p}\): distribution in the source space

  • \(\mathbf{q}\): distribution in the target space

  • L: loss function to account for the misfit between the similarity matrices

  • H: entropy

Note

If the inner solver ot.sinkhorn did not convergence, the optimal coupling \(\mathbf{T}\) returned by this function does not necessarily satisfy the marginal constraints \(\mathbf{T}\mathbf{1}=\mathbf{p}\) and \(\mathbf{T}^T\mathbf{1}=\mathbf{q}\). So the returned Gromov-Wasserstein loss does not necessarily satisfy distance properties and may be negative.

Parameters:
  • C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • q (array-like, shape (nt,), optional) – Distribution in the target space. If let to its default value None, uniform distribution is taken.

  • loss_fun (string, optional (default='square_loss')) – Loss function used for the solver either ‘square_loss’ or ‘kl_loss’

  • epsilon (float, optional) – Regularization term >0

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • G0 (array-like, shape (ns,nt), optional) – If None the initial transport plan of the solver is pq^T. Otherwise G0 will be used as initial transport of the solver. G0 is not required to satisfy marginal constraints but we strongly recommend it to correctly estimate the GW distance.

  • max_iter (int, optional) – Max number of iterations

  • tol (float, optional) – Stop threshold on error (>0)

  • solver (string, optional) – Solver to use either ‘PGD’ for Projected Gradient Descent or ‘PPA’ for Proximal Point Algorithm. Default value is ‘PGD’.

  • warmstart (bool, optional) – Either to perform warmstart of dual potentials in the successive Sinkhorn projections.

  • verbose (bool, optional) – Print information along iterations

  • log (bool, optional) – Record log if True.

  • **kwargs (dict) – parameters can be directly passed to the ot.sinkhorn solver. Such as numItermax and stopThr to control its estimation precision, e.g [51] suggests to use numItermax=1.

Returns:

gw_dist – Gromov-Wasserstein distance

Return type:

float

References

ot.gromov.entropic_partial_gromov_wasserstein(C1, C2, p=None, q=None, reg=1.0, m=None, loss_fun='square_loss', G0=None, numItermax=1000, tol=1e-07, symmetric=None, log=False, verbose=False)[source]

Returns the partial Gromov-Wasserstein transport between \((\mathbf{C_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{q})\)

The function solves the following optimization problem:

\[\gamma = \mathop{\arg \min}_{\gamma} \quad \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l})\cdot \gamma_{i,j}\cdot\gamma_{k,l} + \mathrm{reg} \cdot\Omega(\gamma)\]
\[ \begin{align}\begin{aligned}s.t. \ \gamma &\geq 0\\ \gamma \mathbf{1} &\leq \mathbf{a}\\ \gamma^T \mathbf{1} &\leq \mathbf{b}\\ \mathbf{1}^T \gamma^T \mathbf{1} = m &\leq \min\{\|\mathbf{a}\|_1, \|\mathbf{b}\|_1\}\end{aligned}\end{align} \]

where :

  • \(\mathbf{C_1}\) is the metric cost matrix in the source space

  • \(\mathbf{C_2}\) is the metric cost matrix in the target space

  • \(\mathbf{p}\) and \(\mathbf{q}\) are the sample weights

  • L: quadratic loss function

  • \(\Omega\) is the entropic regularization term, \(\Omega=\sum_{i,j} \gamma_{i,j}\log(\gamma_{i,j})\)

  • m is the amount of mass to be transported

The formulation of the GW problem has been proposed in [12] and the partial GW in [29]

Parameters:
  • C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • q (array-like, shape (nt,), optional) – Distribution in the target space. If let to its default value None, uniform distribution is taken.

  • reg (float, optional. Default is 1.) – entropic regularization parameter

  • m (float, optional) – Amount of mass to be transported (default: \(\min\{\|\mathbf{p}\|_1, \|\mathbf{q}\|_1\}\))

  • loss_fun (str, optional) – Loss function used for the solver either ‘square_loss’ or ‘kl_loss’.

  • G0 (array-like, shape (ns, nt), optional) – Initialization of the transportation matrix

  • numItermax (int, optional) – Max number of iterations

  • tol (float, optional) – Stop threshold on error (>0)

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • log (bool, optional) – return log if True

  • verbose (bool, optional) – Print information along iterations

Examples

>>> from ot.gromov import entropic_partial_gromov_wasserstein
>>> import scipy as sp
>>> a = np.array([0.25] * 4)
>>> b = np.array([0.25] * 4)
>>> x = np.array([1,2,100,200]).reshape((-1,1))
>>> y = np.array([3,2,98,199]).reshape((-1,1))
>>> C1 = sp.spatial.distance.cdist(x, x)
>>> C2 = sp.spatial.distance.cdist(y, y)
>>> np.round(entropic_partial_gromov_wasserstein(C1, C2, a, b, 1e2), 2)
array([[0.12, 0.13, 0.  , 0.  ],
       [0.13, 0.12, 0.  , 0.  ],
       [0.  , 0.  , 0.25, 0.  ],
       [0.  , 0.  , 0.  , 0.25]])
>>> np.round(entropic_partial_gromov_wasserstein(C1, C2, a, b, 1e2,0.25), 2)
array([[0.02, 0.03, 0.  , 0.03],
       [0.03, 0.03, 0.  , 0.03],
       [0.  , 0.  , 0.03, 0.  ],
       [0.02, 0.02, 0.  , 0.03]])
Returns:

  • math: gamma : (dim_a, dim_b) ndarray – Optimal transportation matrix for the given parameters

  • log (dict) – log dictionary returned only if log is True

References

See also

ot.partial.partial_gromov_wasserstein

exact Partial Gromov-Wasserstein

ot.gromov.entropic_partial_gromov_wasserstein2(C1, C2, p=None, q=None, reg=1.0, m=None, loss_fun='square_loss', G0=None, numItermax=1000, tol=1e-07, symmetric=None, log=False, verbose=False)[source]

Returns the partial Gromov-Wasserstein discrepancy between \((\mathbf{C_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{q})\)

The function solves the following optimization problem:

\[PGW = \min_{\gamma} \quad \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l})\cdot \gamma_{i,j}\cdot\gamma_{k,l} + \mathrm{reg} \cdot\Omega(\gamma)\]
\[ \begin{align}\begin{aligned}s.t. \ \gamma &\geq 0\\ \gamma \mathbf{1} &\leq \mathbf{a}\\ \gamma^T \mathbf{1} &\leq \mathbf{b}\\ \mathbf{1}^T \gamma^T \mathbf{1} = m &\leq \min\{\|\mathbf{a}\|_1, \|\mathbf{b}\|_1\}\end{aligned}\end{align} \]

where :

  • \(\mathbf{C_1}\) is the metric cost matrix in the source space

  • \(\mathbf{C_2}\) is the metric cost matrix in the target space

  • \(\mathbf{p}\) and \(\mathbf{q}\) are the sample weights

  • L: Loss function to account for the misfit between the similarity matrices.

  • \(\Omega\) is the entropic regularization term, \(\Omega=\sum_{i,j} \gamma_{i,j}\log(\gamma_{i,j})\)

  • m is the amount of mass to be transported

The formulation of the GW problem has been proposed in [12] and the partial GW in [29]

Parameters:
  • C1 (ndarray, shape (ns, ns)) – Metric cost matrix in the source space

  • C2 (ndarray, shape (nt, nt)) – Metric cost matrix in the target space

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • q (array-like, shape (nt,), optional) – Distribution in the target space. If let to its default value None, uniform distribution is taken.

  • reg (float) – entropic regularization parameter

  • m (float, optional) – Amount of mass to be transported (default: \(\min\{\|\mathbf{p}\|_1, \|\mathbf{q}\|_1\}\))

  • loss_fun (str, optional) – Loss function used for the solver either ‘square_loss’ or ‘kl_loss’.

  • G0 (ndarray, shape (ns, nt), optional) – Initialization of the transportation matrix

  • numItermax (int, optional) – Max number of iterations

  • tol (float, optional) – Stop threshold on error (>0)

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • log (bool, optional) – return log if True

  • verbose (bool, optional) – Print information along iterations

Returns:

  • partial_gw_dist (float) – Partial Gromov-Wasserstein distance

  • log (dict) – log dictionary returned only if log is True

Examples

>>> from ot.gromov import entropic_partial_gromov_wasserstein2
>>> import scipy as sp
>>> a = np.array([0.25] * 4)
>>> b = np.array([0.25] * 4)
>>> x = np.array([1,2,100,200]).reshape((-1,1))
>>> y = np.array([3,2,98,199]).reshape((-1,1))
>>> C1 = sp.spatial.distance.cdist(x, x)
>>> C2 = sp.spatial.distance.cdist(y, y)
>>> np.round(entropic_partial_gromov_wasserstein2(C1, C2, a, b, 1e2), 2)
3.75

References

ot.gromov.entropic_semirelaxed_fused_gromov_wasserstein(M, C1, C2, p=None, loss_fun='square_loss', symmetric=None, epsilon=0.1, alpha=0.5, G0=None, max_iter=10000.0, tol=1e-09, log=False, verbose=False, random_state=0)[source]

Computes the entropic-regularized semi-relaxed FGW transport between two graphs (see [48]) estimated using a Mirror Descent algorithm following the KL geometry.

\[ \begin{align}\begin{aligned}\mathbf{T}^* \in \mathop{\arg \min}_{\mathbf{T}} \quad (1 - \alpha) \langle \mathbf{T}, \mathbf{M} \rangle_F + \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

where :

  • \(\mathbf{M}\) is the (ns, nt) metric cost matrix between features

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • \(\mathbf{p}\) source weights (sum to 1)

  • L is a loss function to account for the misfit between the similarity matrices

Note

This function is backend-compatible and will work on arrays from all compatible backends. However all the steps in the conditional gradient are not differentiable.

The algorithm used for solving the problem is conditional gradient as discussed in [48]

Parameters:
  • M (array-like, shape (ns, nt)) – Metric cost matrix between features across domains

  • C1 (array-like, shape (ns, ns)) – Metric cost matrix representative of the structure in the source space

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix representative of the structure in the target space

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • loss_fun (str) – loss function used for the solver either ‘square_loss’ or ‘kl_loss’.

  • epsilon (float) – Regularization term >0

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • alpha (float, optional) – Trade-off parameter (0 < alpha < 1)

  • G0 (array-like of shape (ns,nt) or string, optional) – If G0=None the initial transport plan of the solver is \(\mathbf{p} \frac{\mathbf{1}_{nt}}{nt}^\top\). If G0 is a tensor it must satisfy marginal constraints and will be used as initial transport of the solver. if G0 is a string it will be interpreted as a method for ot.gromov.semirelaxed_init_plan() taking values in “product”, “random_product”, “random”, “fluid”, “fluid_soft”, “spectral”, “spectral_soft”, “kmeans”, “kmeans_soft”.

  • max_iter (int, optional) – Max number of iterations

  • tol (float, optional) – Stop threshold on error computed on transport plans

  • log (bool, optional) – record log if True

  • verbose (bool, optional) – Print information along iterations

  • random_state (int, optional) – Random seed used in stochastic initialization methods.

Returns:

  • G (array-like, shape (ns, nt)) – Optimal transportation matrix for the given parameters.

  • log (dict) – Log dictionary return only if log==True in parameters.

References

ot.gromov.entropic_semirelaxed_fused_gromov_wasserstein2(M, C1, C2, p=None, loss_fun='square_loss', symmetric=None, epsilon=0.1, alpha=0.5, G0=None, max_iter=10000.0, tol=1e-09, log=False, verbose=False, random_state=0)[source]

Computes the entropic-regularized semi-relaxed FGW divergence between two graphs (see [48]) estimated using a Mirror Descent algorithm following the KL geometry.

\[ \begin{align}\begin{aligned}\mathbf{srFGW}_{\alpha} = \min_{\mathbf{T}} \quad (1 - \alpha) \langle \mathbf{T}, \mathbf{M} \rangle_F + \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

where :

  • \(\mathbf{M}\) is the (ns, nt) metric cost matrix between features

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • \(\mathbf{p}\) source weights (sum to 1)

  • L is a loss function to account for the misfit between the similarity matrices

Note

This function is backend-compatible and will work on arrays from all compatible backends. However all the steps in the conditional gradient are not differentiable.

The algorithm used for solving the problem is conditional gradient as discussed in [48]

Parameters:
  • M (array-like, shape (ns, nt)) – Metric cost matrix between features across domains

  • C1 (array-like, shape (ns, ns)) – Metric cost matrix representative of the structure in the source space.

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix representative of the structure in the target space.

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • loss_fun (str, optional) – loss function used for the solver either ‘square_loss’ or ‘kl_loss’.

  • epsilon (float) – Regularization term >0

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • alpha (float, optional) – Trade-off parameter (0 < alpha < 1)

  • G0 (array-like of shape (ns,nt) or string, optional) – If G0=None the initial transport plan of the solver is \(\mathbf{p} \frac{\mathbf{1}_{nt}}{nt}^\top\). If G0 is a tensor it must satisfy marginal constraints and will be used as initial transport of the solver. if G0 is a string it will be interpreted as a method for ot.gromov.semirelaxed_init_plan() taking values in “product”, “random_product”, “random”, “fluid”, “fluid_soft”, “spectral”, “spectral_soft”, “kmeans”, “kmeans_soft”.

  • max_iter (int, optional) – Max number of iterations

  • tol (float, optional) – Stop threshold on error computed on transport plans

  • log (bool, optional) – record log if True

  • verbose (bool, optional) – Print information along iterations

  • random_state (int, optional) – Random seed used in stochastic initialization methods.

Returns:

  • srfgw-divergence (float) – Semi-relaxed Fused gromov wasserstein divergence for the given parameters.

  • log (dict) – Log dictionary return only if log==True in parameters.

References

ot.gromov.entropic_semirelaxed_gromov_wasserstein(C1, C2, p=None, loss_fun='square_loss', epsilon=0.1, symmetric=None, G0=None, max_iter=10000.0, tol=1e-09, log=False, verbose=False, random_state=0)[source]

Returns the entropic-regularized semi-relaxed gromov-wasserstein divergence transport plan from \((\mathbf{C_1}, \mathbf{p})\) to \(\mathbf{C_2}\) estimated using a Mirror Descent algorithm following the KL geometry.

The function solves the following optimization problem:

\[ \begin{align}\begin{aligned}\mathbf{T}^* \in \mathop{\arg \min}_\mathbf{T} \quad \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

Where :

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • \(\mathbf{p}\): distribution in the source space

  • L: loss function to account for the misfit between the similarity matrices

Note

This function is backend-compatible and will work on arrays from all compatible backends. However all the steps in the conditional gradient are not differentiable.

Parameters:
  • C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • loss_fun (str) – loss function used for the solver either ‘square_loss’ or ‘kl_loss’.

  • epsilon (float) – Regularization term >0

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • verbose (bool, optional) – Print information along iterations

  • G0 (array-like of shape (ns,nt) or string, optional) – If G0=None the initial transport plan of the solver is \(\mathbf{p} \frac{\mathbf{1}_{nt}}{nt}^\top\). If G0 is a tensor it must satisfy marginal constraints and will be used as initial transport of the solver. if G0 is a string it will be interpreted as a method for ot.gromov.semirelaxed_init_plan() taking values in “product”, “random_product”, “random”, “fluid”, “fluid_soft”, “spectral”, “spectral_soft”, “kmeans”, “kmeans_soft”.

  • max_iter (int, optional) – Max number of iterations

  • tol (float, optional) – Stop threshold on error computed on transport plans

  • log (bool, optional) – record log if True

  • verbose – Print information along iterations

  • random_state (int, optional) – Random seed used in stochastic initialization methods.

Returns:

  • G (array-like, shape (ns, nt)) –

    Coupling between the two spaces that minimizes:

    \(\sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\)

  • log (dict) – Convergence information and loss.

References

ot.gromov.entropic_semirelaxed_gromov_wasserstein2(C1, C2, p=None, loss_fun='square_loss', epsilon=0.1, symmetric=None, G0=None, max_iter=10000.0, tol=1e-09, log=False, verbose=False, random_state=0, **kwargs)[source]

Returns the entropic-regularized semi-relaxed gromov-wasserstein divergence from \((\mathbf{C_1}, \mathbf{p})\) to \(\mathbf{C_2}\) estimated using a Mirror Descent algorithm following the KL geometry.

The function solves the following optimization problem:

\[ \begin{align}\begin{aligned}\mathbf{srGW} = \min_{\mathbf{T}} \quad \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

Where :

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • \(\mathbf{p}\): distribution in the source space

  • L: loss function to account for the misfit between the similarity matrices

Note that when using backends, this loss function is differentiable wrt the matrices (C1, C2) but not yet for the weights p. .. note:: This function is backend-compatible and will work on arrays

from all compatible backends. However all the steps in the conditional gradient are not differentiable.

Parameters:
  • C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • loss_fun (str) – loss function used for the solver either ‘square_loss’ or ‘kl_loss’.

  • epsilon (float) – Regularization term >0

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • verbose (bool, optional) – Print information along iterations

  • G0 (array-like of shape (ns,nt) or string, optional) – If G0=None the initial transport plan of the solver is \(\mathbf{p} \frac{\mathbf{1}_{nt}}{nt}^\top\). If G0 is a tensor it must satisfy marginal constraints and will be used as initial transport of the solver. if G0 is a string it will be interpreted as a method for ot.gromov.semirelaxed_init_plan() taking values in “product”, “random_product”, “random”, “fluid”, “fluid_soft”, “spectral”, “spectral_soft”, “kmeans”, “kmeans_soft”.

  • max_iter (int, optional) – Max number of iterations

  • tol (float, optional) – Stop threshold on error computed on transport plans

  • log (bool, optional) – record log if True

  • verbose – Print information along iterations

  • random_state (int, optional) – Random seed used in stochastic initialization methods.

Returns:

  • srgw (float) – Semi-relaxed Gromov-Wasserstein divergence

  • log (dict) – convergence information and Coupling matrix

References

ot.gromov.fgw_barycenters(N, Ys, Cs, ps=None, lambdas=None, alpha=0.5, fixed_structure=False, fixed_features=False, p=None, loss_fun='square_loss', armijo=False, symmetric=True, max_iter=100, tol=1e-09, stop_criterion='barycenter', warmstartT=False, verbose=False, log=False, init_C=None, init_X=None, random_state=None, **kwargs)[source]

Returns the Fused Gromov-Wasserstein barycenters of S measurable networks with node features \((\mathbf{C}_s, \mathbf{Y}_s, \mathbf{p}_s)_{1 \leq s \leq S}\) (see eq (5) in [24]), estimated using Fused Gromov-Wasserstein transports from Conditional Gradient solvers.

The function solves the following optimization problem:

\[\mathbf{C}^*, \mathbf{Y}^* = \mathop{\arg \min}_{\mathbf{C}\in \mathbb{R}^{N \times N}, \mathbf{Y}\in \mathbb{Y}^{N \times d}} \quad \sum_s \lambda_s \mathrm{FGW}_{\alpha}(\mathbf{C}, \mathbf{C}_s, \mathbf{Y}, \mathbf{Y}_s, \mathbf{p}, \mathbf{p}_s)\]

Where :

  • \(\mathbf{Y}_s\): feature matrix

  • \(\mathbf{C}_s\): metric cost matrix

  • \(\mathbf{p}_s\): distribution

Parameters:
  • N (int) – Desired number of samples of the target barycenter

  • Ys (list of array-like, each element has shape (ns,d)) – Features of all samples

  • Cs (list of array-like, each element has shape (ns,ns)) – Structure matrices of all samples

  • ps (list of array-like, each element has shape (ns,), optional) – Masses of all samples. If let to its default value None, uniform distributions are taken.

  • lambdas (list of float, optional) – List of the S spaces’ weights. If let to its default value None, uniform weights are taken.

  • alpha (float, optional) – Alpha parameter for the fgw distance.

  • fixed_structure (bool, optional) – Whether to fix the structure of the barycenter during the updates.

  • fixed_features (bool, optional) – Whether to fix the feature of the barycenter during the updates

  • p (array-like, shape (N,), optional) – Weights in the targeted barycenter. If let to its default value None, uniform distribution is taken.

  • loss_fun (str, optional) – Loss function used for the solver either ‘square_loss’ or ‘kl_loss’

  • armijo (bool, optional) – If True the step of the line-search is found via an armijo research. Else closed form is used. If there are convergence issues use False.

  • symmetric (bool, optional) – Either structures are to be assumed symmetric or not. Default value is True. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • max_iter (int, optional) – Max number of iterations

  • tol (float, optional) – Stop threshold on relative error (>0)

  • stop_criterion (str, optional. Default is 'barycenter'.) – Stop criterion taking values in [‘barycenter’, ‘loss’]. If set to ‘barycenter’ uses absolute norm variations of estimated barycenters. Else if set to ‘loss’ uses the relative variations of the loss.

  • warmstartT (bool, optional) – Either to perform warmstart of transport plans in the successive fused gromov-wasserstein transport problems.

  • verbose (bool, optional) – Print information along iterations.

  • log (bool, optional) – Record log if True.

  • init_C (array-like, shape (N,N), optional) – Initialization for the barycenters’ structure matrix. If not set a random init is used.

  • init_X (array-like, shape (N,d), optional) – Initialization for the barycenters’ features. If not set a random init is used.

  • random_state (int or RandomState instance, optional) – Fix the seed for reproducibility

Returns:

  • X (array-like, shape (N, d)) – Barycenters’ features

  • C (array-like, shape (N, N)) – Barycenters’ structure matrix

  • log (dict) – Only returned when log=True. It contains the keys:

    • \(\mathbf{T}\): list of (N, ns) transport matrices

    • \(\mathbf{p}\): (N,) barycenter weights

    • \((\mathbf{M}_s)_s\): all distance matrices between the feature of the barycenter and the other features \((dist(\mathbf{X}, \mathbf{Y}_s))_s\) shape (N, ns)

    • values used in convergence evaluation.

  • .. _references-fgw-barycenters

References

ot.gromov.format_partitioned_graph(C, p, part, rep_indices, F=None, M=None, alpha=1.0, nx=None)[source]

Format an attributed graph \((\mathbf{C}, \mathbf{F}, \mathbf{p})\) with structure matrix \((\mathbf{C} \in R^{n \times n}\), feature matrix \((\mathbf{F} \in R^{n \times d}\) and node relative importance \((\mathbf{p} \in \Sigma_n\), into a partitioned attributed graph taking into account partitions and representants \(\mathcal{P} = \left{(\mathbf{P_{i}}, \mathbf{r_{i}})\right}_i\).

Parameters:
  • C (array-like, shape (n, n)) – Structure matrix.

  • p (array-like, shape (n,),) – Node distribution.

  • part (array-like, shape (n,)) – Array of partition assignment for each node.

  • rep_indices (list of array-like of ints, shape (npart,)) – indices for representative node of each partition sorted according to partition identifiers.

  • F (array-like, shape (n, d), optional. (Default is None)) – Optional feature matrix aligned with the graph structure.

  • M (array-like, shape (n, n), optional. (Default is None)) – Optional pairwise similarity matrix between features.

  • alpha (float, optional. Default is 1.) – Trade-off parameter in \(]0, 1]\) between structure and features. If alpha = 1 features are ignored. This trade-off is taken into account into the outputted relations between nodes and representants.

  • nx (backend, optional) – POT backend

Returns:

  • CR (array-like, shape (npart, npart)) – Structure matrix between partition representants.

  • list_R (list of npart arrays,) – List of relations between a representant and nodes in its partition, for each partition.

  • list_p (list of npart arrays,) – List of node distributions within each partition.

  • FR (array-like, shape (npart, d), if F != None.) – Feature matrix of representants.

References

ot.gromov.format_partitioned_samples(X, p, part, rep_indices, F=None, alpha=1.0, nx=None)[source]

Format an attributed graph \((\mathbf{D}(\mathbf{X}), \mathbf{F}, \mathbf{p})\) with euclidean structure matrix \((\mathbf{D}(\mathbf{X}) \in R^{n \times n}\), feature matrix \((\mathbf{F} \in R^{n \times d}\) and node relative importance \((\mathbf{p} \in \Sigma_n\), into a partitioned attributed graph taking into account partitions and representants \(\mathcal{P} = \left{(\mathbf{P_{i}}, \mathbf{r_{i}})\right}_i\).

Parameters:
  • X (array-like, shape (n, d)) – Structure matrix.

  • p (array-like, shape (n,),) – Node distribution.

  • part (array-like, shape (n,)) – Array of partition assignment for each node.

  • rep_indices (list of array-like of ints, shape (npart,)) – indices for representative node of each partition sorted according to partition identifiers.

  • F (array-like, shape (n, p), optional. (Default is None)) – Optional feature matrix aligned with the samples.

  • alpha (float, optional. Default is 1.) – Trade-off parameter in \(]0, 1]\) between structure and features. If alpha = 1 features are ignored. This trade-off is taken into account into the outputted relations between nodes and representants.

  • nx (backend, optional) – POT backend

Returns:

  • CR (array-like, shape (npart, npart)) – Structure matrix between partition representants.

  • list_R (list of npart arrays,) – List of relations between a representant and nodes in its partition, for each partition.

  • list_p (list of npart arrays,) – List of node distributions within each partition.

  • FR (array-like, shape (npart, d), if F != None.) – Feature matrix of representants.

References

ot.gromov.fused_gromov_wasserstein(M, C1, C2, p=None, q=None, loss_fun='square_loss', symmetric=None, alpha=0.5, armijo=False, G0=None, log=False, max_iter=10000.0, tol_rel=1e-09, tol_abs=1e-09, **kwargs)[source]

Returns the Fused Gromov-Wasserstein transport between \((\mathbf{C_1}, \mathbf{Y_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{Y_2}, \mathbf{q})\) with pairwise distance matrix \(\mathbf{M}\) between node feature matrices \(\mathbf{Y_1}\) and \(\mathbf{Y_2}\) (see [24]).

The function solves the following optimization problem using Conditional Gradient:

\[ \begin{align}\begin{aligned}\mathbf{T}^* \in\mathop{\arg\min}_\mathbf{T} \quad (1 - \alpha) \langle \mathbf{T}, \mathbf{M} \rangle_F + \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

Where :

  • \(\mathbf{M}\): metric cost matrix between features across domains

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • \(\mathbf{p}\): distribution in the source space

  • \(\mathbf{q}\): distribution in the target space

  • L: loss function to account for the misfit between the similarity and feature matrices

  • \(\alpha\): trade-off parameter

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

Note

All computations in the conjugate gradient solver are done with numpy to limit memory overhead.

Note

This function will cast the computed transport plan to the data type of the provided input \(\mathbf{M}\). Casting to an integer tensor might result in a loss of precision. If this behaviour is unwanted, please make sure to provide a floating point input.

Parameters:
  • M (array-like, shape (ns, nt)) – Metric cost matrix between features across domains

  • C1 (array-like, shape (ns, ns)) – Metric cost matrix representative of the structure in the source space

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix representative of the structure in the target space

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • q (array-like, shape (nt,), optional) – Distribution in the target space. If let to its default value None, uniform distribution is taken.

  • loss_fun (str, optional) – Loss function used for the solver

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • alpha (float, optional) – Trade-off parameter (0 < alpha < 1)

  • armijo (bool, optional) – If True the step of the line-search is found via an armijo research. Else closed form is used. If there are convergence issues use False.

  • G0 (array-like, shape (ns,nt), optional) – If None the initial transport plan of the solver is pq^T. Otherwise G0 must satisfy marginal constraints and will be used as initial transport of the solver.

  • log (bool, optional) – record log if True

  • max_iter (int, optional) – Max number of iterations

  • tol_rel (float, optional) – Stop threshold on relative error (>0)

  • tol_abs (float, optional) – Stop threshold on absolute error (>0)

  • **kwargs (dict) – parameters can be directly passed to the ot.optim.cg solver

Returns:

  • T (array-like, shape (ns, nt)) – Optimal transportation matrix for the given parameters.

  • log (dict) – Log dictionary return only if log==True in parameters.

References

ot.gromov.fused_gromov_wasserstein2(M, C1, C2, p=None, q=None, loss_fun='square_loss', symmetric=None, alpha=0.5, armijo=False, G0=None, log=False, max_iter=10000.0, tol_rel=1e-09, tol_abs=1e-09, **kwargs)[source]

Returns the Fused Gromov-Wasserstein distance between \((\mathbf{C_1}, \mathbf{Y_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{Y_2}, \mathbf{q})\) with pairwise distance matrix \(\mathbf{M}\) between node feature matrices \(\mathbf{Y_1}\) and \(\mathbf{Y_2}\) (see [24]).

The function solves the following optimization problem using Conditional Gradient:

\[ \begin{align}\begin{aligned}\mathbf{FGW} = \mathop{\min}_\mathbf{T} \quad (1 - \alpha) \langle \mathbf{T}, \mathbf{M} \rangle_F + \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

Where :

  • \(\mathbf{M}\): metric cost matrix between features across domains

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • \(\mathbf{p}\): distribution in the source space

  • \(\mathbf{q}\): distribution in the target space

  • L: loss function to account for the misfit between the similarity and feature matrices

  • \(\alpha\): trade-off parameter

Note that when using backends, this loss function is differentiable wrt the matrices (C1, C2, M) and weights (p, q) for quadratic loss using the gradients from [38]_.

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

Note

All computations in the conjugate gradient solver are done with numpy to limit memory overhead.

Note

This function will cast the computed transport plan to the data type of the provided input \(\mathbf{M}\). Casting to an integer tensor might result in a loss of precision. If this behaviour is unwanted, please make sure to provide a floating point input.

Parameters:
  • M (array-like, shape (ns, nt)) – Metric cost matrix between features across domains

  • C1 (array-like, shape (ns, ns)) – Metric cost matrix representative of the structure in the source space.

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix representative of the structure in the target space.

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • q (array-like, shape (nt,), optional) – Distribution in the target space. If let to its default value None, uniform distribution is taken.

  • loss_fun (str, optional) – Loss function used for the solver.

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • alpha (float, optional) – Trade-off parameter (0 < alpha < 1)

  • armijo (bool, optional) – If True the step of the line-search is found via an armijo research. Else closed form is used. If there are convergence issues use False.

  • G0 (array-like, shape (ns,nt), optional) – If None the initial transport plan of the solver is pq^T. Otherwise G0 must satisfy marginal constraints and will be used as initial transport of the solver.

  • log (bool, optional) – Record log if True.

  • max_iter (int, optional) – Max number of iterations

  • tol_rel (float, optional) – Stop threshold on relative error (>0)

  • tol_abs (float, optional) – Stop threshold on absolute error (>0)

  • **kwargs (dict) – Parameters can be directly passed to the ot.optim.cg solver.

Returns:

  • fgw-distance (float) – Fused Gromov-Wasserstein distance for the given parameters.

  • log (dict) – Log dictionary return only if log==True in parameters.

References

ot.gromov.fused_gromov_wasserstein_dictionary_learning(Cs, Ys, D, nt, alpha, reg=0.0, ps=None, q=None, epochs=20, batch_size=32, learning_rate_C=1.0, learning_rate_Y=1.0, Cdict_init=None, Ydict_init=None, projection='nonnegative_symmetric', use_log=False, tol_outer=1e-05, tol_inner=1e-05, max_iter_outer=20, max_iter_inner=200, use_adam_optimizer=True, verbose=False, random_state=None, **kwargs)[source]

Infer Fused Gromov-Wasserstein linear dictionary \(\{ (\mathbf{C_{dict}[d]}, \mathbf{Y_{dict}[d]}, \mathbf{q}) \}_{d \in [D]}\) from the list of S attributed structures \(\{ (\mathbf{C_s}, \mathbf{Y_s},\mathbf{p_s}) \}_s\)

\[\begin{split}\min_{\mathbf{C_{dict}},\mathbf{Y_{dict}}, \{\mathbf{w_s}\}_{s}} \sum_{s=1}^S FGW_{2,\alpha}(\mathbf{C_s}, \mathbf{Y_s}, \sum_{d=1}^D w_{s,d}\mathbf{C_{dict}[d]},\sum_{d=1}^D w_{s,d}\mathbf{Y_{dict}[d]}, \mathbf{p_s}, \mathbf{q}) \\ - reg\| \mathbf{w_s} \|_2^2\end{split}\]

Such that \(\forall s \leq S\) :

  • \(\mathbf{w_s}^\top \mathbf{1}_D = 1\)

  • \(\mathbf{w_s} \geq \mathbf{0}_D\)

Where :

  • \(\forall s \leq S, \mathbf{C_s}\) is a (ns,ns) pairwise similarity matrix of variable size ns.

  • \(\forall s \leq S, \mathbf{Y_s}\) is a (ns,d) features matrix of variable size ns and fixed dimension d.

  • \(\mathbf{C_{dict}}\) is a (D, nt, nt) tensor of D pairwise similarity matrix of fixed size nt.

  • \(\mathbf{Y_{dict}}\) is a (D, nt, d) tensor of D features matrix of fixed size nt and fixed dimension d.

  • \(\forall s \leq S, \mathbf{p_s}\) is the source distribution corresponding to \(\mathbf{C_s}\)

  • \(\mathbf{q}\) is the target distribution assigned to every structures in the embedding space.

  • \(\alpha\) is the trade-off parameter of Fused Gromov-Wasserstein

  • reg is the regularization coefficient.

The stochastic algorithm used for estimating the attributed graph dictionary atoms as proposed in [38]_

Parameters:
  • Cs (list of S symmetric array-like, shape (ns, ns)) – List of Metric/Graph cost matrices of variable size (ns,ns).

  • Ys (list of S array-like, shape (ns, d)) – List of feature matrix of variable size (ns,d) with d fixed.

  • D (int) – Number of dictionary atoms to learn

  • nt (int) – Number of samples within each dictionary atoms

  • alpha (float) – Trade-off parameter of Fused Gromov-Wasserstein

  • reg (float, optional) – Coefficient of the negative quadratic regularization used to promote sparsity of w. The default is 0.

  • ps (list of S array-like, shape (ns,), optional) – Distribution in each source space C of Cs. Default is None and corresponds to uniform distributions.

  • q (array-like, shape (nt,), optional) – Distribution in the embedding space whose structure will be learned. Default is None and corresponds to uniform distributions.

  • epochs (int, optional) – Number of epochs used to learn the dictionary. Default is 32.

  • batch_size (int, optional) – Batch size for each stochastic gradient update of the dictionary. Set to the dataset size if the provided batch_size is higher than the dataset size. Default is 32.

  • learning_rate_C (float, optional) – Learning rate used for the stochastic gradient descent on Cdict. Default is 1.

  • learning_rate_Y (float, optional) – Learning rate used for the stochastic gradient descent on Ydict. Default is 1.

  • Cdict_init (list of D array-like with shape (nt, nt), optional) – Used to initialize the dictionary structures Cdict. If set to None (Default), the dictionary will be initialized randomly. Else Cdict must have shape (D, nt, nt) i.e match provided shape features.

  • Ydict_init (list of D array-like with shape (nt, d), optional) – Used to initialize the dictionary features Ydict. If set to None, the dictionary features will be initialized randomly. Else Ydict must have shape (D, nt, d) where d is the features dimension of inputs Ys and also match provided shape features.

  • projection (str, optional) – If ‘nonnegative’ and/or ‘symmetric’ is in projection, the corresponding projection will be performed at each stochastic update of the dictionary Else the set of atoms is \(R^{nt * nt}\). Default is ‘nonnegative_symmetric’

  • log (bool, optional) – If set to True, losses evolution by batches and epochs are tracked. Default is False.

  • use_adam_optimizer (bool, optional) – If set to True, adam optimizer with default settings is used as adaptative learning rate strategy. Else perform SGD with fixed learning rate. Default is True.

  • tol_outer (float, optional) – Solver precision for the BCD algorithm, measured by absolute relative error on consecutive losses. Default is \(10^{-5}\).

  • tol_inner (float, optional) – Solver precision for the Conjugate Gradient algorithm used to get optimal w at a fixed transport, measured by absolute relative error on consecutive losses. Default is \(10^{-5}\).

  • max_iter_outer (int, optional) – Maximum number of iterations for the BCD. Default is 20.

  • max_iter_inner (int, optional) – Maximum number of iterations for the Conjugate Gradient. Default is 200.

  • verbose (bool, optional) – Print the reconstruction loss every epoch. Default is False.

  • random_state (int, RandomState instance or None, default=None) – Determines random number generation. Pass an int for reproducible output across multiple function calls.

Returns:

  • Cdict_best_state (D array-like, shape (D,nt,nt)) – Metric/Graph cost matrices composing the dictionary. The dictionary leading to the best loss over an epoch is saved and returned.

  • Ydict_best_state (D array-like, shape (D,nt,d)) – Feature matrices composing the dictionary. The dictionary leading to the best loss over an epoch is saved and returned.

  • log (dict) – If use_log is True, contains loss evolutions by batches and epochs.

References

ot.gromov.fused_gromov_wasserstein_linear_unmixing(C, Y, Cdict, Ydict, alpha, reg=0.0, p=None, q=None, tol_outer=1e-05, tol_inner=1e-05, max_iter_outer=20, max_iter_inner=200, symmetric=True, **kwargs)[source]

Returns the Fused Gromov-Wasserstein linear unmixing of \((\mathbf{C},\mathbf{Y},\mathbf{p})\) onto the attributed dictionary atoms \(\{ (\mathbf{C_{dict}[d]},\mathbf{Y_{dict}[d]}, \mathbf{q}) \}_{d \in [D]}\)

\[\min_{\mathbf{w}} FGW_{2,\alpha}(\mathbf{C},\mathbf{Y}, \sum_{d=1}^D w_d\mathbf{C_{dict}[d]},\sum_{d=1}^D w_d\mathbf{Y_{dict}[d]}, \mathbf{p}, \mathbf{q}) - reg \| \mathbf{w} \|_2^2\]

such that, \(\forall s \leq S\) :

  • \(\mathbf{w_s}^\top \mathbf{1}_D = 1\)

  • \(\mathbf{w_s} \geq \mathbf{0}_D\)

Where :

  • \(\mathbf{C}\) is a (ns,ns) pairwise similarity matrix of variable size ns.

  • \(\mathbf{Y}\) is a (ns,d) features matrix of variable size ns and fixed dimension d.

  • \(\mathbf{C_{dict}}\) is a (D, nt, nt) tensor of D pairwise similarity matrix of fixed size nt.

  • \(\mathbf{Y_{dict}}\) is a (D, nt, d) tensor of D features matrix of fixed size nt and fixed dimension d.

  • \(\mathbf{p}\) is the source distribution corresponding to \(\mathbf{C_s}\)

  • \(\mathbf{q}\) is the target distribution assigned to every structures in the embedding space.

  • \(\alpha\) is the trade-off parameter of Fused Gromov-Wasserstein

  • reg is the regularization coefficient.

The algorithm used for solving the problem is a Block Coordinate Descent as discussed in [38]_, algorithm 6.

Parameters:
  • C (array-like, shape (ns, ns)) – Metric/Graph cost matrix.

  • Y (array-like, shape (ns, d)) – Feature matrix.

  • Cdict (D array-like, shape (D,nt,nt)) – Metric/Graph cost matrices composing the dictionary on which to embed (C,Y).

  • Ydict (D array-like, shape (D,nt,d)) – Feature matrices composing the dictionary on which to embed (C,Y).

  • alpha (float,) – Trade-off parameter of Fused Gromov-Wasserstein.

  • reg (float, optional) – Coefficient of the negative quadratic regularization used to promote sparsity of w. The default is 0.

  • p (array-like, shape (ns,), optional) – Distribution in the source space C. Default is None and corresponds to uniform distribution.

  • q (array-like, shape (nt,), optional) – Distribution in the space depicted by the dictionary. Default is None and corresponds to uniform distribution.

  • tol_outer (float, optional) – Solver precision for the BCD algorithm.

  • tol_inner (float, optional) – Solver precision for the Conjugate Gradient algorithm used to get optimal w at a fixed transport. Default is \(10^{-5}\).

  • max_iter_outer (int, optional) – Maximum number of iterations for the BCD. Default is 20.

  • max_iter_inner (int, optional) – Maximum number of iterations for the Conjugate Gradient. Default is 200.

Returns:

  • w (array-like, shape (D,)) – fused Gromov-Wasserstein linear unmixing of (C,Y,p) onto the span of the dictionary.

  • Cembedded (array-like, shape (nt,nt)) – embedded structure of \((\mathbf{C},\mathbf{Y}, \mathbf{p})\) onto the dictionary, \(\sum_d w_d\mathbf{C_{dict}[d]}\).

  • Yembedded (array-like, shape (nt,d)) – embedded features of \((\mathbf{C},\mathbf{Y}, \mathbf{p})\) onto the dictionary, \(\sum_d w_d\mathbf{Y_{dict}[d]}\).

  • T (array-like (ns,nt)) – Fused Gromov-Wasserstein transport plan between \((\mathbf{C},\mathbf{p})\) and \((\sum_d w_d\mathbf{C_{dict}[d]}, \sum_d w_d\mathbf{Y_{dict}[d]},\mathbf{q})\).

  • current_loss (float) – reconstruction error

References

ot.gromov.fused_unbalanced_across_spaces_cost(M_linear, data, tuple_pxy_samp, tuple_pxy_feat, pi_samp, pi_feat, hyperparams, divergence, reg_type, nx)[source]

Return the fused unbalanced across-space divergence between two spaces

Parameters:
  • M_linear (tuple of arrays) – Pair of cost matrices corresponding to the Wasserstein terms w.r.t sample and feature couplings

  • data (tuple of arrays) – Tuple of input spaces represented as matrices

  • tuple_pxy_samp (tuple of arrays) – Tuple of reference measures in the marginal-relaxation and regularization terms w.r.t the sample coupling

  • tuple_pxy_feat (tuple of arrays) – Tuple of reference measures in the marginal-relaxation and regularization terms w.r.t the feature coupling

  • pi_samp (array-like) – Sample coupling

  • pi_feat (array-like) – Feature coupling

  • hyperparams (tuple of floats) – Hyperparameters of marginal-relaxation and regularization terms in the fused unbalanced across-domain divergence

  • divergence (string, default = "kl") – Bregman divergence, either “kl” (Kullback-Leibler divergence) or “l2” (half-squared L2 divergence)

  • reg_type (string,) –

    Type of regularization term in the fused unbalanced across-domain divergence

    • reg_type = “joint” corresponds to FUGW

    • reg_type = “independent” corresponds to UCOOT

  • nx (backend, optional) – If let to its default value None, a backend test will be conducted.

Return type:

Fused unbalanced across-space divergence between two spaces

ot.gromov.fused_unbalanced_across_spaces_divergence(X, Y, wx_samp=None, wx_feat=None, wy_samp=None, wy_feat=None, reg_marginals=10, epsilon=0, reg_type='joint', divergence='kl', unbalanced_solver='sinkhorn', alpha=0, M_samp=None, M_feat=None, rescale_plan=True, init_pi=None, init_duals=None, max_iter=100, tol=1e-07, max_iter_ot=500, tol_ot=1e-07, log=False, verbose=False, **kwargs_solver)[source]

Compute the fused unbalanced cross-spaces divergence between two matrices equipped with the distributions on rows and columns. We consider two cases of matrix:

  • (Squared) similarity matrix in Gromov-Wasserstein setting,

whose rows and columns represent the samples.

  • Arbitrary-size matrix in Co-Optimal Transport setting,

whose rows represent samples, and columns represent corresponding features/dimensions.

More precisely, this function returns the sample and feature transport plans between \((\mathbf{X}, \mathbf{w}_{xs}, \mathbf{w}_{xf})\) and \((\mathbf{Y}, \mathbf{w}_{ys}, \mathbf{w}_{yf})\), by solving the following problem using Block Coordinate Descent algorithm:

\[\begin{split}\mathop{\arg \min}_{\mathbf{P}, \mathbf{Q}} &\quad \sum_{i,j,k,l} (\mathbf{X}_{i,k} - \mathbf{Y}_{j,l})^2 \mathbf{P}_{i,j} \mathbf{Q}_{k,l} \\ &+ \rho_s \mathbf{Div}(\mathbf{P}_{\# 1} \mathbf{Q}_{\# 1}^T | \mathbf{w}_{xs} \mathbf{w}_{ys}^T) + \rho_f \mathbf{Div}(\mathbf{P}_{\# 2} \mathbf{Q}_{\# 2}^T | \mathbf{w}_{xf} \mathbf{w}_{yf}^T) \\ &+ \alpha_s \sum_{i,j} \mathbf{P}_{i,j} \mathbf{M^{(s)}}_{i, j} + \alpha_f \sum_{k, l} \mathbf{Q}_{k,l} \mathbf{M^{(f)}}_{k, l} + \mathbf{Reg}(\mathbf{P}, \mathbf{Q})\end{split}\]

Where:

  • \(\mathbf{X}\): Source input (arbitrary-size) matrix

  • \(\mathbf{Y}\): Target input (arbitrary-size) matrix

  • \(\mathbf{M^{(s)}}\): Additional sample matrix

  • \(\mathbf{M^{(f)}}\): Additional feature matrix

  • \(\mathbf{w}_{xs}\): Distribution of the samples in the source space

  • \(\mathbf{w}_{xf}\): Distribution of the features in the source space

  • \(\mathbf{w}_{ys}\): Distribution of the samples in the target space

  • \(\mathbf{w}_{yf}\): Distribution of the features in the target space

  • \(\mathbf{Div}\): Either Kullback-Leibler divergence or half-squared L2 norm.

  • \(\mathbf{Reg}\): Regularizer for sample and feature couplings.

We consider two types of regularizer:
  • Independent regularization used in unbalanced Co-Optimal Transport

\[\mathbf{Reg}(\mathbf{P}, \mathbf{Q}) = \varepsilon_s \mathbf{Div}(\mathbf{P} | \mathbf{w}_{xs} \mathbf{w}_{ys}^T) + \varepsilon_f \mathbf{Div}(\mathbf{Q} | \mathbf{w}_{xf} \mathbf{w}_{yf}^T)\]
  • Joint regularization used in fused unbalanced Gromov-Wasserstein

\[\mathbf{Reg}(\mathbf{P}, \mathbf{Q}) = \varepsilon \mathbf{Div}(\mathbf{P} \otimes \mathbf{Q} | (\mathbf{w}_{xs} \mathbf{w}_{ys}^T) \otimes (\mathbf{w}_{xf} \mathbf{w}_{yf}^T) )\]

Note

This function allows epsilon to be zero. In that case, unbalanced_method must be either “mm” or “lbfgsb”.

Parameters:
  • X ((n_sample_x, n_feature_x) array-like, float) – Source input matrix.

  • Y ((n_sample_y, n_feature_y) array-like, float) – Target input matrix.

  • wx_samp ((n_sample_x, ) array-like, float, optional (default = None)) – Histogram assigned on rows (samples) of matrix X. Uniform distribution by default.

  • wx_feat ((n_feature_x, ) array-like, float, optional (default = None)) – Histogram assigned on columns (features) of matrix X. Uniform distribution by default.

  • wy_samp ((n_sample_y, ) array-like, float, optional (default = None)) – Histogram assigned on rows (samples) of matrix Y. Uniform distribution by default.

  • wy_feat ((n_feature_y, ) array-like, float, optional (default = None)) – Histogram assigned on columns (features) of matrix Y. Uniform distribution by default.

  • reg_marginals (float or indexable object of length 1 or 2) – Marginal relaxation terms for sample and feature couplings. If reg_marginals is a scalar or an indexable object of length 1, then the same value is applied to both marginal relaxations.

  • epsilon (scalar or indexable object of length 2, float or int, optional (default = 0)) – Regularization parameters for entropic approximation of sample and feature couplings. Allow the case where epsilon contains 0. In that case, the MM solver is used by default instead of Sinkhorn solver. If epsilon is scalar, then the same value is applied to both regularization of sample and feature couplings.

  • reg_type (string, optional) –

    • If reg_type = “joint”: then use joint regularization for couplings.

    • If reg_type = “independent”: then use independent regularization for couplings.

  • divergence (string, optional (default = "kl")) –

    • If divergence = “kl”, then Div is the Kullback-Leibler divergence.

    • If divergence = “l2”, then Div is the half squared Euclidean norm.

  • unbalanced_solver (string, optional (default = "sinkhorn")) –

    Solver for the unbalanced OT subroutine.

    • If divergence = “kl”, then unbalanced_solver can be: “sinkhorn”, “sinkhorn_log”, “mm”, “lbfgsb”

    • If divergence = “l2”, then unbalanced_solver can be “mm”, “lbfgsb”

  • alpha (scalar or indexable object of length 2, float or int, optional (default = 0)) – Coeffficient parameter of linear terms with respect to the sample and feature couplings. If alpha is scalar, then the same alpha is applied to both linear terms.

  • M_samp ((n_sample_x, n_sample_y), float, optional (default = None)) – Sample matrix associated to the Wasserstein linear term on sample coupling.

  • M_feat ((n_feature_x, n_feature_y), float, optional (default = None)) – Feature matrix associated to the Wasserstein linear term on feature coupling.

  • rescale_plan (boolean, optional (default = True)) – If True, then rescale the sample and feature transport plans within each BCD iteration, so that they always have equal mass.

  • init_pi (tuple of two matrices of size (n_sample_x, n_sample_y) and) – (n_feature_x, n_feature_y), optional (default = None). Initialization of sample and feature couplings. Uniform distributions by default.

  • init_duals (tuple of two tuples ((n_sample_x, ), (n_sample_y, )) and ((n_feature_x, ), (n_feature_y, )), optional (default = None).) – Initialization of sample and feature dual vectors if using Sinkhorn algorithm. Zero vectors by default.

  • max_iter (int, optional (default = 100)) – Number of Block Coordinate Descent (BCD) iterations.

  • tol (float, optional (default = 1e-7)) – Tolerance of BCD scheme. If the L1-norm between the current and previous sample couplings is under this threshold, then stop BCD scheme.

  • max_iter_ot (int, optional (default = 100)) – Number of iterations to solve each of the two unbalanced optimal transport problems in each BCD iteration.

  • tol_ot (float, optional (default = 1e-7)) – Tolerance of unbalanced solver for each of the two unbalanced optimal transport problems in each BCD iteration.

  • log (bool, optional (default = False)) – If True then the cost and four dual vectors, including two from sample and two from feature couplings, are recorded.

  • verbose (bool, optional (default = False)) – If True then print the COOT cost at every multiplier of eval_bcd-th iteration.

Returns:

  • pi_samp ((n_sample_x, n_sample_y) array-like, float) – Sample coupling matrix.

  • pi_feat ((n_feature_x, n_feature_y) array-like, float) – Feature coupling matrix.

  • log (dictionary, optional) –

    Returned if log is True. The keys are:

    errorarray-like, float

    list of L1 norms between the current and previous sample coupling.

    duals_sample(n_sample_x, n_sample_y) tuple, float

    Pair of dual vectors when solving OT problem w.r.t the sample coupling.

    duals_feature(n_feature_x, n_feature_y) tuple, float

    Pair of dual vectors when solving OT problem w.r.t the feature coupling.

    linearfloat

    Linear part of the cost.

    ucootfloat

    Total cost.

    backend

    The proper backend for all input arrays

ot.gromov.fused_unbalanced_gromov_wasserstein(Cx, Cy, wx=None, wy=None, reg_marginals=10, epsilon=0, divergence='kl', unbalanced_solver='mm', alpha=0, M=None, init_duals=None, init_pi=None, max_iter=100, tol=1e-07, max_iter_ot=500, tol_ot=1e-07, log=False, verbose=False, **kwargs_solve)[source]

Compute the lower bound of the fused unbalanced Gromov-Wasserstein (FUGW) between two similarity matrices. In practice, this lower bound is used interchangeably with the true FUGW.

More precisely, this function returns the transport plan between \((\mathbf{C^X}, \mathbf{w_X})\) and \((\mathbf{C^Y}, \mathbf{w_Y})\), by solving the following problem using Block Coordinate Descent algorithm:

\[\begin{split}\mathop{\arg \min}_{\substack{\mathbf{P}, \mathbf{Q}: \\ mass(P) = mass(Q)}} &\quad \sum_{i,j,k,l} (\mathbf{C^X}_{i,k} - \mathbf{C^Y}_{j,l})^2 \mathbf{P}_{i,j} \mathbf{Q}_{k,l} + \frac{\alpha}{2} \sum_{i,j} (\mathbf{P}_{i,j} + \mathbf{Q}_{i,j}) \mathbf{M}_{i, j} \\ &+ \rho_1 \mathbf{Div}(\mathbf{P}_{\# 1} \mathbf{Q}_{\# 1}^T | \mathbf{w_X} \mathbf{w_X}^T) + \rho_2 \mathbf{Div}(\mathbf{P}_{\# 2} \mathbf{Q}_{\# 2}^T | \mathbf{w_Y} \mathbf{w_Y}^T) \\ &+ \varepsilon \mathbf{Div}(\mathbf{P} \otimes \mathbf{Q} | (\mathbf{w_X} \mathbf{w_Y}^T) \otimes (\mathbf{w_X} \mathbf{w_Y}^T))\end{split}\]

Where:

  • \(\mathbf{C^X}\): Source similarity matrix

  • \(\mathbf{C^Y}\): Target similarity matrix

  • \(\mathbf{M}\): Sample matrix corresponding to the Wasserstein term

  • \(\mathbf{w_X}\): Distribution of the samples in the source space

  • \(\mathbf{w_Y}\): Distribution of the samples in the target space

  • \(\mathbf{Div}\): Either Kullback-Leibler divergence or half-squared L2 norm.

Note

This function allows epsilon to be zero. In that case, unbalanced_method must be either “mm” or “lbfgsb”.

Parameters:
  • Cx ((n_sample_x, n_feature_x) array-like, float) – Source similarity matrix.

  • Cy ((n_sample_y, n_feature_y) array-like, float) – Target similarity matrix.

  • wx ((n_sample_x, ) array-like, float, optional (default = None)) – Histogram assigned on rows (samples) of matrix Cx. Uniform distribution by default.

  • wy ((n_sample_y, ) array-like, float, optional (default = None)) – Histogram assigned on rows (samples) of matrix Cy. Uniform distribution by default.

  • reg_marginals (float or indexable object of length 1 or 2) – Marginal relaxation terms for sample and feature couplings. If reg_marginals is a scalar or an indexable object of length 1, then the same value is applied to both marginal relaxations.

  • epsilon (scalar, float or int, optional (default = 0)) – Regularization parameters for entropic approximation of sample and feature couplings. Allow the case where epsilon contains 0. In that case, the MM solver is used by default instead of Sinkhorn solver. If epsilon is scalar, then the same value is applied to both regularization of sample and feature couplings.

  • divergence (string, optional (default = "kl")) –

    • If divergence = “kl”, then Div is the Kullback-Leibler divergence.

    • If divergence = “l2”, then Div is the half squared Euclidean norm.

  • unbalanced_solver (string, optional (default = "sinkhorn")) –

    Solver for the unbalanced OT subroutine.

    • If divergence = “kl”, then unbalanced_solver can be: “sinkhorn”, “sinkhorn_log”, “mm”, “lbfgsb”

    • If divergence = “l2”, then unbalanced_solver can be “mm”, “lbfgsb”

  • alpha (scalar, float or int, optional (default = 0)) – Coeffficient parameter of linear terms with respect to the sample and feature couplings. If alpha is scalar, then the same alpha is applied to both linear terms.

  • M ((n_sample_x, n_sample_y), float, optional (default = None)) – Sample matrix associated to the Wasserstein linear term on sample coupling.

  • init_pi ((n_sample_x, n_sample_y) array-like, optional (default = None)) – Initialization of sample coupling. By default = \(w_X w_Y^T\).

  • init_duals (tuple of vectors ((n_sample_x, ), (n_sample_y, )), optional (default = None).) – Initialization of sample and feature dual vectors if using Sinkhorn algorithm. Zero vectors by default.

  • max_iter (int, optional (default = 100)) – Number of Block Coordinate Descent (BCD) iterations.

  • tol (float, optional (default = 1e-7)) – Tolerance of BCD scheme. If the L1-norm between the current and previous sample couplings is under this threshold, then stop BCD scheme.

  • max_iter_ot (int, optional (default = 100)) – Number of iterations to solve each of the two unbalanced optimal transport problems in each BCD iteration.

  • tol_ot (float, optional (default = 1e-7)) – Tolerance of unbalanced solver for each of the two unbalanced optimal transport problems in each BCD iteration.

  • log (bool, optional (default = False)) – If True then the cost and four dual vectors, including two from sample and two from feature couplings, are recorded.

  • verbose (bool, optional (default = False)) – If True then print the COOT cost at every multiplier of eval_bcd-th iteration.

Returns:

  • pi_samp ((n_sample_x, n_sample_y) array-like, float) – Sample coupling matrix. In practice, we use this matrix as solution of FUGW.

  • pi_samp2 ((n_sample_x, n_sample_y) array-like, float) – Second sample coupling matrix. In practice, we usually ignore this output.

  • log (dictionary, optional) –

    Returned if log is True. The keys are:

    errorarray-like, float

    list of L1 norms between the current and previous sample couplings.

    duals(n_sample_x, n_sample_y)-tuple, float

    Pair of dual vectors when solving OT problem w.r.t the sample coupling.

    linearfloat

    Linear part of FUGW cost.

    fugw_costfloat

    Total FUGW cost.

    backend

    The proper backend for all input arrays

References

ot.gromov.fused_unbalanced_gromov_wasserstein2(Cx, Cy, wx=None, wy=None, reg_marginals=10, epsilon=0, divergence='kl', unbalanced_solver='mm', alpha=0, M=None, init_duals=None, init_pi=None, max_iter=100, tol=1e-07, max_iter_ot=500, tol_ot=1e-07, log=False, verbose=False, **kwargs_solve)[source]

Compute the lower bound of the fused unbalanced Gromov-Wasserstein (FUGW) between two similarity matrices. In practice, this lower bound is used interchangeably with the true FUGW.

More precisely, this function returns the lower bound of the fused unbalanced Gromov-Wasserstein cost between \((\mathbf{C^X}, \mathbf{w_X})\) and \((\mathbf{C^Y}, \mathbf{w_Y})\), by solving the following problem using Block Coordinate Descent algorithm:

\[\begin{split}\mathop{\min}_{\substack{\mathbf{P}, \mathbf{Q}: \\ mass(P) = mass(Q)}} &\quad \sum_{i,j,k,l} (\mathbf{C^X}_{i,k} - \mathbf{C^Y}_{j,l})^2 \mathbf{P}_{i,j} \mathbf{Q}_{k,l} + \frac{\alpha}{2} \sum_{i,j} (\mathbf{P}_{i,j} + \mathbf{Q}_{i,j}) \mathbf{M}_{i, j} \\ &+ \rho_1 \mathbf{Div}(\mathbf{P}_{\# 1} \mathbf{Q}_{\# 1}^T | \mathbf{w_X} \mathbf{w_X}^T) + \rho_2 \mathbf{Div}(\mathbf{P}_{\# 2} \mathbf{Q}_{\# 2}^T | \mathbf{w_Y} \mathbf{w_Y}^T) \\ &+ \varepsilon \mathbf{Div}(\mathbf{P} \otimes \mathbf{Q} | (\mathbf{w_X} \mathbf{w_Y}^T) \otimes (\mathbf{w_X} \mathbf{w_Y}^T))\end{split}\]

Where:

  • \(\mathbf{C^X}\): Source similarity matrix

  • \(\mathbf{C^Y}\): Target similarity matrix

  • \(\mathbf{M}\): Sample matrix corresponding to the Wasserstein term

  • \(\mathbf{w_X}\): Distribution of the samples in the source space

  • \(\mathbf{w_Y}\): Distribution of the samples in the target space

  • \(\mathbf{Div}\): Either Kullback-Leibler divergence or half-squared L2 norm.

Note

This function allows epsilon to be zero. In that case, unbalanced_method must be either “mm” or “lbfgsb”. Also the computation of gradients is only supported for KL divergence, but not for half squared-L2 norm. In case of half squared-L2 norm, the calculation of KL divergence will be used.

Parameters:
  • Cx ((n_sample_x, n_feature_x) array-like, float) – Source similarity matrix.

  • Cy ((n_sample_y, n_feature_y) array-like, float) – Target similarity matrix.

  • wx ((n_sample_x, ) array-like, float, optional (default = None)) – Histogram assigned on rows (samples) of matrix Cx. Uniform distribution by default.

  • wy ((n_sample_y, ) array-like, float, optional (default = None)) – Histogram assigned on rows (samples) of matrix Cy. Uniform distribution by default.

  • reg_marginals (float or indexable object of length 1 or 2) – Marginal relaxation terms for sample and feature couplings. If reg_marginals is a scalar or an indexable object of length 1, then the same value is applied to both marginal relaxations.

  • epsilon (scalar, float or int, optional (default = 0)) – Regularization parameters for entropic approximation of sample and feature couplings. Allow the case where epsilon contains 0. In that case, the MM solver is used by default instead of Sinkhorn solver. If epsilon is scalar, then the same value is applied to both regularization of sample and feature couplings.

  • divergence (string, optional (default = "kl")) –

    • If divergence = “kl”, then Div is the Kullback-Leibler divergence.

    • If divergence = “l2”, then Div is the half squared Euclidean norm.

  • unbalanced_solver (string, optional (default = "sinkhorn")) –

    Solver for the unbalanced OT subroutine.

    • If divergence = “kl”, then unbalanced_solver can be: “sinkhorn”, “sinkhorn_log”, “mm”, “lbfgsb”

    • If divergence = “l2”, then unbalanced_solver can be “mm”, “lbfgsb”

  • alpha (scalar, float or int, optional (default = 0)) – Coeffficient parameter of linear terms with respect to the sample and feature couplings. If alpha is scalar, then the same alpha is applied to both linear terms.

  • M ((n_sample_x, n_sample_y), float, optional (default = None)) – Sample matrix associated to the Wasserstein linear term on sample coupling.

  • init_pi ((n_sample_x, n_sample_y) array-like, optional (default = None)) – Initialization of sample coupling. By default = \(w_X w_Y^T\).

  • init_duals (tuple of vectors ((n_sample_x, ), (n_sample_y, )), optional (default = None).) – Initialization of sample and feature dual vectors if using Sinkhorn algorithm. Zero vectors by default.

  • max_iter (int, optional (default = 100)) – Number of Block Coordinate Descent (BCD) iterations.

  • tol (float, optional (default = 1e-7)) – Tolerance of BCD scheme. If the L1-norm between the current and previous sample couplings is under this threshold, then stop BCD scheme.

  • max_iter_ot (int, optional (default = 100)) – Number of iterations to solve each of the two unbalanced optimal transport problems in each BCD iteration.

  • tol_ot (float, optional (default = 1e-7)) – Tolerance of unbalanced solver for each of the two unbalanced optimal transport problems in each BCD iteration.

  • log (bool, optional (default = False)) – If True then the cost and four dual vectors, including two from sample and two from feature couplings, are recorded.

  • verbose (bool, optional (default = False)) – If True then print the COOT cost at every multiplier of eval_bcd-th iteration.

Returns:

  • fugw (float) – Total FUGW cost

  • log (dictionary, optional) –

    Returned if log is True. The keys are:

    errorarray-like, float

    list of L1 norms between the current and previous sample couplings.

    duals(n_sample_x, n_sample_y)-tuple, float

    Pair of dual vectors when solving OT problem w.r.t the sample coupling.

    linearfloat

    Linear part of FUGW cost.

    fugw_costfloat

    Total FUGW cost.

    backend

    The proper backend for all input arrays

References

ot.gromov.get_graph_partition(C, npart, part_method='random', F=None, alpha=1.0, random_state=0, nx=None)[source]

Partitioning a given graph with structure matrix \(\mathbf{C} \in R^{n \times n}\) into npart partitions either ‘random’, or using one of {‘louvain’, ‘fluid’} algorithms from networkx, or ‘spectral’ clustering from scikit-learn, or (Fused) Gromov-Wasserstein projections from POT.

Parameters:
  • C (array-like, shape (n, n)) – Structure matrix.

  • npart (int,) – number of partitions/clusters smaller than the number of nodes in \(\mathbf{C}\).

  • part_method (str, optional. Default is 'random'.) – Partitioning algorithm to use among {‘random’, ‘louvain’, ‘fluid’, ‘spectral’, ‘GW’, ‘FGW’}. ‘random’ for random sampling of points; ‘louvain’ and ‘fluid’ for graph partitioning algorithm that works well on adjacency matrix, If the louvain algorithm is used, npart is ignored; ‘spectral’ for spectral clustering; ‘(F)GW’ for (F)GW projection using sr(F)GW solvers.

  • F (array-like, shape (n, d), optional. (Default is None)) – Optional feature matrix aligned with the graph structure. Only used if part_method=”FGW”.

  • alpha (float, optional. (Default is 1.)) – Trade-off parameter between feature and structure matrices, taking values in [0, 1] and only used if F != None and part_method=”FGW”.

  • random_state (int, optional) – Random seed for the partitioning algorithm.

  • nx (backend, optional) – POT backend.

Returns:

part – Array of partition assignment for each node.

Return type:

array-like, shape (npart,)

References

ot.gromov.get_graph_representants(C, part, rep_method='pagerank', random_state=0, nx=None)[source]

Get representative node for each partition given by \(\mathbf{part} \in R^{n}\) of a graph with structure matrix \(\mathbf{C} \in R^{n \times n}\). Selection is either done randomly or using ‘pagerank’ algorithm from networkx.

Parameters:
  • C (array-like, shape (n, n)) – structure matrix.

  • part (array-like, shape (n,)) – Array of partition assignment for each node.

  • rep_method (str, optional. Default is 'pagerank'.) – Selection method for representant in each partition. Can be either ‘random’ i.e random sampling within each partition, or ‘pagerank’ to select a node with maximal pagerank.

  • random_state (int, optional) – Random seed for the partitioning algorithm

  • nx (backend, optional) – POT backend

Returns:

rep_indices – indices for representative node of each partition sorted according to partition identifiers.

Return type:

list, shape (npart,)

References

ot.gromov.get_partition_and_representants_samples(X, npart, method='kmeans', random_state=0, nx=None)[source]

Compute npart partitions and representants over samples \(\mathbf{X} \in R^{n \times d}\) using either a random or a kmeans algorithm.

Parameters:
  • X (array-like, shape (n, d)) – Samples endowed with an euclidean geometry.

  • npart (int,) – number of partitions smaller than the number of samples in \(\mathbf{X}\).

  • method (str, optional. Default is 'kmeans'.) – Partitioning and representant selection algorithms to use among {‘random’, ‘kmeans’}. ‘random’ for random sampling of points; ‘kmeans’ for k-means clustering using scikit-learn implementation where closest points to centroids are considered as representants.

  • random_state (int, optional) – Random seed for the partitioning algorithm.

  • nx (backend, optional) – POT backend.

Returns:

  • part (array-like, shape (npart,)) – Array of partition assignment for each node.

  • rep_indices (list, shape (npart,)) – indices for representative node of each partition sorted according to partition identifiers.

References

ot.gromov.gromov_barycenters(N, Cs, ps=None, p=None, lambdas=None, loss_fun='square_loss', symmetric=True, armijo=False, max_iter=1000, tol=1e-09, stop_criterion='barycenter', warmstartT=False, verbose=False, log=False, init_C=None, random_state=None, **kwargs)[source]

Returns the Gromov-Wasserstein barycenters of S measured similarity matrices \((\mathbf{C}_s)_{1 \leq s \leq S}\)

The function solves the following optimization problem with block coordinate descent:

\[\mathbf{C}^* = \mathop{\arg \min}_{\mathbf{C}\in \mathbb{R}^{N \times N}} \quad \sum_s \lambda_s \mathrm{GW}(\mathbf{C}, \mathbf{C}_s, \mathbf{p}, \mathbf{p}_s)\]

Where :

  • \(\mathbf{C}_s\): metric cost matrix

  • \(\mathbf{p}_s\): distribution

Parameters:
  • N (int) – Size of the targeted barycenter

  • Cs (list of S array-like of shape (ns, ns)) – Metric cost matrices

  • ps (list of S array-like of shape (ns,), optional) – Sample weights in the S spaces. If let to its default value None, uniform distributions are taken.

  • p (array-like, shape (N,), optional) – Weights in the targeted barycenter. If let to its default value None, uniform distribution is taken.

  • lambdas (list of float, optional) – List of the S spaces’ weights. If let to its default value None, uniform weights are taken.

  • loss_fun (callable, optional) – tensor-matrix multiplication function based on specific loss function

  • symmetric (bool, optional.) – Either structures are to be assumed symmetric or not. Default value is True. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • armijo (bool, optional) – If True the step of the line-search is found via an armijo research. Else closed form is used. If there are convergence issues use False.

  • max_iter (int, optional) – Max number of iterations

  • tol (float, optional) – Stop threshold on relative error (>0)

  • stop_criterion (str, optional. Default is 'barycenter'.) – Stop criterion taking values in [‘barycenter’, ‘loss’]. If set to ‘barycenter’ uses absolute norm variations of estimated barycenters. Else if set to ‘loss’ uses the relative variations of the loss.

  • warmstartT (bool, optional) – Either to perform warmstart of transport plans in the successive fused gromov-wasserstein transport problems.s

  • verbose (bool, optional) – Print information along iterations.

  • log (bool, optional) – Record log if True.

  • init_C (bool | array-like, shape(N,N)) – Random initial value for the \(\mathbf{C}\) matrix provided by user.

  • random_state (int or RandomState instance, optional) – Fix the seed for reproducibility

Returns:

  • C (array-like, shape (N, N)) – Similarity matrix in the barycenter space (permutated arbitrarily)

  • log (dict) – Only returned when log=True. It contains the keys:

    • \(\mathbf{T}\): list of (N, ns) transport matrices

    • \(\mathbf{p}\): (N,) barycenter weights

    • values used in convergence evaluation.

References

ot.gromov.gromov_wasserstein(C1, C2, p=None, q=None, loss_fun='square_loss', symmetric=None, log=False, armijo=False, G0=None, max_iter=10000.0, tol_rel=1e-09, tol_abs=1e-09, **kwargs)[source]

Returns the Gromov-Wasserstein transport between \((\mathbf{C_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{q})\).

The function solves the following optimization problem using Conditional Gradient:

\[ \begin{align}\begin{aligned}\mathbf{T}^* \in \mathop{\arg \min}_\mathbf{T} \quad \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{\gamma} \mathbf{1} &= \mathbf{p}\\ \mathbf{\gamma}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{\gamma} &\geq 0\end{aligned}\end{align} \]

Where :

  • \(\mathbf{C_1}\): Metric cost matrix in the source space.

  • \(\mathbf{C_2}\): Metric cost matrix in the target space.

  • \(\mathbf{p}\): Distribution in the source space.

  • \(\mathbf{q}\): Distribution in the target space.

  • L: Loss function to account for the misfit between the similarity matrices.

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

Note

All computations in the conjugate gradient solver are done with numpy to limit memory overhead.

Note

This function will cast the computed transport plan to the data type of the provided input \(\mathbf{C}_1\). Casting to an integer tensor might result in a loss of precision. If this behaviour is unwanted, please make sure to provide a floating point input.

Parameters:
  • C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space.

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space.

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • q (array-like, shape (nt,), optional) – Distribution in the target space. If let to its default value None, uniform distribution is taken.

  • loss_fun (str, optional) – Loss function used for the solver either ‘square_loss’ or ‘kl_loss’.

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • verbose (bool, optional) – Print information along iterations.

  • log (bool, optional) – Record log if True.

  • armijo (bool, optional) – If True, the step of the line-search is found via an armijo search. Else closed form is used. If there are convergence issues, use False.

  • G0 (array-like, shape (ns,nt), optional) – If None, the initial transport plan of the solver is pq^T. Otherwise G0 must satisfy marginal constraints and will be used as initial transport of the solver.

  • max_iter (int, optional) – Max number of iterations.

  • tol_rel (float, optional) – Stop threshold on relative error (>0).

  • tol_abs (float, optional) – Stop threshold on absolute error (>0).

  • **kwargs (dict) – Parameters can be directly passed to the ot.optim.cg solver.

Returns:

  • T (array-like, shape (ns, nt)) –

    Coupling between the two spaces that minimizes:

    \(\sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\)

  • log (dict) – Convergence information and loss.

References

ot.gromov.gromov_wasserstein2(C1, C2, p=None, q=None, loss_fun='square_loss', symmetric=None, log=False, armijo=False, G0=None, max_iter=10000.0, tol_rel=1e-09, tol_abs=1e-09, **kwargs)[source]

Returns the Gromov-Wasserstein loss \(\mathbf{GW}\) between \((\mathbf{C_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{q})\). To recover the Gromov-Wasserstein distance as defined in [13] compute \(d_{GW} = \frac{1}{2} \sqrt{\mathbf{GW}}\).

The function solves the following optimization problem using Conditional Gradient:

\[ \begin{align}\begin{aligned}\mathbf{GW} = \min_\mathbf{T} \quad \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{\gamma} \mathbf{1} &= \mathbf{p}\\ \mathbf{\gamma}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{\gamma} &\geq 0\end{aligned}\end{align} \]

Where :

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • \(\mathbf{p}\): distribution in the source space

  • \(\mathbf{q}\): distribution in the target space

  • L: loss function to account for the misfit between the similarity matrices

Note that when using backends, this loss function is differentiable wrt the matrices (C1, C2) and weights (p, q) for quadratic loss using the gradients from [38]_.

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

Note

All computations in the conjugate gradient solver are done with numpy to limit memory overhead.

Note

This function will cast the computed transport plan to the data type of the provided input \(\mathbf{C}_1\). Casting to an integer tensor might result in a loss of precision. If this behaviour is unwanted, please make sure to provide a floating point input.

Parameters:
  • C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • q (array-like, shape (nt,), optional) – Distribution in the target space. If let to its default value None, uniform distribution is taken.

  • loss_fun (str) – loss function used for the solver either ‘square_loss’ or ‘kl_loss’

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • verbose (bool, optional) – Print information along iterations

  • log (bool, optional) – record log if True

  • armijo (bool, optional) – If True the step of the line-search is found via an armijo research. Else closed form is used. If there are convergence issues use False.

  • G0 (array-like, shape (ns,nt), optional) – If None the initial transport plan of the solver is pq^T. Otherwise G0 must satisfy marginal constraints and will be used as initial transport of the solver.

  • max_iter (int, optional) – Max number of iterations

  • tol_rel (float, optional) – Stop threshold on relative error (>0)

  • tol_abs (float, optional) – Stop threshold on absolute error (>0)

  • **kwargs (dict) – parameters can be directly passed to the ot.optim.cg solver

Returns:

  • gw_dist (float) – Gromov-Wasserstein distance

  • log (dict) – convergence information and Coupling matrix

References

ot.gromov.gromov_wasserstein_dictionary_learning(Cs, D, nt, reg=0.0, ps=None, q=None, epochs=20, batch_size=32, learning_rate=1.0, Cdict_init=None, projection='nonnegative_symmetric', use_log=True, tol_outer=1e-05, tol_inner=1e-05, max_iter_outer=20, max_iter_inner=200, use_adam_optimizer=True, verbose=False, random_state=None, **kwargs)[source]

Infer Gromov-Wasserstein linear dictionary \(\{ (\mathbf{C_{dict}[d]}, q) \}_{d \in [D]}\) from the list of structures \(\{ (\mathbf{C_s},\mathbf{p_s}) \}_s\)

\[\min_{\mathbf{C_{dict}}, \{\mathbf{w_s} \}_{s \leq S}} \sum_{s=1}^S GW_2(\mathbf{C_s}, \sum_{d=1}^D w_{s,d}\mathbf{C_{dict}[d]}, \mathbf{p_s}, \mathbf{q}) - reg\| \mathbf{w_s} \|_2^2\]

such that, \(\forall s \leq S\) :

  • \(\mathbf{w_s}^\top \mathbf{1}_D = 1\)

  • \(\mathbf{w_s} \geq \mathbf{0}_D\)

Where :

  • \(\forall s \leq S, \mathbf{C_s}\) is a (ns,ns) pairwise similarity matrix of variable size ns.

  • \(\mathbf{C_{dict}}\) is a (D, nt, nt) tensor of D pairwise similarity matrix of fixed size nt.

  • \(\forall s \leq S, \mathbf{p_s}\) is the source distribution corresponding to \(\mathbf{C_s}\)

  • \(\mathbf{q}\) is the target distribution assigned to every structures in the embedding space.

  • reg is the regularization coefficient.

The stochastic algorithm used for estimating the graph dictionary atoms as proposed in [38]_

Parameters:
  • Cs (list of S symmetric array-like, shape (ns, ns)) – List of Metric/Graph cost matrices of variable size (ns, ns).

  • D (int) – Number of dictionary atoms to learn

  • nt (int) – Number of samples within each dictionary atoms

  • reg (float, optional) – Coefficient of the negative quadratic regularization used to promote sparsity of w. The default is 0.

  • ps (list of S array-like, shape (ns,), optional) – Distribution in each source space C of Cs. Default is None and corresponds to uniform distributions.

  • q (array-like, shape (nt,), optional) – Distribution in the embedding space whose structure will be learned. Default is None and corresponds to uniform distributions.

  • epochs (int, optional) – Number of epochs used to learn the dictionary. Default is 32.

  • batch_size (int, optional) – Batch size for each stochastic gradient update of the dictionary. Set to the dataset size if the provided batch_size is higher than the dataset size. Default is 32.

  • learning_rate (float, optional) – Learning rate used for the stochastic gradient descent. Default is 1.

  • Cdict_init (list of D array-like with shape (nt, nt), optional) – Used to initialize the dictionary. If set to None (Default), the dictionary will be initialized randomly. Else Cdict must have shape (D, nt, nt) i.e match provided shape features.

  • projection (str , optional) – If ‘nonnegative’ and/or ‘symmetric’ is in projection, the corresponding projection will be performed at each stochastic update of the dictionary Else the set of atoms is \(R^{nt * nt}\). Default is ‘nonnegative_symmetric’

  • log (bool, optional) – If set to True, losses evolution by batches and epochs are tracked. Default is False.

  • use_adam_optimizer (bool, optional) – If set to True, adam optimizer with default settings is used as adaptative learning rate strategy. Else perform SGD with fixed learning rate. Default is True.

  • tol_outer (float, optional) – Solver precision for the BCD algorithm, measured by absolute relative error on consecutive losses. Default is \(10^{-5}\).

  • tol_inner (float, optional) – Solver precision for the Conjugate Gradient algorithm used to get optimal w at a fixed transport, measured by absolute relative error on consecutive losses. Default is \(10^{-5}\).

  • max_iter_outer (int, optional) – Maximum number of iterations for the BCD. Default is 20.

  • max_iter_inner (int, optional) – Maximum number of iterations for the Conjugate Gradient. Default is 200.

  • verbose (bool, optional) – Print the reconstruction loss every epoch. Default is False.

  • random_state (int, RandomState instance or None, default=None) – Determines random number generation. Pass an int for reproducible output across multiple function calls.

Returns:

  • Cdict_best_state (D array-like, shape (D,nt,nt)) – Metric/Graph cost matrices composing the dictionary. The dictionary leading to the best loss over an epoch is saved and returned.

  • log (dict) – If use_log is True, contains loss evolutions by batches and epochs.

References

ot.gromov.gromov_wasserstein_linear_unmixing(C, Cdict, reg=0.0, p=None, q=None, tol_outer=1e-05, tol_inner=1e-05, max_iter_outer=20, max_iter_inner=200, symmetric=None, **kwargs)[source]

Returns the Gromov-Wasserstein linear unmixing of \((\mathbf{C},\mathbf{p})\) onto the dictionary \(\{ (\mathbf{C_{dict}[d]}, \mathbf{q}) \}_{d \in [D]}\).

\[\min_{ \mathbf{w}} GW_2(\mathbf{C}, \sum_{d=1}^D w_d\mathbf{C_{dict}[d]}, \mathbf{p}, \mathbf{q}) - reg \| \mathbf{w} \|_2^2\]

such that:

  • \(\mathbf{w}^\top \mathbf{1}_D = 1\)

  • \(\mathbf{w} \geq \mathbf{0}_D\)

Where :

  • \(\mathbf{C}\) is the (ns,ns) pairwise similarity matrix.

  • \(\mathbf{C_{dict}}\) is a (D, nt, nt) tensor of D pairwise similarity matrices of size nt.

  • \(\mathbf{p}\) and \(\mathbf{q}\) are source and target weights.

  • reg is the regularization coefficient.

The algorithm used for solving the problem is a Block Coordinate Descent as discussed in [38]_ , algorithm 1.

Parameters:
  • C (array-like, shape (ns, ns)) – Metric/Graph cost matrix.

  • Cdict (D array-like, shape (D,nt,nt)) – Metric/Graph cost matrices composing the dictionary on which to embed C.

  • reg (float, optional.) – Coefficient of the negative quadratic regularization used to promote sparsity of w. Default is 0.

  • p (array-like, shape (ns,), optional) – Distribution in the source space C. Default is None and corresponds to uniform distribution.

  • q (array-like, shape (nt,), optional) – Distribution in the space depicted by the dictionary. Default is None and corresponds to uniform distribution.

  • tol_outer (float, optional) – Solver precision for the BCD algorithm.

  • tol_inner (float, optional) – Solver precision for the Conjugate Gradient algorithm used to get optimal w at a fixed transport. Default is \(10^{-5}\).

  • max_iter_outer (int, optional) – Maximum number of iterations for the BCD. Default is 20.

  • max_iter_inner (int, optional) – Maximum number of iterations for the Conjugate Gradient. Default is 200.

Returns:

  • w (array-like, shape (D,)) – Gromov-Wasserstein linear unmixing of \((\mathbf{C},\mathbf{p})\) onto the span of the dictionary.

  • Cembedded (array-like, shape (nt,nt)) – embedded structure of \((\mathbf{C},\mathbf{p})\) onto the dictionary, \(\sum_d w_d\mathbf{C_{dict}[d]}\).

  • T (array-like (ns, nt)) – Gromov-Wasserstein transport plan between \((\mathbf{C},\mathbf{p})\) and \((\sum_d w_d\mathbf{C_{dict}[d]}, \mathbf{q})\)

  • current_loss (float) – reconstruction error

References

ot.gromov.gwggrad(constC, hC1, hC2, T, nx=None)[source]

Return the gradient for Gromov-Wasserstein

The gradient is computed as described in Proposition 2 in [12]

Parameters:
  • constC (array-like, shape (ns, nt)) – Constant \(\mathbf{C}\) matrix in Eq. (6)

  • hC1 (array-like, shape (ns, ns)) – \(\mathbf{h1}(\mathbf{C1})\) matrix in Eq. (6)

  • hC2 (array-like, shape (nt, nt)) – \(\mathbf{h2}(\mathbf{C2})\) matrix in Eq. (6)

  • T (array-like, shape (ns, nt)) – Current value of transport matrix \(\mathbf{T}\)

  • nx (backend, optional) – If let to its default value None, a backend test will be conducted.

Returns:

grad – Gromov-Wasserstein gradient

Return type:

array-like, shape (ns, nt)

References

ot.gromov.gwloss(constC, hC1, hC2, T, nx=None)[source]

Return the Loss for Gromov-Wasserstein

The loss is computed as described in Proposition 1 Eq. (6) in [12]

Parameters:
  • constC (array-like, shape (ns, nt)) – Constant \(\mathbf{C}\) matrix in Eq. (6)

  • hC1 (array-like, shape (ns, ns)) – \(\mathbf{h1}(\mathbf{C1})\) matrix in Eq. (6)

  • hC2 (array-like, shape (nt, nt)) – \(\mathbf{h2}(\mathbf{C2})\) matrix in Eq. (6)

  • T (array-like, shape (ns, nt)) – Current value of transport matrix \(\mathbf{T}\)

  • nx (backend, optional) – If let to its default value None, a backend test will be conducted.

Returns:

loss – Gromov-Wasserstein loss

Return type:

float

References

ot.gromov.init_matrix(C1, C2, p, q, loss_fun='square_loss', nx=None)[source]

Return loss matrices and tensors for Gromov-Wasserstein fast computation

Returns the value of \(\mathcal{L}(\mathbf{C_1}, \mathbf{C_2}) \otimes \mathbf{T}\) with the selected loss function as the loss function of Gromov-Wasserstein discrepancy.

The matrices are computed as described in Proposition 1 in [12]

Where :

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • \(\mathbf{T}\): A coupling between those two spaces

The square-loss function \(L(a, b) = |a - b|^2\) is read as :

\[ \begin{align}\begin{aligned}L(a, b) = f_1(a) + f_2(b) - h_1(a) h_2(b)\\\mathrm{with} \ f_1(a) &= a^2\\ f_2(b) &= b^2\\ h_1(a) &= a\\ h_2(b) &= 2b\end{aligned}\end{align} \]

The kl-loss function \(L(a, b) = a \log\left(\frac{a}{b}\right) - a + b\) is read as :

\[ \begin{align}\begin{aligned}L(a, b) = f_1(a) + f_2(b) - h_1(a) h_2(b)\\\mathrm{with} \ f_1(a) &= a \log(a) - a\\ f_2(b) &= b\\ h_1(a) &= a\\ h_2(b) &= \log(b)\end{aligned}\end{align} \]
Parameters:
  • C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space

  • p (array-like, shape (ns,)) – Probability distribution in the source space

  • q (array-like, shape (nt,)) – Probability distribution in the target space

  • loss_fun (str, optional) – Name of loss function to use: either ‘square_loss’ or ‘kl_loss’ (default=’square_loss’)

  • nx (backend, optional) – If let to its default value None, a backend test will be conducted.

Returns:

  • constC (array-like, shape (ns, nt)) – Constant \(\mathbf{C}\) matrix in Eq. (6)

  • hC1 (array-like, shape (ns, ns)) – \(\mathbf{h1}(\mathbf{C1})\) matrix in Eq. (6)

  • hC2 (array-like, shape (nt, nt)) – \(\mathbf{h2}(\mathbf{C2})\) matrix in Eq. (6)

References

ot.gromov.init_matrix_semirelaxed(C1, C2, p, loss_fun='square_loss', nx=None)[source]

Return loss matrices and tensors for semi-relaxed Gromov-Wasserstein fast computation

Returns the value of \(\mathcal{L}(\mathbf{C_1}, \mathbf{C_2}) \otimes \mathbf{T}\) with the selected loss function as the loss function of semi-relaxed Gromov-Wasserstein discrepancy.

The matrices are computed as described in Proposition 1 in [12] and adapted to the semi-relaxed problem where the second marginal is not a constant anymore.

Where :

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • \(\mathbf{T}\): A coupling between those two spaces

The square-loss function \(L(a, b) = |a - b|^2\) is read as :

\[ \begin{align}\begin{aligned}L(a, b) = f_1(a) + f_2(b) - h_1(a) h_2(b)\\\mathrm{with} \ f_1(a) &= a^2\\ f_2(b) &= b^2\\ h_1(a) &= a\\ h_2(b) &= 2b\end{aligned}\end{align} \]

The kl-loss function \(L(a, b) = a \log\left(\frac{a}{b}\right) - a + b\) is read as :

\[ \begin{align}\begin{aligned}L(a, b) = f_1(a) + f_2(b) - h_1(a) h_2(b)\\\mathrm{with} \ f_1(a) &= a \log(a) - a\\ f_2(b) &= b\\ h_1(a) &= a\\ h_2(b) &= \log(b)\end{aligned}\end{align} \]
Parameters:
  • C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space

  • p (array-like, shape (ns,)) – Probability distribution in the source space

  • loss_fun (str, optional) – Name of loss function to use: either ‘square_loss’ or ‘kl_loss’ (default=’square_loss’)

  • nx (backend, optional) – If let to its default value None, a backend test will be conducted.

Returns:

  • constC (array-like, shape (ns, nt)) – Constant \(\mathbf{C}\) matrix in Eq. (6) adapted to srGW

  • hC1 (array-like, shape (ns, ns)) – \(\mathbf{h1}(\mathbf{C1})\) matrix in Eq. (6)

  • hC2 (array-like, shape (nt, nt)) – \(\mathbf{h2}(\mathbf{C2})\) matrix in Eq. (6)

  • fC2t (array-like, shape (nt, nt)) – \(\mathbf{f2}(\mathbf{C2})^\top\) matrix in Eq. (6)

References

ot.gromov.lowrank_gromov_wasserstein_samples(X_s, X_t, a=None, b=None, reg=0, rank=None, alpha=1e-10, gamma_init='rescale', rescale_cost=True, cost_factorized_Xs=None, cost_factorized_Xt=None, stopThr=0.0001, numItermax=1000, stopThr_dykstra=0.001, numItermax_dykstra=10000, seed_init=49, warn=True, warn_dykstra=False, log=False)[source]

Solve the entropic regularization Gromov-Wasserstein transport problem under low-nonnegative rank constraints on the couplings and cost matrices.

Squared euclidean distance matrices are considered for the target and source distributions.

The function solves the following optimization problem:

\[\mathop{\min_{(Q,R,g) \in \mathcal{C(a,b,r)}}} \mathcal{Q}_{A,B}(Q\mathrm{diag}(1/g)R^T) - \epsilon \cdot H((Q,R,g))\]

where :

  • math:

    A is the (dim_a, dim_a) square pairwise cost matrix of the source domain.

  • math:

    B is the (dim_a, dim_a) square pairwise cost matrix of the target domain.

  • math:

    mathcal{Q}_{A,B} is quadratic objective function of the Gromov Wasserstein plan.

  • math:

    Q and R are the low-rank matrix decomposition of the Gromov-Wasserstein plan.

  • math:

    g is the weight vector for the low-rank decomposition of the Gromov-Wasserstein plan.

  • \(\mathbf{a}\) and \(\mathbf{b}\) are source and target weights (histograms, both sum to 1).

  • math:

    r is the rank of the Gromov-Wasserstein plan.

  • math:

    mathcal{C(a,b,r)} are the low-rank couplings of the OT problem.

  • \(H((Q,R,g))\) is the values of the three respective entropies evaluated for each term.

Parameters:
  • X_s (array-like, shape (n_samples_a, dim_Xs)) – Samples in the source domain

  • X_t (array-like, shape (n_samples_b, dim_Xt)) – Samples in the target domain

  • a (array-like, shape (n_samples_a,), optional) – Samples weights in the source domain If let to its default value None, uniform distribution is taken.

  • b (array-like, shape (n_samples_b,), optional) – Samples weights in the target domain If let to its default value None, uniform distribution is taken.

  • reg (float, optional) – Regularization term >=0

  • rank (int, optional. Default is None. (>0)) – Nonnegative rank of the OT plan. If None, min(ns, nt) is considered.

  • alpha (int, optional. Default is 1e-10. (>0 and <1/r)) – Lower bound for the weight vector g.

  • rescale_cost (bool, optional. Default is False) – Rescale the low rank factorization of the sqeuclidean cost matrix

  • seed_init (int, optional. Default is 49. (>0)) – Random state for the ‘random’ initialization of low rank couplings

  • gamma_init (str, optional. Default is "rescale".) – Initialization strategy for gamma. ‘rescale’, or ‘theory’ Gamma is a constant that scales the convergence criterion of the Mirror Descent optimization scheme used to compute the low-rank couplings (Q, R and g)

  • numItermax (int, optional. Default is 1000.) – Max number of iterations for Low Rank GW

  • stopThr (float, optional. Default is 1e-4.) – Stop threshold on error (>0) for Low Rank GW The error is the sum of Kullback Divergences computed for each low rank coupling (Q, R and g) and scaled using gamma.

  • numItermax_dykstra (int, optional. Default is 2000.) – Max number of iterations for the Dykstra algorithm

  • stopThr_dykstra (float, optional. Default is 1e-7.) – Stop threshold on error (>0) in Dykstra

  • cost_factorized_Xs (tuple, optional. Default is None) – Tuple with two pre-computed low rank decompositions (A1, A2) of the source cost matrix. Both matrices should have a shape of (n_samples_a, dim_Xs + 2). If None, the low rank cost matrices will be computed as sqeuclidean cost matrices.

  • cost_factorized_Xt (tuple, optional. Default is None) – Tuple with two pre-computed low rank decompositions (B1, B2) of the target cost matrix. Both matrices should have a shape of (n_samples_b, dim_Xt + 2). If None, the low rank cost matrices will be computed as sqeuclidean cost matrices.

  • warn (bool, optional) – if True, raises a warning if the low rank GW algorithm doesn’t convergence.

  • warn_dykstra (bool, optional) – if True, raises a warning if the Dykstra algorithm doesn’t convergence.

  • log (bool, optional) – record log if True

Returns:

  • Q (array-like, shape (n_samples_a, r)) – First low-rank matrix decomposition of the OT plan

  • R (array-like, shape (n_samples_b, r)) – Second low-rank matrix decomposition of the OT plan

  • g (array-like, shape (r, )) – Weight vector for the low-rank decomposition of the OT

  • log (dict (lazy_plan, value and value_linear)) – log dictionary return only if log==True in parameters

References

ot.gromov.partial_fused_gromov_wasserstein(M, C1, C2, p=None, q=None, m=None, loss_fun='square_loss', alpha=0.5, nb_dummies=1, G0=None, thres=1, numItermax=10000.0, tol=1e-08, symmetric=None, warn=True, log=False, verbose=False, **kwargs)[source]

Returns the Partial Fused Gromov-Wasserstein transport between \((\mathbf{C_1}, \mathbf{F_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{F_2}, \mathbf{q})\), with pairwise distance matrix \(\mathbf{M}\) between node feature matrices.

The function solves the following optimization problem using Conditional Gradient:

\[ \begin{align}\begin{aligned}\mathbf{T}^* \in \mathop{\arg \min}_\mathbf{T} \quad (1 - \alpha) \langle \mathbf{T}, \mathbf{M} \rangle_F + \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\\ \mathbf{1}^T \mathbf{T}^T \mathbf{1} = m &\leq \min\{\|\mathbf{p}\|_1, \|\mathbf{q}\|_1\}\end{aligned}\end{align} \]

where :

  • \(\mathbf{M}\): metric cost matrix between features across domains

  • \(\mathbf{C_1}\): Metric cost matrix in the source space.

  • \(\mathbf{C_2}\): Metric cost matrix in the target space.

  • \(\mathbf{p}\): Distribution in the source space.

  • \(\mathbf{q}\): Distribution in the target space.

  • m is the amount of mass to be transported

  • L: Loss function to account for the misfit between the similarity matrices.

The formulation of the problem has been proposed in [29]

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

Note

All computations in the conjugate gradient solver are done with numpy to limit memory overhead.

Note

This function will cast the computed transport plan to the data type of the provided input \(\mathbf{C}_1\). Casting to an integer tensor might result in a loss of precision. If this behaviour is unwanted, please make sure to provide a floating point input.

Parameters:
  • M (array-like, shape (ns, nt)) – Metric cost matrix between features across domains

  • C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space

  • C2 (array-like, shape (nt, nt)) – Metric costfr matrix in the target space

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • q (array-like, shape (nt,), optional) – Distribution in the target space. If let to its default value None, uniform distribution is taken.

  • m (float, optional) – Amount of mass to be transported (default: \(\min\{\|\mathbf{p}\|_1, \|\mathbf{q}\|_1\}\))

  • loss_fun (str, optional) – Loss function used for the solver either ‘square_loss’ or ‘kl_loss’.

  • alpha (float, optional) – Trade-off parameter (0 < alpha < 1)

  • nb_dummies (int, optional) – Number of dummy points to add (avoid instabilities in the EMD solver)

  • G0 (array-like, shape (ns, nt), optional) – Initialization of the transportation matrix

  • thres (float, optional) – quantile of the gradient matrix to populate the cost matrix when 0 (default: 1)

  • numItermax (int, optional) – Max number of iterations

  • tol (float, optional) – tolerance for stopping iterations

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • warn (bool, optional.) – Whether to raise a warning when EMD did not converge.

  • log (bool, optional) – return log if True

  • verbose (bool, optional) – Print information along iterations

  • **kwargs (dict) – parameters can be directly passed to the emd solver

Returns:

  • T (array-like, shape (ns, nt)) – Optimal transport matrix between the two spaces.

  • log (dict) – Convergence information and loss.

  • .. _references-partial-gromov-wasserstein

References

ot.gromov.partial_fused_gromov_wasserstein2(M, C1, C2, p=None, q=None, m=None, loss_fun='square_loss', alpha=0.5, nb_dummies=1, G0=None, thres=1, numItermax=10000.0, tol=1e-07, symmetric=None, warn=False, log=False, verbose=False, **kwargs)[source]

Returns the Partial Fused Gromov-Wasserstein discrepancy between \((\mathbf{C_1}, \mathbf{F_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{F_2}, \mathbf{q})\), with pairwise distance matrix \(\mathbf{M}\) between node feature matrices.

The function solves the following optimization problem using Conditional Gradient:

\[ \begin{align}\begin{aligned}\mathbf{PFGW}_{\alpha} = \mathop{\min}_\mathbf{T} \quad (1 - \alpha) \langle \mathbf{T}, \mathbf{M} \rangle_F + \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\\ \mathbf{1}^T \mathbf{T}^T \mathbf{1} = m &\leq \min\{\|\mathbf{p}\|_1, \|\mathbf{q}\|_1\}\end{aligned}\end{align} \]

where :

  • \(\mathbf{M}\): metric cost matrix between features across domains

  • \(\mathbf{C_1}\): Metric cost matrix in the source space.

  • \(\mathbf{C_2}\): Metric cost matrix in the target space.

  • \(\mathbf{p}\): Distribution in the source space.

  • \(\mathbf{q}\): Distribution in the target space.

  • m is the amount of mass to be transported

  • L: Loss function to account for the misfit between the similarity matrices.

The formulation of the problem has been proposed in [29]

Note that when using backends, this loss function is differentiable wrt the matrices (M, C1, C2).

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

Note

All computations in the conjugate gradient solver are done with numpy to limit memory overhead.

Note

This function will cast the computed transport plan to the data type of the provided input \(\mathbf{C}_1\). Casting to an integer tensor might result in a loss of precision. If this behaviour is unwanted, please make sure to provide a floating point input.

Parameters:
  • M (array-like, shape (ns, nt)) – Metric cost matrix between features across domains

  • C1 (ndarray, shape (ns, ns)) – Metric cost matrix in the source space

  • C2 (ndarray, shape (nt, nt)) – Metric cost matrix in the target space

  • p (ndarray, shape (ns,)) – Distribution in the source space

  • q (ndarray, shape (nt,)) – Distribution in the target space

  • m (float, optional) – Amount of mass to be transported (default: \(\min\{\|\mathbf{p}\|_1, \|\mathbf{q}\|_1\}\))

  • loss_fun (str, optional) – Loss function used for the solver either ‘square_loss’ or ‘kl_loss’.

  • alpha (float, optional) – Trade-off parameter (0 < alpha < 1)

  • nb_dummies (int, optional) – Number of dummy points to add (avoid instabilities in the EMD solver)

  • G0 (ndarray, shape (ns, nt), optional) – Initialization of the transportation matrix

  • thres (float, optional) – quantile of the gradient matrix to populate the cost matrix when 0 (default: 1)

  • numItermax (int, optional) – Max number of iterations

  • tol (float, optional) – tolerance for stopping iterations

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • warn (bool, optional.) – Whether to raise a warning when EMD did not converge.

  • log (bool, optional) – return log if True

  • verbose (bool, optional) – Print information along iterations

  • **kwargs (dict) – parameters can be directly passed to the emd solver

Warning

When dealing with a large number of points, the EMD solver may face some instabilities, especially when the mass associated to the dummy point is large. To avoid them, increase the number of dummy points (allows a smoother repartition of the mass over the points).

Returns:

  • partial_fgw_dist (float) – partial FGW discrepancy

  • log (dict) – log dictionary returned only if log is True

  • .. _references-partial-gromov-wasserstein2

References

ot.gromov.partial_gromov_wasserstein(C1, C2, p=None, q=None, m=None, loss_fun='square_loss', nb_dummies=1, G0=None, thres=1, numItermax=10000.0, tol=1e-08, symmetric=None, warn=True, log=False, verbose=False, **kwargs)[source]

Returns the Partial Gromov-Wasserstein transport between \((\mathbf{C_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{q})\).

The function solves the following optimization problem using Conditional Gradient:

\[ \begin{align}\begin{aligned}\mathbf{T}^* \in \mathop{\arg \min}_\mathbf{T} \quad \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\\ \mathbf{1}^T \mathbf{T}^T \mathbf{1} = m &\leq \min\{\|\mathbf{p}\|_1, \|\mathbf{q}\|_1\}\end{aligned}\end{align} \]

where :

  • \(\mathbf{C_1}\): Metric cost matrix in the source space.

  • \(\mathbf{C_2}\): Metric cost matrix in the target space.

  • \(\mathbf{p}\): Distribution in the source space.

  • \(\mathbf{q}\): Distribution in the target space.

  • m is the amount of mass to be transported

  • L: Loss function to account for the misfit between the similarity matrices.

The formulation of the problem has been proposed in [29]

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

Note

All computations in the conjugate gradient solver are done with numpy to limit memory overhead.

Note

This function will cast the computed transport plan to the data type of the provided input \(\mathbf{C}_1\). Casting to an integer tensor might result in a loss of precision. If this behaviour is unwanted, please make sure to provide a floating point input.

Parameters:
  • C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space

  • C2 (array-like, shape (nt, nt)) – Metric costfr matrix in the target space

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • q (array-like, shape (nt,), optional) – Distribution in the target space. If let to its default value None, uniform distribution is taken.

  • m (float, optional) – Amount of mass to be transported (default: \(\min\{\|\mathbf{p}\|_1, \|\mathbf{q}\|_1\}\))

  • loss_fun (str, optional) – Loss function used for the solver either ‘square_loss’ or ‘kl_loss’.

  • nb_dummies (int, optional) – Number of dummy points to add (avoid instabilities in the EMD solver)

  • G0 (array-like, shape (ns, nt), optional) – Initialization of the transportation matrix

  • thres (float, optional) – quantile of the gradient matrix to populate the cost matrix when 0 (default: 1)

  • numItermax (int, optional) – Max number of iterations

  • tol (float, optional) – tolerance for stopping iterations

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • warn (bool, optional.) – Whether to raise a warning when EMD did not converge.

  • log (bool, optional) – return log if True

  • verbose (bool, optional) – Print information along iterations

  • **kwargs (dict) – parameters can be directly passed to the emd solver

Returns:

  • T (array-like, shape (ns, nt)) – Optimal transport matrix between the two spaces.

  • log (dict) – Convergence information and loss.

Examples

>>> from ot.gromov import partial_gromov_wasserstein
>>> import scipy as sp
>>> a = np.array([0.25] * 4)
>>> b = np.array([0.25] * 4)
>>> x = np.array([1,2,100,200]).reshape((-1,1))
>>> y = np.array([3,2,98,199]).reshape((-1,1))
>>> C1 = sp.spatial.distance.cdist(x, x)
>>> C2 = sp.spatial.distance.cdist(y, y)
>>> np.round(partial_gromov_wasserstein(C1, C2, a, b),2)
array([[0.  , 0.25, 0.  , 0.  ],
       [0.25, 0.  , 0.  , 0.  ],
       [0.  , 0.  , 0.25, 0.  ],
       [0.  , 0.  , 0.  , 0.25]])
>>> np.round(partial_gromov_wasserstein(C1, C2, a, b, m=0.25),2)
array([[0.  , 0.  , 0.  , 0.  ],
       [0.  , 0.  , 0.  , 0.  ],
       [0.  , 0.  , 0.25, 0.  ],
       [0.  , 0.  , 0.  , 0.  ]])

References

ot.gromov.partial_gromov_wasserstein2(C1, C2, p=None, q=None, m=None, loss_fun='square_loss', nb_dummies=1, G0=None, thres=1, numItermax=10000.0, tol=1e-07, symmetric=None, warn=False, log=False, verbose=False, **kwargs)[source]

Returns the Partial Gromov-Wasserstein discrepancy between \((\mathbf{C_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{q})\).

The function solves the following optimization problem using Conditional Gradient:

\[ \begin{align}\begin{aligned}\mathbf{PGW} = \mathop{\min}_\mathbf{T} \quad \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\\ \mathbf{1}^T \mathbf{T}^T \mathbf{1} = m &\leq \min\{\|\mathbf{p}\|_1, \|\mathbf{q}\|_1\}\end{aligned}\end{align} \]

where :

  • \(\mathbf{C_1}\): Metric cost matrix in the source space.

  • \(\mathbf{C_2}\): Metric cost matrix in the target space.

  • \(\mathbf{p}\): Distribution in the source space.

  • \(\mathbf{q}\): Distribution in the target space.

  • m is the amount of mass to be transported

  • L: Loss function to account for the misfit between the similarity matrices.

The formulation of the problem has been proposed in [29]

Note that when using backends, this loss function is differentiable wrt the matrices (C1, C2).

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

Note

All computations in the conjugate gradient solver are done with numpy to limit memory overhead.

Note

This function will cast the computed transport plan to the data type of the provided input \(\mathbf{C}_1\). Casting to an integer tensor might result in a loss of precision. If this behaviour is unwanted, please make sure to provide a floating point input.

Parameters:
  • C1 (ndarray, shape (ns, ns)) – Metric cost matrix in the source space

  • C2 (ndarray, shape (nt, nt)) – Metric cost matrix in the target space

  • p (ndarray, shape (ns,)) – Distribution in the source space

  • q (ndarray, shape (nt,)) – Distribution in the target space

  • m (float, optional) – Amount of mass to be transported (default: \(\min\{\|\mathbf{p}\|_1, \|\mathbf{q}\|_1\}\))

  • loss_fun (str, optional) – Loss function used for the solver either ‘square_loss’ or ‘kl_loss’.

  • nb_dummies (int, optional) – Number of dummy points to add (avoid instabilities in the EMD solver)

  • G0 (ndarray, shape (ns, nt), optional) – Initialization of the transportation matrix

  • thres (float, optional) – quantile of the gradient matrix to populate the cost matrix when 0 (default: 1)

  • numItermax (int, optional) – Max number of iterations

  • tol (float, optional) – tolerance for stopping iterations

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • warn (bool, optional.) – Whether to raise a warning when EMD did not converge.

  • log (bool, optional) – return log if True

  • verbose (bool, optional) – Print information along iterations

  • **kwargs (dict) – parameters can be directly passed to the emd solver

Warning

When dealing with a large number of points, the EMD solver may face some instabilities, especially when the mass associated to the dummy point is large. To avoid them, increase the number of dummy points (allows a smoother repartition of the mass over the points).

Returns:

  • partial_gw_dist (float) – partial GW discrepancy

  • log (dict) – log dictionary returned only if log is True

Examples

>>> from ot.gromov import partial_gromov_wasserstein2
>>> import scipy as sp
>>> a = np.array([0.25] * 4)
>>> b = np.array([0.25] * 4)
>>> x = np.array([1,2,100,200]).reshape((-1,1))
>>> y = np.array([3,2,98,199]).reshape((-1,1))
>>> C1 = sp.spatial.distance.cdist(x, x)
>>> C2 = sp.spatial.distance.cdist(y, y)
>>> np.round(partial_gromov_wasserstein2(C1, C2, a, b),2)
3.38
>>> np.round(partial_gromov_wasserstein2(C1, C2, a, b, m=0.25),2)
0.0

References

ot.gromov.pointwise_gromov_wasserstein(C1, C2, p, q, loss_fun, alpha=1, max_iter=100, threshold_plan=0, log=False, verbose=False, random_state=None)[source]

Returns the gromov-wasserstein transport between \((\mathbf{C_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{q})\) using a stochastic Frank-Wolfe. This method has a \(\mathcal{O}(\mathrm{max\_iter} \times PN^2)\) time complexity with P the number of Sinkhorn iterations.

The function solves the following optimization problem:

\[ \begin{align}\begin{aligned}\mathbf{GW} = \mathop{\arg \min}_\mathbf{T} \quad \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

Where :

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • \(\mathbf{p}\): distribution in the source space

  • \(\mathbf{q}\): distribution in the target space

  • L: loss function to account for the misfit between the similarity matrices

Parameters:
  • C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space

  • p (array-like, shape (ns,)) – Distribution in the source space

  • q (array-like, shape (nt,)) – Distribution in the target space

  • loss_fun (function: \(\mathbb{R} \times \mathbb{R} \mapsto \mathbb{R}\)) – Loss function used for the distance, the transport plan does not depend on the loss function

  • alpha (float) – Step of the Frank-Wolfe algorithm, should be between 0 and 1

  • max_iter (int, optional) – Max number of iterations

  • threshold_plan (float, optional) – Deleting very small values in the transport plan. If above zero, it violates the marginal constraints.

  • verbose (bool, optional) – Print information along iterations

  • log (bool, optional) – Gives the distance estimated and the standard deviation

  • random_state (int or RandomState instance, optional) – Fix the seed for reproducibility

Returns:

T – Optimal coupling between the two spaces

Return type:

array-like, shape (ns, nt)

References

ot.gromov.quantized_fused_gromov_wasserstein(C1, C2, npart1, npart2, p=None, q=None, C1_aux=None, C2_aux=None, F1=None, F2=None, alpha=1.0, part_method='fluid', rep_method='random', log=False, armijo=False, max_iter=10000.0, tol_rel=1e-09, tol_abs=1e-09, random_state=0, **kwargs)[source]

Returns the quantized Fused Gromov-Wasserstein transport between \((\mathbf{C_1}, \mathbf{F_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{F_2}, \mathbf{q})\), whose samples are assigned to partitions and representants \(\mathcal{P_1} = \{(\mathbf{P_{1, i}}, \mathbf{r_{1, i}})\}_{i \leq npart1}\) and \(\mathcal{P_2} = \{(\mathbf{P_{2, j}}, \mathbf{r_{2, j}})\}_{j \leq npart2}\).

The function estimates the following optimization problem:

\[ \begin{align}\begin{aligned}\mathbf{T}^* \in \mathop{\arg \min}_\mathbf{T} \quad \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l} + (1-\alpha) \langle \mathbf{T}, \mathbf{D}(\mathbf{F_1}, \mathbf{F}_2) \rangle_F s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\\ \mathbf{T}_{|\mathbf{P_{1, i}}, \mathbf{P_{2, j}}} &= T^{g}_{ij} \mathbf{T}^{(i,j)}\end{aligned}\end{align} \]

using a two-step strategy computing: i) a global alignment \(\mathbf{T}^{g}\) between representants across joint structure and feature spaces; ii) local alignments \(\mathbf{T}^{(i, j)}\) between partitions \(\mathbf{P_{1, i}}\) and \(\mathbf{P_{2, j}}\) seen as 1D measures.

Where :

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • \(\mathbf{F_1}\): Feature matrix in the source space

  • \(\mathbf{F_2}\): Feature matrix in the target space

  • \(\mathbf{D}(\mathbf{F_1}, \mathbf{F_2})\): Pairwise euclidean distance matrix between features

  • \(\mathbf{p}\): distribution in the source space

  • \(\mathbf{q}\): distribution in the target space

  • \(L\): quadratic loss function to account for the misfit between the similarity matrices

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

Note

All computations in the conjugate gradient solver are done with numpy to limit memory overhead.

Parameters:
  • C1 (array-like, shape (ns, ns)) – Structure matrix in the source space.

  • C2 (array-like, shape (nt, nt)) – Structure matrix in the target space.

  • npart1 (int,) – number of partition in the source space.

  • npart2 (int,) – number of partition in the target space.

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • q (array-like, shape (nt,), optional) – Distribution in the target space. If let to its default value None, uniform distribution is taken.

  • C1_aux (array-like, shape (ns, ns), optional. Default is None.) – Auxiliary structure matrix in the source space to perform the partitioning and representant selection.

  • C2_aux (array-like, shape (nt, nt), optional. Default is None.) – Auxiliary structure matrix in the target space to perform the partitioning and representant selection.

  • F1 (array-like, shape (ns, d), optional. Default is None.) – Feature matrix in the source space.

  • F2 (array-like, shape (nt, d), optional. Default is None.) – Feature matrix in the target space

  • alpha (float, optional. Default is 1.) – FGW trade-off parameter in \(]0, 1]\) between structure and features. If alpha = 1 features are ignored hence computing qGW, if alpha=0 structures are ignored and we compute the quantized Wasserstein transport.

  • part_method (str, optional. Default is 'spectral'.) – Partitioning algorithm to use among {‘random’, ‘louvain’, ‘fluid’, ‘spectral’, ‘louvain_fused’, ‘fluid_fused’, ‘spectral_fused’, ‘GW’, ‘FGW’}. If part_method in {‘louvain_fused’, ‘fluid_fused’, ‘spectral_fused’}, corresponding graph partitioning algorithm {‘louvain’, ‘fluid’, ‘spectral’} will be used on the modified structure matrix \(\alpha \mathbf{C} + (1 - \alpha) \mathbf{D}(\mathbf{F})\) where \(\mathbf{D}(\mathbf{F})\) is the pairwise euclidean matrix between features. If part_method in {‘GW’, ‘FGW’}, a (F)GW projection is used. If the louvain algorithm is used, the requested number of partitions is ignored.

  • rep_method (str, optional. Default is 'pagerank'.) – Selection method for node representant in each partition. Can be either ‘random’ i.e random sampling within each partition, {‘pagerank’, ‘pagerank_fused’} to select a node with maximal pagerank w.r.t \(\mathbf{C}\) or \(\alpha \mathbf{C} + (1 - \alpha) \mathbf{D}(\mathbf{F})\).

  • verbose (bool, optional) – Print information along iterations

  • log (bool, optional) – record log if True

  • armijo (bool, optional) – If True the step of the line-search is found via an armijo research. Else closed form is used. If there are convergence issues use False.

  • max_iter (int, optional) – Max number of iterations

  • tol_rel (float, optional) – Stop threshold on relative error (>0)

  • tol_abs (float, optional) – Stop threshold on absolute error (>0)

  • random_state (int, optional) – Random seed for the partitioning algorithm

  • **kwargs (dict) – parameters can be directly passed to the ot.optim.cg solver

Returns:

  • T_global (array-like, shape (npart1, npart2)) – Fused Gromov-Wasserstein alignment \(\mathbf{T}^{g}\) between representants.

  • Ts_local (dict of local OT matrices.) – Dictionary with keys \((i, j)\) corresponding to 1D OT between \(\mathbf{P_{1, i}}\) and \(\mathbf{P_{2, j}}\) if \(T^{g}_{ij} \neq 0\).

  • T (array-like, shape (ns, nt)) – Coupling between the two spaces.

  • log (dict) – Convergence information for inner problems and qGW loss.

References

ot.gromov.quantized_fused_gromov_wasserstein_partitioned(CR1, CR2, list_R1, list_R2, list_p1, list_p2, MR=None, alpha=1.0, build_OT=False, log=False, armijo=False, max_iter=10000.0, tol_rel=1e-09, tol_abs=1e-09, nx=None, **kwargs)[source]

Returns the quantized Fused Gromov-Wasserstein transport between \((\mathbf{C_1}, \mathbf{F_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{F_2}, \mathbf{q})\), whose samples are assigned to partitions and representants \(\mathcal{P_1} = \{(\mathbf{P_{1, i}}, \mathbf{r_{1, i}})\}_{i \leq npart1}\) and \(\mathcal{P_2} = \{(\mathbf{P_{2, j}}, \mathbf{r_{2, j}})\}_{j \leq npart2}\). The latter must be precomputed and encoded e.g for the source as: \(\mathbf{CR_1}\) structure matrix between representants; list_R1 a list of relations between representants and their associated samples; list_p1 a list of nodes distribution within each partition; \(\mathbf{FR_1}\) feature matrix of representants.

The function estimates the following optimization problem:

\[ \begin{align}\begin{aligned}\mathbf{T}^* \in \mathop{\arg \min}_\mathbf{T} \quad \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l} + (1-\alpha) \langle \mathbf{T}, M\rangle_F s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\\ \mathbf{T}_{|\mathbf{P_{1, i}}, \mathbf{P_{2, j}}} &= T^{g}_{ij} \mathbf{T}^{(i,j)}\end{aligned}\end{align} \]

using a two-step strategy computing: i) a global alignment \(\mathbf{T}^{g}\) between representants joint structure and feature spaces; ii) local alignments \(\mathbf{T}^{(i, j)}\) between partitions \(\mathbf{P_{1, i}}\) and \(\mathbf{P_{2, j}}\) seen as 1D measures.

Where :

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • \(\mathbf{F_1}\): Feature matrix in the source space

  • \(\mathbf{F_2}\): Feature matrix in the target space

  • \(\mathbf{M}\): Pairwise similarity matrix between features

  • \(\mathbf{p}\): distribution in the source space

  • \(\mathbf{q}\): distribution in the target space

  • \(L\): quadratic loss function to account for the misfit between the similarity matrices

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

Note

All computations in the Gromov-Wasserstein conjugate gradient solver are done with numpy to limit memory overhead.

Parameters:
  • CR1 (array-like, shape (npart1, npart1)) – Structure matrix between partition representants in the source space.

  • CR2 (array-like, shape (npart2, npart2)) – Structure matrix between partition representants in the target space.

  • list_R1 (list of npart1 arrays,) – List of relations between representants and their associated samples in the source space.

  • list_R2 (list of npart2 arrays,) – List of relations between representants and their associated samples in the target space.

  • list_p1 (list of npart1 arrays,) – List of node distributions within each partition of the source space.

  • list_p (list of npart2 arrays,) – List of node distributions within each partition of the target space.

  • MR (array-like, shape (npart1, npart2), optional. (Default is None)) – Metric cost matrix between features of representants across spaces.

  • alpha (float, optional. Default is None.) – FGW trade-off parameter in \(]0, 1]\) between structure and features. If alpha = 1 features are ignored hence computing qGW.

  • build_OT (bool, optional. Default is False) – Either to build or not the OT between non-partitioned structures.

  • log (bool, optional. Default is False) – record log if True

  • armijo (bool, optional) – If True the step of the line-search is found via an armijo research. Else closed form is used. If there are convergence issues use False.

  • max_iter (int, optional) – Max number of iterations

  • tol_rel (float, optional) – Stop threshold on relative error (>0)

  • tol_abs (float, optional) – Stop threshold on absolute error (>0)

  • nx (backend, optional) – POT backend

  • **kwargs (dict) – parameters can be directly passed to the ot.optim.cg solver

Returns:

  • T_global (array-like, shape (npart1, npart2)) – Gromov-Wasserstein alignment \(\mathbf{T}^{g}\) between representants.

  • Ts_local (dict of local OT matrices.) – Dictionary with keys \((i, j)\) corresponding to 1D OT between \(\mathbf{P_{1, i}}\) and \(\mathbf{P_{2, j}}\) if \(T^{g}_{ij} \neq 0\).

  • T (array-like, shape (ns, nt)) – Coupling between the two spaces if build_OT=True else None.

  • log (dict, if log=True.) – Convergence information and losses of inner OT problems.

References

ot.gromov.quantized_fused_gromov_wasserstein_samples(X1, X2, npart1, npart2, p=None, q=None, F1=None, F2=None, alpha=1.0, method='kmeans', log=False, armijo=False, max_iter=10000.0, tol_rel=1e-09, tol_abs=1e-09, random_state=0, **kwargs)[source]

Returns the quantized Fused Gromov-Wasserstein transport between samples endowed with their respective euclidean geometry \((\mathbf{D}(\mathbf{X_1}), \mathbf{F_1}, \mathbf{p})\) and \((\mathbf{D}(\mathbf{X_1}), \mathbf{F_2}, \mathbf{q})\), whose samples are assigned to partitions and representants \(\mathcal{P_1} = \{(\mathbf{P_{1, i}}, \mathbf{r_{1, i}})\}_{i \leq npart1}\) and \(\mathcal{P_2} = \{(\mathbf{P_{2, j}}, \mathbf{r_{2, j}})\}_{j \leq npart2}\).

The function estimates the following optimization problem:

\[ \begin{align}\begin{aligned}\mathbf{T}^* \in \mathop{\arg \min}_\mathbf{T} \quad \alpha \sum_{i,j,k,l} L(\mathbf{D}(\mathbf{X_1})_{i,k}, \mathbf{D}(\mathbf{X_2})_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l} + (1-\alpha) \langle \mathbf{T}, \mathbf{D}(\mathbf{F_1}, \mathbf{F}_2) \rangle_F s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\\ \mathbf{T}_{|\mathbf{P_{1, i}}, \mathbf{P_{2, j}}} &= T^{g}_{ij} \mathbf{T}^{(i,j)}\end{aligned}\end{align} \]

using a two-step strategy computing: i) a global alignment \(\mathbf{T}^{g}\) between representants across joint structure and feature spaces; ii) local alignments \(\mathbf{T}^{(i, j)}\) between partitions \(\mathbf{P_{1, i}}\) and \(\mathbf{P_{2, j}}\) seen as 1D measures.

Where :

  • \(\mathbf{X_1}\): Samples in the source space

  • \(\mathbf{X_2}\): Samples in the target space

  • \(\mathbf{F_1}\): Feature matrix in the source space

  • \(\mathbf{F_2}\): Feature matrix in the target space

  • \(\mathbf{D}(\mathbf{F_1}, \mathbf{F_2})\): Pairwise euclidean distance matrix between features

  • \(\mathbf{p}\): distribution in the source space

  • \(\mathbf{q}\): distribution in the target space

  • \(L\): quadratic loss function to account for the misfit between the similarity matrices

Note

This function is backend-compatible and will work on arrays from all compatible backends. But the algorithm uses the C++ CPU backend which can lead to copy overhead on GPU arrays.

Note

All computations in the conjugate gradient solver are done with numpy to limit memory overhead.

Parameters:
  • X1 (array-like, shape (ns, ds)) – Samples in the source space.

  • X2 (array-like, shape (nt, dt)) – Samples in the target space.

  • npart1 (int,) – number of partition in the source space.

  • npart2 (int,) – number of partition in the target space.

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • q (array-like, shape (nt,), optional) – Distribution in the target space. If let to its default value None, uniform distribution is taken.

  • F1 (array-like, shape (ns, d), optional. Default is None.) – Feature matrix in the source space.

  • F2 (array-like, shape (nt, d), optional. Default is None.) – Feature matrix in the target space

  • alpha (float, optional. Default is 1.) – FGW trade-off parameter in \(]0, 1]\) between structure and features. If alpha = 1 features are ignored hence computing qGW, if alpha=0 structures are ignored and we compute the quantized Wasserstein transport.

  • method (str, optional. Default is 'kmeans'.) – Partitioning and representant selection algorithms to use among {‘random’, ‘kmeans’, ‘kmeans_fused’}. If part_method == ‘kmeans_fused’, kmeans is performed on augmented samples \([\alpha \mathbf{X}; (1 - \alpha) \mathbf{F}]\).

  • verbose (bool, optional) – Print information along iterations

  • log (bool, optional) – record log if True

  • armijo (bool, optional) – If True the step of the line-search is found via an armijo research. Else closed form is used. If there are convergence issues use False.

  • max_iter (int, optional) – Max number of iterations

  • tol_rel (float, optional) – Stop threshold on relative error (>0)

  • tol_abs (float, optional) – Stop threshold on absolute error (>0)

  • random_state (int, optional) – Random seed for the partitioning algorithm

  • **kwargs (dict) – parameters can be directly passed to the ot.optim.cg solver

Returns:

  • T_global (array-like, shape (npart1, npart2)) – Fused Gromov-Wasserstein alignment \(\mathbf{T}^{g}\) between representants.

  • Ts_local (dict of local OT matrices.) – Dictionary with keys \((i, j)\) corresponding to 1D OT between \(\mathbf{P_{1, i}}\) and \(\mathbf{P_{2, j}}\) if \(T^{g}_{ij} \neq 0\).

  • T (array-like, shape (ns, nt)) – Coupling between the two spaces.

  • log (dict) – Convergence information for inner problems and qGW loss.

References

ot.gromov.sampled_gromov_wasserstein(C1, C2, p, q, loss_fun, nb_samples_grad=100, epsilon=1, max_iter=500, log=False, verbose=False, random_state=None)[source]

Returns the gromov-wasserstein transport between \((\mathbf{C_1}, \mathbf{p})\) and \((\mathbf{C_2}, \mathbf{q})\) using a 1-stochastic Frank-Wolfe. This method has a \(\mathcal{O}(\mathrm{max\_iter} \times N \log(N))\) time complexity by relying on the 1D Optimal Transport solver.

The function solves the following optimization problem:

\[ \begin{align}\begin{aligned}\mathbf{GW} = \mathop{\arg \min}_\mathbf{T} \quad \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T}^T \mathbf{1} &= \mathbf{q}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

Where :

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • \(\mathbf{p}\): distribution in the source space

  • \(\mathbf{q}\): distribution in the target space

  • L: loss function to account for the misfit between the similarity matrices

Parameters:
  • C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space

  • p (array-like, shape (ns,)) – Distribution in the source space

  • q (array-like, shape (nt,)) – Distribution in the target space

  • loss_fun (function: \(\mathbb{R} \times \mathbb{R} \mapsto \mathbb{R}\)) – Loss function used for the distance, the transport plan does not depend on the loss function

  • nb_samples_grad (int) – Number of samples to approximate the gradient

  • epsilon (float) – Weight of the Kullback-Leibler regularization

  • max_iter (int, optional) – Max number of iterations

  • verbose (bool, optional) – Print information along iterations

  • log (bool, optional) – Gives the distance estimated and the standard deviation

  • random_state (int or RandomState instance, optional) – Fix the seed for reproducibility

Returns:

T – Optimal coupling between the two spaces

Return type:

array-like, shape (ns, nt)

References

ot.gromov.semirelaxed_fgw_barycenters(N, Ys, Cs, ps=None, lambdas=None, alpha=0.5, fixed_structure=False, fixed_features=False, p=None, loss_fun='square_loss', symmetric=True, max_iter=100, tol=1e-09, stop_criterion='barycenter', warmstartT=False, verbose=False, log=False, init_C=None, init_X=None, G0='product', random_state=None, **kwargs)[source]

Returns the Semi-relaxed Fused Gromov-Wasserstein barycenters of S measurable networks with node features \((\mathbf{C}_s, \mathbf{Y}_s, \mathbf{p}_s)_{1 \leq s \leq S}\) (see eq (44) in [48], estimated using the semi-relaxed FGW transports from Conditional Gradient solvers.

The function solves the following optimization problem:

\[\mathbf{C}^*, \mathbf{Y}^* = \mathop{\arg \min}_{\mathbf{C}\in \mathbb{R}^{N \times N}, \mathbf{Y}\in \mathbb{Y}^{N \times d}} \quad \sum_s \lambda_s \mathrm{srFGW}_{\alpha}(\mathbf{C}_s, \mathbf{Y}_s, \mathbf{p}_s, \mathbf{C}, \mathbf{Y})\]

Where :

  • \(\mathbf{Y}_s\): input feature matrix

  • \(\mathbf{C}_s\): input metric cost matrix

  • \(\mathbf{p}_s\): input distribution

Parameters:
  • N (int) – Desired number of samples of the target barycenter

  • Ys (list of array-like, each element has shape (ns,d)) – Features of all samples

  • Cs (list of array-like, each element has shape (ns,ns)) – Structure matrices of all samples

  • ps (list of array-like, each element has shape (ns,), optional) – Masses of all samples. If let to its default value None, uniform distributions are taken.

  • lambdas (array-like of shape (S,) , optional) – List of the S spaces’ weights. If let to its default value None, uniform weights are taken.

  • alpha (float, optional) – Alpha parameter for the srFGW divergence in \(]0, 1[\).

  • fixed_structure (bool, optional) – Whether to fix the structure of the barycenter during the updates.

  • fixed_features (bool, optional) – Whether to fix the feature of the barycenter during the updates

  • loss_fun (str, optional) – Loss function used for the solver either ‘square_loss’ or ‘kl_loss’

  • symmetric (bool, optional) – Either structures are to be assumed symmetric or not. Default value is True. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • max_iter (int, optional) – Max number of iterations

  • tol (float, optional) – Stop threshold on relative error (>0)

  • stop_criterion (str, optional. Default is 'barycenter'.) – Stop criterion taking values in [‘barycenter’, ‘loss’]. If set to ‘barycenter’ uses absolute norm variations of estimated barycenters. Else if set to ‘loss’ uses the relative variations of the loss.

  • warmstartT (bool, optional) – Either to perform warmstart of transport plans in the successive fused gromov-wasserstein transport problems.

  • verbose (bool, optional) – Print information along iterations.

  • log (bool, optional) – Record log if True.

  • init_C (array-like, shape (N,N), optional) – Initialization for the barycenters’ structure matrix. If not set a random init is used.

  • init_X (array-like, shape (N,d), optional) – Initialization for the barycenters’ features. If not set a random init is used.

  • G0 (str, optional. Default is "product".) – Initialization method for transport plans calling ot.gromov.semirelaxed_init_plan(), and taking values in “product”, “random_product”, “random”, “fluid”, “fluid_soft”, “spectral”, “spectral_soft”, “kmeans”, “kmeans_soft”. Transport plans are used to deduce an initial barycenter structure if init_C=None.

  • random_state (int or RandomState instance, optional) – Fix the seed for reproducibility

Returns:

  • X (array-like, shape (N, d)) – Barycenters’ features

  • C (array-like, shape (N, N)) – Barycenters’ structure matrix

  • log (dict) – Only returned when log=True. It contains the keys:

    • \(\mathbf{T}_s\): list of (N, ns) transport matrices from which target masses can be deduced.

    • \((\mathbf{M}_s)_s\): all distance matrices between the feature of the barycenter and the other features \((dist(\mathbf{X}, \mathbf{Y}_s))_s\) shape (N, ns)

    • values used in convergence evaluation.

References

ot.gromov.semirelaxed_fused_gromov_wasserstein(M, C1, C2, p=None, loss_fun='square_loss', symmetric=None, alpha=0.5, G0=None, log=False, max_iter=10000.0, tol_rel=1e-09, tol_abs=1e-09, random_state=0, **kwargs)[source]

Computes the semi-relaxed Fused Gromov-Wasserstein transport between two graphs (see [48]).

\[ \begin{align}\begin{aligned}\mathbf{T}^* \in \mathop{\arg \min}_{\mathbf{T}} \quad (1 - \alpha) \langle \mathbf{T}, \mathbf{M} \rangle_F + \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) T_{i,j} T_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

where :

  • \(\mathbf{M}\) is the (ns, nt) metric cost matrix

  • \(\mathbf{p}\) source weights (sum to 1)

  • L is a loss function to account for the misfit between the similarity matrices

Note

This function is backend-compatible and will work on arrays from all compatible backends. However all the steps in the conditional gradient are not differentiable.

The algorithm used for solving the problem is conditional gradient as discussed in [48]

Parameters:
  • M (array-like, shape (ns, nt)) – Metric cost matrix between features across domains

  • C1 (array-like, shape (ns, ns)) – Metric cost matrix representative of the structure in the source space

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix representative of the structure in the target space

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • loss_fun (str) – loss function used for the solver either ‘square_loss’ or ‘kl_loss’.

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • alpha (float, optional) – Trade-off parameter (0 < alpha < 1)

  • G0 (array-like of shape (ns,nt) or string, optional) – If G0=None the initial transport plan of the solver is \(\mathbf{p} \frac{\mathbf{1}_{nt}}{nt}^\top\). If G0 is a tensor it must satisfy marginal constraints and will be used as initial transport of the solver. if G0 is a string it will be interpreted as a method for ot.gromov.semirelaxed_init_plan() taking values in “product”, “random_product”, “random”, “fluid”, “fluid_soft”, “spectral”, “spectral_soft”, “kmeans”, “kmeans_soft”.

  • log (bool, optional) – record log if True

  • max_iter (int, optional) – Max number of iterations

  • tol_rel (float, optional) – Stop threshold on relative error (>0)

  • tol_abs (float, optional) – Stop threshold on absolute error (>0)

  • random_state (int, optional) – Random seed used in stochastic initialization methods.

  • **kwargs (dict) – parameters can be directly passed to the ot.optim.cg solver

Returns:

  • gamma (array-like, shape (ns, nt)) – Optimal transportation matrix for the given parameters.

  • log (dict) – Log dictionary return only if log==True in parameters.

References

ot.gromov.semirelaxed_fused_gromov_wasserstein2(M, C1, C2, p=None, loss_fun='square_loss', symmetric=None, alpha=0.5, G0=None, log=False, max_iter=10000.0, tol_rel=1e-09, tol_abs=1e-09, random_state=0, **kwargs)[source]

Computes the semi-relaxed FGW divergence between two graphs (see [48]).

\[ \begin{align}\begin{aligned}\mathbf{srFGW}_{\alpha} = \min_{\mathbf{T}} \quad (1 - \alpha) \langle \mathbf{T}, \mathbf{M} \rangle_F + \alpha \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) T_{i,j} T_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

where :

  • \(\mathbf{M}\) is the (ns, nt) metric cost matrix

  • \(\mathbf{p}\) source weights (sum to 1)

  • L is a loss function to account for the misfit between the similarity matrices

The algorithm used for solving the problem is conditional gradient as discussed in [48]

Note that when using backends, this loss function is differentiable wrt the matrices (C1, C2) but not yet for the weights p.

Note

This function is backend-compatible and will work on arrays from all compatible backends. However all the steps in the conditional gradient are not differentiable.

Parameters:
  • M (array-like, shape (ns, nt)) – Metric cost matrix between features across domains

  • C1 (array-like, shape (ns, ns)) – Metric cost matrix representative of the structure in the source space.

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix representative of the structure in the target space.

  • p (array-like, shape (ns,)) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • loss_fun (str, optional) – loss function used for the solver either ‘square_loss’ or ‘kl_loss’.

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • alpha (float, optional) – Trade-off parameter (0 < alpha < 1)

  • G0 (array-like of shape (ns,nt) or string, optional) – If G0=None the initial transport plan of the solver is \(\mathbf{p} \frac{\mathbf{1}_{nt}}{nt}^\top\). If G0 is a tensor it must satisfy marginal constraints and will be used as initial transport of the solver. if G0 is a string it will be interpreted as a method for ot.gromov.semirelaxed_init_plan() taking values in “product”, “random_product”, “random”, “fluid”, “fluid_soft”, “spectral”, “spectral_soft”, “kmeans”, “kmeans_soft”.

  • log (bool, optional) – Record log if True.

  • max_iter (int, optional) – Max number of iterations

  • tol_rel (float, optional) – Stop threshold on relative error (>0)

  • tol_abs (float, optional) – Stop threshold on absolute error (>0)

  • random_state (int, optional) – Random seed used in stochastic initialization methods.

  • **kwargs (dict) – Parameters can be directly passed to the ot.optim.cg solver.

Returns:

  • srfgw-divergence (float) – Semi-relaxed Fused Gromov-Wasserstein divergence for the given parameters.

  • log (dict) – Log dictionary return only if log==True in parameters.

References

ot.gromov.semirelaxed_gromov_barycenters(N, Cs, ps=None, lambdas=None, loss_fun='square_loss', symmetric=True, max_iter=1000, tol=1e-09, stop_criterion='barycenter', warmstartT=False, verbose=False, log=False, init_C=None, G0='product', random_state=None, **kwargs)[source]

Returns the Semi-relaxed Gromov-Wasserstein barycenters of S measured similarity matrices \((\mathbf{C}_s)_{1 \leq s \leq S}\)

The function solves the following optimization problem with block coordinate descent:

\[\mathbf{C}^* = \mathop{\arg \min}_{\mathbf{C}\in \mathbb{R}^{N \times N}} \quad \sum_s \lambda_s \mathrm{srGW}(\mathbf{C}_s, \mathbf{p}_s, \mathbf{C})\]

Where :

  • \(\mathbf{C}_s\): input metric cost matrix

  • \(\mathbf{p}_s\): distribution

Parameters:
  • N (int) – Size of the targeted barycenter

  • Cs (list of S array-like of shape (ns, ns)) – Metric cost matrices

  • ps (list of S array-like of shape (ns,), optional) – Sample weights in the S spaces. If let to its default value None, uniform distributions are taken.

  • lambdas (array-like of shape (S,) , optional) – List of the S spaces’ weights. If let to its default value None, uniform weights are taken.

  • loss_fun (callable, optional) – tensor-matrix multiplication function based on specific loss function

  • symmetric (bool, optional.) – Either structures are to be assumed symmetric or not. Default value is True. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • max_iter (int, optional) – Max number of iterations

  • tol (float, optional) – Stop threshold on relative error (>0)

  • stop_criterion (str, optional. Default is 'barycenter'.) – Stop criterion taking values in [‘barycenter’, ‘loss’]. If set to ‘barycenter’ uses absolute norm variations of estimated barycenters. Else if set to ‘loss’ uses the relative variations of the loss.

  • warmstartT (bool, optional) – Either to perform warmstart of transport plans in the successive fused gromov-wasserstein transport problems.s

  • verbose (bool, optional) – Print information along iterations.

  • log (bool, optional) – Record log if True.

  • init_C (array-like of shape (N,N), optional.) – Random initial value for the \(\mathbf{C}\) matrix provided by user. Default is None and relies G0 to produce an initial structure.

  • G0 (str, optional. Default is 'product'.) – Initialization method for transport plans calling ot.gromov.semirelaxed_init_plan(), and taking values in “product”, “random_product”, “random”, “fluid”, “fluid_soft”, “spectral”, “spectral_soft”, “kmeans”, “kmeans_soft”. Transport plans are used to deduce an initial barycenter structure if init_C=None.

  • random_state (int or RandomState instance, optional) – Fix the seed for reproducibility

Returns:

  • C (array-like, shape (N, N)) – Barycenters’ structure matrix

  • log (dict) – Only returned when log=True. It contains the keys:

    • \(\mathbf{T}\): list of (N, ns) transport matrices

    • \(\mathbf{p}\): (N,) barycenter weights

    • values used in convergence evaluation.

References

ot.gromov.semirelaxed_gromov_wasserstein(C1, C2, p=None, loss_fun='square_loss', symmetric=None, log=False, G0=None, max_iter=10000.0, tol_rel=1e-09, tol_abs=1e-09, random_state=0, **kwargs)[source]

Returns the semi-relaxed Gromov-Wasserstein divergence transport from \((\mathbf{C_1}, \mathbf{p})\) to \(\mathbf{C_2}\) (see [48]).

The function solves the following optimization problem using Conditional Gradient:

\[ \begin{align}\begin{aligned}\mathbf{T}^* \in \mathop{\arg \min}_{\mathbf{T}} \quad \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

Where :

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • \(\mathbf{p}\): distribution in the source space

  • L: loss function to account for the misfit between the similarity matrices

Note

This function is backend-compatible and will work on arrays from all compatible backends. However all the steps in the conditional gradient are not differentiable.

Parameters:
  • C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • loss_fun (str) – loss function used for the solver either ‘square_loss’ or ‘kl_loss’.

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • verbose (bool, optional) – Print information along iterations

  • log (bool, optional) – record log if True

  • G0 (array-like of shape (ns,nt) or string, optional) – If G0=None the initial transport plan of the solver is \(\mathbf{p} \frac{\mathbf{1}_{nt}}{nt}^\top\). If G0 is a tensor it must satisfy marginal constraints and will be used as initial transport of the solver. if G0 is a string it will be interpreted as a method for ot.gromov.semirelaxed_init_plan() taking values in “product”, “random_product”, “random”, “fluid”, “fluid_soft”, “spectral”, “spectral_soft”, “kmeans”, “kmeans_soft”.

  • max_iter (int, optional) – Max number of iterations

  • tol_rel (float, optional) – Stop threshold on relative error (>0)

  • tol_abs (float, optional) – Stop threshold on absolute error (>0)

  • random_state (int, optional) – Random seed used in stochastic initialization methods.

  • **kwargs (dict) – parameters can be directly passed to the ot.optim.cg solver

Returns:

  • T (array-like, shape (ns, nt)) –

    Coupling between the two spaces that minimizes:

    \(\sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\)

  • log (dict) – Convergence information and loss.

References

ot.gromov.semirelaxed_gromov_wasserstein2(C1, C2, p=None, loss_fun='square_loss', symmetric=None, log=False, G0=None, max_iter=10000.0, tol_rel=1e-09, tol_abs=1e-09, random_state=0, **kwargs)[source]

Returns the semi-relaxed Gromov-Wasserstein divergence from \((\mathbf{C_1}, \mathbf{p})\) to \(\mathbf{C_2}\) (see [48]).

The function solves the following optimization problem using Conditional Gradient:

\[ \begin{align}\begin{aligned}\text{srGW} = \min_{\mathbf{T}} \quad \sum_{i,j,k,l} L(\mathbf{C_1}_{i,k}, \mathbf{C_2}_{j,l}) \mathbf{T}_{i,j} \mathbf{T}_{k,l}\\s.t. \ \mathbf{T} \mathbf{1} &= \mathbf{p}\\ \mathbf{T} &\geq 0\end{aligned}\end{align} \]

Where :

  • \(\mathbf{C_1}\): Metric cost matrix in the source space

  • \(\mathbf{C_2}\): Metric cost matrix in the target space

  • \(\mathbf{p}\): distribution in the source space

  • L: loss function to account for the misfit between the similarity matrices

Note that when using backends, this loss function is differentiable wrt the matrices (C1, C2) but not yet for the weights p.

Note

This function is backend-compatible and will work on arrays from all compatible backends. However all the steps in the conditional gradient are not differentiable.

Parameters:
  • C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space

  • p (array-like, shape (ns,), optional) – Distribution in the source space. If let to its default value None, uniform distribution is taken.

  • loss_fun (str) – loss function used for the solver either ‘square_loss’ or ‘kl_loss’.

  • symmetric (bool, optional) – Either C1 and C2 are to be assumed symmetric or not. If let to its default None value, a symmetry test will be conducted. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

  • verbose (bool, optional) – Print information along iterations

  • log (bool, optional) – record log if True

  • G0 (array-like of shape (ns,nt) or string, optional) – If G0=None the initial transport plan of the solver is \(\mathbf{p} \frac{\mathbf{1}_{nt}}{nt}^\top\). If G0 is a tensor it must satisfy marginal constraints and will be used as initial transport of the solver. if G0 is a string it will be interpreted as a method for ot.gromov.semirelaxed_init_plan() taking values in “product”, “random_product”, “random”, “fluid”, “fluid_soft”, “spectral”, “spectral_soft”, “kmeans”, “kmeans_soft”.

  • max_iter (int, optional) – Max number of iterations

  • tol_rel (float, optional) – Stop threshold on relative error (>0)

  • tol_abs (float, optional) – Stop threshold on absolute error (>0)

  • random_state (int, optional) – Random seed used in stochastic initialization methods.

  • **kwargs (dict) – parameters can be directly passed to the ot.optim.cg solver

Returns:

  • srgw (float) – Semi-relaxed Gromov-Wasserstein divergence

  • log (dict) – convergence information and Coupling matrix

References

ot.gromov.semirelaxed_init_plan(C1, C2, p, M=None, alpha=1.0, method='product', use_target=True, random_state=0, nx=None)[source]

Heuristics to initialize the semi-relaxed (F)GW transport plan \(\mathbf{T} \in \mathcal{U}_{nt}(\mathbf{p})\), between a graph \((\mathbf{C1}, \mathbf{p})\) and a structure matrix \(\mathbf{C2}\), where \(\mathcal{U}_{nt}(\mathbf{p}) = \{\mathbf{T} \in \mathbb{R}_{+}^{ns * nt}, \mathbf{T} \mathbf{1}_{nt} = \mathbf{p} \}\). Available methods are:

  • “product” or “random_product”: \(\mathbf{T} = \mathbf{pq}^{T}\) with \(\mathbf{q}\) uniform or randomly samples in the nt probability simplex.

  • “random”: random sampling in \(\mathcal{U}_{nt}(\mathbf{p})\).

  • “fluid”: Fluid algorithm from networkx for graph partitioning.

  • “spectral”, “kmeans” : Spectral or Kmeans clustering from sklearn.

  • “fluid_soft”, “spectral_soft”, “kmeans_soft”: \(\mathbf{T}_0\) given by corresponding clustering with target marginal \(\mathbf{q}_0\), further centered as \(\mathbf{T} = (\mathbf{T}_0 + \mathbf{pq}_0^T) / 2\) .

If a metric cost matrix between features across domains \(\mathbf{M}\) is a provided, it will be used as cost matrix in a semi-relaxed Wasserstein problem providing \(\mathbf{T}_M \in \mathcal{U}_{nt}(\mathbf{p})\). Then the outputted transport plan is \(\alpha \mathbf{T} + (1 - \alpha ) \mathbf{T}_{M}\).

Parameters:
  • C1 (array-like, shape (ns, ns)) – Metric cost matrix in the source space.

  • C2 (array-like, shape (nt, nt)) – Metric cost matrix in the target space.

  • p (array-like, shape (ns,), optional.) – Probability distribution in the source space. If let to None, uniform weights are assumed on C1.

  • M (array-like, shape (ns, nt), optional.) – Metric cost matrix between features across domains.

  • alpha (float, optional) – Trade-off parameter (0 <= alpha <= 1)

  • method (str, optional) – Method to initialize the transport plan. The default is ‘product’.

  • use_target (bool, optional.) – Whether or not to use the target structure/features to further align transport plan provided by the method.

  • random_state (int, optional) – Random seed used for stochastic methods.

  • nx (backend, optional) – POT backend.

Returns:

T – Admissible transport plan for the sr(F)GW problems.

Return type:

array-like, shape (ns, ns)

References

ot.gromov.solve_gromov_linesearch(G, deltaG, cost_G, C1, C2, M, reg, alpha_min=None, alpha_max=None, nx=None, symmetric=False, **kwargs)[source]

Solve the linesearch in the FW iterations for any inner loss that decomposes as in Proposition 1 in [12].

Parameters:
  • G (array-like, shape(ns,nt)) – The transport map at a given iteration of the FW

  • deltaG (array-like (ns,nt)) – Difference between the optimal map found by linearization in the FW algorithm and the value at a given iteration

  • cost_G (float) – Value of the cost at G

  • C1 (array-like (ns,ns), optional) – Transformed Structure matrix in the source domain. For the ‘square_loss’ and ‘kl_loss’, we provide hC1 from ot.gromov.init_matrix

  • C2 (array-like (nt,nt), optional) – Transformed Structure matrix in the source domain. For the ‘square_loss’ and ‘kl_loss’, we provide hC2 from ot.gromov.init_matrix

  • M (array-like (ns,nt)) – Cost matrix between the features.

  • reg (float) – Regularization parameter.

  • alpha_min (float, optional) – Minimum value for alpha

  • alpha_max (float, optional) – Maximum value for alpha

  • nx (backend, optional) – If let to its default value None, a backend test will be conducted.

  • symmetric (bool, optional) – Either structures are to be assumed symmetric or not. Default value is False. Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).

Returns:

  • alpha (float) – The optimal step size of the FW

  • fc (int) – nb of function call. Useless here

  • cost_G (float) – The value of the cost for the next iteration

References

ot.gromov.solve_partial_gromov_linesearch(G, deltaG, cost_G, df_G, df_Gc, M, reg, alpha_min=None, alpha_max=None, nx=None, **kwargs)[source]

Solve the linesearch in the FW iterations of partial (F)GW following eq.5 of [29].

Parameters:
  • G (array-like, shape(ns,nt)) – The transport map at a given iteration of the FW

  • deltaG (array-like (ns,nt)) – Difference between the optimal map Gc found by linearization in the FW algorithm and the value at a given iteration

  • cost_G (float) – Value of the cost at G

  • df_G (array-like (ns,nt)) – Gradient of the GW cost at G

  • df_Gc (array-like (ns,nt)) – Gradient of the GW cost at Gc

  • M (array-like (ns,nt)) – Cost matrix between the features.

  • reg (float) – Regularization parameter.

  • alpha_min (float, optional) – Minimum value for alpha

  • alpha_max (float, optional) – Maximum value for alpha

  • nx (backend, optional) – If let to its default value None, a backend test will be conducted.

Returns:

  • alpha (float) – The optimal step size of the FW

  • fc (int) – nb of function call. Useless here

  • cost_G (float) – The value of the cost for the next iteration

  • df_G (array-like (ns,nt)) – Updated gradient of the GW cost

References

ot.gromov.solve_semirelaxed_gromov_linesearch(G, deltaG, cost_G, C1, C2, ones_p, M, reg, fC2t=None, alpha_min=None, alpha_max=None, nx=None, **kwargs)[source]

Solve the linesearch in the Conditional Gradient iterations for the semi-relaxed Gromov-Wasserstein divergence.

Parameters:
  • G (array-like, shape(ns,nt)) – The transport map at a given iteration of the FW

  • deltaG (array-like (ns,nt)) – Difference between the optimal map found by linearization in the FW algorithm and the value at a given iteration

  • cost_G (float) – Value of the cost at G

  • C1 (array-like (ns,ns), optional) – Transformed Structure matrix in the source domain. Note that for the ‘square_loss’ and ‘kl_loss’, we provide hC1 from ot.gromov.init_matrix_semirelaxed

  • C2 (array-like (nt,nt), optional) – Transformed Structure matrix in the source domain. Note that for the ‘square_loss’ and ‘kl_loss’, we provide hC2 from ot.gromov.init_matrix_semirelaxed

  • ones_p (array-like (ns,1)) – Array of ones of size ns

  • M (array-like (ns,nt)) – Cost matrix between the features.

  • reg (float) – Regularization parameter.

  • fC2t (array-like (nt,nt), optional) – Transformed Structure matrix in the source domain. Note that for the ‘square_loss’ and ‘kl_loss’, we provide fC2t from ot.gromov.init_matrix_semirelaxed. If fC2t is not provided, it is by default fC2t corresponding to the ‘square_loss’.

  • alpha_min (float, optional) – Minimum value for alpha

  • alpha_max (float, optional) – Maximum value for alpha

  • nx (backend, optional) – If let to its default value None, a backend test will be conducted.

Returns:

  • alpha (float) – The optimal step size of the FW

  • fc (int) – nb of function call. Useless here

  • cost_G (float) – The value of the cost for the next iteration

References

ot.gromov.tensor_product(constC, hC1, hC2, T, nx=None)[source]

Return the tensor for Gromov-Wasserstein fast computation

The tensor is computed as described in Proposition 1 Eq. (6) in [12]

Parameters:
  • constC (array-like, shape (ns, nt)) – Constant \(\mathbf{C}\) matrix in Eq. (6)

  • hC1 (array-like, shape (ns, ns)) – \(\mathbf{h1}(\mathbf{C1})\) matrix in Eq. (6)

  • hC2 (array-like, shape (nt, nt)) – \(\mathbf{h2}(\mathbf{C2})\) matrix in Eq. (6)

  • nx (backend, optional) – If let to its default value None, a backend test will be conducted.

Returns:

tens\(\mathcal{L}(\mathbf{C_1}, \mathbf{C_2}) \otimes \mathbf{T}\) tensor-matrix multiplication result

Return type:

array-like, shape (ns, nt)

References

ot.gromov.unbalanced_co_optimal_transport(X, Y, wx_samp=None, wx_feat=None, wy_samp=None, wy_feat=None, reg_marginals=10, epsilon=0, divergence='kl', unbalanced_solver='mm', alpha=0, M_samp=None, M_feat=None, rescale_plan=True, init_pi=None, init_duals=None, max_iter=100, tol=1e-07, max_iter_ot=500, tol_ot=1e-07, log=False, verbose=False, **kwargs_solve)[source]

Compute the unbalanced Co-Optimal Transport between two Euclidean point clouds (represented as matrices whose rows are samples and columns are the features/dimensions).

More precisely, this function returns the sample and feature transport plans between \((\mathbf{X}, \mathbf{w}_{xs}, \mathbf{w}_{xf})\) and \((\mathbf{Y}, \mathbf{w}_{ys}, \mathbf{w}_{yf})\), by solving the following problem using Block Coordinate Descent algorithm:

\[\begin{split}\mathop{\arg \min}_{\mathbf{P}, \mathbf{Q}} &\quad \sum_{i,j,k,l} (\mathbf{X}_{i,k} - \mathbf{Y}_{j,l})^2 \mathbf{P}_{i,j} \mathbf{Q}_{k,l} \\ &+ \rho_s \mathbf{Div}(\mathbf{P}_{\# 1} \mathbf{Q}_{\# 1}^T | \mathbf{w}_{xs} \mathbf{w}_{ys}^T) + \rho_f \mathbf{Div}(\mathbf{P}_{\# 2} \mathbf{Q}_{\# 2}^T | \mathbf{w}_{xf} \mathbf{w}_{yf}^T) \\ &+ \alpha_s \sum_{i,j} \mathbf{P}_{i,j} \mathbf{M^{(s)}}_{i, j} + \alpha_f \sum_{k, l} \mathbf{Q}_{k,l} \mathbf{M^{(f)}}_{k, l} \\ &+ \varepsilon_s \mathbf{Div}(\mathbf{P} | \mathbf{w}_{xs} \mathbf{w}_{ys}^T) + \varepsilon_f \mathbf{Div}(\mathbf{Q} | \mathbf{w}_{xf} \mathbf{w}_{yf}^T)\end{split}\]

Where:

  • \(\mathbf{X}\): Source input (arbitrary-size) matrix

  • \(\mathbf{Y}\): Target input (arbitrary-size) matrix

  • \(\mathbf{M^{(s)}}\): Additional sample matrix

  • \(\mathbf{M^{(f)}}\): Additional feature matrix

  • \(\mathbf{w}_{xs}\): Distribution of the samples in the source space

  • \(\mathbf{w}_{xf}\): Distribution of the features in the source space

  • \(\mathbf{w}_{ys}\): Distribution of the samples in the target space

  • \(\mathbf{w}_{yf}\): Distribution of the features in the target space

  • \(\mathbf{Div}\): Either Kullback-Leibler divergence or half-squared L2 norm.

Note

This function allows epsilon to be zero. In that case, unbalanced_method must be either “mm” or “lbfgsb”.

Parameters:
  • X ((n_sample_x, n_feature_x) array-like, float) – Source input matrix.

  • Y ((n_sample_y, n_feature_y) array-like, float) – Target input matrix.

  • wx_samp ((n_sample_x, ) array-like, float, optional (default = None)) – Histogram assigned on rows (samples) of matrix X. Uniform distribution by default.

  • wx_feat ((n_feature_x, ) array-like, float, optional (default = None)) – Histogram assigned on columns (features) of matrix X. Uniform distribution by default.

  • wy_samp ((n_sample_y, ) array-like, float, optional (default = None)) – Histogram assigned on rows (samples) of matrix Y. Uniform distribution by default.

  • wy_feat ((n_feature_y, ) array-like, float, optional (default = None)) – Histogram assigned on columns (features) of matrix Y. Uniform distribution by default.

  • reg_marginals (float or indexable object of length 1 or 2) – Marginal relaxation terms for sample and feature couplings. If reg_marginals is a scalar or an indexable object of length 1, then the same value is applied to both marginal relaxations.

  • epsilon (scalar or indexable object of length 2, float or int, optional (default = 0)) – Regularization parameters for entropic approximation of sample and feature couplings. Allow the case where epsilon contains 0. In that case, the MM solver is used by default instead of Sinkhorn solver. If epsilon is scalar, then the same value is applied to both regularization of sample and feature couplings.

  • divergence (string, optional (default = "kl")) –

    • If divergence = “kl”, then Div is the Kullback-Leibler divergence.

    • If divergence = “l2”, then Div is the half squared Euclidean norm.

  • unbalanced_solver (string, optional (default = "sinkhorn")) –

    Solver for the unbalanced OT subroutine.

    • If divergence = “kl”, then unbalanced_solver can be: “sinkhorn”, “sinkhorn_log”, “mm”, “lbfgsb”

    • If divergence = “l2”, then unbalanced_solver can be “mm”, “lbfgsb”

  • alpha (scalar or indexable object of length 2, float or int, optional (default = 0)) – Coeffficient parameter of linear terms with respect to the sample and feature couplings. If alpha is scalar, then the same alpha is applied to both linear terms.

  • M_samp ((n_sample_x, n_sample_y), float, optional (default = None)) – Sample matrix associated to the Wasserstein linear term on sample coupling.

  • M_feat ((n_feature_x, n_feature_y), float, optional (default = None)) – Feature matrix associated to the Wasserstein linear term on feature coupling.

  • rescale_plan (boolean, optional (default = True)) – If True, then rescale the sample and feature transport plans within each BCD iteration, so that they always have equal mass.

  • init_pi (tuple of two matrices of size (n_sample_x, n_sample_y) and) – (n_feature_x, n_feature_y), optional (default = None). Initialization of sample and feature couplings. Uniform distributions by default.

  • init_duals (tuple of two tuples ((n_sample_x, ), (n_sample_y, )) and ((n_feature_x, ), (n_feature_y, )), optional (default = None).) – Initialization of sample and feature dual vectors if using Sinkhorn algorithm. Zero vectors by default.

  • max_iter (int, optional (default = 100)) – Number of Block Coordinate Descent (BCD) iterations.

  • tol (float, optional (default = 1e-7)) – Tolerance of BCD scheme. If the L1-norm between the current and previous sample couplings is under this threshold, then stop BCD scheme.

  • max_iter_ot (int, optional (default = 100)) – Number of iterations to solve each of the two unbalanced optimal transport problems in each BCD iteration.

  • tol_ot (float, optional (default = 1e-7)) – Tolerance of unbalanced solver for each of the two unbalanced optimal transport problems in each BCD iteration.

  • log (bool, optional (default = False)) – If True then the cost and four dual vectors, including two from sample and two from feature couplings, are recorded.

  • verbose (bool, optional (default = False)) – If True then print the COOT cost at every multiplier of eval_bcd-th iteration.

Returns:

  • pi_samp ((n_sample_x, n_sample_y) array-like, float) – Sample coupling matrix.

  • pi_feat ((n_feature_x, n_feature_y) array-like, float) – Feature coupling matrix.

  • log (dictionary, optional) –

    Returned if log is True. The keys are:

    errorarray-like, float

    list of L1 norms between the current and previous sample coupling.

    duals_sample(n_sample_x, n_sample_y)-tuple, float

    Pair of dual vectors when solving OT problem w.r.t the sample coupling.

    duals_feature(n_feature_x, n_feature_y)-tuple, float

    Pair of dual vectors when solving OT problem w.r.t the feature coupling.

    linearfloat

    Linear part of the cost.

    ucootfloat

    Total cost.

References

ot.gromov.unbalanced_co_optimal_transport2(X, Y, wx_samp=None, wx_feat=None, wy_samp=None, wy_feat=None, reg_marginals=10, epsilon=0, divergence='kl', unbalanced_solver='sinkhorn', alpha=0, M_samp=None, M_feat=None, rescale_plan=True, init_pi=None, init_duals=None, max_iter=100, tol=1e-07, max_iter_ot=500, tol_ot=1e-07, log=False, verbose=False, **kwargs_solve)[source]

Compute the unbalanced Co-Optimal Transport between two Euclidean point clouds (represented as matrices whose rows are samples and columns are the features/dimensions).

More precisely, this function returns the unbalanced Co-Optimal Transport cost between \((\mathbf{X}, \mathbf{w}_{xs}, \mathbf{w}_{xf})\) and \((\mathbf{Y}, \mathbf{w}_{ys}, \mathbf{w}_{yf})\), by solving the following problem using Block Coordinate Descent algorithm:

\[\begin{split}\mathop{\min}_{\mathbf{P}, \mathbf{Q}} &\quad \sum_{i,j,k,l} (\mathbf{X}_{i,k} - \mathbf{Y}_{j,l})^2 \mathbf{P}_{i,j} \mathbf{Q}_{k,l} \\ &+ \rho_s \mathbf{Div}(\mathbf{P}_{\# 1} \mathbf{Q}_{\# 1}^T | \mathbf{w}_{xs} \mathbf{w}_{ys}^T) + \rho_f \mathbf{Div}(\mathbf{P}_{\# 2} \mathbf{Q}_{\# 2}^T | \mathbf{w}_{xf} \mathbf{w}_{yf}^T) \\ &+ \alpha_s \sum_{i,j} \mathbf{P}_{i,j} \mathbf{M^{(s)}}_{i, j} + \alpha_f \sum_{k, l} \mathbf{Q}_{k,l} \mathbf{M^{(f)}}_{k, l} \\ &+ \varepsilon_s \mathbf{Div}(\mathbf{P} | \mathbf{w}_{xs} \mathbf{w}_{ys}^T) + \varepsilon_f \mathbf{Div}(\mathbf{Q} | \mathbf{w}_{xf} \mathbf{w}_{yf}^T)\end{split}\]

Where:

  • \(\mathbf{X}\): Source input (arbitrary-size) matrix

  • \(\mathbf{Y}\): Target input (arbitrary-size) matrix

  • \(\mathbf{M^{(s)}}\): Additional sample matrix

  • \(\mathbf{M^{(f)}}\): Additional feature matrix

  • \(\mathbf{w}_{xs}\): Distribution of the samples in the source space

  • \(\mathbf{w}_{xf}\): Distribution of the features in the source space

  • \(\mathbf{w}_{ys}\): Distribution of the samples in the target space

  • \(\mathbf{w}_{yf}\): Distribution of the features in the target space

  • \(\mathbf{Div}\): Either Kullback-Leibler divergence or half-squared L2 norm.

Note

This function allows epsilon to be zero. In that case, unbalanced_method must be either “mm” or “lbfgsb”. Also the computation of gradients is only supported for KL divergence. The case of half squared-L2 norm uses those of KL divergence.

Parameters:
  • X ((n_sample_x, n_feature_x) array-like, float) – Source input matrix.

  • Y ((n_sample_y, n_feature_y) array-like, float) – Target input matrix.

  • wx_samp ((n_sample_x, ) array-like, float, optional (default = None)) – Histogram assigned on rows (samples) of matrix X. Uniform distribution by default.

  • wx_feat ((n_feature_x, ) array-like, float, optional (default = None)) – Histogram assigned on columns (features) of matrix X. Uniform distribution by default.

  • wy_samp ((n_sample_y, ) array-like, float, optional (default = None)) – Histogram assigned on rows (samples) of matrix Y. Uniform distribution by default.

  • wy_feat ((n_feature_y, ) array-like, float, optional (default = None)) – Histogram assigned on columns (features) of matrix Y. Uniform distribution by default.

  • reg_marginals (float or indexable object of length 1 or 2) – Marginal relaxation terms for sample and feature couplings. If reg_marginals is a scalar or an indexable object of length 1, then the same value is applied to both marginal relaxations.

  • epsilon (scalar or indexable object of length 2, float or int, optional (default = 0)) – Regularization parameters for entropic approximation of sample and feature couplings. Allow the case where epsilon contains 0. In that case, the MM solver is used by default instead of Sinkhorn solver. If epsilon is scalar, then the same value is applied to both regularization of sample and feature couplings.

  • divergence (string, optional (default = "kl")) –

    • If divergence = “kl”, then Div is the Kullback-Leibler divergence.

    • If divergence = “l2”, then Div is the half squared Euclidean norm.

  • unbalanced_solver (string, optional (default = "sinkhorn")) –

    Solver for the unbalanced OT subroutine.

    • If divergence = “kl”, then unbalanced_solver can be: “sinkhorn”, “sinkhorn_log”, “mm”, “lbfgsb”

    • If divergence = “l2”, then unbalanced_solver can be “mm”, “lbfgsb”

  • alpha (scalar or indexable object of length 2, float or int, optional (default = 0)) – Coeffficient parameter of linear terms with respect to the sample and feature couplings. If alpha is scalar, then the same alpha is applied to both linear terms.

  • M_samp ((n_sample_x, n_sample_y), float, optional (default = None)) – Sample matrix associated to the Wasserstein linear term on sample coupling.

  • M_feat ((n_feature_x, n_feature_y), float, optional (default = None)) – Feature matrix associated to the Wasserstein linear term on feature coupling.

  • rescale_plan (boolean, optional (default = True)) – If True, then rescale the transport plans in each BCD iteration, so that they always have equal mass.

  • init_pi (tuple of two matrices of size (n_sample_x, n_sample_y) and) – (n_feature_x, n_feature_y), optional (default = None). Initialization of sample and feature couplings. Uniform distributions by default.

  • init_duals (tuple of two tuples ((n_sample_x, ), (n_sample_y, )) and ((n_feature_x, ), (n_feature_y, )), optional (default = None).) – Initialization of sample and feature dual vectors if using Sinkhorn algorithm. Zero vectors by default.

  • max_iter (int, optional (default = 100)) – Number of Block Coordinate Descent (BCD) iterations.

  • tol (float, optional (default = 1e-7)) – Tolerance of BCD scheme. If the L1-norm between the current and previous sample couplings is under this threshold, then stop BCD scheme.

  • max_iter_ot (int, optional (default = 100)) – Number of iterations to solve each of the two unbalanced optimal transport problems in each BCD iteration.

  • tol_ot (float, optional (default = 1e-7)) – Tolerance of unbalanced solver for each of the two unbalanced optimal transport problems in each BCD iteration.

  • log (bool, optional (default = False)) – If True then the cost and four dual vectors, including two from sample and two from feature couplings, are recorded.

  • verbose (bool, optional (default = False)) – If True then print the COOT cost at every multiplier of eval_bcd-th iteration.

Returns:

  • ucoot (float) – UCOOT cost.

  • log (dictionary, optional) –

    Returned if log is True. The keys are:

    errorarray-like, float

    list of L1 norms between the current and previous sample coupling.

    duals_sample(n_sample_x, n_sample_y)-tuple, float

    Pair of dual vectors when solving OT problem w.r.t the sample coupling.

    duals_feature(n_feature_x, n_feature_y)-tuple, float

    Pair of dual vectors when solving OT problem w.r.t the feature coupling.

    linearfloat

    Linear part of UCOOT cost.

    ucootfloat

    UCOOT cost.

    backend

    The proper backend for all input arrays

References

ot.gromov.uot_cost_matrix(data, pi, tuple_p, hyperparams, divergence, reg_type, nx=None)[source]

The Block Coordinate Descent algorithm for FUGW and UCOOT requires solving an UOT problem in each iteration. In particular, we need to specify the following inputs:

  • Cost matrix

  • Hyperparameters (marginal-relaxations and regularization)

  • Reference measures in the marginal-relaxation and regularization terms

This method returns the cost matrix. The method ot.gromov.uot_parameters_and_measures returns the rest of the inputs.

Parameters:
  • data (tuple of arrays) – vector or matrix

  • pi (array-like) – vector or matrix

  • tuple_p (tuple of arrays) – Tuple of reference measures in the marginal-relaxation terms w.r.t the (either sample or feature) coupling

  • hyperparams (tuple of floats) – Hyperparameters of marginal-relaxation and regularization terms in the fused unbalanced across-domain divergence

  • divergence (string, default = "kl") – Bregman divergence, either “kl” (Kullback-Leibler divergence) or “l2” (half-squared L2 divergence)

  • reg_type (string,) –

    Type of regularization term in the fused unbalanced across-domain divergence

    • reg_type = “joint” corresponds to FUGW

    • reg_type = “independent” corresponds to UCOOT

  • nx (backend, optional) – If let to its default value None, a backend test will be conducted.

Return type:

Cost matrix of the UOT subroutine for UCOOT and FUGW

ot.gromov.uot_parameters_and_measures(pi, tuple_weights, hyperparams, reg_type, divergence, nx)[source]

The Block Coordinate Descent algorithm for FUGW and UCOOT requires solving an UOT problem in each iteration. In particular, we need to specify the following inputs:

  • Cost matrix

  • Hyperparameters (marginal-relaxations and regularization)

  • Reference measures in the marginal-relaxation and regularization terms

The method ot.gromov.uot_cost_matrix returns the cost matrix. This method returns the rest of the inputs.

Parameters:
  • pi (array-like) – vector or matrix

  • tuple_weights (tuple of arrays) – Tuple of reference measures in the marginal-relaxation and regularization terms w.r.t the (either sample or feature) coupling

  • hyperparams (tuple of floats) – Hyperparameters of marginal-relaxation and regularization terms in the fused unbalanced across-domain divergence

  • reg_type (string,) –

    Type of regularization term in the fused unbalanced across-domain divergence

    • reg_type = “joint” corresponds to FUGW

    • reg_type = “independent” corresponds to UCOOT

  • divergence (string, default = "kl") – Bregman divergence, either “kl” (Kullback-Leibler divergence) or “l2” (half-squared L2 divergence)

  • nx (backend, optional) – If let to its default value None, a backend test will be conducted.

Return type:

Tuple of hyperparameters and distributions (weights)

ot.gromov.update_barycenter_feature(Ts, Ys, lambdas, p=None, loss_fun='square_loss', target=True, check_zeros=True, nx=None)[source]

Updates the feature with respect to the S \(\mathbf{T}_s\) couplings calculated at each iteration of variants of the FGW barycenter problem with inner wasserstein loss loss_fun (e.g FGW [24], srFGW [48]). If target=True the barycenter is considered as the target else as the source.

Parameters:
  • Ts (list of S array-like of shape (ns, N) if target=True else (N, ns).) – The S \(\mathbf{T}_s\) couplings calculated at each iteration.

  • Ys (list of S array-like, shape (ns, d)) – Feature matrices.

  • lambdas (list of float) – List of the S spaces’ weights

  • p (array-like, shape (N,) or (S,N)) – Masses or list of masses in the targeted barycenter.

  • loss_fun (str, optional. Default is 'square_loss') – Name of loss function to use in [‘square_loss’].

  • target (bool, optional. Default is True.) – Whether the barycenter is positioned as target (True) or source (False).

  • check_zeros (bool, optional. Default is True.) – Whether to check if marginals on the barycenter contains zeros or not. Can be set to False to gain time if marginals are known to be positive.

  • nx (backend, optional) – If let to its default value None, a backend test will be conducted.

Returns:

X

Return type:

array-like, shape (N, d)

References

ot.gromov.update_barycenter_structure(Ts, Cs, lambdas, p=None, loss_fun='square_loss', target=True, check_zeros=True, nx=None)[source]

Updates \(\mathbf{C}\) according to the inner loss L with the S \(\mathbf{T}_s\) couplings calculated at each iteration of variants of the GW barycenter problem (e.g GW [12], srGW [48]). If target=True it solves for:

\[\mathbf{C}^* = \mathop{\arg \min}_{\mathbf{C}\in \mathbb{R}^{N \times N}} \quad \sum_s \lambda_s \sum_{i,j,k,l} L(\mathbf{C}^{(s)}_{i,k}, \mathbf{C}_{j,l}) \mathbf{T}^{(s)}_{i,j} \mathbf{T}^{(s)}_{k,l}\]

Else it solves the symmetric problem:

\[\mathbf{C}^* = \mathop{\arg \min}_{\mathbf{C}\in \mathbb{R}^{N \times N}} \quad \sum_s \lambda_s \sum_{i,j,k,l} L(\mathbf{C}_{j,l}, \mathbf{C}^{(s)}_{i,k}) \mathbf{T}^{(s)}_{i,j} \mathbf{T}^{(s)}_{k,l}\]

Where :

  • \(\mathbf{C}^{(s)}\): pairwise matrix in the s^{th} source space .

  • \(\mathbf{C}\): pairwise matrix in the target space.

  • \(L\): inner divergence for the GW loss

Parameters:
  • Ts (list of S array-like of shape (ns, N) if target=True else (N, ns).) – The S \(\mathbf{T}_s\) couplings calculated at each iteration.

  • Cs (list of S array-like, shape(ns, ns)) – Metric cost matrices.

  • lambdas (list of float,) – List of the S spaces’ weights.

  • p (array-like, shape (N,) or (S,N)) – Masses or list of masses in the targeted barycenter.

  • loss_fun (str, optional. Default is 'square_loss') – Name of loss function to use in [‘square_loss’, ‘kl_loss’].

  • target (bool, optional. Default is True.) – Whether the barycenter is positioned as target (True) or source (False).

  • check_zeros (bool, optional. Default is True.) – Whether to check if marginals on the barycenter contains zeros or not. Can be set to False to gain time if marginals are known to be positive.

  • nx (backend, optional) – If let to its default value None, a backend test will be conducted.

Returns:

C – Updated \(\mathbf{C}\) matrix.

Return type:

array-like, shape (nt, nt)

References