# ot.datasets

Simple example datasets

## Functions

ot.datasets.make_1D_gauss(n, m, s)[source]

return a 1D histogram for a gaussian distribution (n bins, mean m and std s)

Parameters:
• n (int) – number of bins in the histogram

• m (float) – mean value of the gaussian distribution

• s (float) – standard deviation of the gaussian distribution

Returns:

h – 1D histogram for a gaussian distribution

Return type:

ndarray (n,)

### Examples using ot.datasets.make_1D_gauss

Optimal Transport for 1D distributions

Optimal Transport for 1D distributions

Smooth and sparse OT example

Smooth and sparse OT example

Regularized OT with generic solver

Regularized OT with generic solver

OT distances in 1D

OT distances in 1D

Optimal Transport solvers comparison

Optimal Transport solvers comparison

Wasserstein 1D (flow and barycenter) with PyTorch

Wasserstein 1D (flow and barycenter) with PyTorch

1D Wasserstein barycenter demo

1D Wasserstein barycenter demo

Debiased Sinkhorn barycenter demo

Debiased Sinkhorn barycenter demo

1D Wasserstein barycenter: exact LP vs entropic regularization

1D Wasserstein barycenter: exact LP vs entropic regularization

Screened optimal transport (Screenkhorn)

Screened optimal transport (Screenkhorn)

Low rank Sinkhorn

Low rank Sinkhorn

Computing d-dimensional Barycenters via d-MMOT

Computing d-dimensional Barycenters via d-MMOT

1D Unbalanced optimal transport

1D Unbalanced optimal transport

1D Wasserstein barycenter demo for Unbalanced distributions

1D Wasserstein barycenter demo for Unbalanced distributions
ot.datasets.make_2D_samples_gauss(n, m, sigma, random_state=None)[source]

Return n samples drawn from 2D gaussian $$\mathcal{N}(m, \sigma)$$

Parameters:
• n (int) – number of samples to make

• m (ndarray, shape (2,)) – mean value of the gaussian distribution

• sigma (ndarray, shape (2, 2)) – covariance matrix of the gaussian distribution

• random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Returns:

X – n samples drawn from $$\mathcal{N}(m, \sigma)$$.

Return type:

ndarray, shape (n, 2)

### Examples using ot.datasets.make_2D_samples_gauss

Optimal Transport between 2D empirical distributions

Optimal Transport between 2D empirical distributions

Gromov-Wasserstein example

Gromov-Wasserstein example

Weak Optimal Transport VS exact Optimal Transport

Weak Optimal Transport VS exact Optimal Transport

Sliced Wasserstein Distance on 2D distributions

Sliced Wasserstein Distance on 2D distributions

2D examples of exact and entropic unbalanced optimal transport

2D examples of exact and entropic unbalanced optimal transport

Partial Wasserstein and Gromov-Wasserstein example

Partial Wasserstein and Gromov-Wasserstein example

Regularization path of l2-penalized unbalanced optimal transport

Regularization path of l2-penalized unbalanced optimal transport
ot.datasets.make_data_classif(dataset, n, nz=0.5, theta=0, p=0.5, random_state=None, **kwargs)[source]

Dataset generation for classification problems

Parameters:
• dataset (str) – type of classification problem (see code)

• n (int) – number of training samples

• nz (float) – noise level (>0)

• p (float) – proportion of one class in the binary setting

• random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Returns:

• X (ndarray, shape (n, d)) – n observation of size d

• y (ndarray, shape (n,)) – labels of the samples.

### Examples using ot.datasets.make_data_classif

Dual OT solvers for entropic and quadratic regularized OT with Pytorch

Dual OT solvers for entropic and quadratic regularized OT with Pytorch

OT with Laplacian regularization for domain adaptation

OT with Laplacian regularization for domain adaptation

OT mapping estimation for domain adaptation

OT mapping estimation for domain adaptation

OTDA unsupervised vs semi-supervised setting

OTDA unsupervised vs semi-supervised setting

OT for domain adaptation on empirical distributions

OT for domain adaptation on empirical distributions

OT for multi-source target shift

OT for multi-source target shift
ot.datasets.make_1D_gauss(n, m, s)[source]

return a 1D histogram for a gaussian distribution (n bins, mean m and std s)

Parameters:
• n (int) – number of bins in the histogram

• m (float) – mean value of the gaussian distribution

• s (float) – standard deviation of the gaussian distribution

Returns:

h – 1D histogram for a gaussian distribution

Return type:

ndarray (n,)

ot.datasets.make_2D_samples_gauss(n, m, sigma, random_state=None)[source]

Return n samples drawn from 2D gaussian $$\mathcal{N}(m, \sigma)$$

Parameters:
• n (int) – number of samples to make

• m (ndarray, shape (2,)) – mean value of the gaussian distribution

• sigma (ndarray, shape (2, 2)) – covariance matrix of the gaussian distribution

• random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Returns:

X – n samples drawn from $$\mathcal{N}(m, \sigma)$$.

Return type:

ndarray, shape (n, 2)

ot.datasets.make_data_classif(dataset, n, nz=0.5, theta=0, p=0.5, random_state=None, **kwargs)[source]

Dataset generation for classification problems

Parameters:
• dataset (str) – type of classification problem (see code)

• n (int) – number of training samples

• nz (float) – noise level (>0)

• p (float) – proportion of one class in the binary setting

• random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Returns:

• X (ndarray, shape (n, d)) – n observation of size d

• y (ndarray, shape (n,)) – labels of the samples.