rnanorm.TMM

class rnanorm.TMM(m_trim=0.3, a_trim=0.05)[source]

Trimmed mean of M-values (TMM) normalization.

In an RNA-seq experiment a small fraction of genes is sometimes extremely overexpressed in some samples but not in others . This can artificially inflate library size and therefore (after library size normalization) cause the remaining genes to be considered under-sampled in those samples. Unless this effect is adjusted for, those genes may falsely appear to be down-regulated in that sample. TMM is one of the approaches to correct for such imbalance. For more explanation on the topic check EdgeR docs.

Procedure for normalization is described in Robinson & Oshlack, 2010, but in short:

  • Use raw counts

  • Define the reference sample (self.ref_)

  • Compute scaling factors
    • Compute M values, filter by double trimming with m_trim

    • Compute A values, filter by double trimming with m_trim

    • Compute factors as weighted sum of M values

    • Factors = 2 ** factors

    • Rescale factors so that their geometric mean is 1

  • “Adjusted library size” = library size * normalization factors

  • Compute CPM normalization with “Adjusted library size”

This implementation is based on edgeR’s and is validated to be identical to it to at least 10 decimal places.

Parameters:
  • m_trim (float) – Keep genes that are within (m_trim, 1 - m_trim) percentile of M-values.

  • a_trim (float) – Keep genes that are within (a_trim, 1 - a_trim) percentile of A-values.

Examples

>>> from rnanorm.datasets import load_toy_data
>>> from rnanorm import TMM
>>> X = load_toy_data().exp
>>> X
          Gene_1  Gene_2  Gene_3  Gene_4  Gene_5
Sample_1     200     300     500    2000    7000
Sample_2     400     600    1000    4000   14000
Sample_3     200     300     500    2000   17000
Sample_4     200     300     500    2000    2000
>>> TMM().set_output(transform="pandas").fit_transform(X)
           Gene_1   Gene_2   Gene_3    Gene_4     Gene_5
Sample_1  20000.0  30000.0  50000.0  200000.0   700000.0
Sample_2  20000.0  30000.0  50000.0  200000.0   700000.0
Sample_3  20000.0  30000.0  50000.0  200000.0  1700000.0
Sample_4  20000.0  30000.0  50000.0  200000.0   200000.0
__init__(m_trim=0.3, a_trim=0.05)[source]

Initialize class.

Methods

__init__([m_trim, a_trim])

Initialize class.

fit(X[, y])

Fit.

fit_transform(X[, y])

Fit to data, then transform it.

get_feature_names_out([input_features])

Get output feature names for transformation.

get_metadata_routing()

Get metadata routing of this object.

get_norm_factors(X)

Get UQ normalization factors (normalized with geometric mean).

get_params([deep])

Get parameters for this estimator.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Transform.