rnanorm.UQ

class rnanorm.UQ[source]

Upper quartile (UQ) normalization.

In an RNA-seq experiment a small fraction of genes is sometimes extremely overexpressed in some samples but not in others. This can artificially inflate library size and therefore (after library size normalization) cause the remaining genes to be considered under-sampled in those samples. Unless this effect is adjusted for, those genes may falsely appear to be down-regulated in that sample. Upper quartile is one of the approaches to correct for such imbalance. For more explanation on the topic check EdgeR docs.

Procedure for normalization is described in Bullard et al. 2010, but in short:

  • Use raw counts as input

  • Compute scaling factors
    • Remove genes that have zero count in all samples

    • Scaling factor is expression at the 75th percentile

    • Rescale factors so that their geometric mean is 1

  • “Adjusted library size” = library size * factor

  • Return CPM normalization with “Adjusted library size”

This implementation is based on edgeR and has been validated to be identical to it to at least 10 decimal places.

Examples

>>> from rnanorm.datasets import load_toy_data
>>> from rnanorm import UQ
>>> X = load_toy_data().exp
>>> X
          Gene_1  Gene_2  Gene_3  Gene_4  Gene_5
Sample_1     200     300     500    2000    7000
Sample_2     400     600    1000    4000   14000
Sample_3     200     300     500    2000   17000
Sample_4     200     300     500    2000    2000
>>> UQ().set_output(transform="pandas").fit_transform(X)
           Gene_1   Gene_2   Gene_3    Gene_4     Gene_5
Sample_1  20000.0  30000.0  50000.0  200000.0   700000.0
Sample_2  20000.0  30000.0  50000.0  200000.0   700000.0
Sample_3  20000.0  30000.0  50000.0  200000.0  1700000.0
Sample_4  20000.0  30000.0  50000.0  200000.0   200000.0
__init__()

Methods

__init__()

fit(X[, y])

Fit.

fit_transform(X[, y])

Fit to data, then transform it.

get_feature_names_out([input_features])

Get output feature names for transformation.

get_metadata_routing()

Get metadata routing of this object.

get_norm_factors(X)

Get UQ normalization factors (normalized with geometric mean).

get_params([deep])

Get parameters for this estimator.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Transform.