rnanorm.UQ¶
- class rnanorm.UQ[source]¶
Upper quartile (UQ) normalization.
In an RNA-seq experiment a small fraction of genes is sometimes extremely overexpressed in some samples but not in others. This can artificially inflate library size and therefore (after library size normalization) cause the remaining genes to be considered under-sampled in those samples. Unless this effect is adjusted for, those genes may falsely appear to be down-regulated in that sample. Upper quartile is one of the approaches to correct for such imbalance. For more explanation on the topic check EdgeR docs.
Procedure for normalization is described in Bullard et al. 2010, but in short:
Use raw counts as input
- Compute scaling factors
Remove genes that have zero count in all samples
Scaling factor is expression at the 75th percentile
Rescale factors so that their geometric mean is 1
“Adjusted library size” = library size * factor
Return CPM normalization with “Adjusted library size”
This implementation is based on edgeR and has been validated to be identical to it to at least 10 decimal places.
Examples
>>> from rnanorm.datasets import load_toy_data >>> from rnanorm import UQ >>> X = load_toy_data().exp >>> X Gene_1 Gene_2 Gene_3 Gene_4 Gene_5 Sample_1 200 300 500 2000 7000 Sample_2 400 600 1000 4000 14000 Sample_3 200 300 500 2000 17000 Sample_4 200 300 500 2000 2000 >>> UQ().set_output(transform="pandas").fit_transform(X) Gene_1 Gene_2 Gene_3 Gene_4 Gene_5 Sample_1 20000.0 30000.0 50000.0 200000.0 700000.0 Sample_2 20000.0 30000.0 50000.0 200000.0 700000.0 Sample_3 20000.0 30000.0 50000.0 200000.0 1700000.0 Sample_4 20000.0 30000.0 50000.0 200000.0 200000.0
- __init__()¶
Methods
__init__
()fit
(X[, y])Fit.
fit_transform
(X[, y])Fit to data, then transform it.
get_feature_names_out
([input_features])Get output feature names for transformation.
get_metadata_routing
()Get metadata routing of this object.
get_norm_factors
(X)Get UQ normalization factors (normalized with geometric mean).
get_params
([deep])Get parameters for this estimator.
set_output
(*[, transform])Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X)Transform.