Matching modalities

Alignment of cellular profiles from two different modalities

Description

Cellular function is regulated by the complex interplay of different types of biological molecules (DNA, RNA, proteins, etc.), which determine the state of a cell. Several recently described technologies allow for simultaneous measurement of different aspects of cellular state. For example, sci-CAR jointly profiles RNA expression and chromatin accessibility on the same cell and CITE-seq measures surface protein abundance and RNA expression from each cell. These technologies enable us to better understand cellular function, however datasets are still rare and there are tradeoffs that these measurements make for to profile multiple modalities.

Joint methods can be more expensive or lower throughput or more noisy than measuring a single modality at a time. Therefore it is useful to develop methods that are capable of integrating measurements of the same biological system but obtained using different technologies on different cells.

Here the goal is to learn a latent space where cells profiled by different technologies in different modalities are matched if they have the same state. We use jointly profiled data as ground truth so that we can evaluate when the observations from the same cell acquired using different modalities are similar. A perfect result has each of the paired observations sharing the same coordinates in the latent space.

Summary

viewof color_by_rank = Inputs.toggle({label: "Color by rank"})
viewof scale_column = Inputs.toggle({label: "Rescale per column"})

funkyheatmap(
    funky_heatmap_args.data,
    funky_heatmap_args.columns,
    funky_heatmap_args.column_info,
    funky_heatmap_args.column_groups,
    funky_heatmap_args.palettes,
    funky_heatmap_args.expand,
    funky_heatmap_args.col_annot_offset,
    funky_heatmap_args.add_abc,
    scale_column,
    {
        fontSize: 14,
        rowHeight: 26,
        rootStyle: 'max-width: none',
        colorByRank: color_by_rank
    }
);

OJS Runtime Error

Failed to fetch dynamically imported module

Figure 1: Overview of the results per method. This figures shows the mean of the scaled scores (group Overall), the mean scores per dataset (group Dataset) and the mean scores per metric (group Metric).

funkyheatmap = (await require('d3@7').then(d3 => {
  window.d3 = d3;
  return import('https://unpkg.com/funkyheatmap-js@0.1.7');
})).default;

OJS Error

TypeError: Failed to fetch dynamically imported module: https://unpkg.com/funkyheatmap-js@0.1.7

Metrics

kNN Area Under the Curve¹: Let $f (i) \in F$ be the scRNA-seq measurement of cell $i$ , and $g (i) \in G$ be the scATAC- seq measurement of cell $i$ . kNN-AUC calculates the average percentage overlap of neighborhoods of $f (i)$ in $F$ with neighborhoods of $g (i)$ in $G$ . Higher is better.

Mean squared error²: Mean squared error (MSE) is the average distance between each pair of matched observations of the same cell in the learned latent space. Lower is better.

Results

Results table of the scores per method, dataset and metric (after scaling). Use the filters to make a custom subselection of methods and datasets. The “Overall mean” dataset is the mean value across all datasets.

Filters Active - 1


Harmonic Alignment (log scran) ¹4
Harmonic Alignment (sqrt CP10k) ¹4
Mutual Nearest Neighbors (log CP10k) ⁴4
Mutual Nearest Neighbors (log scran) ⁴4
Procrustes superimposition ³4


CITE-seq Cord Blood Mononuclear Cells ⁵5
Overall mean5
sciCAR Cell Lines ⁶5
sciCAR Mouse Kidney ⁶5

Method	Dataset	Mean score	kNN Area Under the Curve	Mean squared error	Runtime (s)	CPU (%)	Memory (GB)
Procrustes superimposition ³	Overall mean	0.21	0.15	0.27	317	146	0.70
Mutual Nearest Neighbors (log scran) ⁴	Overall mean	0.05	0.03	0.08	724	98	3.81
Mutual Nearest Neighbors (log CP10k) ⁴	Overall mean	0.04	0.05	0.04	577	93	2.02
Harmonic Alignment (log scran) ¹	Overall mean	0.01	0.02	0.01	975	137	3.81
Harmonic Alignment (sqrt CP10k) ¹	Overall mean	0.00	0.00	0.01	593	206	0.97

Details

Methods

Harmonic Alignment (log scran)¹: Harmonic alignment embeds cellular data from each modality into a common space by computing a mapping between the 100-dimensional diffusion maps of each modality. This mapping is computed by computing an isometric transformation of the eigenmaps, and concatenating the resulting diffusion maps together into a joint 200-dimensional space. This joint diffusion map space is used as output for the task. Links: Docs.

Harmonic Alignment (sqrt CP10k)¹: Harmonic alignment embeds cellular data from each modality into a common space by computing a mapping between the 100-dimensional diffusion maps of each modality. This mapping is computed by computing an isometric transformation of the eigenmaps, and concatenating the resulting diffusion maps together into a joint 200-dimensional space. This joint diffusion map space is used as output for the task. Links: Docs.

Mutual Nearest Neighbors (log CP10k)⁴: Mutual nearest neighbors (MNN) embeds cellular data from each modality into a common space by computing a mapping between modality-specific 100-dimensional SVD embeddings. The embeddings are integrated using the FastMNN version of the MNN algorithm, which generates an embedding of the second modality mapped to the SVD space of the first. This corrected joint SVD space is used as output for the task. Links: Docs.

Mutual Nearest Neighbors (log scran)⁴: Mutual nearest neighbors (MNN) embeds cellular data from each modality into a common space by computing a mapping between modality-specific 100-dimensional SVD embeddings. The embeddings are integrated using the FastMNN version of the MNN algorithm, which generates an embedding of the second modality mapped to the SVD space of the first. This corrected joint SVD space is used as output for the task. Links: Docs.

Procrustes superimposition³: Procrustes superimposition embeds cellular data from each modality into a common space by aligning the 100-dimensional SVD embeddings to one another by using an isomorphic transformation that minimizes the root mean squared distance between points. The unmodified SVD embedding and the transformed second modality are used as output for the task. Links: Docs.

Random Features⁷: 20-dimensional SVD is computed on the first modality, and is then randomly permuted twice, once for use as the output for each modality, producing random features with no correlation between modalities. Links: Docs.

True Features⁷: 20-dimensional SVD is computed on the first modality, and this same embedding is used as output for both modalities, producing perfectly aligned features from each modality. Links: Docs.

Baseline methods

Random Features: 20-dimensional SVD is computed on the first modality, and is then randomly permuted twice, once for use as the output for each modality, producing random features with no correlation between modalities.

True Features: 20-dimensional SVD is computed on the first modality, and this same embedding is used as output for both modalities, producing perfectly aligned features from each modality.

Datasets

CITE-seq Cord Blood Mononuclear Cells⁵: 8k cord blood mononuclear cells sequenced by CITEseq, a multimodal addition to the 10x scRNA-seq platform that allows simultaneous measurement of RNA and protein.

sciCAR Cell Lines⁶: 5k cells from a time-series of dexamethasone treatment sequenced by sci-CAR, a combinatorial indexing-based co-assay that jointly profiles chromatin accessibility and mRNA.

sciCAR Mouse Kidney⁶: 11k cells from adult mouse kidney sequenced by sci-CAR, a combinatorial indexing-based co-assay that jointly profiles chromatin accessibility and mRNA.

Download raw data

Task info Method info Metric info Dataset info Results Quality control

Quality control results

✓ All checks succeeded!

Visualization of raw results