Single-cell spatial landscapes of the lung tumour immune microenvironment


Clinical cohort

A cohort of 416 patients with LUAD were included in this study with follow-up time ranging from February 1996 to July 2020. For the validation cohort, 60 patients with LUAD with follow-up time ranging from February 2012 to May 2022 were included with two distinct cores per patient. All samples obtained were primary treatment-naive LUADs diagnosed by a board-certified pathologist following surgical resection or biopsy. Clinical information on all patients included can be found in Supplementary Tables 1 and 12. Tissue microarrays were constructed by selecting one 1-mm2 core from the surgical tumour specimen. Patient samples and clinical information were obtained following written informed patient consent. The protocols for human sample biobanking were approved (ethics, scientific and final) through the IUCPQ Biobank, protocol number IRB #2022-3474, 22090, and the MUHC protocol numbers IRB #2014-1119 and 2019-5253.

Sample staining and IMC

Formalin-fixed paraffin-embedded (FFPE) slides were deparaffinized at 70 °C by incubation in EZ Prep solution (Roche Diagnostics) followed by antigen retrieval at 95 °C in standard cell conditioning 1 solution (Roche Diagnostics). The Ventana Discovery Ultra auto-stainer platform (Roche Diagnostics) was used for antigen retrieval. Slides were rinsed with 1× PBS and incubated for 45 min in Dako serum-free protein block solution (Agilent). Slides were stained with a cocktail containing metal-tagged antibodies at optimized dilutions overnight at 4 °C. All conjugations were performed by the Single Cell and Imaging Mass Cytometry Platform at the Goodman Cancer Institute (McGill University), using Maxpar Conjugation Kits (Fluidigm). Information on the antibodies used can be found in Supplementary Table 2. Slides were then washed with 0.2% Triton X and 1× PBS. An optimized dilution of the secondary antibody cocktail containing metal-conjugated anti-biotin was prepared in Dako antibody diluent. After a 1-h incubation, slides were washed with 0.2% Triton X and 1× PBS. Before IMC acquisition, Cell-ID Intercalator-Ir (Fluidigm) at a dilution of 1:400 was used to counterstain slides in 1× PBS for 30 min at room temperature. Slides were then rinsed for 5 min with distilled water and air-dried. IMC images were acquired at a resolution of roughly 1 μm. Cores were laser-ablated at a frequency of 200 Hz using the Hyperion Imaging System (Fluidigm) and raw data were compiled using the Fluidigm commercial acquisition software. Of note, in our validation cohort, we stained with alpha cleaved H3 (176Yb) instead of histone H3 (176Yb). Accordingly, this marker was excluded from validating our deep-learning predictions of progression.

Antibody optimization

Antibodies were optimized on control tissues including the spleen, tonsil, appendix, placenta, thymus, normal lung and LUAD. Multiplex quality-control staining of positive and negative control tissue can be seen in Extended Data Figs. 24, with four representative images staining for each of the 35 markers in our panel.

Data transformation and normalization

Data presented were not transformed. All analyses were based on raw IMC measurements. For heatmap visualization, expression data were normalized to the 95th percentile and z-scored cluster means were plotted. Single-cell marker expressions were summarized by mean pixel values for each channel.

Cell segmentation and lineage assignment

All markers underwent a staining quality check before cell segmentation (Extended Data Figs. 24). A small number of markers did not consistently stain every sample in our cohort, so we chose not to make any conclusions based on those markers (GM-CSFR, PD-1, PD-L1 and B7-H3). Note that CD163 (a putative ‘M2-like’ marker) was chosen to subdivide macrophage populations on the basis that this marker is often upregulated in tumour-associated macrophages and has been used to categorize macrophages in multiplex imaging studies14,42,43. Although the terms ‘M1/pro-inflammatory’ and ‘M2/anti-inflammatory’ have traditionally been used to classify macrophage activation states, these terms are outdated and were therefore avoided44,45. Using a novel cell segmentation pipeline that combines classical and modern machine-learning-based computer vision algorithms, we segmented all cells contained within the IMC images. The model used is a fully automated hybrid cell detection model that eliminates subjective bias and enables high-throughput image segmentation. It allows us to accurately distinguish cells across diverse tissue microenvironments and resolve low-resolution structures. The details of our image segmentation approach can be found here: Owing to existing phenotyping challenges for highly multiplexed imaging, we created a cell phenotyping pipeline to assign cell phenotypes. Our strategy relies on canonical lineage markers and uses a supervised hierarchal approach that integrates the staining quality, the expected population abundance and cell lineage maturation to assign cells. We used k-means clustering46 and a mixture of generalized Gaussian models47 to generate a mask or level for each marker within a multi-level image stack created based on staining intensity. This allowed us to evaluate the existence of a marker at a particular location. Each marker in our panel was assessed using six levels and the appropriate mask was subsequently manually curated for each marker. Each mask is produced using the following procedure:

  1. (1)

    The greyscale image channel is convolved with a median filter with a particular window size (3 × 3).

  2. (2)

    Each pixel in the image is clustered into six groups of intensity levels using the k-means algorithm.

  3. (3)

    For each channel, we then selected all groups up to a particular level as foreground (1) and the rest as background (0).

  4. (4)

    To obtain smoother binary masks, we also applied a morphological blob removal process in which binary blobs of a particular area are removed from masks to avoid noisy regions.

  5. (5)

    To further refine the accuracy of select markers, additional channel-specific morphological operations were applied by computing an additional binary mask obtained using the adaptive binarization method with a sensitivity of 0.4. This mask is then amalgamated with the mask obtained in step 4.

As a formula, for each cell ({c}_{i}), we consider the curated mask for each lineage marker ({M}_{k}), where (k=1,ldots ,n) and n is the number of lineage markers. Let us assume ({p}_{{c}_{i}}^{j}) be the jth pixel that lies in the surrounding of ({c}_{i}) and each pixel has the following presence vector based on the lineage markers:

$$E({p}_{{c}_{i}}^{j})={{p}_{{M}_{1}}^{j},{p}_{{M}_{2}}^{j},ldots ,{p}_{{M}_{n}}^{j}}$$

where ({p}_{{M}_{i}}={0,text{or},1}), which determines whether the pixel ({p}_{{c}_{i}}^{j}) is positive for a particular marker. Next, to determine whether each pixel within a cell is positive or negative for a given marker, we determined the majority vector by summing the presence of all vectors as:

$${M}_{{c}_{i}}=left{mathop{sum }limits_{j=1}^{{N}_{{c}_{i}}}{p}_{{M}_{1}}^{j},mathop{sum }limits_{j=1}^{{N}_{{c}_{i}}}{p}_{{M}_{2}}^{j},ldots ,mathop{sum }limits_{j=1}^{{N}_{{c}_{i}}}{p}_{{M}_{n}}^{j}right}$$

where ({N}_{{c}_{i}}) is the number of pixels in the cell ({c}_{i}). The maximum value in vector ({M}_{{c}_{i}}) determines the cell-type assignment. Cell lineages were assigned in rank priority order (Extended Data Fig. 1c). See the ‘Code availability’ section for additional details.

Cell–cell pairwise interaction

We performed a permutation-test-based analysis of spatial single-cell interactions to identify significant pairwise interaction–avoidance between cells12. Interacting cells were defined as those within six pixels. P values less than 0.01 were deemed significant.

Neighbourhood identification

To generate CNs, we used a ‘window’ capture strategy consisting of the number of cells (n) in closest proximity to a given cell as previously described14. Each window is a frequency vector consisting of the types of X (as indicated) closest cells to a given cell. Obtaining all the window vectors for each cell, initial cells (Extended Data Fig. 7a) were clustered using Scikit-learn, a software machine-learning library for Python, and MiniBatchKMeans clustering algorithm version 0.24.2 with default batch size = 100 and random_state = 0. Subsequent CN analysis was performed using the MiniBatchKMeans clustering algorithm version 1.1.2 with default batch size = 1,024 and random_state = 0. Every cell was subsequently allocated to a CN based on their defining window. The prevalence of each neighbourhood in each core was normalized so that the sum of neighbourhood prevalence for that core was 100%. Values were then z-scored and cores with a z-score above or equal to 0 and below 0 were compared for survival outcomes.


All t-SNE plots were generated in MATLAB (version 2019b) using default parameters. For visualization, expression data were normalized to the 95th percentile.

Deep learning

All deep-learning analysis steps were performed in Python (version 3.7.12). We used the TensorFlow (version 2.8.0) framework alongside Keras, which now acts as an interface for the TensorFlow library. We have two modes of data for our experiments: (1) raw IMC images, and (2) cell frequencies obtained from cell phenotyping. For raw IMC images, the pretrained ResNet-50 model with weights pretrained on ImageNet is first utilized to extract embeddings from each channel within the multiplex IMC channels. Each channel is fed to the three-channels of ResNet-50 and the embeddings are computed before the classificationlayers are obtained. Each channel produces an embedding vector size of 2,048 and then these are all concatenated into a single vector of features representing that specific core. We then reduced the dimensionality of the extracted feature vectors using mini-batch sparse principal components analysis to a specific number of principal components (for most applications we tried nine principal components). Principal components were later used to train a support vector machine with a radial basis function kernel with the parameters specified in our codebase. For the imbalanced datasets, we used a random oversampling to achieve an equal number of samples for each class during the training. The function used is RandomOverSampler (version 0.9.1) and it is available at: To compare with cell frequencies, we imagined that cell-frequency vectors also represent a core (in which each vector is simply a vector of cell prevalence of each type). Similar to images, we reduced the dimensionality of the extracted feature vectors to nine principal components and then trained a support vector machine with a radial basis function kernel with the same parameters. Various classes of Scikit-learn (version 1.0.2) machine-learning libraries have been utilized for the tasks of splitting the dataset, dimensionality reduction and training support vector machines for the prediction tasks. All feature extraction and training steps were performed on Google Cloud GPU/TPU servers. See the ‘Code availability’ section for additional details.

Statistical analysis and workflow

All image analysis steps were performed in MATLAB (version 2019b) and Python (version 3.7.12). Statistical analyses were performed using RStudio version 4.2.2 and GraphPad Prism 9 statistical software. Data are expressed as mean ± s.e.m. or mean ± s.d.; P < 0.05 was considered significant unless otherwise indicated. All statistical tests are indicated in the figure legends. Survival data were analysed by log-rank (Mantel–Cox) test.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.


Please enter your comment!
Please enter your name here