Baited reconstruction with 2D template matching for high-resolution structure determination in vitro and in vivo without template bias

Version of Record

The authors declare this version of their article to be the Version of Record.

About eLife's process

Download
Cite
Share
CommentOpen annotations (there are currently 0 annotations on this page).

164 views

Version of Record published: November 27, 2023 (This version)
Reviewed preprint version 2: November 10, 2023 (Go to version)
Reviewed preprint version 1: September 14, 2023 (Go to version)
Preprint posted: July 11, 2023 (Go to version)
Sent for peer review: June 30, 2023

1. Builds upon
In situ single particle classification reveals distinct 60S maturation intermediates in cells

Bronwyn A Lucas, Kexin Zhang ... Nikolaus Grigorieff

Research Advance Sep 5, 2022
Further reading

Abstract
eLife assessment
Introduction
Results
Discussion
Materials and methods
Data availability
References
Peer review
Author response
Article and author information
Metrics

Abstract

Previously we showed that 2D template matching (2DTM) can be used to localize macromolecular complexes in images recorded by cryogenic electron microscopy (cryo-EM) with high precision, even in the presence of noise and cellular background (Lucas et al., 2021; Lucas et al., 2022). Here, we show that once localized, these particles may be averaged together to generate high-resolution 3D reconstructions. However, regions included in the template may suffer from template bias, leading to inflated resolution estimates and making the interpretation of high-resolution features unreliable. We evaluate conditions that minimize template bias while retaining the benefits of high-precision localization, and we show that molecular features not present in the template can be reconstructed at high resolution from targets found by 2DTM, extending prior work at low-resolution. Moreover, we present a quantitative metric for template bias to aid the interpretation of 3D reconstructions calculated with particles localized using high-resolution templates and fine angular sampling.

eLife assessment

This is an important demonstration of how the false-positive rate of high-resolution 2D template matching to find particles of a given target structure in 2D cryo-EM images (2DTM) relates to overfitting the data towards the template. The authors present new methods to measure the amount of model bias that gets introduced in high-resolution features of such maps, with compelling evidence that high-resolution features that are not present in the template can still be reconstructed in 3D from images obtained by 2DTM.

https://doi.org/10.7554/eLife.90486.3.sa0

About eLife assessments

Introduction

Over the last decade, single-particle cryogenic electron microscopy (cryo-EM) has emerged as a high-resolution technique to study molecules and their assemblies in a near-native state (Guaita et al., 2022). In the most favorable cases, close to 1 Å resolution can be achieved, rivaling results obtained by protein crystallography (Nakane et al., 2020; Yip et al., 2020). The resolution obtained from a single-particle dataset depends on the quality of the images, the accuracy of particle alignment and imaging parameters, the structural integrity of the sample, and the number of particles contributing to a reconstruction. For a high-quality dataset, between 20,000 and 70,000 asymmetric units of well-aligned and homogeneous particles have to be averaged to reach sub 2 Å resolution (Nakane et al., 2020; Yip et al., 2020). Methods development in a related cryo-EM technique has also enabled imaging particles in situ at high resolution using tomography and subtomogram averaging. These in situ subtomogram averages now approach 3 Å resolution (Tegunov et al., 2021), a resolution obtained routinely for single-particle reconstructions in vitro. Data collection for tomography requires more time compared to the single-particle technique due to the need for acquiring a tilt series, and processing tends to be computationally more expensive due to the additional degrees of freedom, compared to 2D images used in the single-particle technique. An additional complication of averaging images of molecules for in situ structure determination is the selection of valid targets, which have to be identified against a background of other molecules inside the cell or tissue being imaged. This is in contrast to a typical single-particle dataset, in which the particles have undergone a purification step that enriches the particle of interest, and imaged in solvent which makes particle selection more reliable.

2D template matching (2DTM) is an approach that can be used to identify target molecules and complexes in cryo-EM images of cells and cell sections, using single images of nominally untilted specimens (Lucas et al., 2022; Lucas et al., 2021; Rickgauer et al., 2020; Rickgauer et al., 2017). This approach can be used in combination with 3D template matching (3DTM) to identify targets in tomograms collected from the same areas imaged for 2DTM. 2DTM is fundamentally limited by the background generated by overlapping molecules in untiled views of the sample, imposing a size limit on what can be detected (Rickgauer et al., 2017). A combined approach of 2D and 3DTM could benefit from the strengths of both approaches, with better target detection (lower false negative rate) of 3DTM in the tomograms, and the improved overall precision of 2DTM in the untiled views (Lucas et al., 2021). In our previous studies, we have demonstrated that the targets detected using 2DTM can be used to calculate 3D reconstructions showing novel details not present in the template (Lucas et al., 2022; Lucas et al., 2021; Rickgauer et al., 2017). 3D reconstruction is straightforward because for every detected target, 2DTM also determines their x,y location in the image, three Euler angles, and image defocus, that is, all the parameters needed to calculate a single-particle reconstruction. Using this approach, we revealed non-modeled density for the viral polymerase (VP1) bound to a rotavirus capsid (Rickgauer et al., 2017), for the small ribosomal subunit (SSU) and tRNAs (Lucas et al., 2022; Lucas et al., 2021), as well as structural differences between Mycoplasma pneumoniae and Bacillus subtilis large ribosomal subunits (LSUs) (Lucas et al., 2021). Interpretation of reconstructions obtained from 2DTM targets can be hindered by template bias (Lucas et al., 2022; Lucas et al., 2021), that is, the reproduction of modeled features included in the template, that are reproduced in the reconstructions, even though they do not correspond to structural features in the detected particles. This could result from inclusion of pure noise particles and/or local overfitting of particle parameters in the presence of signal. Our previous studies showed that template bias does not prevent the discovery of new structural features at low resolution that were not represented by the template, but it has yet to be determined if this is true for high-resolution features, which are more susceptible to noise overfitting (Stewart and Grigorieff, 2004). 2DTM could in principle be used to study the structure of targets at high resolution, that would otherwise be too small to identify on their own, as long as they bind or are otherwise rigidly attached to a larger target that can be located by 2DTM. However, due to limitations in the number and heterogeneity of particles in previous studies, it was unclear whether this approach could indeed recover reliable high-resolution information.

In the present study, we explore this possibility further to assess the resolution that can be obtained in unmodeled regions omitted from the template. We analyze a published single particle dataset of β-galactosidase (Bgal) using 2DTM, and 60S LSUs detected in images of Sacchromyces cerevisiae lamellae. In both cases, we show high resolution in areas of the reconstruction that were omitted in the template, demonstrating the utility of 2DTM for structure discovery. We present a new metric to quantify template bias in a template-based 3D reconstruction, making reconstruction from 2DTM targets a more broadly useful tool.

Results

Reconstruction of the Bgal ligand binding pocket

To show the potential of 2DTM to reveal new structural details at high resolution, we analyzed a published single-particle cryo-EM dataset of Escherichia coli Bgal bound to phenylethyl β-D-thiogalactopyranoside (PETG) (Saur et al., 2020). The dataset was used previously to calculate a 2.2 Å single-particle reconstruction (EMDB-10574) that displays density for a number of specifically bound water molecules in the structure, including in the PETG binding pocket. The authors also built an atomic model into the high-resolution map (PDB: 6TTE). For our template, however, we used an atomic model of ligand-free Bgal determined by X-ray crystallography at 1.7 Å (PDB: 1DP0) (Juers et al., 2000). Using a model that was built into a map that is independent from the data analyzed by 2DTM aids our demonstration of 2DTM as a tool that can make use of atomic models experimentally unrelated to the data being analyzed.

To demonstrate high resolution in areas omitted in the template, we removed atoms in the vicinity of all D2 symmetry-related ligand binding pockets, within a 10 Å radius centered on the side chain amide nitrogen atom of asparagine 102 (in PDB: 1DP0). The truncated atomic model was used to generate a template with cisTEM’s simulator (Himes and Grigorieff, 2021) (see Materials and methods). We searched 558 micrographs downloaded from the EMPIAR database (EMPIAR-10644) and obtained 59,259 targets with 2DTM SNRs above a threshold of 7.3 (Figure 1A), the standard threshold calculated to limit the average number of false positives (false positive rate) to one per micrograph, based on the given search parameters and a Gaussian noise model (Rickgauer et al., 2017) (see Equation 2 below). To reduce the particles to a number closer to the final dataset used to calculate the 2.2 Å cryo-EM map in (49,895), and to enrich for the particles most similar to the template (similar to selecting the best classes in Saur et al., 2020), we limited our targets to those with 2DTM SNRs above 9.0 and obtained a final dataset of 55,627 particles.

Figure 1 with 1 supplement see all

Download asset Open asset

Baited reconstruction for visualization of β-galactosidase (Bgal) ligand binding pocket at high resolution.

(A) Reconstruction of Bgal from 2DTM coordinates using images from a previously published dataset (Saur et al., 2020) using a Bgal crystal structure (PDB: 1DP0) (Juers et al., 2000) as a template, with a 10 Å sphere around the phenylethyl β-D-thiogalactopyranoside (PETG) ligand omitted. (B) A 2D slice through the reconstruction in (A) including the region deleted from the density shows no obvious discontinuity in the density. (C) A view of the density in (A) indicated with a red box, with regions within 1.8 Å of the template model highlighted in red. Gray indicates density of Bgal outside of the template, purple indicates density consistent with the position of PETG, and blue indicates additional density that likely represent water molecules. (D) A stick diagram showing the locations of the atoms in the template used for template matching. (E) Published density from Saur et al., 2020 aligned and scaled as in (A). (F) As in (B), showing a region of the published density in (E). (G) As in (C), showing the same region of the published density in (E). (H) As in (D), showing all atoms annotated in the crystal structure, including those omitted before generating the 2DTM template.

The identified 55,627 targets were extracted together with their template-matched x,y positions, Euler angles, and CTFFIND4-derived defocus values using prepare_stack_matchtemplate (Lucas et al., 2021), and the particle stack and alignment parameters were imported into cisTEM as a refinement package for further single particle processing. The Fourier shell correlation (FSC) (Harauz and van Heel, 1986) for the initial reconstruction calculated from the template-matched alignment parameters indicated a resolution of 2.4 Å (FSC = 0.143) (Rosenthal and Henderson, 2003). We performed further refinement against the template while keeping the refinement resolution limit of 3.0 Å – one cycle of defocus and beam tilt refinement, followed by a refinement of alignment parameters and another cycle of defocus and beam tilt parameters. The final reconstruction (Figure 1A–C) displayed a resolution according to the FSC of 2.2 Å (Figure 1—figure supplement 1F). As mentioned above and previously discussed (Lucas et al., 2021), resolution estimates based on the FSC can be affected by template bias. Therefore, the present estimate has to be considered unreliable, and has to be supported by additional evidence, such as inspection of features visible in the density map.

Peaks corresponding to detected targets are clearly visible (Figure 1—figure supplement 1B). The average 2DTM SNR for this dataset was 11.6, and a maximum of 16.3, which is in the range of what is expected for a 465 kDa target (Rickgauer et al., 2020; Rickgauer et al., 2017). The refined reconstruction shows clear density for PETG and water molecules in the ligand binding pocket that were omitted in the template (Figure 1C). Comparison of this reconstruction with the published map (Figure 1G) suggests that they are virtually identical and that there is little or no evidence of template bias in the 2DTM reconstruction. An assessment of the local resolution using Phenix (Liebschner et al., 2019) further indicates a resolution of about 1.8 Å in the binding pocket, consistent with the clear density for water.

Baited reconstruction visualizes ribosomes at near atomic resolution in FIB-milled lamellae

To investigate whether 2DTM can be used to generate reliable high-resolution reconstructions from images derived from cellular samples, we used a previously published dataset of 37 images of four FIB-milled lamellae generated from S. cerevisiae cells treated with the translation inhibitor cycloheximide (CHX) to enrich the ribosome population in a single state (Figure 2A; Lucas and Grigorieff, 2023). The lamella samples were not tilted during data collection and therefore exhibited a small tilt with respect to the electron beam of about 8° due to the milling angle during sample preparation. We identified 12,210 LSUs with 2DTM in the cytoplasm using a threshold of 7.85 which corresponds to an expectation of one false positive per image across the dataset, or ~0.3% of the particles (Figure 2—figure supplement 1A–C). Local positional and orientational refinement was performed using the cisTEM program refine_template (Lucas et al., 2021) and the original template as a reference. The refined 2DTM coordinates were used to calculate an initial reconstruction with a nominal resolution of 3.15 Å (FSC = 0.143) (Rosenthal and Henderson, 2003). One cycle of beam tilt refinement against the reconstruction improved the resolution to 3.1 Å (Figure 2A and B). Unlike for the in vitro Bgal reconstruction, further refinement of the other alignment parameters using the reconstruction as a reference caused the resolution to decrease to 8 Å. The reconstruction has reduced signal at high spatial frequencies relative to the template as indicated by the half-map FSC (Figure 2—figure supplement 1D). This, combined with the higher background and lack of low-resolution contrast of the ribosomes in the cellular lamella relative to a purified sample, may reduce the alignment accuracy relative to the high-resolution 2DTM template. This highlights the importance of high spatial frequencies for alignment of particles in images with strong background, in contrast to images of purified samples that show strong low-resolution features, which are important for reliable particle alignment (Stewart and Grigorieff, 2004).

Figure 2 with 2 supplements see all

Download asset Open asset

Visualizing drugs and small molecules bound to the ribosome in vivo.

(A) A reconstruction of the ribosome from 2DTM coordinates identified in the cytoplasm of FIB-milled *S. cerevisiae* cell sections showing clear density for both the 60S (part of the template) and the 40S (outside of the template). (B) A slice of the reconstruction in (A), indicating the local resolution using the indicated color coding. The arrow indicates the P-site tRNA. (C) Regions of the density >3 Å from the template model are indicated in pink. The crystal structure PDB: 4U3U was aligned with the template model and the position of cycloheximide (CHX) was not altered. (D) As in C, showing density corresponding to a spermidine (PDB: 7R81) and unaccounted for density outside of the template (black arrow), which may also represent a polyamide.

As previously reported (Lucas et al., 2022; Lucas et al., 2021), we found density consistent with the SSU and tRNAs, that did not derive from the template. In the present case, local resolution estimation shows that parts of the SSU are resolved at <4 Å resolution (Figure 2B). The SSU is conformationally variable and shows considerable positional heterogeneity relative to the LSU. Therefore, this value is likely an underestimate of the potential attainable resolution in reconstructions from 2DTM targets in cells. Although the map represents an average of all states identified, we observed clear density for tRNAs in the A/A and P/P state with apparent density for the polypeptide on the A site tRNA (Figure 2—figure supplement 2) and no clear density for E-site tRNAs. This allowed us to conclude that CHX stalls ribosomes in the classical PRE translocation state in vivo, likely by preventing transition of the P/P tRNA to the P/E state consistent with an in vitro structure of the translating Neurospora crassa ribosome (Shen et al., 2021) and inference from ribosome profiling data (Lareau et al., 2014; Wu et al., 2019). The relatively low resolution of the tRNAs likely reflects the mixed pool of tRNA depending on the codons on which the ribosome stalled as well as a mixture of states.

Visualization of drug-target interactions in cells

Drug-target interactions can be visualized at high-resolution in vitro with cryo-EM and X-ray crystallography. However, it is unclear whether this recapitulates the binding site in vivo, possibly missing weak interactions that are disrupted during purification. Visualizing drug-target interactions in cells is therefore an important goal. We observed additional density near the ribosomal E site not present in the template that is consistent with the position of CHX in a previously published crystal structure (Garreau de Loubresse et al., 2014; Figure 2C). The density was sufficiently well resolved to dock CHX and provide in vivo confirmation for the position and orientation of CHX binding in the E site.

We noted several key differences between the model built from the in vitro CHX-bound structure and the in situ CHX-bound structure. First, we did not observe density for eIF5A but did observe density consistent with binding of spermidine (Figure 2D), as has been observed previously for the in vitro CHX-bound N. crassa ribosome (PDB: 7R81) (Shen et al., 2021). This demonstrates that spermidine can bind to ribosomes within cells, however, whether spermidine binds as part of the translation cycle or whether stalling of translation with CHX allowed for spermidine to bind is unclear. Baited reconstruction with 2DTM could be used to further probe the function of polyamides to regulate translation in vivo.

Baited reconstruction using the LSU as a template model allowed us to visualize the binding of small molecules such as drugs and polyamides to the ribosome within cells. This demonstrates the power of baited reconstruction to reveal biologically relevant features that would only be evident at high resolution.

Omit templates reveal high-resolution features without template bias

The local resolution of parts of the LSU were measured at ~3 Å, however, this region overlapped with the template and therefore the resolution measure using standard tools may be unreliable. To assess the resolution in this region we repeated this experiment with a template that lacked the ribosomal protein L7A. Since this protein was not present in the template, any density in this region cannot be due to template bias. We found that the local resolution of L7A was indistinguishable from the surrounding density and showed varying local resolution from 3.2 to 4.5 Å (Figure 3A–B). The density was sufficiently well resolved to observe side chains in regions that were lacking from the template (Figure 3C). This suggests that baited reconstruction with 2DTM coordinates can be used to generate high-resolution reconstructions from cellular samples, free from template bias, and demonstrates an approach to verify local resolution estimates.

Figure 3

Download asset Open asset

Baited reconstruction reveals high-resolution features in vivo without template bias.

(A) Slice of a reconstruction using 2DTM coordinates identified with a template lacking the protein L7A. Color coding indicates the local resolution as indicated in the key. (B) As in (A), pink indicates the 2DTM template used to identify the targets used in the reconstruction. (C) The model PDB: 6Q8Y is shown in the density. Red corresponds to the protein L7A, which was omitted from the template used to identify targets for the reconstruction. Blue corresponds to model features that were present in the template. (D) Single nucleotide omit template and (E) reconstruction showing emergence of density outside of the template, including a phosphate bulge, black arrow. Single amino acid omit templates lacking Phe (F), Arg (H), or Ser (J) and density (**G, I, K**), respectively, showing emergence of features consistent with each amino acid.

To examine the recovery of high-resolution information with single residue precision, we generated another truncated template by removing every 20th residue from each chain. This resulted in a total reduction of 51 kDa or ~3% of the template mass. We then localized 12,090 targets using the same 2DTM protocol as for the full-length template. The small difference in template mass minimally affected target detection, only 120 targets (<1%) were missed, and there were minimal deviations in the locations and orientations for the remaining targets. The 3D reconstruction generated from the detected targets showed clear density corresponding to nucleotides (Figure 3D and E) and various amino acids (Figure 3F–K) that were missing from the template and therefore cannot derive from template bias. This demonstrates that omitting small randomly scattered regions from a 2DTM template can be used to assess template bias throughout the reconstruction.

Quantifying template bias

The calculation of reconstructions from targets identified by 2DTM, which relies on a priori structural models, bears the danger of generating results that reproduce features of the template even when these features are absent from the targets to be detected. In the field of cryo-EM, this is often referred to as the ‘Einstein from noise’ problem (Henderson, 2013). The risk of template bias increases with dataset size (number of images), as well as the ratio of false positives vs true positives. Template bias in reconstructions generated from 2DTM targets is generally avoided because the scoring function (SNR threshold) is set to reject most false positives. To quantify template bias in reconstructions at various 2DTM SNR thresholds, we generated a series of reconstructions at different thresholds using targets identified with a full-length LSU template (‘full’ template) and the template lacking 3% of the residues (‘omit’ template) covering different areas of the model, while retaining most detections relative to the full template as described above (Figure 4A). We wrote a new cisTEM program measure_template_bias (see Materials and methods) that calculates the difference between map densities $ρ_{f u l l}$ and $ρ_{o m i t}$ in these reconstructions, in the omitted regions:

Ω = \frac{ρ_{f u l l} - ρ_{o m i t}}{ρ_{f u l l}}

Figure 4 with 1 supplement see all

Download asset Open asset

Baited reconstruction provides a quantitative metric for template bias.

(A) Observed template bias ( $Ω$ ) calculated using the *cis*TEM program measure_template_bias as a function of the 2DTM SNR threshold used to select targets from images of yeast lamellae. Blue arrows indicate the reconstructions shown in C. (B) Plot showing a comparison of the predicted false positive rate and the observed $Ω$ . The plotted straight line indicates the best fit linear function $y = 0.96 x - 0.05$ . (C) Images showing the same region of maps resulting from reconstruction using targets identified with the indicated template at the indicated 2DTM SNR threshold. Red indicates the location of the omitted residue in the omit template.

As expected, for high 2DTM SNR thresholds (few or no false positives), the template bias $Ω$ was only a few percent, while for lower thresholds, it approached 100% (Figure 4A). This was consistent with increased density in the reconstructions using the full template relative to the omit template (Figure 4C). The observed lower limit of $Ω$ of ~8% (Figure 4A) is likely due to some overfitting of noise when template-matching true particles, rather than inclusion of false positives. This overfitting may manifest itself in small alignment errors of the targets against the matching template, and a bias of these errors toward compensating for any mismatch between target and template, such as omitted regions in the template. Further work to quantify $Ω$ at different spatial frequencies will be informative to assess the contribution of local overfitting to template bias.

If we assume that the template bias is proportional to the rate of false positive detection, $r_{f}$ , we can plot the expected false positive rate, $r_{f, m o d e l}$ , against the observed template bias $Ω$ (Figure 4B). The expected false positive rate is given by the complementary error function (Rickgauer et al., 2017) as

r_{f, m o d e l} = \frac{1}{2} e r f c (\frac{{S N R}_{t}}{\sqrt{2}})

where ${S N R}_{t}$ is the 2DTM SNR threshold applied to the template search results. The plot shows that the template bias is not proportional to the expected false positive rate (Figure 4B). This is likely due to the variable background found in images of lamellae, which means that the spectral whitening that is applied to the images before the search (Rickgauer et al., 2017) does not whiten all areas of the images evenly. This results in local deviations of the background (noise) distribution from the Gaussian noise model implied in Equation 2, leading to higher-than-expected false positive ratios at low SNR thresholds.

If we estimate the number of true targets at 13,456 (the number of targets identified by both templates at a threshold of 7.85) and recalculate the number of false positives as the overall number of detected targets in excess of this number, the template bias is approximately proportional to the false positive rate (red line in Figure 4B). Further work is required to develop an improved noise model that predicts the correct number of false positives in images of variable contrast, such as images of cellular lamellae. Furthermore, it is important to note that the 2DTM SNR threshold used here to exclude most of the false positives also leads to a rejection of true positives. The number of these false negatives depends on the 2DTM SNR generated by the targets, which is proportional to their molecular mass (Figure 4—figure supplement 1). For our data and 150 nm thick lamellae, this means that targets below about 300 kDa will not be detected. Improvements in cryo-EM instrumentation, sample preparation, image processing, and 2DTM methods will lower this limit (Russo et al., 2022).

Discussion

We show here that baited reconstruction with 2DTM can reveal high-resolution detail in regions not modeled in the template. Using a previously published single-particle dataset we observe interactions between specific side chains with water and a ligand. Using particles localized in FIB-milled yeast lamellae, we observe specific binding of the drug CHX and polyamides to the ribosome in cells. We show that baited reconstruction can be used to recover high-resolution features in cells without template bias in regions omitted from the template, and quantify template bias in regions overlapping the template. The use of 2D images to generate high-resolution reconstructions makes this process significantly faster and less computationally expensive relative to tomography. Baited reconstruction is analogous to a ‘pulldown’ assay in molecular biology, wherein a ‘bait’ molecule is used to capture and identify novel interacting ‘prey’. This strategy is distinct from prior structure determination strategies because it makes use of a high-resolution template, traditionally avoided to prevent introducing template bias artifacts (Henderson, 2013). Baited reconstruction leverages the advantages of precise targeting with a high-resolution template, while avoiding the template bias by focusing on regions omitted from, or external to the template. Baited reconstruction can therefore leverage the wealth of existing structural data, as well as molecular models generated by the newly available structure prediction tools (Baek et al., 2021; Evans et al., 2022; Jumper et al., 2021), to approach biological and pharmacological questions in vitro and in vivo.

Implications for drug discovery

One of the most direct applications of this approach is to drug discovery. During the drug development pipeline, potentially thousands of variants of a lead compound are tested relative to a single protein target. Determining the structures of each in complex with its protein partner using the traditional single-particle cryo-EM workflow can be time-consuming and laborious, and often requires image processing expertise. The strategy presented here could be used to streamline this process substantially.

The ribosome is a major target of antibiotic and anticancer drugs. We have demonstrated that baited reconstruction with 2DTM can reveal drug-ribosome interactions directly in cells. The reconstructions are at comparable resolution relative to the state-of-the-art from tomography, while using a more streamlined data collection and processing pipeline that could be easily automated. This approach could therefore be used to more efficiently characterize the mechanism of action of antibiotic drugs directly in cells. Since 2DTM does not require purification, the interactions with other cellular complexes can also be investigated.

2DTM accelerates high-resolution in situ structure determination

Baited reconstruction is substantially faster and a more streamlined pipeline for in situ structure determination compared to cryo-ET and subtomogram averaging. Current pipelines for in situ structure determination using cryo-ET and subtomogram averaging are time-consuming and require expert knowledge to curate an effective pipeline. We expect focused classification to identify sub-populations to further improve the resolution of in situ reconstructions from 2DTM targets. To help classify particles against a cellular background without introducing alignment errors (see Results), alignment parameters (Euler angles, x,y shifts) can remain fixed. While tomograms are required to provide the cellular 3D context of molecules, our work shows that it is not always necessary to use tomography to generate high-resolution reconstructions of macromolecular complexes in cells. 2DTM could reduce the manual effort and time for structure determination in cells when compared to subtomogram averaging, depending on the time it takes to annotate, refine, and classify the subtomograms.

Our approach also differs from the recently published isSPA method (Cheng et al., 2023; Cheng et al., 2021). isSPA follows the traditional single-particle workflow, applied to particles in their native environment. Particles are selected using a template that is limited to an intermediate resolution of 8 Å, resulting in an initial particle stack that contains many false positives. Selection of the top scores, followed by standard single-particle classification and alignment protocols, then yields reconstructions of the detected targets. This approach is particularly successful in situations where there is a high concentration of the particle of interest, such as Rubisco inside the carboxysome (Cheng et al., 2021), capsid proteins in viral capsids (Cheng et al., 2021), and phycobilisome and photosystem II in the thylakoid membranes inside P. purpureum cells (Cheng et al., 2023). In contrast, using 2DTM, we select targets with high fidelity, avoid false positives, and determine the molecule pose to high accuracy without the need for an intermediate reconstruction from the detected targets to act as reference for further refinement. By using the full resolution of the signal present in the images, 2DTM is also more sensitive than isSPA, detecting particles of 300 kDa in 150 nm lamellae (Figure 4—figure supplement 1). These differences mean that 2DTM can be used with fewer, and potentially smaller particles to achieve high-resolution structures compared to isSPA and other techniques following the canonical single-particle averaging workflow. As demonstrated here, the detection criterion used in 2DTM largely avoids overfitting artifacts in reconstructions by eliminating images that are not statistically distinguishable from noise. This makes 2DTM particularly useful for in situ structure determination, which is often limited by the low abundance of the target complexes inside the cell. By reducing the number of particles needed to achieve high-resolution reconstructions in cells, baited reconstruction with 2DTM will make it possible to determine the structures of less abundant complexes in cells.

Application to in vitro single-particle analysis

Our results of a single-particle dataset of purified Bgal demonstrates another use case for 2DTM. In the original analysis of this dataset using the traditional single-particle workflow, 136,013 particles were initially selected using template-based particle picking (Gautomatch 0.56, http://www.mrc-lmb.cam.ac.uk/kzhang/) (Saur et al., 2020). 2D classification, ab initio reconstruction, and further 3D classification eventually yielded a 2.2 Å reconstruction showing the bound ligand (PETG). The same result was achieved with a simple run of 2DTM, without requiring manual intervention or expert knowledge in the image processing workflow. In a separate 2DTM search using the first 277 images of the dataset and a crystal structure of GroEL (PDB: 1GRL) as a template – a particle of comparable size to Bgal – we detected only 53 targets above the default SNR threshold (excluding two images that had sharp black lines across them), and none above a threshold of 9.0. This further demonstrates the high level of discrimination of 2DTM between true and false positives, as shown earlier (Rickgauer et al., 2017). Besides the streamlined workflow, 2DTM can therefore also be used in the presence of impurities to reliably select the particles of interest. Using multiple templates, particles could be classified to arrive at quantitative estimates of particles occupying defined conformational states. The reduced need for sample purity and dataset size to perform such analyses may further accelerate the 2DTM workflow, compared to the traditional single-particle workflow, provided appropriate templates are available.

Furthermore, validation of map and model quality is a major challenge in cryo-EM. Current methods use low-pass filtered templates to avoid template bias at high spatial frequencies. We here present a quantitative estimate of local and global template bias in sequence space. This will allow the full resolution of the template to be used to localize particles more specifically and avoid false positives. This may assist in identification of particle classes in the localization stage and can streamline the reconstruction process. Estimating template bias with baited reconstruction can provide a quantitative metric of map and model quality that may find broad utility in single particle and in situ workflows.

Application to subtomogram averaging

Recently, higher resolution template matching and finer angular sampling have also been explored for the analysis of cryo-ET 3D reconstructions (Chaillet et al., 2023; Cruz-León et al., 2023). This approach has clear advantages because it reduced false positives due to low-resolution overlap (Chaillet et al., 2023; Cruz-León et al., 2023) and provides more specific localization of targets in a crowded cellular environment. However, if the identified targets are subsequently used for subtomogram averaging, the reconstructions may exhibit template bias. Both baited reconstruction and the quality metrics we describe above could be applied to subtomogram averaging pipelines.

Future applications

We have shown that it is possible to recover single residue detail, and even the location of water molecules in the most favorable cases, using baited reconstruction with cryo-EM. This approach is analogous to the use of OMIT maps in X-ray crystallography to avoid model bias (Bhat and Cohen, 1984; Hodel et al., 1992) and the M-free score used to estimate reference bias in subtomogram averaging (Yu and Frangakis, 2014). Our approach differs by sampling random residues throughout the sequence and consequently provides higher precision in the estimation of template bias at high resolution. The observation that reconstructions with negligible template bias can be determined using particles identified with high-resolution template matching depends fundamentally on the noise model and threshold used to identify true positives and exclude false positives. We observe that the number of false positives does not perfectly match predictions based on a white Gaussian noise model, suggesting that the background is not perfectly Gaussian everywhere, for example due to local features with strong low-resolution contrast. When the noise model is uncertain or inaccurate, thresholding alone may not be sufficient to remove false positives. It is therefore important to validate features in reconstructions from targets found by template matching if they overlap with the template. In addition, overfitting could be assessed using the Omega metric described here, to quantify template bias in regions important for the study.

By further analogy to X-ray crystallography, the strategy we presented here could be extended by tiling through the template model, omitting overlapping features and combining the densities in each omitted region to form a continuous 3D map in which the density for each residue was omitted from the template, comparable in principle to a composite OMIT map (Terwilliger et al., 2008). While currently computationally expensive and therefore not feasible in most cases, this strategy could be regarded as a ‘gold standard’, yielding reconstructions that are devoid of template bias while retaining the benefits of precise localization and identification of the targets. If only some map regions are validated, as was done in the examples presented here, it is likely that the rest of the 3D map is also reliable, based on the assumption that false positives were excluded from the reconstruction. However, this reasoning may not strictly hold when there is partial and variable mismatch between the targets and the template, for example due to conformational heterogeneity in the detected target population. In such a situation, template bias may not be uniform across the reconstruction, and template bias has to be assessed more rigorously.

Materials and methods

Yeast culture and FIB-milling

Request a detailed protocol

S. cerevisiae strains BY4741 (ATCC) colonies were grown to mid log phase in YPD, diluted to 10,000 cells/mL and treated with 10 µg/mL CHX (Sigma) for 10 min at 30°C with shaking as described in Lucas and Grigorieff, 2023. 3 µL were applied to a 2/1 or 2/2 Quantifoil 200 mesh SiO₂ Cu grid, allowed to rest for 15 s, back-side blotted for 8 s at 27°C, 95% humidity followed by plunge freezing in liquid ethane at –184°C using a Leica EM GP2 plunger. Frozen grids were stored in liquid nitrogen until FIB-milled. FIB-milling was performed as described in Lucas and Grigorieff, 2023.

Cryo-EM data collection and image processing

Request a detailed protocol

Bgal micrograph movie data were downloaded from the EMPIAR database (EMPIAR-10644) and processed with the cisTEM image processing package (Grant et al., 2018) using Unblur (Grant and Grigorieff, 2015) to align and average the exposure-weighted movie frames, and CTFFIND4 (Rohou and Grigorieff, 2015) to determine image defocus values. Four of the 562 micrographs were discarded based on lack of clear CTF Thon rings or ice crystal contamination. The remaining 558 images were processed using cisTEM’s template matching implementation (Lucas et al., 2021), yielding 59,259 targets with 2DTM SNRs above a threshold of 7.3.

Cryo-EM images of the yeast cytoplasm were previously published using imaging and processing pipelines as described in Lucas and Grigorieff, 2023, except that an additional seven images were included that were previously excluded because they contained organelle regions.

Simulating 3D templates

Request a detailed protocol

The atomic coordinates from the indicated PDBs were used to generate a 3D volume using the cisTEM (Grant et al., 2018) program simulate (Himes and Grigorieff, 2021). For the Bgal template, we used a pixel size of 0.672 Å, which is slightly smaller than published for this dataset (0.68 Å). The smaller pixel size was obtained by fitting the 1.7 Å X-ray structure (PDB: 1DP0) into the published 2.2 Å cryo-EM map of PETG-bound Bgal, and adjusting the pixel size of the map to achieve optimal density overlap between model and map in UCSF Chimera (Pettersen et al., 2004). Details on template generation are summarized in Table 1.

Table 1

Preparation and simulation of the 3D templates used in this study.

Template name	PDB	PDB modified?	Resolution of PDB map(Å)	Additional B-factor applied (Å²)	Pixel size(Å)	Box size(pixels)
Bgal	1DP0	10 Å sphere around Asp 102 deleted. HETATOMs excluded	1.7	50	0.672	512
LSU	6Q8Y	Only atomic coordinates corresponding to the LSU included. HETATOMs excluded	3.1	30	1.06	384
LSU ( $∆ L 7 A$ )	6Q8Y	Only atomic coordinates corresponding to the LSU included. Atomic coordinates corresponding to L7A excluded. HETATOMs excluded	3.1	30	1.06	384

2D template matching

Request a detailed protocol

2DTM was performed using the program match_template (Lucas et al., 2021) implemented in the cisTEM graphical user interface (Grant et al., 2018). For the Bgal searches, an in-plane angular step of 1.5° and an out-of-plane angular step of 2.5°, and D2 symmetry were used (no defocus search). This yielded a threshold of 7.30 calculated from a total number of ~6.88 × 10¹² search locations, identifying targets with an average of one false positive per image.

For the LSU, an in-plane angular step of 1.5° and an out-of-plane angular step of 2.5°, and C1 symmetry and defocus search of ±1200 Å with a 200 Å step were used. This yielded a threshold of 7.85 calculated from a total number of ~4.88 × 10¹⁴ search locations, identifying targets with an average of one false positive per image.

Generating 3D reconstructions

Request a detailed protocol

The cisTEM program prepare_stack_matchtemplate (Lucas et al., 2021) was used to generate particle stacks from the refined coordinates from the 2DTM searches followed by reconstruction using the cisTEM program reconstruct3d as described in the text. Local resolution estimation was performed using the local resolution tool in Phenix (Liebschner et al., 2019) using a box size of 7 Å (Bgal) or 12 Å (ribosome). To visualize regions of the ribosome reconstruction outside of the LSU template, we used the UCSF ChimeraX (Pettersen et al., 2021) volume tools to segment the map using a radius of 3 Å from the template atoms. UCSF Chimera (Bgal) (Pettersen et al., 2004) or ChimeraX (ribosome) (Pettersen et al., 2021) were used for visualization.

Quantifying template bias

Request a detailed protocol

We wrote a program, measure_template_bias, which is part of the cisTEM software (Grant et al., 2018, source code available at https://github.com/timothygrant80/cisTEM, executibles available at https://cistem.org/), to assess the degree of template bias present in a reconstruction, calculated from detected 2DTM targets. The program requires two templates on input, one template representing the full structure of the targets to be found (full template), and one containing omitted elements of the structure that serve as test regions to assess template bias (omit template). The program also requires the two reconstructions that were calculated form targets detected by these two templates (full reconstruction and omit reconstruction). The two templates and the two reconstructions have to be identically density-scaled, respectively. Using the two templates, measure_template_bias calculates a difference map that leaves only non-zero densities in areas omitted in the omit template. The difference map is then used as a mask to identify the test regions used to assess template bias. The densities in the test regions are summed for the two input reconstructions, yielding $ρ_{f u l l}$ and $ρ_{o m i t}$ , respectively. The average degree of template bias ( $Ω$ ) is then defined as the difference between $ρ_{f u l l}$ and $ρ_{o m i t}$ , relative to $ρ_{f u l l}$ (Equation 1). $Ω$ can assume values between 0 and 1 (100%), with 0 representing the least degree of template bias, and 1 representing the highest degree of template bias. If the degree of template bias has to be evaluated more locally, measure_template_bias also accepts a difference map, instead of the two templates, that will be used to identify the areas to be used for measuring template bias.

Data availability

All prior existing and new computer code used in this study is available at https://github.com/timothygrant80/cisTEM, (copy archived at Timothygrant80, 2023). Updated executables are available at https://cistem.org/.

The following previously published data sets were used

1. Saur M
2. Hartshorn MJ
3. Dong J
4. Reeks J
5. Bunkoczi G
6. Jhoti H
7. Williams PA
(2019) EMPIAR
Beta-galactosidase in complex with PETG.
https://doi.org/10.6019/EMPIAR-10644
1. Lucas BA
2. Grigorieff N
(2023) EMPIAR
Quantification of gallium cryo-FIB milling damage in biological lamella.
https://doi.org/10.6019/EMPIAR-11544

References

1. Baek M
2. DiMaio F
3. Anishchenko I
4. Dauparas J
5. Ovchinnikov S
6. Lee GR
7. Wang J
8. Cong Q
9. Kinch LN
10. Schaeffer RD
11. Millán C
12. Park H
13. Adams C
14. Glassman CR
15. DeGiovanni A
16. Pereira JH
17. Rodrigues AV
18. van Dijk AA
19. Ebrecht AC
20. Opperman DJ
21. Sagmeister T
22. Buhlheller C
23. Pavkov-Keller T
24. Rathinaswamy MK
25. Dalwadi U
26. Yip CK
27. Burke JE
28. Garcia KC
29. Grishin NV
30. Adams PD
31. Read RJ
32. Baker D
(2021) Accurate prediction of protein structures and interactions using a three-track neural network
Science 373:871–876.
https://doi.org/10.1126/science.abj8754
- PubMed
- Google Scholar
1. Bhat TN
2. Cohen GH
(1984) OMITMAP: An electron density map suitable for the examination of errors in a macromolecular model
Journal of Applied Crystallography 17:244–248.
https://doi.org/10.1107/S0021889884011456
- Google Scholar
(2023) Extensive angular sampling enables the sensitive localization of macromolecules in electron tomograms
International Journal of Molecular Sciences 24:13375.
https://doi.org/10.3390/ijms241713375
- PubMed
- Google Scholar
1. Cheng J
2. Li B
3. Si L
4. Zhang X
(2021) Determining structures in a native environment using single-particle cryoelectron microscopy images
Innovation 2:100166.
https://doi.org/10.1016/j.xinn.2021.100166
- PubMed
- Google Scholar
1. Cheng J
2. Liu T
3. You X
4. Zhang F
5. Sui SF
6. Wan X
7. Zhang X
(2023) Determining protein structures in cellular lamella at pseudo-atomic resolution by GisSPA
Nature Communications 14:1282.
https://doi.org/10.1038/s41467-023-36175-y
- PubMed
- Google Scholar
Preprint
1. Cruz-León S
2. Majtner T
3. Hoffmann PC
4. Kreysing JP
5. Tuijtel MW
6. Schaefer SL
7. Geißler K
8. Beck M
9. Turoňová B
10. Hummer G
(2023) High-Confidence 3D Template Matching for Cryo-Electron Tomography
bioRxiv.
https://doi.org/10.1101/2023.09.05.556310
- Google Scholar
Preprint
1. Evans R
2. O’Neill M
3. Pritzel A
4. Antropova N
5. Senior A
6. Green T
7. Žídek A
8. Bates R
9. Blackwell S
10. Yim J
11. Ronneberger O
12. Bodenstein S
13. Zielinski M
14. Bridgland A
15. Potapenko A
16. Cowie A
17. Tunyasuvunakool K
18. Jain R
19. Clancy E
20. Kohli P
21. Jumper J
22. Hassabis D
(2022) Protein Complex Prediction with AlphaFold-Multimer
bioRxiv.
https://doi.org/10.1101/2021.10.04.463034
- Google Scholar
(2014) Structural basis for the inhibition of the eukaryotic ribosome
Nature 513:517–522.
https://doi.org/10.1038/nature13737
- Google Scholar
1. Grant T
2. Grigorieff N
(2015) Measuring the optimal exposure for single particle cryo-EM using a 2.6 Å reconstruction of rotavirus VP6
eLife 4:e06980.
https://doi.org/10.7554/eLife.06980
- PubMed
- Google Scholar
(2018) cisTEM, user-friendly software for single-particle image processing
eLife 7:e35383.
https://doi.org/10.7554/eLife.35383
- PubMed
- Google Scholar
(2022) Recent advances and current trends in cryo-electron microscopy
Current Opinion in Structural Biology 77:102484.
https://doi.org/10.1016/j.sbi.2022.102484
- PubMed
- Google Scholar
1. Harauz G
2. van Heel M
(1986)
Exact filters for general geometry three dimensional reconstruction

Optik 73:146–156.
- Google Scholar
1. Henderson R
(2013) Avoiding the pitfalls of single particle cryo-electron microscopy: Einstein from noise
PNAS 110:18037–18041.
https://doi.org/10.1073/pnas.1314449110
- PubMed
- Google Scholar
1. Himes B
2. Grigorieff N
(2021) Cryo-TEM simulations of amorphous radiation-sensitive samples using multislice wave propagation
IUCrJ 8:943–953.
https://doi.org/10.1107/S2052252521008538
- PubMed
- Google Scholar
(1992) Model bias in macromolecular crystal structures
Acta Crystallographica Section A Foundations of Crystallography 48:851–858.
https://doi.org/10.1107/S0108767392006044
- Google Scholar
1. Juers DH
2. Jacobson RH
3. Wigley D
4. Zhang XJ
5. Huber RE
6. Tronrud DE
7. Matthews BW
(2000) High resolution refinement of beta-galactosidase in a new crystal form reveals multiple metal-binding sites and provides a structural basis for alpha-complementation
Protein Science 9:1685–1699.
https://doi.org/10.1110/ps.9.9.1685
- PubMed
- Google Scholar
1. Jumper J
2. Evans R
3. Pritzel A
4. Green T
5. Figurnov M
6. Ronneberger O
7. Tunyasuvunakool K
8. Bates R
9. Žídek A
10. Potapenko A
11. Bridgland A
12. Meyer C
13. Kohl SAA
14. Ballard AJ
15. Cowie A
16. Romera-Paredes B
17. Nikolov S
18. Jain R
19. Adler J
20. Back T
21. Petersen S
22. Reiman D
23. Clancy E
24. Zielinski M
25. Steinegger M
26. Pacholska M
27. Berghammer T
28. Bodenstein S
29. Silver D
30. Vinyals O
31. Senior AW
32. Kavukcuoglu K
33. Kohli P
34. Hassabis D
(2021) Highly accurate protein structure prediction with AlphaFold
Nature 596:583–589.
https://doi.org/10.1038/s41586-021-03819-2
- PubMed
- Google Scholar
1. Lareau LF
2. Hite DH
3. Hogan GJ
4. Brown PO
(2014) Distinct stages of the translation elongation cycle revealed by sequencing ribosome-protected mRNA fragments
eLife 3:e01257.
https://doi.org/10.7554/eLife.01257
- PubMed
- Google Scholar
1. Liebschner D
2. Afonine PV
3. Baker ML
4. Bunkóczi G
5. Chen VB
6. Croll TI
7. Hintze B
8. Hung LW
9. Jain S
10. McCoy AJ
11. Moriarty NW
12. Oeffner RD
13. Poon BK
14. Prisant MG
15. Read RJ
16. Richardson JS
17. Richardson DC
18. Sammito MD
19. Sobolev OV
20. Stockwell DH
21. Terwilliger TC
22. Urzhumtsev AG
23. Videau LL
24. Williams CJ
25. Adams PD
(2019) Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix
Acta Crystallographica. Section D, Structural Biology 75:861–877.
https://doi.org/10.1107/S2059798319011471
- PubMed
- Google Scholar
1. Lucas BA
2. Himes BA
3. Xue L
4. Grant T
5. Mahamid J
6. Grigorieff N
(2021) Locating macromolecular assemblies in cells by 2D template matching with cisTEM
eLife 10:e68946.
https://doi.org/10.7554/eLife.68946
- PubMed
- Google Scholar
(2022) In situ single particle classification reveals distinct 60S maturation intermediates in cells
eLife 11:e79272.
https://doi.org/10.7554/eLife.79272
- PubMed
- Google Scholar
1. Lucas BA
2. Grigorieff N
(2023) Quantification of gallium cryo-FIB milling damage in biological lamellae
PNAS 120:e2301852120.
https://doi.org/10.1073/pnas.2301852120
- PubMed
- Google Scholar
1. Nakane T
2. Kotecha A
3. Sente A
4. McMullan G
5. Masiulis S
6. Brown PMGE
7. Grigoras IT
8. Malinauskaite L
9. Malinauskas T
10. Miehling J
11. Uchański T
12. Yu L
13. Karia D
14. Pechnikova EV
15. de Jong E
16. Keizer J
17. Bischoff M
18. McCormack J
19. Tiemeijer P
20. Hardwick SW
21. Chirgadze DY
22. Murshudov G
23. Aricescu AR
24. Scheres SHW
(2020) Single-particle cryo-EM at atomic resolution
Nature 587:152–156.
https://doi.org/10.1038/s41586-020-2829-0
- Google Scholar
1. Pettersen EF
2. Goddard TD
3. Huang CC
4. Couch GS
5. Greenblatt DM
6. Meng EC
7. Ferrin TE
(2004) UCSF Chimera--A visualization system for exploratory research and analysis
Journal of Computational Chemistry 25:1605–1612.
https://doi.org/10.1002/jcc.20084
- PubMed
- Google Scholar
1. Pettersen EF
2. Goddard TD
3. Huang CC
4. Meng EC
5. Couch GS
6. Croll TI
7. Morris JH
8. Ferrin TE
(2021) UCSF ChimeraX: Structure visualization for researchers, educators, and developers
Protein Science 30:70–82.
https://doi.org/10.1002/pro.3943
- PubMed
- Google Scholar
(2017) Single-protein detection in crowded molecular environments in cryo-EM images
eLife 6:e25648.
https://doi.org/10.7554/eLife.25648
- PubMed
- Google Scholar
Preprint
(2020) Label-Free Single-Instance Protein Detection in Vitrified Cells
bioRxiv.
https://doi.org/10.1101/2020.04.22.053868
- Google Scholar
1. Rohou A
2. Grigorieff N
(2015) CTFFIND4: Fast and accurate defocus estimation from electron micrographs
Journal of Structural Biology 192:216–221.
https://doi.org/10.1016/j.jsb.2015.08.008
- PubMed
- Google Scholar
1. Rosenthal PB
2. Henderson R
(2003) Optimal determination of particle orientation, absolute hand, and contrast loss in single-particle electron cryomicroscopy
Journal of Molecular Biology 333:721–745.
https://doi.org/10.1016/j.jmb.2003.07.013
- PubMed
- Google Scholar
(2022) Cryomicroscopy in situ: what is the smallest molecule that can be directly identified without labels in a cell?
Faraday Discussions 240:277–302.
https://doi.org/10.1039/d2fd00076h
- PubMed
- Google Scholar
1. Saur M
2. Hartshorn MJ
3. Dong J
4. Reeks J
5. Bunkoczi G
6. Jhoti H
7. Williams PA
(2020) Fragment-based drug discovery using cryo-EM
Drug Discovery Today 25:485–490.
https://doi.org/10.1016/j.drudis.2019.12.006
- PubMed
- Google Scholar
1. Shen L
2. Su Z
3. Yang K
4. Wu C
5. Becker T
6. Bell-Pedersen D
7. Zhang J
8. Sachs MS
(2021) Structure of the translating Neurospora ribosome arrested by cycloheximide
PNAS 118:e2111862118.
https://doi.org/10.1073/pnas.2111862118
- PubMed
- Google Scholar
1. Stewart A
2. Grigorieff N
(2004) Noise bias in the refinement of structures derived from single particles
Ultramicroscopy 102:67–84.
https://doi.org/10.1016/j.ultramic.2004.08.008
- PubMed
- Google Scholar
1. Tegunov D
2. Xue L
3. Dienemann C
4. Cramer P
5. Mahamid J
(2021) Multi-particle cryo-EM refinement with M visualizes ribosome-antibiotic complex at 3.5 Å in cells
Nature Methods 18:186–193.
https://doi.org/10.1038/s41592-020-01054-7
- PubMed
- Google Scholar
(2008) Iterative-build OMIT maps: map improvement by iterative model building and refinement without model bias
Acta Crystallographica. Section D, Biological Crystallography 64:515–524.
https://doi.org/10.1107/S0907444908004319
- PubMed
- Google Scholar
Software
1. Timothygrant80
(2023) cisTEM, version swh:1:rev:f635c9b2ce0fbb2a35066126b52a52b7ab42be31
Software Heritage.

https://archive.softwareheritage.org/swh:1:dir:289fa0fe8914f57956206441493c534392265a13;origin=https://github.com/timothygrant80/cisTEM;visit=swh:1:snp:a93fac4ea39f3bf62e5210db3ab45798f6e96ec3;anchor=swh:1:rev:f635c9b2ce0fbb2a35066126b52a52b7ab42be31
1. Wu CCC
2. Zinshteyn B
3. Wehner KA
4. Green R
(2019) High-resolution ribosome profiling defines discrete ribosome elongation states and translational regulation during cellular stress
Molecular Cell 73:959–970.
https://doi.org/10.1016/j.molcel.2018.12.009
- PubMed
- Google Scholar
1. Yip KM
2. Fischer N
3. Paknia E
4. Chari A
5. Stark H
(2020) Atomic-resolution protein structure determination by cryo-EM
Nature 587:157–161.
https://doi.org/10.1038/s41586-020-2833-4
- PubMed
- Google Scholar
1. Yu Z
2. Frangakis AS
(2014) M-free: scoring the reference bias in sub-tomogram averaging and template matching
Journal of Structural Biology 187:10–19.
https://doi.org/10.1016/j.jsb.2014.05.007
- PubMed
- Google Scholar

Peer review

Reviewer #1 (Public Review):

This work continues a series of recent publications from the Grigorieff lab (https://doi.org/10.7554/eLife.25648, https://doi.org/10.7554/eLife.68946, https://doi.org/10.7554/eLife.79272, https://doi.org/10.1073/pnas.2301852120) showcasing the development of high-resolution 2D template matching (2DTM) for detection and reconstruction of macromolecules in cryo-electron microscopy (cryo-EM) images of crowded cellular environments. It is well known in the field of cryo-EM that searching noisy images with a template can result in retrieval of the template itself when averaging the candidate particles detected, an effect known as "Einstein-from-noise" (https://doi.org/10.1073/pnas.1314449110). Briefly, this occurs because it is statistically likely to find a match to an arbitrary motif over a large noisy dataset just by chance. The effect can be mitigated for example by limiting the resolution of the template, but this prevents the accurate detection of macromolecules in a crowded environment, as their "fingerprint" lies in the high-resolution range (https://doi.org/10.7554/eLife.25648). Here, the authors show through several experiments on in vitro and in situ data that features as small as drug compounds and water molecules can be reliably retrieved by 2DTM if they are searched by a template (the "bait") that contains expected neighboring features but not the targets themselves.

The ideas are generally clearly presented with appropriate references to related work, and claims are well supported by the data. In particular, the experiments for verifying the density of the ribosomal protein L7A as well as the systematic removal of residuals from the template model to assess bias are particularly clever.

The revised version of the manuscript addresses essentially all of the concerns raised previously by this reviewer, with the addition of figures and extended discussion of the key concepts.

https://doi.org/10.7554/eLife.90486.3.sa1

Reviewer #2 (Public Review):

This paper by Lucas et al follows on from earlier work by the same group. They use high-resolution 2D template matching (2DTM) to find particles of a given target structure in 2D cryo-EM images, either of in vitro single-particle samples or of more complicated samples, such as FIB-milled cells (which would otherwise perhaps be used for 3D electron tomography). One major concern for high-resolution template matching has been the amount of model bias that gets introduced into a reconstruction that is calculated straight from the orientations and positions identified by the projection matching algorithm. This paper assesses the amount of model bias that gets introduced in high-resolution features of such maps.

For a high-signal-to-noise in vitro single-particle cryo-EM data set, the authors show that their approach does not yield much model bias. This is probably not very surprising, as their method is basically a low false-positive particle picker, which works very well on such data. Still, I guess that is the whole point of it, and it is good to see that they can reconstruct density for a small-molecule compound that was not present in the original template.

For FIB-milled lamella of yeast cells with stalled ribosomes, the SNR is much lower and the dangers of model bias will be higher. This is also evidenced by the observation that further refinement of initial 2DTM identified orientations and positions worsens the map. This is obviously a more relevant SNR regime to assess their method. Still, they show convincing density for the GHX compound that was not present in the template, but was there in the reconstruction from the identified particles.

Quantification of the amount of model bias is then performed using omit maps, where every 20th residue in removed from the template and corresponding reconstructions are compared (for those residues) with the full-template reconstructions. As expected, model bias increases with lower thresholds for the picking. Some model bias (Omega=8%) remains even for very high thresholds. The authors state this may be due to overfitting of noise when template-matching true particles, instead of introducing false positive. Probably, that still represents some sort of problem. Especially because the authors then go on to show that their expectations of number of false positives do not always match the correct number of false positive, probably due to inaccuracies in the noise model for more complicated images, this may warrant further in-depth discussion in a revised manuscript.

Overall, I think this paper is well written and it has made me think differently (again) about the 2DTM technique and its usefulness in various applications, as outlined in the Discussion. Therefore, it will be a constructive contribution to the field.

After the first round of review, the authors addressed most points raised in a satisfying manner, which has led to a further (relatively minor) improvement of the manuscript.

https://doi.org/10.7554/eLife.90486.3.sa2

Reviewer #3 (Public Review):

The authors evaluate the effect of high-resolution 2D template matching on template bias in reconstructions and provide a quantitative metric for overfitting. It is an interesting manuscript that made me reevaluate and correct some mistakes in my understanding of overfitting and template bias, and I'm sure it will be of great use to others in the field.

The revised version of this manuscript addresses all of my concerns. The newly added Figure 4 supplement 1 provides a sobering outlook for the fraction of the proteome we can hope to identify in situ.

https://doi.org/10.7554/eLife.90486.3.sa3

Author response

The following is the authors’ response to the original reviews.

The authors thank the reviewers for their thoughtful and constructive comments. We address each comment below and have uploaded a revised manuscript.

Public Reviews

1. One key point that could use further clarification is how to interpret densities in the reconstruction that do overlap with the template. If the omitted regions can be reliably reconstructed, and the density is smooth throughout, it implies the detected particles are not only (mostly) true positives but also their poses must be essentially correct. Therefore, why cannot the entire reconstruction be trusted, including portions overlapping with the template? In the "Future applications" section, the authors state that in order to obtain a reconstruction that is entirely devoid of template bias, it would be necessary to successively omit parts of the template structure through its entirety. I wonder if that is really necessary and if the presented approach of omitting template portions could be better framed as a "gold-standard" validation procedure.

Our assumption is indeed that the entire reconstruction can be trusted if the omitted features are faithfully reproduced in the reconstruction. We have added a sentence in the discussion to clarify this. However, we think that assessing template bias will still require the omit test (see also our reply below). Also, as discussed in the manuscript, there is likely a little bias left, even if it is not directly visible in the reconstruction. Therefore, if the goal is an entirely unbiased reconstruction, the only way will be to successively omit parts of the template structure throughout the template.

1. In other words, given the compelling evidence provided by the reconstructions in the omitted areas, I find it hard to imagine how the procedure would be "hallucinating" features in the rest of the structure, as the entire reconstruction depends on the same pose and defocus parameters. A possible experiment to test this hypothesis would be to go the opposite way, deliberately adding an unrealistic feature to the bait and checking whether it comes up in the reconstruction, while at the same time checking how it behaves in omitted parts.

Template bias might be generated in different ways. A common situation is the presence of noise, which causes biased deviations of the best template match from their “true” match that would just align the target signal to the template. Another type of bias may occur when there is a mismatch between the template and the detected target. The target may still be detected if there is sufficient structural overlap with the template. Since there might not be a clear “correct” alignment of a mismatching target to the template, the best alignment may again be biased, generating artificial density in the reconstruction. This second case may produce bias that is more pronounced in the mismatching regions. The different origins of bias will have to be investigated more thoroughly in another study. For the present study, however, we maintain that unless there is some assessment of bias in a given location, one cannot completely rule out bias based on the absence of it elsewhere in the reconstruction.

1. When assessing their approach to in situ data (the yeast ribosome), it is intriguing to see that the resolution downgraded from 3.1 to 8 Å when refinement of the particle poses against the current reconstruction was attempted. The authors do provide some possible explanations, such as the reduced signal of the reconstruction at high resolution and the crowded background, but it leaves one to wonder if this means that a 3.1 Å reconstruction could never be obtained from these data by conventional single-particle analysis procedures.

The refinement results with our in situ data do indeed appear to be limited to low resolution when using the conventional single-particle pipeline and software. It might be possible to improve refinement by introducing certain priors, filters and masking functions that are optimized for the increased background and spectral properties of in situ data. Also, we have not tested all available software, and some might perform better than others. It is worth noting that in a different study using our data, by Cheng et al (2023) and cited in our manuscript, the resolution of the refined reconstruction using different software was ~7 Å resolution, i.e., close to what we report here. Finally, refinement of the detected targets against a high-resolution template does work but since it involved the template, we regard this as part of the template matching process.

1. Furthermore, in the section "Quantifying template bias", the authors make the intriguing statement that there can still be some overfitting of noise even in true positives. I understand this overfitting would occur in the form of errors in the pose and defocus estimation, but a clarification would be helpful.

We have added a sentence in the Discussion to clarify where this bias may come from.

1. In the Discussion, the claim that "it is not necessary to use tomography to generate high-resolution reconstructions of macromolecular complexes in cells" is a misconception, at least in part. As demonstrated in works by the same group and others (https://doi.org/10.1016/j.xinn.2021.100166, https://doi.org/10.1038/s41467-023-36175-y, https://doi.org/10.1038/s41586-023-05831-0), 2D imaging of native cellular environments does offer a faster and better way to obtain high-resolution reconstructions compared to tomography. However, tomography provides the entire 3D context of the macromolecules, such as their localization to membranes and the cellular architecture, which can be readily visualized in a tomogram even at low resolution, so methods for structure determination from tilt series data such as subtomogram averaging remain of paramount importance. Most likely, a combination of 2D and 3D imaging approaches will be necessary to retrieve both the highest structural resolution and their cellular context to address biological questions.

We agree and have modified our statement accordingly.

1. The "Materials and Methods" section lacks a description of transmission electron microscopy data collection.

We are sorry for this oversight and have added these details.

1. Finally, the preprint version of this work posted on bioRxiv (https://doi.org/10.1101/2023.07.03.547552) contains the following competing interests statement, which is missing from the submitted version:"The authors are listed as inventors on a closely related patent application named "Methods and Systems for Imaging Interactions Between Particles and Fragments", filed on behalf of the University of Massachusetts."

This is correct. The statement was missing in the first version of the uploaded manuscript and was added after consultation with the eLife editorial office.

1. Quantification of the amount of model bias is then performed using omit maps, where every 20th residue is removed from the template and corresponding reconstructions are compared (for those residues) with the full-template reconstructions. As expected, model bias increases with lower thresholds for the picking. Some model bias (Omega=8%) remains even for very high thresholds. The authors state this may be due to overfitting of noise when template-matching true particles, instead of introducing false positives. Probably, that still represents some sort of problem. Especially because the authors then go on to show that their expectation of the number of false positives does not always match the correct number of false positives, probably due to inaccuracies in the noise model for more complicated images. This may warrant further in-depth discussion in a revised manuscript.

We have added further thoughts regarding the mismatch between expected and actual number of false positives in the Discussion section. A full understanding of the issue likely requires further study, which is currently underway.

1. The authors evaluate the effect of high-resolution 2D template matching on template bias in reconstructions, and provide a quantitative metric for overfitting. It is an interesting manuscript that made me reevaluate and correct some mistakes in my understanding of overfitting and template bias, and I'm sure it will be of great use to others in the field. However, its main point is to promote high-resolution 2D template matching (2DTM) as a more universal analysis method for in vitro and, more importantly, in situ data. While the experiments performed to that end are sound and well-executed in principle, I fail to make that specific conclusion from their results.

We do not see 2DTM as a more universal analysis method for in vitro and in situ data, but as simply as another method that can be used. We have added a sentence in the introduction to clarify this.

1. The authors correctly point out that overfitting is largely enabled by the presence of false-positives in the data set. They go on to perform their in situ experiments with ribosomes, which provide an extremely favorable amount of signal that is unrealistic for the vast majority of the proteome. This seems cherry-picked to keep the number of false-positives and false-negatives low. The relationship between overfitting/false-positive rate and the picking threshold will remain the same for smaller proteins (which is a very useful piece of knowledge from this study). However, the false-negative rate will increase a lot compared to ribosomes if the same high picking threshold is maintained. This will limit the applicability of 2DTM, especially for less-abundant proteins.

The reviewer is correct that the lower SNR of smaller targets poses a fundamental limit to 2DTM. We have stated this in previous studies and have added a sentence in the introduction of the current manuscript to clarify this.

1. I would like to see an ablation study: Take significantly smaller segments of the ribosome (for which the authors already have particle positions from full-template matching, which are reasonably close to the ground-truth), e.g. 50 kDa, 100 kDa, 200 kDa etc., and calculate the false-negative rate for the same picking threshold. If the resulting number of particles does plummet, it would be very helpful to discuss how that affects the utility of 2DTM for non-ribosomes in situ.

The suggested ablation study is a good idea and was reported by Rickgauer et al (2020), cited in our manuscript. We added our own analysis for this dataset in Figure 4-figure supplement 1 and show the proportion of LSUs detected as a function of template mass, indicating detection limit of ~300 kDa. We also added a note in the Results section to explain that the threshold we use to limit false positives means that there are also false negatives, with a rate that depends on their molecular mass.

1. Another point of concern is the dramatic resolution decrease to 8 A after multiple iterations of refinement against experimental reconstructions described in line 159. Was this a local search from the poses provided by 2DTM, or something more global? While this is not a manifestation of overfitting as the authors have conclusively shown, I think it adds an important point to the ongoing "But do we really need tomograms, or can we just 2D everything?" debate in the field, which is also central to the 2D part of 2DTM. Reaching 8 A with 12k ribosome particles would be considered a rather poor subtomogram averaging result these days. Being in the "we need tilt series to be less affected by non-Gaussian noise" camp myself, I wonder if this indicates 2D images are inherently worse for in situ samples. If they are, the same limitations would extend to template matching. In that case, shouldn't the authors advocate for 3DTM instead of 2DTM? It may not be needed for ribosomes, but could give smaller proteins the necessary edge.

We have extensively discussed the advantages and disadvantages of both tomography and 2DTM (Lucas et al, 2021) and think it is not useful to talk in terms of “better” and “worse”. Instead, each technique has its areas of application, and we maintain that a combination of the two may give the best results. The limitation of 8 Å does not apply to reconstructions aligned against high-resolution templates, as demonstrated in the present study. Regarding noise models, there is also need for these in 3DTM, as explained in recent publications: Maurer et al (2023), bioRxiv, doi.org/10.1101/2023.09.06.556487; Cruz-León et al (2023), bioRxiv, doi.org/10.1101/2023.09.05.556310; Chaillet et al (2023), Int. J. Mol. Sci. 24, 13375.

1. Right now, this study is also an invitation to practitioners who do not understand the picking threshold used here and cannot relate it to other template-matching programs to do a lot of questionable template matching and claim that the results are true because templates are "unoverfittable". I think such undesirable consequences should be discussed prominently.

We have added a discussion of this point in the Discussion section.

Recommendations for the authors

1. Lines 58-59: What does "nominally untilted" mean? Has the lamella pre-tilt (milling angle) been taken into account or not? If yes, how?

The lamella milling angle was not taken into account, so there is a tilt built into the sample of about 8° that was not compensated for by a counter-tilt of the microscope goniometer. We have added a note to explain this in the text of the manuscript.

1. Lines 113-114: A brief explanation of the threshold calculation method from Rickgauer et al, 2017 to achieve an expected false positive rate of one per micrograph would be helpful here.

We describe the equation for estimating the false discovery rate later in the manuscript. We have added a note in the text to point the reader to the relevant section of the manuscript.

1. For consistency, it would be interesting to include a plot of the SNR peaks found by 2DTM in the in situ dataset, that could be directly compared to Figure 1 - figure supplement 1B.

We have added this to Figure 2 - figure supplement 1A-C, to directly compare to Figure 1 – figure supplement 1A-C.

1. Showing model-map FSC curves between the density retrieved from the omitted areas and their respective models would provide further evidence not only that they are correct but to what extent.

An FSC calculation would be challenging for small regions, such as side chains and drugs, due to masking artifacts. Moreover, the model was built into an in vitro determined map and was not fit into the in vivo map calculated here. Therefore, deviations between the map and model may reflect differences between the two conditions and may not reflect the agreement of the map to the in vivo structure.

1. Lines 128-130: The figure references are wrong. Here, Figure 1B should probably be Figure 1A (or 1B), and Figure 1C clearly refers to Supplementary Figure 1F (FSC curve).

We have corrected the incorrect figure references.

1. Line 125: Wrong figure reference, Figure 1A here refers to Supplementary Figure 1B (cross-correlation peaks).

We have corrected the incorrect figure references.

1. I haven't been able to find mention of code availability in the manuscript. Given that it is a major outcome of the study, I think it should be provided.

The code is available from the cisTEM repository, github.com/timothygrant80/cisTEM, and an executable version of the program measure_template_bias has been posted for download on the cisTEM webpage, cistem.org. We have added a note in the Methods section to point the readers to these resources.

1. Line 50: "An additional complication of subtomogram averaging for in situ imaging is the selection of valid targets" - This is not specific to subtomogram averaging, but to in situ samples.

We agree and have updated the text to reflect this.

1. Line 77: "if this is true for high-resolution features, which are more susceptible to noise overfitting" - This is not intuitive to me. High-resolution features require more information to be overfitted with a constant set of model parameters, thus making their overfitting harder.

The reviewer is correct that there is more information at high resolution, partially compensating for the low SNR. However, the overall refinement behavior is still dominated by overfitting at high resolution, as we have demonstrated in an earlier publication in Stewart & Grigorieff (2004), Ultramicroscopy 102, 67–84.

1. Line 316: "Baited reconstruction is substantially faster and a more streamlined" - To back this and other similar statements, it would be helpful if the authors provided some time measurements for the execution of their potentially very computationally expensive search.

The current implementation of 2DTM requires 45 GPU hours per template per K3 image to search 13 defocus planes. However, for a comparison, the manual work for annotation, as well as additional processing to align and classify sub-tomograms to generate high resolution averages should also be considered in this comparison. These are highly project-dependent and can exceed the time required for 3DTM manifold. We have clarified this in our Discussion section.

1. Line 319: "We expect focused classification to identify sub-populations to further improve the resolution" - How would this work if refining the 2D data without a high-resolution template resulted in significantly worse resolution even for a ribosome? Or is this meant to be done with prior knowledge of every state?

Classification can be done using existing single particle software. To avoid alignment errors, as described above, particle alignment angles and shifts are fixed during classification. This leaves only the particle occupancy per class to be refined, which appears to lead to good classification. We have added a brief note to explain this strategy. However, since this is not shown in this manuscript, we have not added a more extensive discussion of particle classification.

1. Line 354: "without requiring manual intervention or expert knowledge" - Previous expert knowledge was arguably provided in the form of a high-resolution structure.

We agree with the reviewer and have clarified our statement.

https://doi.org/10.7554/eLife.90486.3.sa4

Article and author information

Author details

Bronwyn A Lucas
1. RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, United States
2. Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, United States
3. Center for Computational Biology, University of California Berkeley, Berkeley, United States
Contribution
Conceptualization, Data curation, Formal analysis, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing – original draft, Project administration, Writing – review and editing

For correspondence
bronwynlucas@berkeley.edu

Competing interests
These authors are listed as inventors on a closely related patent application named "Methods and Systems for Imaging Interactions Between Particles and Fragments", filed on behalf of the University of Massachusetts. The patent relates to the use of the 2DTM method described in this manuscript, to image ligands and drugs bound to larger complexes that can be detected by 2DTM

"This ORCID iD identifies the author of this article:" 0000-0001-9162-0421
Benjamin A Himes
1. RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, United States
2. Howard Hughes Medical Institute, Chevy Chase, United States
Contribution
Conceptualization, Formal analysis, Writing – review and editing

Competing interests
These authors are listed as inventors on a closely related patent application named "Methods and Systems for Imaging Interactions Between Particles and Fragments", filed on behalf of the University of Massachusetts. The patent relates to the use of the 2DTM method described in this manuscript, to image ligands and drugs bound to larger complexes that can be detected by 2DTM

"This ORCID iD identifies the author of this article:" 0000-0001-7777-0298
Nikolaus Grigorieff
1. RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, United States
2. Howard Hughes Medical Institute, Chevy Chase, United States
Contribution
Conceptualization, Software, Formal analysis, Supervision, Funding acquisition, Methodology, Writing – original draft, Project administration, Writing – review and editing

For correspondence
niko@grigorieff.org

Competing interests
This author is also listed as an inventor on a closely related patent application named "Methods and Systems for Imaging Interactions Between Particles and Fragments", filed on behalf of the University of Massachusetts. The patent relates to the use of the 2DTM method described in this manuscript, to image ligands and drugs bound to larger complexes that can be detected by 2DTM. Reviewing editor, eLife

"This ORCID iD identifies the author of this article:" 0000-0002-1506-909X

Funding

Howard Hughes Medical Institute

Nikolaus Grigorieff

Chan Zuckerberg Initiative (2021-234617 (5022))

Nikolaus Grigorieff

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

The authors thank the members of the Grigorieff lab for helpful discussions. We are also grateful for the use of and support from the cryo-EM facility at UMass Chan Medical School. BAL and NG gratefully acknowledge funding from the Chan Zuckerberg Initiative, grant # 2021-234617 (5022).

Senior Editor

Merritt Maduke, Stanford University School of Medicine, United States

Reviewing Editor

Sjors HW Scheres, MRC Laboratory of Molecular Biology, United Kingdom

Version history

Sent for peer review: June 30, 2023
Preprint posted: July 11, 2023 (view preprint)
Preprint posted: September 14, 2023 (view preprint)
Preprint posted: November 10, 2023 (view preprint)
Version of Record published: November 27, 2023 (version 1)

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.90486. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.