Single-cell transcriptome atlases of oilseed and lignocellulosic crops

Proteomic and interactomic insights

Gene-to-product networks underlying key biological processes

Single-cell transcriptome atlases of oilseed and lignocellulosic crops

In addition to building single-cell gene expression atlases of our four plant bioenergy species, the Plant Systems initiative also provides chromatin accessibility information to complement expression atlases, and develops spatial single-cell expression atlases of key tissues. This work enables deconstruction and targeted manipulation for biofuels and bioproducts, in collaboration with partner BRCs, through descriptions of transcriptome heterogeneity at the single-cell level, identification of cell types of interest, and spatial mapping of pathways of interest. Detailed and spatially resolved atlases describe gene transcription activities underlying biomass production, enabling their targeted manipulation to increase yield and usability.

Single-cell atlases offer a trove of data but are not available for the target bioenergy crops. Cell-specific transcriptomes of many cell types, including cells crucial for biofuel development, are not available in any species. Chromatin accessibility data through ATAC-seq can be used to identify cis-regulatory elements and constrain co-expression networks. A major limitation for the broader application of single-cell technologies in crops is the need to decode the identity of the cell populations using reporter gene activity of marker genes or RNA in situ hybridization, which are low-throughput and cumbersome. Spatially resolved scRNAseq overcomes this limitation and is particularly well-suited for use in the target species.

Since many plant products are synthesized in specific cell types, there will be substantial variation in the distribution of metabolic activities and associated gene expression throughout the plant body. Co-expression analyses of metabolic pathways, transcriptional networks, and signaling pathway components permits the reconstruction of metabolism, its regulation, and cross-talk between cells, in a spatial single-cell manner.

Single-cell transcriptome profiling offers unprecedented resolution of the transcriptional programs of thousands of cells representing distinct cell types or cell states, including rare populations that could not be isolated by other means. Open chromatin profiles for each nucleus identify clusters with similar calls peaks, defining cell types and states for comparison with single-cell expression data. Spatial expression information provides a layer of understanding of gene regulation lost in conventional single-cell transcriptome profiling, which relies on cell dissociation. Since plant development, growth, and metabolism are strongly position-dependent, decoding the spatial organization of expression domains of thousands of genes provides key datapoints in predictive models of gene regulation. Different cell types contribute to the formation of the bulk of the biomass of the target lignocellulosic crops, and the mechanisms controlling lignin structure and deposition are expected to diverge in these species.

  • In Poplar and switchgrass, initial cell types of interest include expanding and dividing cells in growing plant parts (above and below ground) and the vascular elements (sieve elements, tracheary elements) and associated vascular parenchyma, cambium, and fibers.
  • In seed oil crops, biomass production and accumulation depend on the coordination of many floral functions, including pollen development and pollen tube growth, fertilization, ovule development and the supply of assimilates to the growing seed. The synchronicity of these processes is poorly understood, and floral single-cell transcriptomes will provide a detailed understanding of ovule and seed development in Camelina and pennycress.

As  proof of concept, the Arabidopsis flower single-cell floral atlas has provided insight into the transcriptional profiles of diverse cell types, documented transcriptional heterogeneity in known cell types, and revealed novel cell types. The figure shows a carpel scRNA transcriptome space. The transcriptional profiles of some cell types can be arranged in a pseudo-time progression representing developmental trajectories.

Single-cell atlas of the Arabidopsis carpel.
Each dot represents a cell.  Cells with similar transcriptional profiles are grouped together in distinct clusters, which represent distinct cell types or cell states.  The identity of the clusters is determined by expression of known marker genes and reporter activity of novel marker genes specifically expressed in each cluster.  Cluster identity is color-coded according to the legend.

Proteomic and interactomic insights

Next-generation interactome maps of the four plant species of interest facilitate a systems understanding, as well as a pathway to manipulation, of important biological processes involved in biofuel and bioproduct production. Spatially relevant interactomes that underlie the successful growth and development of bioenergy relevant crop species, include those involved in cell wall biosynthesis and modification, floral and seed development, and oil biosynthesis.

Interactome maps are constructed by:

  • Identifying protein targets from transcriptome atlases
  • Exploring protein-protein and protein-carbohydrate interactions critical to bioenergy production in vivo

Moving from transcriptome to proteins
While transcriptomes provide a wealth of data, they can only give us information at the RNA level. Several recent studies in plants have indicated that RNA levels do not necessarily correlate with protein levels. In order to build robust networks underlying the growth and development of bioenergy relevant crops, the analyses use bulk proteomic data generated for organs or interest.

Protein-Protein Interactions

Inside the cell.
The components of the plant gene expression machinery function in the context of large multimeric complexes. The composition of these complexes, their dynamic regulation (e.g., through phase separation), and their developmental dynamics are largely unknown. Describing functional complexes, dynamics, and the protein ‘neighborhoods’ of key developmental regulators controlling cell decisions in vivo will reveal regulation scenarios for cell types involved in plant growth and development. The interactions between cell wall biosynthesis machinery inside the secretory system of the cell is also of interest; understanding which proteins interact with which will be key in mapping out wall biosynthesis pathways in a single-cell specific manner.

Within the cell wall
The dynamic modification of cell wall polysaccharides is an essential component of plant growth and development; however, we know little about how this remodeling occurs through protein activity and what protein-protein interactions might be involved within the cell wall space. For example, it is not clear whether most modifying proteins act in concert or independently and for others where interactions are regulatory (e.g., pectin modification) the details are virtually non-existent. Describing functional complexes and the protein ‘neighborhoods’ of wall modifying proteins is an essential component of understanding plant growth as it relates to biomass production.

Predicted disorder in SAUR19-related proteins.

Disordered prediction by residue position for AtSAUR19-similar sequences from Camelina (CsSAUR20) and poplar (PaSAUR21 and PtSAUR21).  Residues above the 0.5 probability threshold are predicted to be disordered.

Proof of concept
The work on phase-separation within the DOE IGP (Eisenberg) has been leveraged to explore Small Auxin Upregulated RNAs (SAURs). SAUR proteins are implicated as important growth and development regulators in several plant species, including switchgrass, but we have little understanding of how they are regulated themselves. Most higher plant species contain between 60 and 140 SAUR genes in their genomes, including Arabidopsis, sorghum, and rice; the N- and C-terminal regions are the most variable regions and may be involved in functional regulation. Analysis of Arabidopsis SAUR proteins indicates that they may contain disordered domains (Braybrook, with Eisenberg). An example disordered prediction plot for Arabidopsis SAUR19 and its predicted homologs in Camelina and Poplar indicates that both termini may be disordered (figure).

Protein-Carbohydrate Interactions
Beyond the unknown landscape of protein-protein interactions within the cell wall, we also have little knowledge of protein-carbohydrate interactions or carbohydrate maps themselves. This has limited our ability to productively manipulate cell wall glycans to improve feedstock bioenergetics. Current methods rely on NMR or ex vivo enzymatic accessibility assays and as such we have limited in situ knowledge of how these proteins and their carbohydrates interact in physiologically relevant conditions. While there are structures known for a few wall modifying proteins (e.g., a pectin methylesterase and a xyloglucan transferase hydrolase) important for plant growth, there are many left to determine, and we have little understanding of how they interact with each other and wall polysaccharides in situ. In addition, the structures and interactions exhibited by these proteins will vary based on physiological context (e.g., temperature and wall acidity); this structure vs environment interplay will be critical for understanding how the cell wall is made and modified during plant growth and development. Lastly, we still have only an elementary understanding of which wall carbohydrates are where within growing plants, mostly through antibody-based analyses (e.g., those from the CCRC in Georgia).

A finer understanding of in vivo binding of these antibodies will prove essential for our understanding of cell wall biosynthesis and function as it relates to growth-regulating networks.

Proof of concept
We are developing and deploying new technology to map protein-carbohydrate interaction sites in situ at high resolution utilizing the Fast Photochemical Iodination and Capture by Suzuki (FPICS) method. The FPICS process is described fully below but in brief involves iodine-tagging of carbohydrates followed by retrieval and mass-spec sequencing of spatially associated proteins. FPICS is groundbreaking because it (1) requires only a very small and easily incorporated iodo modification which limits potential functional interference; (2) is compatible with a wide range of biomolecules; (3) eliminates challenges associated with deconvolving the spectra of crosslinked peptides and the frequent unwanted fragmentation of large saccharides; and (4) globally maps interaction sites with residue-level resolution, which is essential for obtaining mechanistic insights into these interactions. Preliminary studies with animal cells confirm that proteins can be efficiently iodinated using a pulsed KrF excimer laser. Using iodobenzoic acid as an iodine source, apomyoglobin was found to be iodinated at 6 His and Tyr residues as determined by top-down (whole protein) MS (Figure, A). Tryptic digest and MS/MS analysis revealed that these residues flank the heme binding site, consistent with preferential labeling of residues proximal to putative binding sites. Extension of this method to interaction mapping in complex cell lysates revealed hundreds of unique labeled sites for test compounds iodocytidine and iodobenzoic acid (Figure, B). These data support compatibility of the method with interaction site mapping of a variety of iodinated molecules in both purified proteins and more complex systems.

Combining KrF excimer laser iodination of complex cell lysates with Suzuki–Miyaura chemoproteomics extends our method to proteome-wide identification of binding sites. Cell lysates were subjected to laser irradiation in the presence of iodobenzoic acid (500 µM) followed by biotinylation using our bioorthogonal Suzuki–Miyaura cross-coupling conditions. Streptavidin blot of the labeled lysates revealed substantial biotinylation in the laser-irradiated sample and minimal background in the control sample (Figure, C). In a pilot chemoproteomic study, 232 unique biotinylated tyrosine residues were identified when FPICS-treated cellular lysates were then subjected to MS/MS analysis, which supports that FPICS is compatible with binding site analysis in whole cell lysates. Ongoing studies are aimed at increasing the coverage of labeled peptides.

A. MS spectra of iodinated myoglobulin.
B. Proteome-wide lableling.
C. FPICS Suzuki blot.

Network inference and exploration

The reconstruction of gene regulatory networks from single-cell data leverages chromatin accessibility and proteomics data generated to constrain gene co-expression networks. The networks can be interrogated by perturbation using mutants in relevant pathways.

The quantitative detection of expression levels of thousands of genes in thousands of cells provides the unique opportunity to study gene co-expression and the correlation of expression levels among genes, and to construct gene co-expression networks relevant to plant development. An extended approach allows us to chart where lignocellulosic biomass biosynthesis and modification occur, relevant to plant growth, on a single cell level with spatial, structure, and interactome information for functional constraints.

Differentiating between direct and indirect interactions through co-expression networks at the transcriptional level is challenging because current algorithms are ‘greedy’ and report both direct and indirect interactions, which requires an independent way to constrain the network graphs. Chromatin accessibility obtained from single-cell ATAC-seq and DNA-protein interactions in vivo provide a direct observation of promoter occupancy and can be used to infer direct regulation. Complementing chromatin accessibility data with mined protein complex reference maps (McWhite et al., 2020) and protein proximity labelling allows network constraint and differentiation between direct and indirect interactions, and between major regulators and their co-expressed cofactors.

Expanded networks are expected to reveal novel insights into the regulation of growth, development, and cell wall biosynthesis. In collaboration with the Enabling Capabilities team, we can test if these factors physically interact in vivo in the autologous system as a way to establish direct interactions and directionality of computational inferred gene co-expression networks, and study higher-order interactions in transcriptional regulation.

Proof of concept

Gene co-expression networks for single-cell RNA sequencing data in the Arabidopsis flower are based on predicted cell populations; these were constrained by DAP-seq and available ChIP-seq data to generate gene regulatory networks describing diverse developmental and physiological processes in the flower, including cell wall elaboration (figure), the plant secretory program, cuticle formation, stress-induced pigment synthesis, glucosinolate synthesis, and cell separation and organ abscission, among others. These networks reveal known and novel connections between the expression of known regulators and their downstream targets, and open exciting new avenues to explore the regulatory logic of floral development and floral cell differentiation. The modular structure of floral regulatory networks allows to compare the involvement of individual modules tasked with specific developmental outputs in different cell populations and to reconstruct the molecular events that led to the specialization, elaboration, and diversification of developmental programs and the origin and evolution of plant cell types.

Network development in the Arabidopsis flower.  A shared module involved in lignin biosynthesis (yellow) is embedded in the cell type-specific co-expression networks of two distinct cell types, which lignify the vessel elements of the xylem and the endothecium of the anther wall.  Co-expression networks are estimated using machine learning methods (random forest regression and mutual information).  Cell wall elaborations (arrows) appear as spirals along the length of elongated cells, which are dead at maturity.