webTB.org  Home  Login/out  Consortium Info  Feedback  

TB Structural Genomics Consortium - Strategy


    The Consortium's Structural Genomics Strategy
There are three overall elements of our strategy. These are (1) the targeting of proteins from a single organism that have clearly identified functional importance, (2) engineering both of processes to make them high-throughput and of proteins to make them compatible with our processes, and (3) a consortium approach to structure determination and analysis that focuses diverse types of expertise on a critical problem.
    M. tuberculosis : a target for structural genomics with a major potential impact
Originally published 2001
Tuberculosis (TB) is the single leading cause of human adult death by an infectious organism. It accounts for over 2.3 million deaths per year, primarily in the developing world (Bloom, 1994). Multi-drug resistant M. tuberculosis strains are increasingly being found in the clinic. Early treatment with at least two effective drugs, e.g. isoniazid and rifampicin, can often reduce mortality with multi-drug resistant tuberculosis (Pablos-Mendez, et al., 1996; Park et al., 1996; Turett et al., 1995), but outbreaks of tuberculosis resistant to seven or more drugs have now been reported (Frieden et al., 1996). It appears likely that strains which are completely resistant to existing antibiotics will become increasingly common given the history of progressive drug resistance seen among clinical isolates, and given the rise in cases imported from abroad (McCray et al., 1997). These results point to the critical need for new drugs that are active against alternative targets. Fortunately, because multi-drug resistance is mainly acquired by sequential point mutations in the genes encoding target proteins, multi-drug resistant strains should be susceptible to agents that are in novel structural classes and act against new targets.

One remarkable feature of M. tuberculosis that greatly complicates treatment is "persistence", the ability of the organism to go into a semi-dormant state for many years, during which time most drugs have limited efficacy (Bloom, 1994). Also, different strains of M. tuberculosis show different clinical virulence due to genetic variations in "virulence factor" proteins that are as yet only partially identified. Understanding persistence, reactivation from a persistent state, and virulence are major challenges for which a fundamental understanding of the metabolism of the organism can have direct importance for drug targeting and design.

Despite the clear importance of new antibiotics for treatment of tuberculosis, no new anti-tubercular drugs have been introduced since rifampin was first used in the early 1960s. Several reports describe new anti-M. tuberculosis drugs, but it is unclear whether any of them is likely to be effective in humans (Reddy, 1996; American Thoracic Society Ad Hoc Committee, 1995; Parenti, 1989; Frieden et al., 1996; Kennedy et al., 1996; Heifets et al., 1988; Yamamoto et al., 1996). The development of new drugs would be greatly facilitated by the identification of genes essential for viability of the bacillus as well as persistence factors and virulence factors that contribute to pathogenesis. The corresponding proteins would be candidate drug targets and drug discovery would be facilitated by structural and functional information about those proteins (e.g., Sharma et al., 1998, Gill et al., 1999, Li et al., 2000). This information is unlikely to be derived wholly from the private sector because most pharmaceutical companies are reluctant to undertake large-scale TB drug discovery projects at the moment because of the resources they require. However, they will commit resources to anti-tubercular drug discovery when much of the ground work has been completed and a well-characterized target has been demonstrated. The information produced by this project would therefore be useful and used. Members of the Consortium have active collaborations with Parke-Davis and Glaxo-Wellcome (see attached letters). In addition, several smaller pharmaceutical companies focused on developing antitubercular drugs, particularly those that do not have their own crystallography groups could also benefit from the structural and functional information produced by this project.

Due to the availability of the complete genome of M. tuberculosis and also in part to our development of a genetic system for studying gene function in M. tuberculosis, our understanding of M. tuberculosis is advancing rapidly (Cole et al., 1998) and will allow us to accurately target proteins based on their importance in virulence, persistence, reactivation, and overall viability of the organism. Recently, large-scale expression array analyses have been used to compare M. tuberculosis strains that are virulent and avirulent, allowing the identification of genes potentially involved in virulence (Behr et al., 1999, Jungblut et al., 1999). The M. tuberculosis genes responsible for persistence have been difficult to identify, but we now have genetic screens that permit us to begin to identify persistence factors as well (Cox et al, 1999, Daugelat & Jacobs, 1999). This screen will be extended to genes responsible for reactivation.

In addition to genes we identify as involved in persistence, virulence, and reactivation, other attractive drug targets involve gene products from important metabolic pathways such as cell wall synthesis and propagation of the organism. Functions and functional linkages in M. tuberculosis can now be identified by informatics approaches we have developed that use both direct comparison to other organisms and correlation of evolutionary and sequence properties (Marcotte et al, 1999). We will determine structures of these proteins from these pathways and analyze them in terms of the functional information that we have available. In cases where the ligands, substrates, or co-factors of a protein can be identified, we will do binding and other biochemical experiments and, where useful, determine co-crystal structures to identify the features critical for binding or catalysis. Our collaborators in the pharmaceutical industry will take the functional and structural information we obtain and use them in high throughput drug screening and structure-based drug-design.

Targeting of functionally important proteins.

We are emphasizing structure determination of proteins known to be functionally important for pathenogenicity, although the precise function will in many cases not be known at the time structure determination is begun. It is our intent that the structures we determine will be useful not only as contributions to a database of folds and functions, but also as a basis for an in-depth understanding of biological processes. We view this targeting strategy, discussed in detail in the "Background" section, as a key to maximizing the scientific returns from this structural genomics project. Accordingly, we are targeting based on known function, genetic screens, informatics and interest from the M. tuberculosis research community.

Engineering of processes and proteins.

Our approach to streamlining the process of structure determination and decreasing costs for the process has two related parts. First, we will develop and use techniques that are intrinsically well-suited for high throughput, and second, we will engineer our protein targets to take advantage of these techniques. We have developed methods for engineering proteins to make them highly soluble and expressed at high levels in E. coli and in vitro (Waldo et al., 1999). This technology will allow us to develop a standardized process to produce and purify selenomethionine-containing variants of most proteins in large quantity, even those that originally contain insufficient methionine for MAD structure determination. An ability to do this in turn will allow us to use the selenomethionine MAD approach and our fully automated structure solution software to solve most of our structures. In this process all steps except crystallization will be highly reliable, and we will use extensive crystallization screening to maximize the yield of crystals. NMR will be used to determine some of the structures of proteins that do not crystallize. The overall result will be a process that has about a 40% probability of structure determination for any given protein. Such a process is fundamentally different from one that uses screening to identify the few proteins for which structure solution is easy (Terwilliger et al., 1998). Though more difficult, it is vastly more valuable because the selection of targets can be based on importance of the structural information that is to be obtained, not on ease of structure solution.

The consortium approach to structural genomics.

Our Consortium for Structural Genomics is composed of laboratories from institutions in countries. At the time of inception, it consisted of 13 Members, who are to be directly funded by this Center, and 11 Associates, who have other sources of funding but are otherwise in every way equivalent to Members. The Consortium offers experience, facilities, and a record of innovation in every aspect of structural genomics that would be difficult to obtain from a single institution. We are collectively responsible for 3.3% of all protein structures in the Protein Data Bank (349 of 10561 protein structures as of 1/1/2000; Berman et al., 2000). Developing the consortium approach, we have already carried out a structural genomics project on a thermophile and have solved 6 new protein structures, identified bottlenecks and developed high-throughput methods for protein production. We have much experience with our target organism, M. tuberculosis. We have determined structures of 7 M. tuberculosis proteins (see ?Preliminary Results?), and we have carried out combined structural and biochemical investigations to understand their functions in depth (e.g., Rozwarski, 1998, Dessen et al, 1995).

The Consortium has made fundamental innovations in all aspects of structural genomics that are central to success of the approach. Participants in the Consortium have developed targeting strategies for structural genomics based on genetic screens (Cox et al, 1999, Daugelat & Jacobs, 1999), whole genome functional analyses (Marcotte et al., 1999a, Pellegrini et al., 1999), and likelihood of obtaining novel fold information (Mallick et al., 2000). We have worked on the problem of inferring function from structure (Colovos et al., 1998). We have discovered how to engineer proteins for optimal expression and solubility (Waldo et al., 1999) and how to express proteins in vitro with high yield (Takanori et al., 1999), making it possible to develop a standardized procedure for protein production. We have developed methods for increasing the stability and solubility of membrane proteins (Zhou and Bowie 2000), increasing the promise of crystallization for this important class of proteins. We have developed crystallization screening approaches that are applicable to high-throughput crystallography (Segelke 1995, Segelke and Rupp 1998). We have invented algorithms and written software used world-wide for macromolecular X-ray structure determination (e.g., Brunger et al., 1998; Terwilliger & Berendzen, 1999). Finally, Consortium members have invented and developed the concept of threading for template-based fold recognition and structure prediction (Bowie et al., 1991), and methods for detecting subtle errors in protein structures (Colovos & Yeates, 1993).

There are several major advantages to this consortium approach. One is that a world-wide effort can be devoted to a defined set of structural targets with a relatively small investment in new infrastructure. Because of the independent sources of funding for Associates, the Consortium will be able to focus approximately 50 FTE of effort on this project. Access to the methods and facilities of the Consortium will greatly amplify the efforts of all the participants. A second advantage is that the knowledge and experience of a diverse group of investigators can be combined and focused on one project. This is particularly important in the structural analysis step, because individual attention will be needed for analysis of every protein structure we determine. A third advantage is that new techniques can be developed and tested at several institutions and the best of them incorporated into our main production facilities. During the first year of the project, the basic steps of protein production and crystallization will be shared about equally by the facilities and individual investigators. By the middle of the project, however, the bulk of this work will be carried out at the facilities, allowing the individual investigators increased time for structure determination and analysis.