|
Tuberculosis (TB) is the single leading cause of human adult death by
an infectious organism. It accounts for over 2.3 million deaths per
year, primarily in the developing world (Bloom, 1994). Multi-drug
resistant M. tuberculosis strains are increasingly being found in the
clinic. Early treatment with at least two effective drugs,
e.g. isoniazid and rifampicin, can often reduce mortality with
multi-drug resistant tuberculosis (Pablos-Mendez, et al., 1996; Park
et al., 1996; Turett et al., 1995), but outbreaks of tuberculosis
resistant to seven or more drugs have now been reported (Frieden et
al., 1996). It appears likely that strains which are completely
resistant to existing antibiotics will become increasingly common
given the history of progressive drug resistance seen among clinical
isolates, and given the rise in cases imported from abroad (McCray et
al., 1997). These results point to the critical need for new drugs
that are active against alternative targets. Fortunately, because
multi-drug resistance is mainly acquired by sequential point mutations
in the genes encoding target proteins, multi-drug resistant strains
should be susceptible to agents that are in novel structural classes
and act against new targets.
One remarkable feature of M. tuberculosis that greatly
complicates treatment is "persistence", the ability of the organism to
go into a semi-dormant state for many years, during which time most
drugs have limited efficacy (Bloom, 1994). Also, different strains of
M. tuberculosis show different clinical virulence due to genetic
variations in "virulence factor" proteins that are as yet only
partially identified. Understanding persistence, reactivation from a
persistent state, and virulence are major challenges for which a
fundamental understanding of the metabolism of the organism can have
direct importance for drug targeting and design.
Despite the clear importance of new antibiotics for treatment of
tuberculosis, no new anti-tubercular drugs have been introduced since
rifampin was first used in the early 1960s. Several reports describe
new anti-M. tuberculosis drugs, but it is unclear whether any of them
is likely to be effective in humans (Reddy, 1996; American Thoracic
Society Ad Hoc Committee, 1995; Parenti, 1989; Frieden et al., 1996;
Kennedy et al., 1996; Heifets et al., 1988; Yamamoto et al.,
1996). The development of new drugs would be greatly facilitated by
the identification of genes essential for viability of the bacillus as
well as persistence factors and virulence factors that contribute to
pathogenesis. The corresponding proteins would be candidate drug
targets and drug discovery would be facilitated by structural and
functional information about those proteins (e.g., Sharma et al.,
1998, Gill et al., 1999, Li et al., 2000). This information is
unlikely to be derived wholly from the private sector because most
pharmaceutical companies are reluctant to undertake large-scale TB
drug discovery projects at the moment because of the resources they
require. However, they will commit resources to anti-tubercular drug
discovery when much of the ground work has been completed and a
well-characterized target has been demonstrated. The information
produced by this project would therefore be useful and used. Members
of the Consortium have active collaborations with Parke-Davis and
Glaxo-Wellcome (see attached letters). In addition, several smaller
pharmaceutical companies focused on developing antitubercular drugs,
particularly those that do not have their own crystallography groups
could also benefit from the structural and functional information
produced by this project.
Due to the availability of the complete genome of M. tuberculosis
and also in part to our development of a genetic system for studying
gene function in M. tuberculosis, our understanding of M. tuberculosis
is advancing rapidly (Cole et al., 1998) and will allow us to
accurately target proteins based on their importance in virulence,
persistence, reactivation, and overall viability of the
organism. Recently, large-scale expression array analyses have been
used to compare M. tuberculosis strains that are virulent and
avirulent, allowing the identification of genes potentially involved
in virulence (Behr et al., 1999, Jungblut et al., 1999). The
M. tuberculosis genes responsible for persistence have been difficult
to identify, but we now have genetic screens that permit us to begin
to identify persistence factors as well (Cox et al, 1999, Daugelat &
Jacobs, 1999). This screen will be extended to genes responsible for
reactivation.
In addition to genes we identify as involved in persistence,
virulence, and reactivation, other attractive drug targets involve
gene products from important metabolic pathways such as cell wall
synthesis and propagation of the organism. Functions and functional
linkages in M. tuberculosis can now be identified by informatics
approaches we have developed that use both direct comparison to other
organisms and correlation of evolutionary and sequence properties
(Marcotte et al, 1999). We will determine structures of these proteins
from these pathways and analyze them in terms of the functional
information that we have available. In cases where the ligands,
substrates, or co-factors of a protein can be identified, we will do
binding and other biochemical experiments and, where useful, determine
co-crystal structures to identify the features critical for binding or
catalysis. Our collaborators in the pharmaceutical industry will take
the functional and structural information we obtain and use them in
high throughput drug screening and structure-based drug-design.
Targeting of functionally important proteins.
We are emphasizing structure determination of proteins known to be
functionally important for pathenogenicity, although the precise
function will in many cases not be known at the time structure
determination is begun. It is our intent that the structures we
determine will be useful not only as contributions to a database of
folds and functions, but also as a basis for an in-depth understanding
of biological processes. We view this targeting strategy, discussed in
detail in the "Background" section, as a key to maximizing the
scientific returns from this structural genomics project.
Accordingly, we are targeting based on known function, genetic
screens, informatics and interest from the M. tuberculosis research
community.
Engineering of processes and proteins.
Our approach to streamlining the process of structure determination
and decreasing costs for the process has two related parts. First, we
will develop and use techniques that are intrinsically well-suited for
high throughput, and second, we will engineer our protein targets to
take advantage of these techniques. We have developed methods for
engineering proteins to make them highly soluble and expressed at high
levels in E. coli and in vitro (Waldo et al., 1999). This technology
will allow us to develop a standardized process to produce and purify
selenomethionine-containing variants of most proteins in large
quantity, even those that originally contain insufficient methionine
for MAD structure determination. An ability to do this in turn will
allow us to use the selenomethionine MAD approach and our fully
automated structure solution software to solve most of our structures.
In this process all steps except crystallization will be highly
reliable, and we will use extensive crystallization screening to
maximize the yield of crystals. NMR will be used to determine some of
the structures of proteins that do not crystallize. The overall result
will be a process that has about a 40% probability of structure
determination for any given protein. Such a process is fundamentally
different from one that uses screening to identify the few proteins
for which structure solution is easy (Terwilliger et al., 1998).
Though more difficult, it is vastly more valuable because the
selection of targets can be based on importance of the structural
information that is to be obtained, not on ease of structure solution.
The consortium approach to structural genomics.
Our Consortium for Structural Genomics is composed of laboratories
from institutions in countries. At the time
of inception, it consisted of 13 Members, who
are to be directly funded by this Center, and 11 Associates, who have
other sources of funding but are otherwise in every way equivalent to
Members. The Consortium offers experience, facilities, and a record
of innovation in every aspect of structural genomics that would be
difficult to obtain from a single institution. We are collectively
responsible for 3.3% of all protein structures in the Protein Data
Bank (349 of 10561 protein structures as of 1/1/2000; Berman et al.,
2000). Developing the consortium approach, we have already carried
out a structural genomics project on a thermophile and have solved 6
new protein structures, identified bottlenecks and developed
high-throughput methods for protein production. We have much
experience with our target organism, M. tuberculosis. We have
determined structures of 7 M. tuberculosis proteins (see ?Preliminary
Results?), and we have carried out combined structural and biochemical
investigations to understand their functions in depth (e.g.,
Rozwarski, 1998, Dessen et al, 1995).
The Consortium has made fundamental innovations in all aspects of
structural genomics that are central to success of the approach.
Participants in the Consortium have developed targeting strategies for
structural genomics based on genetic screens (Cox et al, 1999,
Daugelat & Jacobs, 1999), whole genome functional analyses (Marcotte
et al., 1999a, Pellegrini et al., 1999), and likelihood of obtaining
novel fold information (Mallick et al., 2000). We have worked on the
problem of inferring function from structure (Colovos et al.,
1998). We have discovered how to engineer proteins for optimal
expression and solubility (Waldo et al., 1999) and how to express
proteins in vitro with high yield (Takanori et al., 1999), making it
possible to develop a standardized procedure for protein
production. We have developed methods for increasing the stability and
solubility of membrane proteins (Zhou and Bowie 2000), increasing the
promise of crystallization for this important class of proteins. We
have developed crystallization screening approaches that are
applicable to high-throughput crystallography (Segelke 1995, Segelke
and Rupp 1998). We have invented algorithms and written software used
world-wide for macromolecular X-ray structure determination (e.g.,
Brunger et al., 1998; Terwilliger & Berendzen, 1999). Finally,
Consortium members have invented and developed the concept of
threading for template-based fold recognition and structure prediction
(Bowie et al., 1991), and methods for detecting subtle errors in
protein structures (Colovos & Yeates, 1993).
There are several major advantages to this consortium
approach. One is that a world-wide effort can be devoted to a defined
set of structural targets with a relatively small investment in new
infrastructure. Because of the independent sources of funding for
Associates, the Consortium will be able to focus approximately 50 FTE
of effort on this project. Access to the methods and facilities of
the Consortium will greatly amplify the efforts of all the
participants. A second advantage is that the knowledge and experience
of a diverse group of investigators can be combined and focused on one
project. This is particularly important in the structural analysis
step, because individual attention will be needed for analysis of
every protein structure we determine. A third advantage is that new
techniques can be developed and tested at several institutions and the
best of them incorporated into our main production facilities. During
the first year of the project, the basic steps of protein production
and crystallization will be shared about equally by the facilities and
individual investigators. By the middle of the project, however, the
bulk of this work will be carried out at the facilities, allowing the
individual investigators increased time for structure determination
and analysis.
|