CA2226898A1 - E1 endoglucanase cellulose binding domain - Google Patents

E1 endoglucanase cellulose binding domain Download PDF

Info

Publication number
CA2226898A1
CA2226898A1 CA002226898A CA2226898A CA2226898A1 CA 2226898 A1 CA2226898 A1 CA 2226898A1 CA 002226898 A CA002226898 A CA 002226898A CA 2226898 A CA2226898 A CA 2226898A CA 2226898 A1 CA2226898 A1 CA 2226898A1
Authority
CA
Canada
Prior art keywords
serine
alanine
threonine
gly
asparagine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002226898A
Other languages
French (fr)
Inventor
Steven R. Thomas
Robert A. Laymon
William S. Adney
Michael E. Himmel
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Midwest Research Institute
Original Assignee
Midwest Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Midwest Research Institute filed Critical Midwest Research Institute
Priority to CA002226898A priority Critical patent/CA2226898A1/en
Publication of CA2226898A1 publication Critical patent/CA2226898A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/24Hydrolases (3) acting on glycosyl compounds (3.2)
    • C12N9/2402Hydrolases (3) acting on glycosyl compounds (3.2) hydrolysing O- and S- glycosyl compounds (3.2.1)
    • C12N9/2405Glucanases
    • C12N9/2434Glucanases acting on beta-1,4-glucosidic bonds
    • C12N9/2437Cellulases (3.2.1.4; 3.2.1.74; 3.2.1.91; 3.2.1.150)
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K1/00General methods for the preparation of peptides, i.e. processes for the organic chemical preparation of peptides or proteins of any length
    • C07K1/14Extraction; Separation; Purification
    • C07K1/16Extraction; Separation; Purification by chromatography
    • C07K1/22Affinity chromatography or related techniques based upon selective absorption processes
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K17/00Carrier-bound or immobilised peptides; Preparation thereof
    • C07K17/02Peptides being immobilised on, or in, an organic carrier
    • C07K17/10Peptides being immobilised on, or in, an organic carrier the carrier being a carbohydrate
    • C07K17/12Cellulose or derivatives thereof
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y302/00Hydrolases acting on glycosyl compounds, i.e. glycosylases (3.2)
    • C12Y302/01Glycosidases, i.e. enzymes hydrolysing O- and S-glycosyl compounds (3.2.1)
    • C12Y302/01004Cellulase (3.2.1.4), i.e. endo-1,4-beta-glucanase
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide

Abstract

The Acidothermus cellulolyticus E1 endoglucanase, domains thereof, fusion proteins containing such domains, and variants thereof are described. Nucleic acids encoding such proteins or polypeptides are also described. The cellulose binding domain of the E1 endoglucanase is useful in labelling or modifying a cellulose or other polysaccharide surface, and in purifying or immobilizing a binding domain fusion protein to cellulose or other polysaccharide.

Description

This application is a continuation-in-part of copending U.S. Serial No.
08/604,913, filed February 22, 1996, which is; a continuation-in-part of copending U.S.
Serial No.
08/276,213, filed July 15, 1994, now U.S. Patent 5,536,655, which is a continuation-in-part of copending U.S. Serial No. 08/125,115 filed September 21, 1993, now U.S. Patent 5,366,884, which is a continuation-in-part of U.S. Serial No. 07/826,089 filed January 27, 1992, now U.S. Patent 5,275,944, which is a continuation-in-part of U.S.
Serial No.
07/412,434 filed September 26, 1989, now U.S. Patent 5,100,735. Each of the foregoing applications are incorporated herein in their entirety by reference.
The United States Government has rights in this invention under Contract No.
DE
AC36-83 CH 10093 between the United States Department of Energy and the National Renewable Energy Laboratory, a Division of the Midwest Research Institute.
FIELD OF THE INVENTION
The invention relates to genes encoding Acadothermus cellulolyticus E1 endoglucanase or domains thereof, microorganisms containing the E1 endoglucanase gene or recombinant derivatives thereof, and use of heterologous organisms to express the gene or recombinant derivatives to produce the E1 enzyme or derivatives thereof, segments of the E 1 enzyme or derivatives thereof, or a hybrid protein or derivatives thereof.
BACKGROUND OF THE INVENTION
The fermentable fractions of biomass include cellulose (~3-1,4-linked glucose) and hemicellulose. Cellulose consists of long, unbranched, covalently bonded, water insoluble chains of glucose which are resistant to depolymerization (Warren, J., 1996, ASM News 62:85-88). Hemicellulose is a highly branched heterogeneous fraction of biomass that is composed of xylose, glucose and minor five- and six-carbon sugars including arabinose, mannose and galactose, depending on the plant species.
The complete enzymatic degradation of cellulose to glucose, probably the most desirable fermentation feedstock for bioconversions, may be accomplished by the synergistic action of three distinct classes of enzymes. The first class, the "endo-(3-1,4-glucanases" or (3-1,4-D-glucan-4-glucanohydrolases (EC 3.2.1.4), acts at random on soluble and insoluble ~3-1,4-glucan substrates to break the chains. The activity of endoglucanases is commonly measured by the detection of reducing groups released from carboxymethylcellulose (CMC). The second class, the "exo-(3-1,4-glucanases", includes both the ~i-1,4-D-glucan glucohydrolases (EC 3.2.1.74), which liberate D-glucose from 1,4-[3-D-glucans and hydrolyse cellobiose slowly, and ~i-1,4-D-glucan cellobiohydrolases (EC
3.2.1.91) which liberate D-cellobiose from ~3-1,4-glucans. The third class, the "~3-D-glucosidases" or ~3-D-glucoside glucohydrolases (EC 3.2.1.21 ), act to release D-glucose units from soluble cellodextrins, especially cellobiose, and an array of aryl-glycosides. (3-D-glucosidases which preferentially hydrolyze cellobiose are commonly referred to as "cellobiases."
The development of an economic process for the conversion of low-value lignocellulosic biomass to ethanol via fermentation requires the optimization of several key steps, especially that of economic cellulase production. Practical utilization of cellulose by hydrolysis with cellulase to produce glucose requires large amounts of cellulase to fully depolymerize cellulose. For example, about one kilogram cellulase preparation may be used to fully digest fifty kilograms of cellulose.
Economical production of cellulase is also complicated by the relatively slow growth rates of cellulase-producing fungi and the long times required for cellulase induction.
Therefore, improvements in, or alternative cellulase production systems capable of greater productivities and/or less expensive production schemes, higher specific activities of cellulase activity or faster growth rates than may be possible with natural cellulose-producing fungi would significantly reduce the cost of cellulose hydrolysis and make the large-scale bioconversion of cellulosic biomass to ethanol more economical.
In addition to the conversion of cellulose to oligo- or monosaccharides and ethanol, cellulases have other applications. Specifically, the independently-folding cellulose bind domain (CBD) of cellulases can be used to label cellulose or to modify the surface properties of a cellulose matrix or fibers. Surface modification can find application in textile, pulp and paper industries. CBDs can also be used in affinity purification of CBD-fusion proteins. The CBD can also be used to immobilize CBD-fusion proteins to the relatively inert cellulose support (Warren, J., 1996, supra; Tomme, P. et al., 1995, Adv Microbiol. Ph s~ iol. 37:1-80 ). For example, the fusion product of Cellulomonas fimi exoglucanase (Cex) CBD and Agrobacterium sp. (3-glucosidase has been shown to non-covalently adsorb to cellulose without loss of ~3-glucosidase activity (Ong et al., 1989, Bio/Technolo~v 7:604-607). Greenwood et al., 1989, FEBS
244(1):127-131; Seeboth et al., 1992, Appl. Microbiol. Biotechno1.37:621-25;
and Kyaw et al., 1994, Bioresource Technology 50:31-35, also describe fusion proteins comprising CBDs from different bacterial cellulases, and their immobilization or purification on cellulose.
CBDs have been classified into eleven families (personal communication of Peter Tomme, November 1, 1996). Seven families (I, II, III, IV and VI, IX and X) are based on similarities in primary structure; nearly all the CBDs fall in these seven families.
Some of these CBDs have been demonstrated to have affinity for cellulose.
Three additional families (V, VII and VIII) contain functional CBDs that have unrelated primary structures. Families I and II, while representing unrelated primary sequences and three-dimensional structures, contain at least four conserved aromatic residues which are believed to be important for the interaction with cellulose, and at least two cysteines for formation of disulfide bonds. While the mechanism of CBD adsorption to cellulose is not known, the conservation of aromatic amino acids, particularly tryptophan and tyrosine, implies an important role for these residues in cellulose binding through hydrophobic interaction. The exposed positions of conserved aromatic residues in C. fimi and T. reesei CBDs reflects the potential role of these residues in binding cellulose. CBDs vary in their affinity for different physical forms of cellulose (i.e., crystalline, amorphous, etc.). It has been theorized that some CBDs may disrupt non-covalent associations in crystalline cellulose, making the substrate more accessible to enzymatic action by the catalytic domain. At present there is no direct evidence for this mechanism (Tomme, P.
et al., 1995, Adv Microbiol. Ph sY iol. 37:1-80 and Tomme, P. et al., 1995, supra).
If cellulases or CBDs are to be used in industrial and research applications, it is often desirable that they be relatively resistant to harsh process conditions and/or high temperatures. Since shear forces can be applied during pumping and stirring steps, additional stability to this stress is a desired characteristic. Further, resistance to pH
changes is desirable because optimum pH for glucose degradation to ethanol for yeasts is pHS, and because residual acid may remain following acid pretreatment of cellulosic materials. Additionally, resistance to proteases which are produced by common microbial contaminants, is also desirable.
It has been found that highly thermostable cellulase enzymes secreted by the cellulolytic thermophile Acidothermus cellulolyticus gen. nov., sp. nov. have one or more of the foregoing properties. These cellulases are discussed in U.S. Patents 5,275,944 and 5,110, 735. This bacterium was originally isolated from decaying wood in an acidic, thermal pool at Yellowstone National Park and deposited with the American Type Culture Collection (ATCC) under accession number 43068 (Mohagheghi et al 1986.
Int.
J S"~tem. Bacteriol. 36:435-443).
The cellulase complex produced by this organism is known to contain several different cellulase enzymes displaying maximal activities at temperatures of 75° to 83°C.
These cellulases are relatively resistant to inhibition by cellobiose, an end product of the reactions catalyzed by cellulase. Also, the cellulases from Acidotherm. us cellulolyticus are active over a broad pH range centered about pH 6, and are still quite active at pH 5, the pH at which yeasts are optimally able to ferment glucose to ethanol. A
high molecular weight cellulase isolated from growth broths of Acidothermus cellulolyticus was found to have a molecular weight of approximately 156,600 to 203,400 daltons by SDS-PAGE. This enzyme is described in U.S. Patent #5,110,735.
A novel cellulase enzyme, known as the E1 endoglucanase, also secreted by Acidothermus cellulolyticus into the growth medium, is described in detail in U.S. Patent #5,275,944. This endoglucanase demonstrates a temperature optimum of 83°C and a specific activity of 40 .mole glucose released from carboxymethylcellulose/min/mg protein at 85 ° C. This E 1 endoglucanase was further identified as having an isoelectric pH of 6.7 and a molecular weight of between about 57,420 and 74,580 (see U.S.
Pat. #
5,536,655).
It has been proposed to use recombinant cellulase enzymes to either augment or replace costly fungal enzymes for cellulose degradation (Lejeune, Colson, and Eveleigh, in Biosynthesis and BiodeQr~adation of Cellulose, C. Haigler and P.J. Weimer, Eds., Marcel-Dekker, New York, NY 1991, pp. 623-672). The genes coding for Acidothermus cellulolyticus cellulases, or fusion proteins of the El endoglucanase CBD
cloned into Streptomyces lividans, E. coli, Bacillus spp., Picha pastoris, Aspergillus spp., Trichoderma reesei, other microbial host organisms, or crop plants, could provide an abundant, inexpensive source of highly active, thermally stable enzymes.
SUMMARY OF THE INVENTION
It is an object of the present invention to produce biochemically pure, isolated domains or domain variants of the Acidothermus cellulolyticus E 1 endoglucanase protein, including the catalytic, linker and cellulose binding domains, and nucleic acids encoding same.

It is a another object of the present invention to prepare hybrid or fusion endoglucanases, one part of which corresponds to a domain of the E 1 endoglucanase such as the catalytic domain, the cellulose binding domain, the linker region, or combinations thereof.
It is a further object of the present invention to prepare hybrid or fusion proteins that bind to cellulose or other polysaccharide, and which comprise the E 1 endoglucanase cellulose binding domain and some other protein or polypeptide. The fusion proteins can be purified on a cellulose or another polysaccharide matrix, and are suitable for immobilization on the surface of cellulose or another polysaccharide, for the labeling of a cellulose or other polysaccharide surface, or for modifying the surface properties of a cellulose or polysaccharide matrix or fiber. Cellulose or polysaccharide surface property modification by CBD alone or CBD-fusion proteins can find application in textile, pulp, paper, chemical and pharmaceutical industries.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 (SEQ ID NO: 1) shows the 3004 base pair nucleotide sequence of the region of Acidothermus cellulolyticus genomic DNA which contains the E 1 endoglucanase gene.
Figure 2 (SEQ ID NO: 2)shows the amino acid translation of the open reading frame between bases 824-2512 in Fig. 1.
Figure 3 shows a schematic illustration of the domain architecture of the Acidothermus cellulolyticus E 1 endoglucanase protein.
Figure 4 shows a schematic illustration of the putative transcriptional and translational regulatory sequences associated with the E 1 endoglucanase gene aligned with the nucleotide sequence coordinates of the E1 gene.
Figure 5 shows the regions remaining in the collection of deletion mutants of the original 2.2 kb E 1 gene clone and whether or not the remaining gene fragment expresses a protein with endoglucanase activity. This 2.2 kb fragment does not contain the entire E 1 gene. It is C-terminal truncated (3'-end missing), and lacks most of the coding sequence for the CBD.
Figure 6 shows an amino acid sequence comparison between the catalytic domains of three homologous family 5 endoglucanases from different bacteria, Bacillus polymyxa B-1,4-endoglucanase (SEQ ID NO: 3), Xanthomonas campestris B-1,4-endoglucanase A
(SEQ ID NO: 4), and Acidothermus cellulolyticus E 1 endoglucanase (SEQ ID NO:
2, residues 1-521), and a consensus sequence (SEQ ID NOS: 5-9).
Figure 7 shows an amino acid alignment to identify homologies among 28 family IIa cellulose binding domains of enzymes characterized from 12 bacteria (SEQ
ID NOS:
2, 10-39).
DESCRIPTION OF THE PREFERRED EMBODIMENTS
The Acidothermus cellulolyticus E1 endoglucanase is a ~i-1,4-endoglucanase or endocellulase which can hydrolyze cellulose preferably and xylan to some degree and is hereafter referred to as E 1 endoglucanase. The gene encoding the E 1 endoglucanase is a 3004 base fragment of DNA that is unique in nature and discretely defined. The natural gene contains a ribosome binding site followed by three direct repeats of an 8 base sequence of unknown function, 1686 base pair open reading frame, signal peptide, termination codon, a putative transcriptional terminator, and a putative upstream transcriptional regulatory sequence which shows homology to sequences found upstream of cellulase -genes isolated from actinomycete bacteria (Lin and Wilson, 1988, J.
Bacteriol., 170:384-346; Yablonsky et al., Biochemistry and Genetics of Cellulose Degradation, Aubert, Benguin, Millet, Eds., Academic Press, New York, NY, 1988, pp.
249-266).
The cloned gene may be expressed in other microorganisms under its natural promotor or another promotor recognized by the host microorganism..
Alternatively, additional copies of the gene may be introduced into Acidothermus cellulolyticus to enhance expression of the enzyme. Additionally, a nucleic acid encoding one or more domains or fragments of the Acidothermus cellulolyticus E 1 endoglucanase may be ligated to nucleic acid encoding domains or fragments from other compatible endoglucanases to create a novel recombinant nucleic acid capable of expressing a hybrid endoglucanase enzyme having beneficial properties from both endoglucanases or any portion thereof. Further, a nucleic acid encoding the E 1 endoglucanase CBD
may be ligated to nucleic acid encoding other proteins or polypeptides to produce a fusion protein that binds to cellulose or polysaccharide, thereby permitting fusion protein purification or immobilization, or modification of a cellulose or polysaccharide surface.
The E 1 endoglucanase gene coding region of 1686 base pairs corresponds to 562 amino acids. The mature protein has an N terminal amino acid sequence which commences at amino acid residue 42 and is 521 amino acids in length. The amino terminus of the mature protein has been established by amino acid sequencing (Edman 1 ~ degradation). Presumably, the first 41 amino acids encode a signal sequence which is cleaved to yield the active E 1 endoglucanase enzyme. Alternatively, the coding sequence may begin at amino acid residue 14 (Met), resulting in a signal sequence of 28 residues.
The nucleotide and amino acid sequences may be seen in Figures 1 and 2 (SEQ ID
NOS: 1 & 2), respectively. Review of the amino acid sequence deduced from the gene sequence indicates that the protein is generally architecturally similar to other cellulase genes. It is a mufti-domain protein, comprising an N-terminal catalytic domain, a linker region and a C-terminal cellulose binding domain of very characteristic amino acid sequence. The approximate enzyme architecture is shown in Figure 3.
The subject invention comprises the native E1 endoglucanase protein (with or without signal sequences), and isolated domains thereof, including the signal sequences, catalytic domain, cellulose binding domain and linker domain. The subject invention also comprises nucleic acids that encode native E 1 endoglucanase or.dom~ins thereof.
The Acidothermus cellulolyticus E 1 endoglucanase gene was cloned using standard recombinant DNA techniques as will be described below. Variations on these techniques are well known and may be used to reproduce the invention.
Alternatively, the DNA molecule of the present invention can be produced through any of a variety of other means, preferably by application of recombinant DNA techniques, the polymerase chain reaction techniques (PCR) or chemical synthesis of the gene. Techniques for synthesizing such molecules are disclosed by, for example, Wu et al., Prog Nucl. Acid. Res.
Molec.
Biol 21:101-141 (1978), and Sambrook et al., Molecular Cloning- A Laboratorv Manual, 2nd ed., Cold Spring Harbor Press, Cold Spring Harbor, NY, 1989.
Standard reference works setting forth the general principles of recombinant DNA
technology and cell biology include Watson et al., Molecular Biolo~y of the Gene, Volumes I and II, Benjamin/Cummings Publishing Co., Inc., Menlo Park, CA ( 1987);
Darnell et al., Molecular Cell Biolow, Scientific American Books, Inc., New York, NY
( 1986); Lewin, Genes II, John Wiley & Sons, New York, NY ( 1985); Old et al., Principles of Gene Maninulatiow An Introduction to Genetic Eneineerin~. 2nd Ed., University of California Press, Berkely, CA (1981); Sambrook et al., (Molecular Cloning:
A Laboratory! Manual, 2nd Edition, Cold Spring Harbor Press, Cold Spring Harbor, NY
(1989)) and Albert et al., Molecular Biology_of the Cell, 2nd Ed., Garland Publishing, Inc., New York, NY ( 1989).
Procedures for constructing recombinant molecules in accordance with the above-mentioned method are disclosed by Sambrook et al., supra. Briefly, a nucleic acid sequence encoding the E 1 endoglucanase gene, a domain thereof, or a recombinant hybrid nucleic acid containing the gene or domain, may be recombined with vector nucleic acid in accordance with conventional techniques, including the use of blunt-ended or cohesive termini for ligation, restriction enzyme digestion to provide appropriate termini, filling in of cohesive ends as appropriate, alkaline phosphatase treatment to avoid undesirable joining, and ligation with appropriate ligases. Alternatively, part or all of the gene may be synthesized chemically in overlapping fragments which are hybridized in groups and enzymatically ligated to form longer double-stranded DNA molecules. The resulting vector may then be introduced into a host cell by transformation, transfection, electroporation, etc. Techniques for introducing a vector into a variety of host cells are well known.
A vector is a DNA molecule, derived from a plasmid, bacteriophage or hybrid, into which fragments of DNA may be inserted or cloned. A vector will contain one or more unique restriction sites, and may be capable of autonomous replication or integration into the genome of a defined host or vehicle organism such that the cloned sequence is genetically reproducible and can be passed from one generation to the next, indefinitely.
Another embodiment of the present invention relates specifically to the native 3004 nucleotide sequence of DNA encoding the Acidothermus cellulolyticus E1 endoglucanase enzyme and accompanying flanking sequences. This DNA encodes a amino acid sequence which is shown in Figure 2 (SEQ ID NO: 2). The molecular weight of the protein deduced from the amino acid sequence is 60,648 daltons and includes a putative 41 amino acid signal peptide (or alternatively, a 548 amino acid protein including a 28 amino acid signal peptide). Other DNA sequences encoding the same 562 amino acids may readily be used, as most amino acids are coded for by a plurality of different DNA triplet codons. Therefore, the gene encoding the Acidothermus cellulolyticus E 1 endoglucanase may be any DNA which encodes the amino acid sequence of Fig. 2 (SEQ
ID NO: 2). The mature E 1 protein is comprised of 521 amino acids with a predicted molecular weight of 56,415 daltons.
One may also use an expression vector as the vehicle to clone the E 1 endoglucanase gene or a domain or fusion protein derivative thereof. In such a situation, the host cell will direct expression of the cloned E 1 endoglucanase coding sequence using a heterologous promotor sequence which turns expression on or off under defined conditions or the gene may be expressed constitutively. The protein may be separated, purified and assayed, or assayed directly from crude host cell homogenates or culture medium.
An expression vector is any DNA element capable of replicating in a host cell independently of the host's chromosome, and which can control the expression of a coding sequence inserted into it at specific locations and in a particular orientation. Such DNA expression vectors include bacterial plasmids and phages and typically include well-characterized promoter sequences to facilitate gene transcription.
The DNA is said to be capable of expressing a polypeptide if it contains nucleotide sequences which contain signals for transcriptional and translational initiation, and such sequences are operably linked to nucleotide sequences which encode the polypeptide. An operable linkage is a linkage in which the signals for transcriptional and translational initiation and the DNA sequence sought to be expressed are connected in such a way as to permit gene expression. In particular, the transcriptional and translational initiation signals are located 5' and ligated in-frame to the polypeptide encoding open reading frame, so that the open reading frame is correctly translated into the desired polypeptide.
The precise nature of the signals required for gene expression vary from organism to organism.
The native promotor for Acidothermus cellulolyticus E 1 endoglucanase may not be functional or efficient for expression in certain microbial hosts, especially those from distantly related phylogenetic groups, such as yeasts and fungi. In such a situation, a suitable promotor region of DNA may be ligated upstream from the E 1 endoglucanase coding sequence to control its expression. In addition to the promotor, one may include regulatory sequences to modulate the level and/or timing of gene expression.
The promoter and therefore expression may be controlled by an inducer or a repressor so that the recipient microorganism expresses the genes) only when desired.
A promoter (i.e., transcriptional initiation site having RNA polymerase binding site arid possibly a regulatory region) directs the precise location in the gene of the transcription start site and the relative strength of initiation of RNA
transcription.
Downstream DNA sequences, when transcribed into RNA, will signal the initiation of protein synthesis by the incorporation of a ribosome binding sequence (RBS).
Bacterial promoter regions will normally include those 5'-non-coding sequences involved with initiation of transcription and translation, such as the -10 and -3 5 sequences, RB S, or other sequences having a similar function. Other sequences which influence gene expression are also considered regulatory sequences. In practice, the distinction may be blurred as the two regions may overlap each other. These sequences may be either the natural sequences from the Acidothermus cellulolyticus E 1 endoglucanase gene, they may be taken from other genes, be synthetic, or a combination of these.
If desired, the non-coding region 3' to the gene sequence coding for El endoglucanase may be obtained by the above-described methods. This region may be utilized for its transcriptional termination regulatory sequences. Thus, by retaining the 3'-region naturally contiguous to the DNA sequence coding for the protein, the transcriptional termination signals may be provided. Where the transcriptional termination signals are not satisfactorily functional in the host cell expression system, then a 3' region functional in the host cell may be substituted.
Transcriptional terminators are characterized by large, inverted repeat sequences, which can form extensive stem-loop secondary structures in DNA or RNA (Stryer, L., Biochemistrv, 3rd Ed., W.H.
Freeman & Co., NY, NY, 1995, pp. 710-711).
For expressing the E 1 endoglucanase gene, or a domain or fusion protein derivative thereof, one may use a variety of microbial hosts including most bacteria, yeast, fungi, algae and various animal and plant host systems. Organisms which are capable of secreting large amounts of protein into the external environment would make ideal hosts for cellulase gene expression.
If the host cell is a bacterium, a bacterial promoter and regulatory system will generally be used. For a typical bacterium such as E. coli, representative examples of well known promoters include trc, lac, tac, trp, bacteriophage lambda PL, T7 RNA
polymerase promoter, etc. When the expression system is a yeast, examples of well known promoters include: GAL 1/GAL 10, alcohol dehydrogenase (ADH), his3, cycl, and the methanol-inducible AOX 1 (alcohol oxidase 1 ) promoter of Pichia pastoris.
Well-known fungal promoters include the glucose catabolite repressed CBHI
cellobiohydrolase I Aspergillus promoters such as glaA (glucoamylase) from A.
niger, the starch inducible A. niger promoter, the alcA (alcohol dehydrogenase I) from A.
nidulans, the adhA (alcohol dehydrogenase I) promoter from A. niger, the gpdA
(glyceraldehyde-3-phosphate dehydrogenase) promoter, and tpi (triose phosphate isomerase) promoter.
Baculoviral promoters can also be used in insect cell and larval systems.
Further, promoters derived from transgenic mammalian lactation systems (e.g., cows, goats, pigs, mice) or transgenic plant systems may be used. For eukaryotic hosts, enhancers such as the yeast Ty enhancer, or the tobacco mosaic viral translational enhancer may be used.
Alternatively, if one wished for the E 1 endoglucanase gene, domain or a fusion protein derivative thereof to be expressed only at a particular time, such as after the culture or host organism has reached maturity, an externally regulated promoter is particularly useful. Examples include those based upon the nutritional content of the medium (e.g., lac, trp, his), temperature regulation (e.g., temperature-sensitive regulatory elements), heat shock promoters (e.g., HSP80A, U.S. Patent 5,187,267), stress response (e.g., plant EF 1 A promoter, U. S. Patent 5,177,011 ), chemically inducible promoters (e. g., tetracycline inducible promoter or salicylate inducible promoter, U.S. Patent 5,057, 422), and light-regulated promoters.
Other suitable hosts for expressing E1 endoglucanase include Trichoderma, Aspergillus, Fusarium, Penicillium, Bacillus, Xanthomonas, and Zymomonas. Each of these hosts may also serve as sources of endoglucanase genes for the formation of mixed domain genes for the production of hybrid enzymes. Further suitable hosts for expressing E 1 endoglucanase include insect cells, mammalian lactation systems and crop plants.

Expression of the native E 1 endoglucanase gene has been demonstrated in E.
coli and Streptomyces lividans. The native mature E1 protein has been expressed using a host-specific promoter in Pichia pastoris. Expressing E1 endoglucanase in E. coli has also been performed under control of a T7 bacteriophage promoter, and could be accomplished using other promoters recognizable by E. coli. Expression in E.
coli has been enhanced by at least a factor of five relative to the native gene with the constructs of the present invention. Expression of the E 1 endoglucanase coding sequence under control of the tipA promoter (thiostrepton-inducible) and the STI-II promoter (trypsin inhibited; Himmel, M. et al., in press, in Fuels and Chemicals from Biomass, J.
Woodward & B. Saha, eds, ACS Series 666, American Chemical Society, Washington, D.C.19 9 7) in S. lividans has also been accomplished. Expression of the E l endoglucanase DNA sequence encoding the mature enzyme was accomplished using the AOX 1 (alcohol oxidase 1 ) promoter and signal sequence in P. pastoris. This recombinant yeast system has been shown to synthesize at least 1.5 grams of functional E 1 endoglucanase per liter of culture grown in fed batch mode, using methanol as a carbon source. This level of production represents approximately a 105-fold increase in expression level relative to the A. cellulolyticus strain from which the E 1 endoglucanase gene was originally cloned.
Intact native, variant or hybrid E 1 endoglucanase proteins can be efficiently synthesized in bacteria by providing a strong promoter and an acceptable ribosome binding site. Levels of expression may vary from less than 1 % to more than 30% of total cell protein.
Chemical derivatives of the E 1 endoglucanase native protein, E 1 fusion protein or E 1 domain, or nucleic acids encoding same, are included within the invention.
Examples of chemical derivatives include, but are not limited to: labels attached to the molecule (e.g., dyes (colored or fluorescent), magnetic or other particles, radionuclides, enzymes, antibodies, enzyme inhibitors, cofactors or substrates, etc.), and chemical modification (methylation, acylation, thiolation, chemical modification of a base or amino acid, etc.).
The chemical moiety can be conjugated to the E 1 endoglucanase domain or fusion protein by methods known in the art such as covalent modification and non-covalent bonding (ionic bonding, hydrogen bonding, hydrophobic bonding, etc.). See, e.g., U.S.
Pat. No.
5,340,731.
The nucleotide sequence may be altered to optimize the sequence for expression in a given host. Different organisms have different codon preferences, as has been reported previously. Codon usage may affect expression levels in host organisms.
Furthermore, the nucleotide sequence may be altered to provide the preferred three dimensional configuration of the mRNA produced to enhance messenger RNA stability, ribosome binding affinity and expression level. Alternatively, the change can be made to enhance production of active enzyme, such as changing internal amino acids to permit cleavage of E 1 endoglucanase from a fusion peptide or to add or subtract a site for various proteases.
Oike, Y., et al.; J. Biol. Chem. 257:9751-9758 (1982); Liu, C., et al., Int.
J. Pept. Protein Res. 21:209-215 ( 1983). It should be noted that separation of E 1 endoglucanase from a leader sequence is not necessary, provided that the E 1 endoglucanase activity is sufficiently acceptable.
Changes to the sequence such as insertions, deletions and site specific mutations can be made by random chemical or radiation induced mutagenesis, restriction endonuclease cleavage to create deletions and insertions, transposon or viral insertion, oligonucleotide-directed site specific mutagenesis, "sloppy" PCR, or by standard site specific mutagenesis techniques as taught in Botstein et al., Science, 229:193-(1985).
Such changes may be made in the present invention in order to alter the enzymatic activity, render the enzyme more susceptible or resistant to changes in temperature, pH or chemical environment, alter regulation of the E 1 endoglucanase gene, alter the mRNA or protein stability (half life) and to optimize the gene expression for any given host. These changes may be the result of either random changes or changes targeted to a particular portion of the E 1 endoglucanase molecule believed to be involved with a particular function. To further enhance expression, the final host organism may be mutated so that it will change gene regulation or its production of the E1 endoglucanase gene product.
Nucleotide sequence changes may be conservative and not alter the amino acid sequence. Such changes could be performed to change expression of the gene or the ability to easily manipulate the gene. Nucleotide sequence changes resulting in amino acid substitutions, insertions or deletions are generally performed to alter the enzyme product to impart different biological properties, enhance expression or secretion or for simplifying purification or gene manipulation. Changes in the DNA sequence outside the coding region may also be made to optimize expression of the gene or to improve the ease of DNA manipulation.
The natural amino acid sequence is believed to contain a signal region and three domains corresponding as follows:
Key From To Description SIGNAL 1 41 Putative signal peptide SIGNAL 14 41 Putative signal peptide (alternative) . .~::.
DOMAIN 42 404 Catalytic domain DOMAIN 405 460 Linker Figure 2 (SEQ ID NO: 2) shows the amino acid translation of the open reading frame between bases 824-2512 in Fig. 1 (SEQ ID NO: 1 ). The N terminal amino acid sequence determined by automated Edman degradation and/or tryptic fragment sequencing from native purified E 1 endoglucanase corresponds to the boxed amino acids 42 to 79 of Figure 2. Thus, the mature N terminus of the E 1 endoglucanase begins at w residue 42. Amino acids upstream of the first boxed sequence constitute the putative signal sequence. Underlined amino acids correspond to the linker peptide which links the amino-terminal catalytic domain to the carboxy-terminal CBD.
Figure 3 shows a schematic illustration of the domain architecture of the Acidothermus cellulolyticus E1 endoglu.canase protein. This Figure includes the relative locations of the catalytic, linker, and cellulose binding domains aligned with the amino acid residues numbered 1-562 from the N terminus. The length of each domain is proportional to the length of the amino acid sequence encoding each domain.
Figure 6 sets forth an amino acid comparison between the catalytic domains of three family 5 endoglucanases, Bacillus polymyxa B-1,4-endoglucanase (GIJN-BACPO, SEQ ID NO : 3 ), Swiss-Prot Accession # P23 548, Xanthomonas campestris B-1,4-endoglucanase A (GUNA XANCP CAT, SEQ ID NO: 4) Swiss-Prot Accession #
P 19487, and Acidothermus cellulolyticus E 1 endoglucanase (E 1 Cat Domain, SEQ ID
NO: 2, amino acids 1-521), and a consensus sequence (100% level, SEQ ID NOS: 5-9).
Figure 7 shows an amino acid alignment to identify homologies among 28 family IIa (Tomme, P. et al. in Enzymatic Degradation of Insoluble Carbohydrates, Saddler, J.N.
& Penner, M.H., Eds., ACS Symposium Series 618:142-163, 1995, American Chemical Society, Washington, D.C.; Tomme, P. et al., 1995, Adv. Microbiol. Physiol.
37:1-80) cellulose binding domains of enzymes characterized from 12 bacteria.
Abbreviations are as follows:
GUN 1 ACICE CBDcore Acidothermus cell ulolyticus E 1 endoglucanase CBD core (residues 432-509 of SEQ ID NO: 2) GUNl BUTFI CBDcore Butyrovibrio fibrisolvens endoglucanase 1 CBD core (SEQ ID
NO: 24) GUNA CELFI CBDcore Cellulomonas fimi endoglucanase A CBD core (SEQ ID NO:
12) GUNB CELFI CBDcore Cellulomonas fimi endoglucanase B CBD core (SEQ ID NO:
15) _ .

CfiCenD CBDcore Cellulomonas fimi endoglucanase D CBD core (SEQ ID NO:

30) CfiCbhA CBDcore Cellulomonas fimi cellobiohydrolase A CBD
core (SEQ ID

NO: 28) CfiCbhB CBDcore Cellulomonas fimi cellobiohydrolase B CBD
core (SEQ ID

NO: 31) GUX CELFI CBDcore Cellulomonas fimi Cex (xylanase) CBD core (SEQ ID NO:

32) CflX CBDcore Cellulomonas flavigena endoglucanase X CBD
core (SEQ ID

NO: 29) GUND CLOCL CBDcore Clostridium cellulovorans endoglucanase A
CBD core (SEQ

ID NO: 13) GUNA MICBI CBDcore Microbispora bispora endoglucanase A CBD core (SEQ ID

NO: 10) MceMcenA CBDcore Micromonospora cellulolyticum endoglucanase A CBD core (SEQ ID NO: 35) GUNA PSEFL CBDcore Pseudomonas fluorescens endoglucanase A CBD
core (SEQ

ID NO: 20) GUNB PSEFL CBDcore Pseudomonas fluorescens endoglucanase B CBD
core (SEQ

ID NO: 21) GUNC PSEFL CBDcore Pseudomonas fluorescens endoglucanase C CBD
core (SEQ

ID NO: 19) XYNA PSEFL CBDcore Pseudomonas fluorescens xylanase A CBD core (SEQ ID

NO: 18) XYNB PSEFL CBDcore Pseudomonas Jluorescens xylanase B CBD core (SEQ ID

NO: 17) XYNC PSEFL CBDcore Pseudomonas fluorescens xylanase C CBD core (SEQ ID
NO: 16) GUNA STRLI CBDcore Streptomyces lividans endoglucanase A CBD core (SEQ ID
NO: 34) SliCelB CBDcore Streptomyces lividans endoglucanase B CBD core (SEQ ID
NO: 25) CHIT_STRLI_CBDcore Streptomyces lividans chitinase CBD core (SEQ ID NO: 23) CHIT_STRPL CBDcore Streptomyces plicatus chitinase CBD core (SEQ ID NO: 22) SroEglS CBDcore Streptomyces rochei endoglucanase S CBD core (SEQ ID
NO: 26) TfuEl CBDcore Thermomonospora fusca endoglucanase E, CBD core (SEQ
ID NO: 14) GUN2 THEFU CBDcore T'hermomonospora fusca endoglucanase E2 CBD core (SEQ
ID NO: 11 ) TfuE4 CBDcore Thermomonospora fusca exoglucase E4 CBD core (SEQ ID
NO: 33) GUNS THEFU CBDcore Thermomonospora fusca endoglucanase ES CBD core (SEQ
ID NO: 27) A "variant" of E 1 endoglucanase refers to an altered E 1 amino acid sequence that substantially retains one or more of the biological functions of the native E

endoglucanase and has at least about 25% amino acid similarity with the E 1 amino acid sequence of Figure 2. A "similarity" of at least 25% between the E1 endoglucanase amino acid sequence and a second polypeptide refers to at least 25% amino acid identity or conservative substitutions (as defined in Example 5, hereinbelow), when maximum alignment of identical or similar residues is obtained. Amino acid similarity can be determined using GeneWorksTM version 2.5 (Oxford Molecular Group; Inc., formerly Intelligenetics, Inc., Menlo Park, CA), using the CLASSIC alignment. option, which is based on an algorithm similar to FASTA (see references in GeneWorks Reference Manual, p. 255). Penalties for scoring alignments are as follows: cost to open a gap=5;
cost to lengthen a gap=25; minimum diagonal length=4; maximum diagonal offset=20;
and consensus cutoff=50%. Variants also include those polypeptides having amino acid S similarities of at least 30%, 40%, 50%, 60%, 70%, 80%, 90% and 95%.
"Substantial"
retention of a biological function means retention of at least about 10%, and preferably at least 25% or 50% of the native E1 endoglucanase function or activity.
An E 1 endoglucanase "domain variant" as used herein refers to polypeptides that have at least about 25% amino acid similarity to the relevant El endoglucanase domain.
Variants also include those sequences having amino acid similarities of at least 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 95%, relative to the relevant El domain.
In addition, the domain variant preferably has also identically conserved those amino acid residues that are critical to the biological activity of the relevant E 1 endoglucanase domain. For the A. cellulolyticus E 1 endoglucanase cellulose binding domain, the critical residues are the following: Trp475, G1y478, Trp496, Trp512, Tyr/Trp530 and G1y543 of Fig. 2; or TrplS, GlylB, Trp36, Trp52, Tyr/Trp70, and G1y83 of Fig. Ts GUN 1 ACICE CBD sequence (residues 432-509 of SEQ ID NO: 2).
Further, the spacing between these conserved residues is important. It is preferred that the spacing be as follows: TrpX2(X)GIyX,~TrpX~4(X)(X)TrpX,6(X)[Tyr/Trp]X,2(X)Gly, where X
is any amino acid and parentheses indicate the option of an amino acid at the position.
Multiple residues within brackets indicate that any one of the listed residues can be present at the position. Overall, the foregoing critical residues and the remaining residues in the CBD domain variant must have at least 25% or greater similarity to the GUNl ACICE CBD sequence.
The domain variant can also retain at least about 10%, and more preferably, 25%
of at least one of the biological functions of the relevant E 1 endoglucanase domain.
Alternatively, the domain variant can retain at least about 30%, 40%; 50%, 60%, 70%, 20 w 80%, 90%, 95%, 150%, 200%, 300%, etc. of a biological function of the relevant endoglucanase domain. The biological functions or activities of the catalytic domain are the hydrolysis of (3-1,4-glucan (cellulose), ~3-1,4-xylan (xylan) or a-1,4-glucan (starch) substrates. 'The biological functions or activities of the cellulose binding domain are its binding to cellulose and other polysaccharides such as xylan and starch.
Methods for determining the effect of a particular amino acid substitution in a domain variant can be determined by methods known in the art. For example, see Linder et al., 1995, Prot. Sci. 4:1056-64; and Reinikainen et al., 1992 Proteins:
Structure, Function and Genetics, 14:475-82.
The binding of E 1 endoglucanase cellulose binding domain or its variants to cellulose or other polysaccharide can be measured as relative equilibrium association constant, Kr, using bacterial microcrystalline cellulose (BMCC) or other appropriate polysaccharide and the Langmuir adsorption isotherm model as described in Gilkes, N. et al. ( 1992, J. Biol. Chem. 267:6743-49). Alternatively, the binding of E 1 endoglucanase CBD to a polysaccharide may be measured as equilibrium association constant, Ka, using a standard Scatchard plot. The Langmuir adsorption isotherm model is a more appropriate method where the substrate has overlapping binding sites, resulting in a upward concave Scatchard plot. The cellulose binding domain variant can have a K, or a Ka of at least about 10% of that of the E 1 endoglucanase CBD. Of increasing preference is a CBD variant K, or Ka of at least about 25%. Alternatively, the CBD
variant can have a Kr or Ka of at least about 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 150%, 200%, 300%, etc. of that of the E 1 endoglucanase CBD.
Without wishing to be bound by theory, it is believed that the E1 endoglucanase CBD is roughly wedged-shaped. As the wedge binds to the cellulose, it is believed that the wedge tip is inserted between microfibrils of the cellulose fiber, disrupting the crystalline structure, and making cellulose linkages more accessible to the catalytic domain. This model is consistent with the structure of the filamentous fungus T. reesei CBD as determined by NMR spectroscopy (Hoffren, A. et al. , 1995, Prot. Eng.
8:443-450) and the C. fimi CBD structure as also determined by NMR (Warren, R., 1996, ASM
News 62:85-88). By analogy to the Cex CBD structure, one of the El CBD wedge faces is believed to have four aromatic residues exposed at its surface for interaction with the cellulose or other polysaccharide substrate. These residues, TrplS, Trp36, Trp52, and Tyr 70 of Fig. 7, are believed to be critical for E1 CBD binding to cellulose.
When binding to cellulose, the polar aromatic Tyr residue is believed to hydrogen bond with aliphatic hydroxyl groups accessible on the substrate surface, and the delocalized ~
electrons of the aromatic Trp and Tyr residues may become polarized and also interact with cellulose sugars. In E 1 CBD domain variants, these four residues and the spacing between them are highly conserved.
For the purposes of this application, the terms "hybrid enzyme," or "hybrid protein," "fusion enzyme" or "fusion protein" include all proteins or polypeptides having at least one domain or fragment originating substantially from the E 1 endoglucanase protein fused to another domain or fragment substantially originating from at least one different protein, and any multi-subunit proteins containing same. The signal sequences, and the catalytic, linking and cellulose binding regions may be considered E 1 domains.
Hybrid or fusion enzymes of E 1 endoglucanase may be prepared by ligating nucleic acid encoding one or more E 1 endoglucanase domains to one or more domains from one or more different cellulase genes. Representative examples of other cellulase genes which may be used are Bacillus polymyxa ~i-1,4-endoglucanase (Baud et al.
Journal of Bacteriolo~y, 172:1576-86 (1992)) and Xanthomonas campestis (3-1,4-endoglucanase A (Gough et al, Gene, 89:53-59 ( 1990)). The number of domains in the hybrid protein may be the same or different from any natural enzyme. A large number of different combinations are possible, as a large number of cellulases have now been cloned and sequenced.

It is further contemplated that more than one catalytic domain may be included in the hybrid enzyme. This may result in an increased specific activity and/or altered functionality. Also, a catalytic domain containing cellulase activity other than endoglucanase activity may be included as well to reduce the number of cellulase enzymes one needs of add to a cellulosic substrate for polymer degradation.
Hybrid or fusion proteins can also be produced by ligating a nucleic acid encoding one or more E 1 endoglucanase domains to one or more domains of a non-cellulase gene.
Such non-cellulase genes or domains can include enzymes, antibodies, drugs, hormones and the like. The domains in a fusion protein can be joined by amino acids that provide a site for later cleavage to facilitate downstream process steps such as purification. For example, the linking amino acids can encode a specific proteolytic site, or a site susceptible to chemical cleavage (e.g., by cyanogen bromide or acid for instance).
The E 1 domain or E 1 fusion protein can also be chemically derivatized with a chemical moiety. As discussed above, the chemical derivative can comprise a chemical moiety selected from the group consisting of a dye (e.g., colored or fluorescent), radionuclide, diagnostic reagent, magnetic or other particle, an enzyme, antibody, enzyme inhibitor, cofactor or substrate, and the like.
The E 1 CBD or hybrid proteins comprising the E 1 CBD can find application in protein purification, affinity chromatography matrix modification, and/or cellulose or other polysaccharide surface modification. Specifically, a fusion protein of E
1 CBD and an enzyme can be affinity purified on a cellulose or other polysaccharide column. The fusion protein can then be cleaved at a proteolytic site engineered into the junction of the E 1 CBD and enzyme. Additionally, a fusion protein of E 1 CBD and a protein or polypeptide can be used to modify the chemical or physical properties of a cellulose or polysaccharide matrix column. For example, a fusion protein of CBD and biotin would modify the chemical properties of a cellulose or polysaccharide matrix.
Further, E1 CBD
can be used to modify (e.g., roughen or disrupt) a cellulose or polysaccharide fiber surface. Finally, a chemical derivative of an E 1 CBD fusion protein, such as an E 1 CBD
derivatized with a dye, can be used to modify or label a cellulose or polysaccharide fiber surface.
The foregoing and similar applications are more fully described in U.S. Pat.
Nos.
5,340,371; 5,137,819; and 5,202,247; and in WO 93105226.
Another preferred embodiment is to use the E1 endoglucanase produced by recombinant cells to hydrolyse cellulose in cellulosic materials for the production of sugars per se or for fermentation to alcohol or other chemicals, single cell protein, etc.
The processes for the fermentation of sugars to alcohol and its many variations are well known.
In situations where it is desired to simultaneously ferment the sugars produced by hydrolysis of cellulose, one may use yeast, Zymomonas mobilis, or genetically engineered E. coli (U.S. Pat. #5,000,000) as suitable hosts for introducing the El endoglucanase gene or use a mixed culture of an alcohol producing microbe and the E 1 endoglucanase enzyme or microbe producing enzyme. If insufficient endoglucanase protein is released, the culture conditions may be changed to enhance release of enzyme. Other suitable hosts include any microorganism fermenting glucose to ethanol such as Lactobacillus sp. or Clostridium sp. and microorganisms fermenting a pentose to ethanol such as Pichia stipidis and metabolically engineered Zymomonas (Picataggio et al., U.S. Pat.
No.
5,514,583. In addition to the glucose-fermenting host cells, and the released endoglucanase enzyme, other enzymes are added to permit the complete conversion of cellulose to glucose, as E 1 endoglucanase will not alone completely convert cellulose to glucose.
Either yeast or Zymomonas may be employed as a recombinant host for cellulase gene expression. However, both yeast (Saccharomyces cerevisiae) and E. coli are known to be poor hosts for proteins when secretion into the medium is desired. At the present time, the capacity of Zymomonas to secrete large amounts of proteins is not understood thoroughly. However, heterologous cellulase genes have been transferred into and expressed at fairly low levels in both S. cerevisiae (Bailey et al., Biotechnol. Appl.
Biochem., 17:65-76 ( 1993)) and in Zymomonas (Su et al., Biotech. Lett. 15:979-(1993)) as well as in other bacterial and fungal species.
For industrial uses, cellulase enzymes that display thermal stability, such as endoglucanase, generally have enhanced stability under harsh process conditions as well as high temperatures. Based on the thermal stability of Acidothermus cellulolyticus E 1 endoglucanase (see U.S. Pat. #5,275,944), the E1 CBD can be relatively resistant to shear forces that are applied during the industrial processes of pumping and stirnng, to pH
changes, and to proteases which are produced by common microbial contaminants.
In particular, and by analogy to observed E 1 activity on filter paper and CMC, the E 1 CBD
is sufficiently stable as to bind cellulose at temperatures ranging from about 20 ° C to about 100 ° C, and a pH range of 2-9. The E 1 CBD is expected to exhibit greater stability at a pH range of 4-8, and optimal stability at pHS. The optimum temperature for E 1 CBD
stability is expected to be about 83 °C. The foregoing properties of the E1 endoglucanase CBD are not found in CBDs derived from organisms not adapted to thermal environments. These properties of the E 1 CBD impart unexpected advantages to industrial or other processes in which the E 1 CBD or fusion proteins containing the E 1 CBD bind to cellulose or other polysaccharide (e. g., xylan and starch) surfaces or matrices.
Even if the E 1 endoglucanase gene product is not secreted, considerable amounts of cell death and cell lysis occurs during processing due to shearing and pressure differences, thereby releasing some of the enzyme into the surrounding medium.
Leakage of enzyme may be enhanced by a number of culture conditions which increase cell membrane permeability such as temperature and osmotic changes, surfactants, lytic agents (proteases, antibiotics, bacteriophage infection, etc.) and physical shear.

Unless specifically defined otherwise, all technical or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, the preferred methods and materials are now described.

Genome Library Construction Librar~Screenin~ and Subclonin~.
Genomic DNA was isolated from Acidothermus cellulolyticus and purified by banding on cesium chloride gradients. Genomic DNA was partially digested with Sau 3A
and separated on agarose gels. DNA fragments in the range of 9-20 kilobase pairs were isolated from the gels. This purified Sau 3A digested genomic DNA was ligated into the Bam H1 acceptor site of purified EMBL3 lambda phage arms (Clontech, San Diego, CA).
Phage DNA was packaged according to the manufacturer's specifications and plated with E. coli LE392 in top agar which contained the soluble cellulose analog, carboxymethylcellulose (CMC). The plates were incubated overnight ( 12-24 hours) to allow transfection, bacterial growth, and plaque formation. Plates were stained with Congo Red followed by destaining with 1 M NaC 1. Lambda plaques harboring endoglucanase clones showed up as unstained plaques on a red background.
Lambda clones which screened positive on CMC-Congo Red plates were purified by successive rounds of picking, plating and screening. Individual phage isolates were named SL-1 through SL-8. Subsequent subcloning efforts employed the SL-2 clone which contained an approximately 13.7 kb fragment of A. cellulolyticus genomic DNA.
Standard methods for subcloning DNA fragments can be found in Molecular Cloning_a Laboratory Manual (J. Sambrook, E.F. Fritsch and T. Maniatis, Cold Spring Harbor Laboratory Press, second edition, 1989). Purified SL-2 insert DNA was cut with BamHl, Pvul and EcoRl . Resulting fragments of DNA were individually purified by electrophoretic separation on agarose gels. BamHl digestion yielded two fragments derived from gene SL-2 insert DNA, 2.3 and 9 kb in length. Pvu 1 digestion yielded fragments of 0.7, 0.9, 1.7, 2.4, 3.3, and 3.7 kb. EcoRl digestion produced insert-derived fragments of 0.2, 0.3, 1.9, 2.4. and 3.7 kb in length. Individual purified restriction fragments were ligated into plasmid vectors previously cut with the appropriate restriction enzyme. Specifically, the 2.3 and the 9 kb BamH 1 fragments were ligated separately into BamHl-cut pBR322 and pGEM7. Pvul fragments were ligated separately into Pvul-cut pBR322. The 3.7 kb Pvul fragment was also blunt ended by treatment with T4 DNA
polymerase and ligated into the Smal site of pGEM7. EcoRl fragments were ligated into EcoRl cut pBR322.
Ligation products were transformed into competent E. coli DHS a cells and plated onto appropriate selective media (LB + 15 gglml tetracycline or LB + 50 ~g/ml ampicillin) containing 1 mM of the substrate analog, 4-methylumbelliferyl-13-D-cellobioside (4-MLJC), and grown overnight at 37°C. Cleavage of the 4-MUC by [3-1,4-endoglucanase activity results in the formation of a highly fluorescent aglycone product, 4-methylumbelliferone. Plates were inspected for fluorescing colonies under long wave ultraviolet light to determine which subclones harbor fragments of A.
cellulolyticus DNA
encoding functional cellulase genes. Plasmids were purified from fluorescing colonies and the size of the subcloned DNA verified by restriction digestion. By these methods it was possible to determine that the 2.3 kb BamHl fragment encodes a cellulase activity, as does the 3.7 kb Pvul fragment. It has been shown by a combination of restriction mapping, Southern blot hybridization experiments and DNA sequencing that the 2.3 kb BamH 1 fragment and the 3 .7 kb Pvu 1 fragment contain a considerable amount of homologous DNA sequences (about 1.9 kb). DNA sequencing was performed with templates containing A. cellulolyticus DNA inserted into the plasmid pGEM7.

The 2.3 kb Bam H 1 fragment and an overlapping 3 .7 kb Pvu 1 fragment from ~.
SL-2 were also shown to express CMCase activity.
Bi directional Deletion Subclones for Seauencin~.
Bi-directional deletion subclones of the 2.3 kb Bam H1 subclone from SL-2 were produced using the commercially available ExoIII/Mung bean nuclease deletion kit from Promega. A 2.3 kb BamH 1 fragment isolated from clone SL-2 was cloned in both orientations into the BamHl site of and E. coli vector called pGEM-7Zf(+) (Promega Corp., Madison, WI). These clones are referred to as p52 and p53, respectively. Two sets of nested deletion clones were produced according to the manufacturer's specifications using the Erase-a-Based deletion system available from Promega.
Deletions were constructed by double digesting the plasmid with HindIII and Kpn 1. The 5' overhanging sequences resulting from HindIII cleavage provide a starting point for ExoIII deletion. The 3' overhanging sequences resulting from cleavage by Kpn 1 protect the vector DNA from ExoIII digestion. Thus, deletions are unidirectional from the HindIII site, not bi-directional.

Subclone name Description p52 2.3 kb BamHl fragment from ~.SL-2 in BamHl site of pGEM7 p53 2.3 kb BamHl fragment from ~.SL-2 in BamHl site of pGEM7 (opposite orientation) 4-5 3.7 kb Pvul fragment from ~.SL-2 in Smal site of pGEM7 4-9 3 .7 kb Pvu 1 fragment from ~. SL-2 in Sma 1 site of pGEM7 (opposite orientation) Double digested plasmid DNA was then exposed to digestion by the 3' to S' exodeoxyribonuclease, ExoIII, and aliquots of the reaction were removed at various time points into a buffer which halts ExoIII activity. S 1 nuclease, a single strand specific endonuclease, was then added to remove single stranded DNA and to blunt end both ends of the deletion products. T4 DNA ligase was then used to re-circularize plasmid DNAs and the products were transformed into competent E. coli cells.
A representative sampling of the resulting clones are screened by restriction enzyme analysis of plasmid DNAs in order to estimate the extent of deletion.
Deletions endpoints occurred fairly randomly along the sequence and clones were selected for sequencing such that deletion endpoints are spaced at approximately 100 to 300 by intervals from either end of the 2.3 kb BamHl fragment. One set of clones is a succession of progressively longer deletions from one end of clone p52 and the other is a similar set of successively longer deletions from p53. Please refer to Figure 5 for the appropriate length of each deletion mutant. Each of the deletion clones was plated on MUC indicator plates to determine which still exhibited endoglucanase activity.
Retention of (3-1,4-glucanase activity in the deletion subclones is indicated by the symbol, "+", lack of activity by the symbol, "-", after the name of each clone listed in Figure 5.
Manual DNA Seduencin~.
Sequencing reactions were performed using double-stranded plasmid DNAs as templates. Templates used for DNA sequencing reactions included each of the plasmid DNAs described in Table 1 and diagrammed in Figure 5. In order to complete the sequencing of the E 1 gene another subclone was employed as a template in conjunction with synthetic oligonucleotides used as primers. The 3 .7 kb Pvu 1 fragment from SL-2 was blunt ended with T4 DNA polymerase and cloned in both orientations into the Smal site of pGEM7, resulting in clones 4-5 and 4-9. The 3.7 kb Pvu 1 fragment largely overlaps the 2.3 kb BamHl subclone (as shown in Figure 5). Newly synthesized oligonucleotide primers were used to sequence the 810 base pairs downstream of the internal BamHl located at position 2288 of the DNA sequence.
The reactions were carried out using alpha-35S-dATP to label DNA synthesized using the T7 DNA polymerase (Sequenase~) kit provided by United States Biochemicals.
Reaction products were separated on wedge acrylamide gels and were autoradiographed after fixation and drying. X-ray films were read using an electronic gel reader apparatus (a model GP7 Mark II sonic digitizer, manufactured by Science Accessories Corp., Stratford, CT.) and GeneWorks ~'"' software package provided by Intelligenetics, Inc.
(Mountain View, CA). Sequences were checked and assembled using the same software package.

AnalXsis of the Gene Coding for E1 Endo~lucanase.
Three peptide sequences have been obtained from purified endoglucanase El from Acidothermus cellulolyticus. Thirty-eight amino acids have been determined from the N
terminus of the E 1 protein by automated Edman degradation. The 3 8 amino acid sequence overlaps entirely with the previously determined (U. S. Patent #5 , 5 36, 655 ) 24 N
terminal amino acids and extends that N terminal sequence of the native protein by another 14 amino acids. The N terminal sequences are as follows:
AGGGYWHTSG REILDANNVP VRIA (reported in U.S. #5, 536, 655 ),Residues 1-24 of SEQ ID NO: 2 AGGGYWHTSG REILDANNVP VRIAGINWFG FETXNYW (this work), Residues 1-38 of SEQ ID NO: 2 A comparison of the translation of the nucleotide sequence data in Figure 1 and the peptide sequences available from purified E 1 endoglucanase indicates that this clone '~ CA 02226898 1998-03-25 encodes the E 1 endoglucanase protein. The N terminal 3 8 amino acid sequence (first boxed region in Fig. 2) is in exact agreement with the translation of the DNA
sequence between nucleotides 947-1060 in Figure 1. This long sequence of 3 8 amino acids was not found in any other entry in the Swiss-Prot database versions 28 and 32.
Two internal tryptic peptide sequences from the purified E 1 endoglucanase were also determined:
YKGNPTWGFDLHNEPHDPA (Residues 148 - 169 of SEQ. ID NO. 2) XXIFDPVGAXAXPXXQ (Residues 352-367 of SEQ. ID NO. 2) "X" represents an unidentified amino acid. These internal peptide sequences are also boxed in Fig. 2.

Gene Architecture While not wishing to be bound by a particular theory, the following hypothesis is presented. Figure 1 shows that the mature translation product beginning with a GCG
codon at position 947-949 and extends to a TAA terminator codon at position 2410-2412.
Since cellulases are secreted, presumably to gain access to their substrates, one may assume a signal peptide is present which assists in the secretion process in vivo. A
nucleotide sequence apparently comprising the signal peptide for the E 1 endoglucanase is encoded by the nucleotide sequence from 824-946. This stretch of 123 base pairs encodes 41 amino acids, beginning with a GTG (Valine) codon. It is postulated that the translation start site is the GTG codon at position 824-826 instead of the more usual ATG
(methionine) codon because of the proximity of the GTG start codon to a putative upstream ribosome binding sites (RBS), and because of the better amino-terminal charge density on the longer signal peptide. Alternatively, the signal sequence may start with the methionine .at nucleotide 863 of the sequence in Fig. 1 (amino acid # 14 in Fig. 2). For the purpose of gene manipulations, either signal sequence may be used. In fact, it has been observed that the methionine construct appears to be more productive in E. coli.
The putative RBS for the E1 endoglucanase gene is pointed out by the excellent homology (8 of 9 residues) to the published 3' end of the S. lividans 16S rRNA
at position 772-779 (Bibb and Cohen, 1982, Mol. Gen. Genet. 187:265-77). 'Three direct repeats of a 8 by sequence occur immediately downstream of the putative RBS sequence at positions 781-788, 795-802 and 810-817, and are boxed in Figure 1. Nucleotides 710-725 are underlined because they are homologous to the palindromic regulatory sequence first found by Lin and Wilson which lies upstream of several cellulase genes isolated from ?'hermomonospora fusca (Lin and Wilson, 1988, J. Bacteriol. 170:3843-3846) and later in another Actinomycete bacterium, Microbispora bispora (Yablonsky et al.
Biochemistry & Genetics of Cellulose Degradation; Aubert, Beguin, Millet, Eds., Academic Press:
New York, NY, 1988, pp. 249-266).
Promoter sequences for the E 1 endoglucanase gene are not readily defined.
There is extreme diversity of promoter sequences in Actinomycete genes. However, the inventors have observed a 6 base pair homology to the consensus E. coli -35 promoter sequence at bases 671-6 in the sequence in Fig. 1. Whether this sequence in fact functions as part of the E 1 promoter remains to be determined. Regardless, the DNA
sequence of Figure 1 contains the promotor. Nucleotides 2514-2560 are underlined because they comprise a nearly perfect dyad which may function as a transcriptional terminator, as observed for other bacterial genes (Molnar, Recombinant Microbes for Industrial and Agricultural Applications, Murooka and Imanaka, Eds., Marcel-Dekker, New York, NY 1994).
Figure 2 shows the putative signal sequence in lower case letters. An alternative signal sequence may begin at the methionine residue at position 14 in this sequence. The mature E 1 protein begins at position 42. This has been demonstrated by N
terminal amino acid sequencing of the purified native E 1 endoglucanase protein from culture supernatants of Acidothermus cellulolyticus (first boxed region). The underlined sequence in Figure 2 corresponds to a proline/serine/threonine-rich linker domain common to mufti-domain microbial cellulases, the sequences and lengths of which vary considerably. The sequences following the linker domain appear to comprise the cellulose binding domain (CBD). This sequence shows easily discernable, but not highly conserved homology with CBD sequences from other cellulases (see Fig. 7).
Sequences preceding the underlined linker domain comprise the catalytic domain of the E

endoglucanase. This catalytic domain sequence is similar but not identical to catalytic domain sequences from other cellulase proteins.

Expression of Truncated E1 Endoglucanase When the E1 endoglucanase gene is expressed in E. coli a product of the gene which has a lower molecular weight than the native gene product, or that which is expressed in S. lividans is detected. The native and S. lividans products run at 72 kDa on SDS polyacrylamide gels, whereas the largest E 1 product from E. coli runs at approximately 60 kDa. Positive identification of the predominant gene products was performed by Western blotting techniques, using a monoclonal antibody specific for the E 1 endoglucanase. This monoclonal antibody does not cross react with any other protein in E. coli, S. lividans or A. cellulolyticus. The purified E. coli product and the N terminus of the polypeptide was sequenced by automated Edman degradation. The sequence is identical to that of the purified native E 1 protein from A. cellulolyticus.
Accordingly, the recombinant E 1 gene product from E. coli is carboxy-terminally truncated by some mechanism in this host system.

Modified El Endoglucanase Genes The nucleotide sequence may be modified by random mutation or site specific mutation so that the amino acid sequence is unchanged. In this manner, restriction endonuclease sites may be inserted or removed from the gene with or without altering the amino acid sequence. Additionally, certain host microorganisms are well known to prefer certain codons for enhanced expression. For example, see Gouy et al., Nucleic Acids Research, 10(22): 7055-74 (1982). Any or all of the codons may be appropriately modified to enhance expression. This entire set of possible changes and all possible combinations thereof constitute a conservative variant of the original DNA
sequence.
Site specific mutation is a preferred method for inducing mutations in transcriptionally active genes (Kucherlapati, Prog. in Nucl. Acid Res. and Mol. Biol., 36:301 (1989)). This technique of homologous recombination was developed as a method for introduction of specific mutations in a gene (Thomas et al., Cell, 44:419-428, ( 1986).; Thomas and Capecchi, Cell, 51:503-512 ( 1987); Doetschman et al., Proc. Natl.
Acad. Sci., 85:8583-8587 (1988)) or to correct specific mutations within defective genes (Doetschman et al., Nature, 330:576-578 (1987)).
The nucleotide sequence may also be modified in the same manner to produce changes in the amino acid sequence, including substitutions, deletions and insertions of one or more amino acids. Similar techniques may be used in the present invention to alter the amino acid sequence to change a protease or other cleavage site, enhance expression or to change the chemical, physical or biological properties of the enzyme.
Small deletions and insertions may also be used to change the sequence. Introduction or deletion of restriction sites in or around the gene can facilitate gene manipulation.
For a detailed description of protein chemistry and structure, see Schulz, G.E. et al., Principles of Protein Structure, Springer-Verlag, New York, 1978;
Creighton, T.E., Proteins' Structure and Molecular Properties, W.H. Freeman & Co., San Francisco, 1983;
and J. Kyte, Structure in Protein Chemistrv, Garland Publishing, Inc., :New York and London, 1995. The types of substitutions which may be made in the protein or peptide molecule of the present invention may be based on analysis of the frequencies of amino acid changes between a homologous protein of different species, such as those presented in Table 1-2 of Schulz et al., (supra) and Figures 3-9 of Creighton (supra).
Based on such analysis, conservative substitutions are defined herein as exchanges within one of the following five groups:
1. Small aliphatic, nonpolar or slightly polar residues: Ala, Ser, Thr (Pro, Gly);
2. Polar, negatively charged residues and their amides: Asp, Asn, Glu, Gln;
3. Polar, positively charged residues: His, Arg, Lys;
4. Large aliphatic, nonpolar residues: Met, Leu, Ile, Val (Cys); and 5. Large aromatic residues: Phe, Tyr, Trp.
The three amino acid residues in parentheses above have special roles in protein architecture. Gly is the only residue lacking any side chain and thus imparts flexibility to the chain. Pro, because of its unusual geometry, tightly constrains the polypeptide chain.
Cys can participate in disulfide bond formation which is important in protein folding.
Note that Schulz et al. would merge Groups 1 and 2, above. Note also that Tyr, because of its hydrogen bonding potential, has some kinship with Ser, Thr, etc.
Substantial changes in functional properties can be made by selecting substitutions that are less conservative, such as between, rather than within, the above five groups, which will differ more significantly in their effect on maintaining (a) the structure of the peptide backbone in the area of the substitution, for example, as a sheet or helical conformation, (b) the charge or hydrophobicity of the molecule at the target site, or (c) the bulk of the side chain. Examples of such substitutions are (a) substitution of Gly and/or Pro by another amino acid or deletion or insertion of Gly or Pro; (b) substitution of a hydrophilic residue, e.g., Ser or Thr, for (or by) a hydrophobic residue, e.g., Leu, Ile, Phe, Val or Ala; (c) substitution of a Cys residue for (or by) any other residue; (d) substitution of a residue ~
. CA 02226898 1998-03-25 having an electropositive side chain, e.g., Lys, Arg or His, for (or by) a residue having an electronegative charge, e.g., Glu or Asp; or (e) substitution of a residue having a bulky side chain, e.g., Phe, for (or by) a residue not having such a side chain, e.g., Gly.
Most deletions and insertions, and substitutions, according to the present invention, are those which do not produce radical changes in the characteristics of the protein or peptide molecule. One skilled in the art can evaluate the effect of a substitution, deletion or insertion by using routine screening assays such as those described herein. For example, a variant typically is made by site-specific mutagenesis of the peptide molecule-encoding nucleic acid, expression of the variant nucleic acid in recombinant culture, and, optionally, purification from the culture, for example, by immunoaffinity chromatography using a specific antibody such as the monoclonal antibody used in Example 4, on a column (to adsorb the variant by binding).
The activity of the microbial lysate or purified protein or peptide variant can be screened in a suitable screening assay for the desired characteristic. For example, the CMCase assay of Example 1 or the cellulose binding assay described herein can be used.
Modifications of such peptide properties as redox or thermal stability, hydrophobicity, susceptibility to proteolytic degradation, pH insensitivity, resistance to sheer stress, biological activity, expressing yield, or the tendency to aggregate with carriers or into multimers are assayed by methods well known to the ordinarily skilled artisan.

Mixed Domain E1 Endoalucanase Genes and Hvbrid Enzvmes From the putative locations of the domains in the E 1 endoglucanase gene given above and in Figure 3 and comparable cloned cellulase genes from other species, one can separate individual domains and rejoin them to one or more domains-from different genes. The similarity among all of the endoglucanase genes permit one to ligate one or more domains from Acidothermus cellulolyticus E1 endoglucanase gene with one or more domains from an endoglucanase gene form one or more other microorganisms.
Other representative endoglucanase genes include Bacillus polymyxa ~3-1,4-endoglucanase (Baird et al. Journal of Bacterioloev, 172:1576-86 ( 1992)) and Xanthomonas campestsis ~i-1,4-endoglucanase A (Gough et al. Gene, 89:53-59 (1990)). The result of the fusion of the two domains will, upon expression, be a hybrid enzyme. For ease of manipulation, restriction enzyme sites may be previously added to the respective genes by site-specific mutagenesis. If a domain in the fusion protein does not contribute to the desired function of the product, the nucleotide sequence encoding the domain can be deleted or modified to enhance gene manipulation.
Single Domain E1 Endog~lucanase Genes and Polvpeptides Single E1 endoglucanase domains, such as the CBD, are useful in cellulose or polysaccharide (e. g., xylan and starch) surface modification. These domains can be synthesized by expression of nucleic acids encoding the amino acid sequence of a domain or domain variant of interest in a suitable host as described hereinabove.
Such domains can be chemically derivatized with a chemical moiety such as a dye, radionuclide, chromophore, enzyme, antibody, enzyme inhibitor or cofactor, and the like.
The foregoing description of the specific embodiments reveal the general nature of the invention so that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation.

' CA 02226898 1998-03-25 All references mentioned in this application are incorporated in their entirety, by reference.
SEQUENCE LISTING
(1) GENERAL INFORMATION:
(i) APPLICANT: Steven Thomas Robert Laymon William Adney Michael Himmel (ii) TITLE OF INVENTION: E1 Endoglucanase Cellulose Binding Domain (iii) NUMBER OF SEQUENCES: 39 (iv) CORRESPONDENCE ADDRESS:
(A) ADDRESSEE: Swanson & Bratschun, L.L.C.
(B) STREET: 8400 E. Prentice Avenue, Suite 200 (C) CITY: Englewood (D) STATE: Colorado (E) COUNTRY: USA
(F) ZIP: 80111 (v) COMPUTER READABLE FORM:
(A) MEDIUM TYPE: Diskette, 3 1/2 diskette, 1.44 MB
(B) COMPUTER: IBM pc compatible (C) OPERATING SYSTEM: MS-DOS
(D) SOFTWARE: Wordperfect 6.0 (vi) CURRENT APPLICATION DATA:
(A) APPLICATION NUMBER: 08/
(B) FILING DATE:
(C) CLASSIFICATION:
(vii)PRIOR APPLICATION DATA:
(A) APPLICATION NUMBER:08/604,913 (B) FILING DATE:02/22/96 (vii)PRIOR APPLICATION DATA:
(A) APPLICATION NUMBER:08/276,213 (B) FILING DATE:07/15/94 (vii)PRIOR APPLICATION DATA:
(A) APPLICATION NUMBER:08/125,115 (B) FILING DATE:09/21/93 (vii)PRIOR APPLICATION DATA:
(A) APPLICATION NUMBER:07/826,089 (B) FILING DATE:O1/27/92 (vii)PRIOR APPLICATION DATA:
(A) APPLICATION NUMBER:07/412,434 (B) FILING DATE:09/26/89 (viii)ATTORNEY/AGENT INFORMATION:
(A) NAME: Margaret M. Wall (B) REGISTRATION NUMBER: 33,462 (C) REFERENCE/DOCKET NUMBER: 95-56/CIP
(ix) TELECOMMUNICATION INFORMATION:
(A) TELEPHONE: (303) 793-3333 .
(B) TELEFAX: (303) 793-3433 ' ' CA 02226898 1998-03-25 (2) INFORMATION FOR SEQ ID NO:1:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3004 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:1:

Val Pro Arg Ala Leu Arg Arg Val Pro Gly Ser Arg Val Met Leu Arg Val Gly _35 -30 -25 Val Val Val Ala Val Leu Ala Leu Val Ala Ala Leu Ala _20 -15 Asn Leu Ala Val Pro Arg Pro Ala Arg Ala Ala Gly Gly -10 _ 1 Gly Tyr Trp His Thr Ser Gly Arg Glu Ile Leu Asp Ala Asn Asn Val Pro Val Arg Ile Ala Gly Ile Asn Trp Phe Gly Phe Glu Thr Cys Asn Tyr Val Val His Gly Leu-Trp ' CA 02226898 1998-03-25 Ser Arg Asp Tyr Arg Ser Ile Leu Asp Gln Ile Lys Ser Leu Gly Tyr Asn Thr Ile Arg Leu Pro Tyr Ser Asp Asp Ile Leu Lys Pro Gly Thr Met Pro Asn Ser Ile Asn Phe Tyr Gln Met Asn Gln Asp Leu Gln Gly Leu Thr Ser Leu Gln Val Met Asp Lys Ile Val Ala Tyr Ala Gly Arg Ile Gly Leu Arg Ile Ile Leu Asp Arg His Arg Pro Asp Cys Ser Gly Gln Ser Ala Leu Trp Tyr Thr Ser Ser Val Ser Glu Ala Thr Trp Ile Ser Asp Leu Gln Ala Leu Ala Gln Arg Tyr Lys Gly Asn Pro Thr Val Val Gly Phe Asp Leu His Asn Glu Pro His Asp Pro Ala Cys Trp Gly Cys Gly Asp Pro Ser Ile Asp Trp Arg Leu Ala Ala Asp Arg Ala Gly Asn Ala Val Leu Ser Val Asn Pro Asn Leu Leu Ile GGT

Phe Gly Ser Gly Tyr Val Val Tyr Asp Glu Gln Asn Ser AAC

Trp Trp GlyGly Leu Gln Gln Ala GlyGln Tyr Arg Asn Val Val LeuAsn Val Pro Asn Arg Leu ValTyr Ser Ala His Asp TyrAla Thr Ser Val Tyr Pro GlnThr Trp Phe Ser Asp ProThr Phe Pro Asn Asn Met ProGly Ile Trp Asn Lys AsnTrp Gly Tyr Leu Phe Asn GlnAsn Ile Ala Pro Val TrpLeu Gly Glu Phe Gly Thr ThrLeu Gln Ser Thr Thr AspGln Thr Trp Leu Lys Thr LeuVal Gln Tyr Leu Arg ProThr Ala Gln Tyr Gly Ala AspSer Phe Gln Trp Thr PheTrp Ser Trp Asn Pro Asp SerGly Asp Thr Gly Gly IleLeu Lys Asp Asp Trp Gln ThrVal Asp Thr Val Lys AspGly Tyr Leu Ala Pro Ile LysSer Ser Ile GTC

Phe Asp Pro Gly Ala Ser Ala Ser ProSer Ser Gln Val Pro Ser Pro Ser Val Ser Pro Ser Pro Ser Pro Ser Pro Ser Ala Ser Arg Thr Pro Thr Pro Thr Pro Thr Pro Thr Ala Ser Pro Thr Pro Thr Leu Thr Pro Thr Ala Thr Pro Thr Pro Thr Ala Ser Pro Thr Pro Ser Pro Thr Ala Ala Ser Gly Ala Arg Cys Thr Ala Ser Tyr Gln Val Asn Ser Asp Trp Gly Asn Gly Phe Thr Val Thr Val Ala Val Thr Asn Ser Gly Ser Val Ala Thr Lys Thr Trp Thr Val Ser Trp Thr Phe Gly Gly Asn Gln Thr Ile Thr Asn Ser Trp Asn Ala Ala Val Thr Gln Asn Gly Gln Ser Val Thr Ala Arg Ile Met Ser Tyr Asn Asn Val Ile Gln Pro Gly Gln Asn Thr Thr Phe Gly Phe Gln Ala Ser Tyr Thr Gly Ser CCG

Asn Ala Ala Pro Thr Val Ala Cys Ala Ala Ser Sto p AGGGT CC

AATCCGGAC GAACTG

ATCTCAAAAC
GGCTGC

TGAGCATCGC
AGCCTCCATC

GCCGCGACGC
ACGTCGACAA

CCCGTACTGG
GCGCAAGAAG

CACTCTCGCA
GCGAAAATGC

GGATGGACGC
CATCGCTGCG

ACGACATATC
TGGACGCCGC

(2) INFORMATION FOR SEQ ID N0:2 :

(i) SEQUENCE CHARACTERISTI CS:

(A) LENGTH: 562 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:2:

Val Pro Arg Ala Leu Arg Arg Val Pro G1y Arg Val Ser Met Leu Arg Val Gly Val Val Val Ala Val Ala Leu Leu Val Ala Ala Leu Ala Asn Leu Ala Val Pro Pro Ala Arg _15 -10 -5 Arg Ala Ala Gly Gly Gly Tyr Trp His Thr Gly Arg Ser G1u Ile Leu Asp Ala Asn Asn Val Pro Val Ile Ala Arg Gly Ile Asn Trp Phe Gly Phe Glu Thr Cys Tyr Val Asn Val His Gly Leu Trp Ser Arg Asp Tyr Arg Ile Leu Ser Asp Gln Ile Lys Ser Leu Gly Tyr Asn Thr Arg Leu Ile Pro Tyr Ser Asp Asp Ile Leu Lys Pro Gly Met Pro Thr Asn Ser Ile Asn Phe Tyr Gln Met Asn Gln Leu Gln Asp Gly Leu Thr Ser Leu Gln Val Met Asp Lys Val Ala Ile Tyr Ala Gly Arg Ile Gly Leu Arg Ile Ile Asp Arg Leu His Arg Pro Asp Cys Ser Gly Gln Ser Ala Leu Trp Tyr Thr Ser Ser Val Ser Glu Ala Thr Trp Ile Ser Asp Leu Gln Ala Leu Ala Gln Arg Tyr Lys Gly Asn Pro Thr Val Val Gly Phe Asp Leu His Asn Glu Pro His Asp Pro Ala Cys Trp Gly Cys Gly Asp Pro Ser Ile Asp Trp Arg Leu Ala Ala Asp Arg Ala Gly Asn Ala Val Leu Ser Val Asn Pro Asn Leu Leu Ile Phe Val Glu Gly Val Gln Ser Tyr Asn Gly Asp Ser Tyr Trp Trp Gly Gly Asn Leu Gln Gln Ala Gly Gln Tyr Arg Val Val Leu Asn Val Pro Asn Arg Leu Val Tyr Ser Ala His Asp Tyr Ala Thr Ser Val Tyr Pro Gln Thr Trp Phe Ser Asp Pro Thr Phe Pro Asn Asn Met Pro Gly Ile Trp Asn Lys Asn Trp Gly Tyr Leu Phe Asn Gln Asn Ile Ala Pro Val Trp Leu Gly Glu Phe Gly Thr Thr Leu Gln Ser Thr Thr Asp Gln Thr Trp Leu Lys Thr Leu Val Gln Tyr Leu Arg Pro Thr Ala Gln Tyr Gly Ala Asp Ser Phe Gln Trp Thr Phe Trp Ser Trp Asn Pro Asp Ser Gly Asp Thr Gly Gly Ile Leu Lys Asp Asp-Trp Gln Thr Val Asp Thr Val Lys Asp Gly Tyr Leu Ala Pro Ile Lys Ser Ser Ile Phe Asp Pro Val Gly Ala Ser Ala Ser Pro Ser Ser Gln Pro Ser Pro Ser Pro Ser Pro Ser Pro Ser Pro Ser Ala Ser Ala Ser Arg Thr Pro Thr Pro Thr Pro Thr Pro Thr Ala Ser Pro Thr Pro Thr Leu Thr Pro Thr Ala Thr Pro Thr Pro Thr Ala Ser Pro Thr Pro Ser Pro Thr Ala Ala Ser Gly Ala Arg Cys Thr Ala Ser Tyr Gln Val Asn Ser Asp Trp Gly Asn Gly Phe Thr Val Thr Val Ala Val Thr Asn Ser Gly Ser Val Ala Thr Lys Thr Trp Thr Val Ser Trp Thr Phe Gly Gly Asn Gln Thr Ile Thr Asn Ser Trp Asn Ala Ala Val Thr Gln Asn Gly Gln Ser Val Thr Ala Arg Ile Met Ser Tyr Asn Asn Val Ile Gln Pro Gly Gln Asn Thr Thr Phe Gly Phe Gln Ala Ser Tyr Thr Gly Ser Asn Ala Ala Pro Thr Val Ala Cys Ala Ala Ser (2) INFORMATION FOR SEQ ID N0:3:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 397 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:3:
Met Lys Lys Lys Gly Leu Lys Lys Thr Phe Phe Val Ile Ala Ser Leu Val Met Gly Phe Thr Leu Tyr Gly Tyr Thr Pro Val Ser Ala Asp Ala Ala Ser Val Lys Gly Tyr Tyr His Thr Gln Gly Asn Lys Ile Val Asp Glu Ser Gly Lys Glu Ala Ala Phe Asn Gly Leu Asn Trp Phe Gly Leu Glu Thr Pro Asn Tyr Thr Ile His Gly Leu Trp Ser Arg Ser Met Asp Asp Met Leu Asp Gln Val Lys Lys Glu Gly Tyr Asn Leu Ile Arg Leu Pro Tyr Ser Asn Gln Leu Phe Asp g5 100 Ser Ser Ser Arg Pro Asp Ser Ile Asp Tyr His Lys Asn Pro Asp Leu Val Gly Leu Asn Pro Ile Gln Thr Met Asp Lys Leu Thr Glu Lys Ala Gly Gln Arg Gly Ile Gln Ile Ile Leu Asp Arg His Arg Pro Gly Ser Gly Gly Gln Ser Glu Leu Trp Tyr Thr Ser Gln Tyr Pro Glu Ser Arg Trp Ile Ser Asp Trp Lys Met Leu Ala Asp Arg Tyr Lys Asn Asn Pro Thr Val Ile Gly Ala Asp Leu His Asn Glu Pro His Gly Gln Ala Ser Trp Gly Thr Gly Asn Ala Ser Thr Asp Trp Arg Leu Ala Ala Cys Arg Ala Gly Asn Ala Ile Leu Ser Val Asn Pro Asn Trp Leu Ile Asp Val Glu Gly Val Asp His Asn Val Gln Gly Asn Asn Ser Gln Tyr Trp Trp Gly Gly Asn Leu Thr Gly Val Ala Asn Tyr Pro Val Val Leu Asp Val Pro Asn Arg Val Val Tyr Ser Pro His Asp Tyr Gly Pro Gly Val Ser Ser Gln Pro Trp Phe Asn Asp Pro Ala Phe Pro Ser Asn Leu Pro Ala Ile Trp Asp Gln Thr Trp Gly Tyr Ile Ser Lys Gln Asn Ile Ala Pro Val Leu Val Gly Glu Phe Gly Gly Arg Asn Val Asp Leu Ser Ser Pro Glu Gly Lys Trp Gln Asn Ala Leu Val His Tyr Ile Gly Ala Asn Asn Leu Tyr Phe Thr Tyr Trp Ser Leu Asn Pro Asn Ser Gly Asp Thr Gly Gly Leu Leu Leu Asp Asp Trp Thr Thr Trp Asn Arg Pro Lys Gln Asp Met Leu Gly Arg Ile Met Lys Pro Val Val Ser Val Ala Gln Gln Ala Glu Ala Ala Ala Glu (2) INFORMATION FOR SEQ ID N0:4:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 349 amino acids (B) TYPE: amino acid .

(C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:4:
Tyr Ser Ile Asn Asn Ser Arg Gln Ile Val Asp Asp Ser Gly Lys Val Val Gln Leu Lys Gly Val Asn Val Phe Gly Glu Glu Thr Gly Asn His Val Met His Gly Leu Trp Ala Arg Asn Trp Lys Asp Met Ile Val Gln Met Gln Gly Leu Gly Glu Asn Ala Val Arg Leu Pro Phe Cys Pro Ala Thr Leu Arg Ser Asp Thr Met Pro Ala Ser Ile Asp Tyr Ser Arg Asn Ala Asp Leu Gln Gly Leu Thr Ser Leu Gln Ile Leu Asp Lys Val Thr Ala Glu Phe Asn Ala Arg Gly Met Tyr Val Leu Leu Asp His His Thr Pro Asp Cys Ala Gly Ile Ser Glu Leu Trp Tyr Thr Gly Ser Tyr Thr Glu Ala Gln Trp Leu Ala Asp Leu Arg Phe Val A1a Asn Arg Tyr Lys Asn Val Pro Tyr Val Leu Gly Leu Asp Leu Lys Asn y 145 150 155 Glu Pro His Gly Ala Ala Thr Trp Gly Thr Gly Asn Ala Ala Thr Asp Trp Asn Lys Ala Ala Phe Arg Gly Ser Ala Ala Val Leu Ala Val Ala Pro Lys Trp Leu Ile Ala Val Glu Gly Ile Thr Asp Asn Pro Val Cys Ser Thr Asn Gly Gly Ile Phe Trp Gly Gly Asn Leu Gln Pro Leu Ala Cys Thr Pro Leu Asn Ile Pro Ala Asn Arg Leu Leu Leu Ala Pro His Val Tyr Gly Pro Asp Val Phe Val Gln Ser Tyr Phe Asn Asp Ser Asn Phe Pro Asn Asn Met Pro Ala Ile Trp Glu Arg His Phe Gly Gln Phe Ala Gly Thr His Ala Leu Leu Leu Gly Glu Phe Gly Gly Lys Tyr Gly Glu Gly Asp Ala Arg Asp Lys Thr Trp Gln Asp Ala Leu Val Lys Tyr Leu Arg Ser Lys Gly Ile Asn Gln Gly Phe Tyr Trp Ser Trp Asn Pro Asn Ser Gly Asp Thr Gly Gly Ile Leu Arg Asp Asp Trp Thr Ser Val Arg Gln Asp Lys Met Thr Ile Leu Arg Thr Leu Trp Gly Thr Ala Gly Asn (2) INFORMATION FOR SEQ ID N0:5:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 4 amino acids (B) TYPE: amino acid (C) STR.ANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:5:
His Gly Leu Trp (2) INFORMATION FOR SEQ ID N0:6:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 4 amino acids -(B) TYPE: amino acid .

(C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:6:
Leu Trp Tyr Thr (2) INFORMATION FOR SEQ ID N0:7:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 4 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:7:
Asn Glu Pro His (2) INFORMATION FOR SEQ ID N0:8:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 5 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:8:
Trp Gly Gly Asn Leu (2) INFORMATION FOR SEQ ID N0:9:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 6 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:9:
Ser Gly Asp Thr Gly Gly (2) INFORMATION FOR SEQ ID NO:10:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 78 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:10:
Asn Gln Trp Pro Gly Gly Phe Gln Ala Glu Val Thr Val Lys Asn Thr Gly Ser Ser Pro Ile Asn Gly Trp Thr Val Gln Trp Thr Leu Pro Ser Gly Gln Ser Ile Thr Gln Leu Trp Asn Gly Asp Leu Ser Thr Ser Gly Ser Asn Val Thr Val Arg Asn Val Ser Trp Asn Gly Asn Val Pro Ala Gly Gly Ser Thr Ser Phe Gly Phe Leu Gly Ser Gly Thr Gly (2) INFORMATION FOR SEQ ID NO:11:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 76 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:11:
Asn Glu Trp Asn Asp Gly Phe Gln Ala Thr Val Thr Val Thr Ala Asn Gln Asn Ile Thr Gly Trp Thr Val Thr Trp Thr Phe Thr Asp Gly Gln Thr Ile Thr Asn Ala Trp Asn Ala Asp Val Ser Thr Ser Gly Ser Ser Val Thr Ala Arg Asn Val Gly His Asn Gly Thr Leu Ser Gln Gly Ala Ser Thr Glu Phe Gly Phe Val Gly Ser Lys Gly Asn (2) INFORMATION FOR SEQ ID N0:12:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 77 amino acids (g) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:12:
Asn Gln Trp Pro Gly Gly Phe Gly Ala Asn Val Thr Ile Thr Asn Leu Gly Asp Pro Val Ser Ser Trp Lys Leu Asp Trp Thr Tyr Thr Ala Gly Gln Arg Ile Gln Gln Leu Trp Asn Gly Thr Ala Ser Thr Asn Gly Gly Gln Val Ser Val Thr Ser Leu Pro Trp Asn Gly Ser Ile Pro Thr Gly Gly Thr Ala Ser Phe Gly Phe Asn Gly Ser Trp Ala Gly (2) INFORMATION FOR SEQ ID N0:13:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 79 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:13:
Asn Ser Trp Gly Ser Gly Ala Ser Val Asn Val Thr Ile Lys Asn Asn Gly Thr Thr Pro Ile Asn Gly Trp Thr Leu Lys Trp~Thr Met Pro Ile Asn Gln Thr Ile Thr Asn Met Trp Ser Ala Ser Phe Val Ala Ser Gly Thr Thr Leu Ser Val Thr Asn Ala Gly Tyr Asn Gly Thr Ile Ala Ala Asn Gly Gly Thr Gln Ser Phe Gly Phe Asn Ile Asn Tyr Ser Gly (2) INFORMATION FOR SEQ ID N0:14:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 80 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:14:
Asn Asp Trp Pro Gly Gly Phe Thr Ala Ser Val Thr Leu Thr Asn Thr Gly Ser Thr Pro Trp Asp Trp Glu Leu Arg Phe Thr Phe Pro Ser Gly Gln Thr Val Ser His Gly Trp Ser Ala Asn Trp Gln Gln Ser Gly Ser Asp Val Thr Ala Thr Ser Leu Pro Trp Asn Gly Ser Val Pro Pro Gly Gly Gly Ser Val Asn Ile Gly Phe Asn Gly Thr Trp Gly Gly Ser Asn (2) INFORMATION FOR SEQ ID N0:15:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 76 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear ( i i ) MOLECULAR. TYPE
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:15:
Ser Trp Asn Val Gly Phe Thr Gly Ser Val Lys Ile Thr Asn Thr Gly Thr Thr Pro Leu Thr Trp Thr Leu Gly Phe Ala Phe Pro Ser Gly Gln Gln Val Thr Gln Gly Trp Ser Ala Thr Trp Ser Gln Thr Gly Thr Thr Val Thr Ala. ThY

Gly Leu Ser Trp Asn Ala Thr Leu Gln Pro Gly Gln Ser Thr Asp Ile Gly Phe Asn Gly Ser His Pro Gly (2) INFORMATION FOR SEQ ID N0:16:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 76 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:16:
Ser Glu Trp Ser Thr Gly Phe Thr Ala Asn Ile Thr Leu Lys Asn Asp Thr Gly Ala Ala Ile Asn Asn Trp Asn Val Asn Trp Gln Tyr Ser Ser Asn Arg Met Thr Ser Gly Trp Asn Ala Asn Phe Ser Gly Thr Asn Pro Tyr Asn Ala Thr Asn Met Ser Trp Asn Gly Ser Ile Ala Pro Gly Gln Ser Ile Ser Phe Gly Leu Gln Gly Glu Lys Asn Gly (2) INFORMATION FOR SEQ ID N0:17:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 76 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:17:
Ser Glu Trp Ser Thr Gly Phe Thr Ala Asn Ile Thr Leu 5 to Lys Asn Asp Thr Gly Ala Ala Ile Asn Asn Trp Asn Val Asn Trp Gln Tyr Ser Ser Asn Arg Met Thr Ser Gly Trp Asn Ala Asn Phe Ser Gly Thr Asn Pro Tyr Asn Ala Thr Asn Met Ser Trp Asn Gly Ser Ile Ala Pro Gly Gln Ser Ile Ser Phe Gly Leu Gln Gly Glu Lys Asn Gly (2) INFORMATION FOR SEQ ID N0:18:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 76 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:18:
Asn Glu Trp Asn Thr Gly Tyr Thr Gly Asp Ile Thr Ile Thr Asn Arg Gly Ser Ser Ala Ile Asn Gly Trp Ser Val Asn Trp Gln Tyr Ala Thr Asn Arg Leu Ser Ser Ser Trp Asn Ala Asn Val Ser Gly Ser Asn Pro Tyr Ser Ala Ser Asn Leu Ser Trp Asn Gly Asn Ile Gln Pro Gly Gln Ser Val Ser Phe Gly Phe Gln Val Asn Lys Asn Gly (2) INFORMATION FOR SEQ ID N0:19:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 76 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:19:
Asn Ser Trp Gly Ser Gly Phe Thr Ala Ala Ile Arg Ile Thr Asn Ser Thr Ser Ser Val Ile Asn Gly Trp Asn VaI

Ser Trp Gln Tyr Asn Ser Asn Arg Val Thr Asn Leu Trp Asn Pro Asn Leu Ser Gly Ser Asn Pro Tyr Ser Ala Ser Asn Leu Ser Trp Asn Gly Thr Ile Gln Pro Gly Gln Thr Val Glu Phe Gly Phe Gln Gly Val Thr Asn Ser (2) INFORMATION FOR SEQ ID N0:20:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 77 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:20:
Asn Gln Trp Asn Asn Gly Phe Thr Ala Val Ile Arg Val Arg Asn Asn Gly Ser Ser Ala Ile Asn Arg Trp Ser Val Asn Trp Ser Tyr Ser Asp Gly Ser Arg Ile Thr Asn Ser Trp Asn Ala Asn Val Thr Gly Asn Asn Pro Tyr Ala Ala 40 45 ' 50 Ser Ala Leu Gly Trp Asn Ala Asn Ile Gln Pro Gly Gln Thr Ala Glu Phe Gly Phe Gln Gly Thr Lys Gly Ala (2) INFORMATION FOR SEQ ID N0:21:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 77 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:21:
Asn Glu Trp Gly Ser Gly Phe Thr Ala Ser Ile Arg Ile Thr Asn Asn Gly Ser Ser Thr Ile Asn Gly Trp Ser Val Ser Trp Asn Tyr Thr Asp Gly Ser Arg Val Thr Ser Ser Trp Asn Ala Gly Leu Ser Gly Ala Asn Pro Tyr Ser Ala Thr Pro Val Gly Trp Asn Thr Ser Ile Pro Ile Gly Ser Ser Val Glu Phe Gly Val Gln Gly Asn Asn Gly Ser (2) INFORMATION FOR SEQ ID N0:22:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 78 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:22:
Ser Asp Trp Gly Thr Gly Phe Gly Gly Lys Trp Thr Val Lys Asn Thr Gly Thr Thr Ser Leu Ser Ser Trp Thr Val Glu Trp Asp Phe Pro Ser Gly Thr Lys Val Thr Ser Ala Trp Asp Ala Thr Val Thr Asn Ser Ala Asp His Trp Thr Ala Lys Asn Val Gly Trp Asn Gly Thr Leu Ala Pro Gly Ala Ser Val Ser Phe Gly Phe Asn Gly Ser Gly Pro Gly (2) INFORMATION FOR SEQ ID N0:23:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 78 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:23:
Ser Asp Trp Gly Thr Gly Phe Gly Gly Ser Trp Thr Val Lys Asn Thr Gly Thr Thr Ser Leu Ser Ser Trp Thr Val Glu Trp Asp Phe Pro Thr Gly Thr Lys Val Thr Ser Ala Trp Asp Ala Thr Val Thr Asn Ser Gly Asp His Trp Thr Ala Lys Asn Val Gly Trp Asn Gly Thr Leu Ala Pro Gly Ala Ser Val Ser Phe Gly Phe Asn Gly Ser Gly Pro Gly (2) INFORMATION FOR SEQ ID N0:24:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 77 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:24:
Asn Asn Trp Gly Ser Gly Tyr Gln Val Leu Ile Lys Val Lys Asn Asp Ser Ala Ser Arg Val Asp Gly Trp Thr Leu Lys Ile Ser Lys Ser Glu Val Lys Ile Asp Ser Ser Trp Cys Val Asn Ile Ala Glu Glu Gly Gly Tyr Tyr Val Ile Thr Pro Met Ser Trp Asn Ser Ser Leu Glu Pro Ser Ala Ser Val Asp Phe Gly Ile Gln Gly Ser Gly Ser Ile (2) INFORMATION FOR SEQ ID N0:25:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 80 amino acids (B) TYPE: amino acid . CA 02226898 1998-03-25 (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:25:
Asn Val Trp Gln Asp Gly Phe Thr Ala Asp Val Thr Val Thr Asn Thr Gly Thr Ala Pro Val Asp G1y Trp Gln Leu Ala Phe Thr Leu Pro Ser Gly Gln Arg Ile Thr Asn Ala Trp Asn Ala Ser Leu Thr Pro Ser Ser Gly Ser Val Thr Ala Thr Gly Ala Ser His Asn Ala Arg Ile Ala Pro Gly Gly Ser Leu Ser Phe Gly Phe Gln Gly Thr Tyr Gly Gly Ala Phe (2) INFORMATION FOR SEQ ID N0:26:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 80 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:26:
Asn Val Trp Pro Gly Gly Phe Thr Ala Asn Val Thr Val Thr Asn Asn Gly Ser Ala Pro Val Asp Gly Trp Arg Leu Ala Phe Thr Leu Pro Ser Gly Gln Ser Val Val His Ala Trp Asn Ala Ser Val Ser Pro Ser Ser Gly Ala Val Thr Ala Thr Gly Pro Ala Glu Ser Ala Arg Ile Ala Ala Gly Gly Ser Gln Ser Phe Gly Phe Gln Gly Ala Tyr Ser Gly , CA 02226898 1998-03-25 Ser Phe (2) INFORMATION FOR SEQ ID N0:27:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 78 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:27:
Ser Ser Trp Asp Asn Gly Tyr Ser Ala Ser Val Thr Val Arg Asn Asp Thr Ser Ser Thr Val Ser Gln Trp Glu Val Val Leu Thr Leu Pro Gly Gly Thr Thr Val Ala Gln Val Trp Asn Ala Gln His Thr Ser Ser Gly Asn Ser His Thr Phe Thr Gly Val Ser Trp Asn Ser Thr Ile Pro Pro Gly Gly Thr Ala Ser Ser Gly Phe Ile Ala Ser Gly Ser Gly (2) INFORMATION FOR SEQ ID N0:28:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 80 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:28:
Ser Ser Trp Asn Thr Gly Phe Thr Gly Thr Val Glu Val Lys Asn Asn Gly Thr Ala Ala Leu Asn Gly Trp Thr Leu Gly Phe Ser Phe Ala Asp Gly Gln Lys Val Ser Gln Gly Trp Ser Ala Glu Trp Ser Gln Ser Gly Thr Ala Val Thr ~.

Ala Lys Asn Ala Pro Trp Asn Gly Thr Leu Ala Ala Gly Ser Ser Val Ser Ile Gly Phe Asn Gly Thr His Asn Gly Thr Asn (2) INFORMATION FOR SEQ ID N0:29:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 80 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:29:
Ser Ser Trp Asn Thr Gly Phe Thr Ala Ser Val Arg Val Thr Asn Thr Gly Thr Thr Ala Leu Asn Gly Trp Thr Leu Thr Phe Pro Phe Ala Asn Gly Gln Thr Val Gln Gln Gly Trp Ser Ala Asp Trp Ser Gln Ser Gly Thr Thr Val Thr Ala Lys Asn Ala Ala Trp Asn Gly Ser Leu Ala Ala Gly Gln Thr Val Asp Ile Gly Phe Asn Gly Ala His Asn Gly Thr Asn (2) INFORMATION FOR SEQ ID N0:30:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 80 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:30:
Asn Gly Trp Ser Gly Gly Phe Thr Ala Ala Val Thr-Leu Thr Asn Thr Gly Thr Thr Ala Leu Ser Gly Trp Thr Leu Gly Phe Ala Phe Pro Ser Gly Gln Thr Leu Thr Gln Gly Trp Ser Ala Arg Trp Ala Gln Ser Gly Ser Ser Val Thr Ala Thr Asn Glu Ala Trp Asn Ala Val Leu Ala Pro Gly Ala Ser Val Glu Ile Gly Phe Ser Gly Thr His Thr Gly Thr Asn (2) INFORMATION FOR SEQ ID N0:31:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 80 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:31:
Ser Ser Trp Asn Ser Gly Gly Phe Thr Ala Ser Val Arg Ile Thr Asn Thr Gly Thr Thr Thr Ile Asn Gly Trp Ser Leu Gly Phe Asp Leu Thr Ala Gly Gln Lys Val Gln Gln Gly Trp Ser Ala Thr Trp Thr Gln Ser Gly Ser Thr Val Thr Ala Thr Asn Ala Pro Trp Asn Gly Thr Leu Ala Pro Gly Gln Thr Val Asp Val Gly Phe Asn Gly Ser His Thr Gly Gln (2) INFORMATION FOR SEQ ID N0:32:

(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 77 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:32:
Gln Trp Asn Thr Gly Phe Thr Ala Asn Val Thr Val Lys Asn Thr Ser Ser Ala Pro Val Asp Gly Trp Thr Leu Thr Phe Ser Phe Pro Ser Gly Gln Gln Val Thr Gln Ala Trp Ser Ser Thr Val Thr Gln Ser Gly Ser Ala Val Thr Val Arg Asn Ala Pro Trp Asn Gly Ser Ile Pro Ala Gly Gly Thr Ala Gln Phe Gly Phe Asn Gly Ser His Thr Gly (2) INFORMATION FOR SEQ ID N0:33:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 80 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:33:
Asn Asp Trp Asp Ser Gly Phe Thr Ala Ser Ile Arg Ile Thr Tyr His Gly Thr Ala Pro Leu Ser Ser Trp Glu Leu Ser Phe Thr Phe Pro Ala Gly Gln Gln Val Thr His Gly Trp Asn Ala Thr Trp Arg Gln Asp Gly Ala Ala Val Thr Ala Thr Pro Met Ser Trp Asn Ser Ser Leu Ala Pro Gly Ala Thr Val Glu Val Gly Phe Asn Gly Ser Trp Ser Gly Ser Asn (2) INFORMATION FOR SEQ ID N0:34:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 78 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:34:
Ser Gln Trp Glu Gly Gly Phe Gln Ala Gly Val Lys Ile Thr Asn Leu Gly Asp Pro Val Ser Gly Trp Thr Leu Gly Phe Thr Met Pro Asp Ala Gly Gln Arg Leu Val Gln Gly Trp Asn Ala Thr Trp Ser Gln Ser Gly Ser Ala Val Thr Ala Gly Gly Val Asp Trp Asn Arg Thr Leu Ala Thr Gly Ala Ser Ala Asp Leu Gly Phe Val Gly Ser Phe Thr Gly (2) INFORMATION FOR SEQ ID N0:35:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 80 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:35:
Thr Gln Trp Asn Gly Gly Phe Thr Ala Ser Val Asn Val Thr Ala Gly Ser Ala Ile Asn Gly Trp Thr Val Thr Val Ala Leu Pro Gly Gly Ala Ala Ile Thr Gly Thr Trp Asn Ala Gln Ala Ser Gly Thr Ser Gly Thr Val Arg Phe Thr Asn Val Gly Tyr Asn Gly Gln Val Gly Ala Gly Gln Thr Thr Asn Phe Gly Phe Gln Gly Thr Gly Thr Gly Gln Gly Ala Thr (2) INFORMATION FOR SEQ ID N0:36:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 4 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:36:
Gly Phe Thr Ala (2) INFORMATION FOR SEQ ID N0:37:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 5 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:37:
Val Thr Val Thr Asn (2) INFORMATION FOR SEQ ID N0:38:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 4 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:38:
Gly Trp Thr Leu (2) INFORMATION FOR SEQ ID N0:39:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 5 amino acids (B) TYPE: amino acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULAR TYPE:

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:39:
Val Thr Ala Thr Asn

Claims (17)

1. A compound comprising an amino acid sequence of:
SGARCTASYQ VNSDWGNGFT VTVAVTNSGS VATKTWTVSW TFGGNQTITN SWNAAVTQNG
QSVTARNMSY NNVIQPGQNT TFGFQASYTG SNAAPTVACA AS ( Residues 420-521 of SEQ.ID NO.2), or a domain variant thereof.
2. The compound or domain variant of claim 1 wherein said compound or domain variant binds to cellulose over a temperature range from about 20°C to about 100°C.
3. The compound or domain variant of claim 1 wherein said compound or domain variant binds to cellulose over a pH range of about 2 to about 9.
4. The domain variant of claim 1, wherein the variant has at least about 25%
amino acid similarity with the compound of claim 1.
5. An isolated and purified nucleic acid encoding the compound or domain variant of claim 1.
6. A hybrid protein comprising the compound or domain variant of claim 1.
7. A nucleic acid encoding the hybrid protein of claim 6.
8. A chemical derivative of the compound or domain variant of claim 1.
9. The chemical derivative of claim 8 comprising a chemical moiety selected from the group consisting of a colored or fluorescent dye, a radionuclide, a magnetic or other particle, an enzyme, an antibody, an enzyme inhibitor, an enzyme cofactor and an enzyme substrate.
10. A chemical derivative of the hybrid protein of claim 6.
11. The chemical derivative of claim 10 comprising a chemical moiety selected from the group consisting of a colored or fluorescent dye, a radionuclide, a magnetic or other article, an enzyme, an antibody, an enzyme inhibitor, an enzyme cofactor and an enzyme substrate.
12. The nucleic acid of claim 5 comprising the sequence:
TCCGGAGCCC GCTGCACCGC GAGTTACCAG GTCAACAGCG ATTGGGGCAA TGGCTTCACG
GTAACGGTGG CCGTGACAAA TTCCGGATCC GTCGCGACCA AGACATGGAC GGTCAGTTGG
ACATTCGGCG GAAATCAGAC GATTACCAAT TCGTGGAATG CAGCGGTCAC GCAGAACGGT
CAGTCGGTAA CGGCTCGGAA TATGAGTTAT AACAACGTGA TTCAGCCTGG TCAGAACACC
ACGTTCGGAT TCCAGGCGAG CTATACCGGA AGCAACGCGG CACCGACAGT CGCCTGCGCA
GCAAGT (Residues 2204-2509 of SEQ ID NO: 1).
13. A method of modifying a polysaccharide surface comprising contacting said polysaccharide with the compound or domain variant of claim 1 or a hybrid protein comprising the compound or domain variant.
14. A method of purifying the hybrid protein of claim 6 comprising contacting said fusion protein with a polysaccharide matrix.
15. A method of immobilizing the hybrid protein of claim 6 on a support comprising contacting said hybrid protein with a polysaccharide matrix.
16. A method of producing the compound or domain variant of claim 1, or a hybrid protein comprising said compound or domain variant, said method comprising the steps of:
(a) inserting a nucleic acid sequence encoding the compound, domain variant or hybrid protein into a vector;
(b) introducing the vector into the host cell;
(c) expressing the nucleic acid sequence of step (a).
17. A domain variant of the isolated and purified nucleic acid of claim 5 encoding a compound having the following amino acid sequence:

AA104-AA105-AA106-AA107-AA108-AA109-AA110-wherein AA1 is serine, alanine, glycine, glutamine, valine, or non-existent;
AA2 is glycine, serine, threonine, proline, alanine, aspartic acid, glutamine, or non-existent;
AA3 is alanine, glycine, serine, asparagine, arginine, proline, threonine, or non-existent;
AA4 is alanine, glycine, serine, arginine, glutamic acid, threonine, cysteine, valine or non-existent;
AA5 is cysteine, leucine, valine, serine, lysine, or non-existent;
AA6 is threonine, glutamic acid, alanine, arginine, serine, glutamine, lysine, isoleucine, or non-existent;

AA7 is valine, alanine, serine or non-existent;
AA8 threonine, serine, glutamic acid, aspartic acid, lysine, glycine, leucine, isoleucine, valine, alanine, or non-existent;
AA9 is tyrosine, phenylalanine, tryptophan, valine or threonine.
AA10 is threonine, alanine, serine, glutamine, asparagine, glycine, valine, or arginine;
AA11 is isoleucine, valine, alanine, lysine, leucine, or non-existent;
AA12 is threonine, serine, alanine, valine, isoleucine, asparagine, aspartic acid or non-existent;
AA13 is asparagine, serine, or glutamine;
AA14 is aspartic acid, glutamic acid, glutamine, serine, valine, glycine, tryptophan, asparagine, or non-existent;
AA15 is tryptophan or asparagine;
AA16 is glycine, asparagine, serine, proline, aspartic acid, glutamic acid, or glutamine;
AA17 is glycine, threonine, serine, asparagine, aspartic acid, or valine;
AA18 is glycine or alanine;
AA19 is phenylalanine, tyrosine, or alanine;
AA20 is threonine, glutamine, glycine or serine;
AA21 is alanine, glycine or valine;
AA22 is serine, asparagine, alanine, threonine, aspartic acid, glutamic acid, lysine, valine, leucine or glycine;
AA23 is valine, isoleucine, tryptophan or phenylalanine;
AA24 is threonine, alanine, arginine, lysine, glutamic acid, asparagine or lysine.
AA25 is valine, isoleucine or leucine;

AA26 is threonine, lysine or arginine;
AA27 is asparagine, alanine, tyrosine or non-existent;
AA28 is threonine, asparagine, aspartic acid, serine, leucine, arginine, histidine or non-existent;
AA29 is glycine, glutamine, threonine, alanine or serine;
AA30 is serine, threonine, asparagine, aspartic acid, glycine or alanine;
AA31 is serine, threonine, alanine, valine or non-existent;
AA32 is proline, alanine, threonine, serine, aspartic acid, arginine, valine or non-existent;
AA33 is isoleucine, leucine, valine, tryptophan or threonine;
AA34 is asparagine, serine, aspartic acid, threonine or lysine;
AA35 is glycine, serine, asparagine, threonine, arginine or non-existent;
AA36 is tryptophan;
AA37 is threonine, serine, asparagine, glutamic acid, lysine, glutamine or arginine;
AA38 is lysine, valine or phenylalanine;
AA39 is glycine, serine, threonine, asparagine, lysine, alanine, glutamic acid, aspartic acid, arginine or glutamine;
AA40 is tryptophan, phenylalanine, valine or isoleucine;
AA41 is threonine, serine, glutamine, alanine, aspartic acid, asparagine, lysine or proline;
AA42 is phenylalanine, tyrosine, leucine, methionine or lysine;
AA43 is proline, threonine, serine, alanine, or non-existent;
AA44 is aspartic acid, alanine, asparagine, glycine or non-existent;
AA45 serine, threonine, alanine, glycine, asparagine, isoleucine or non-existent;
AA46 is glycine, asparagine, aspartic acid, glutamic acid or non-existent;

AA47 is glutamine, asparagine, threonine, serine, alanine, isoleucine or valine;
AA48 is arginine, threonine, lysine, glutamine, serine, alanine or valine;
AA49 valine, isoleucine, leucine, methionine or phenylalanine;
AA50 is threonine, serine, glutamine, valine or aspartic acid;
AA51 is glutamine, asparagine, serine, histidine or glycine;
AA52 is glycine, alanine, serine, leucine, threonine, tyrosine or methionine;
AA53 is tryptophan;
AA54 is asparagine, serine, aspartic acid or cysteine;
AA55 is alanine, glycine, valine, serine or proline;
AA56 is threonine, asparagine, glycine, serine, aspartic acid, glutamic acid, arginine, glutamine, alanine or lysine;
AA57 is valine, tryptophan, leucine, phenylalanine, isoleucine or alanine;
AA58 is serine, threonine, alanine, arginine, glutamine, or valine;
AA59 is glutamine, glycine, threonine, asparagine, glutamic acid, proline or alanine;
AA60 is serine, threonine, asparagine, glutamic acid, aspartic acid or alanine;
AA61 is glycine, serine, alanine, asparagine or non-existent;
AA62 is serine, asparagine, glycine, threonine, aspartic acid, alanine or glutamine;
AA63 proline, alanine, threonine, serine, histidine, tyrosine, asparagine, aspartic acid or glutamine;
AA64 is valine, tyrosine, tryptophan or leucine;
AA65 is threonine, serine, asparagine, valine, alanine or arginine;
AA66 is alanine, valine, isoleucine or phenylalanine;
AA67 is threonine, arginine, lysine, serine or glycine;

AA68 is asparagine, glycine, proline, serine or alanine;
AA69 is valine, leucine, methionine, alanine, glutamic acid, proline or glutamine;
AA70 is serine, glycine, proline, alanine or aspartic acid;
AA71 is tryptophan, tyrosine, histidine, or glutamic acid;
AA72 is asparagine, lysine or serine;
AA73 is glycine, alanine, serine, threonine, asparagine or arginine;
AA74 is threonine, serine, asparagine, valine, arginine or glutamine;
AA75 is isoleucine, leucine or valine;
AA76 is alanine, serine, glutamine, glycine, leucine, glutamic acid, proline or non-existent;
AA77 is proline, alanine, threonine, glycine or non-existent;
AA78 is glycine, alanine, serine, threonine, asparagine, isoleucine or non-existent;
AA79 is glycine, alanine, serine or asparagine;
AA80 is glutamine, glycine, alanine, serine or non-existent;
AA81 is serine, threonine or asparagine;
AA82 is valine, alanine, threonine, isoleucine, glutamine or leucine;
AA83 is serine, glutamic acid, aspartic acid, asparagine, valine, glutamine or threonine;
AA84 is phenylalanine, isoleucine, valine or leucine;
AA85 is glycine or serene;
AA86 is phenylalanine, leucine, isoleucine or valine;
AA87 is asparagine, glutamine, valine, serine or leucine;
AA88 is glycine, alanine, isoleucine or valine;
AA89 is serine, threonine, asparagine, alanine, glutamic acid or valine;

AA90 is histidine, glycine, lysine, tyrosine, tryptophan, phenylalanine, asparagine or threonine;
AA91 is threonine, asparagine, glycine, serine, proline, alanine or lysine;
AA92 is glycine, serine, asparagine, alanine, valine or isoleucine;
AA93 is serine, threonine, glycine, alanine, valine, isoleucine, glutamine or non-existent;
AA94 is asparagine, glutamine, phenylalanine, serine, leucine or non-existent;
AA95 is threonine, alanine, proline, serine, asparagine, leucine or non-existent;
AA96 is alanine, serine, threonine, lysine, asparagine, proline, glutamic acid, glutamine, arginine, valine, phenylalanine or non-existent;
AA97 is proline, alanine, serine, arginine, valine, glutamic acid or non-existent;
AA98 is threonine, alanine, glutamic acid, serine, glutamine, tyrosine or non-existent;
AA99 is serine, alanine, glycine, arginine, lysine, valine, threonine, aspartic acid, asparagine or non-existent;
AA100 is phenylalanine, proline, cysteine, threonine or non-existent;
AA101 is threonine, serine, alanine, lysine, glutamine, arginine, aspartic acid or non-existent.
AA102 is valine, leucine, isoleucine, glycine or non-existent;
AA103 is asparagine, threonine, alanine, glycine or non-existent;
AA104 is glycine, threonine, isoleucine or non-existent;
AA105 is alanine, serine, threonine, valine, glycine, glutamic acid, leucine, isoleucine or non-existent;
AA106 is alanine, threonine, valine, serine, proline, isoleucine, glutamic acid or non-existent;
AA107 is cysteine, glutamine or non-existent;
AA108 is threonine, serine, alanine, glycine, aspartic acid, asparagine, glutamine or non-existent;

AA109 is glycine, alanine, valine, threonine, leucine, isoleucine or non-existent;
AA110 is serine, glycine, threonine, valine, alanine, arginine, lysine or non-existent.
CA002226898A 1998-03-25 1998-03-25 E1 endoglucanase cellulose binding domain Abandoned CA2226898A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA002226898A CA2226898A1 (en) 1998-03-25 1998-03-25 E1 endoglucanase cellulose binding domain

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CA002226898A CA2226898A1 (en) 1998-03-25 1998-03-25 E1 endoglucanase cellulose binding domain

Publications (1)

Publication Number Publication Date
CA2226898A1 true CA2226898A1 (en) 1999-09-25

Family

ID=29409066

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002226898A Abandoned CA2226898A1 (en) 1998-03-25 1998-03-25 E1 endoglucanase cellulose binding domain

Country Status (1)

Country Link
CA (1) CA2226898A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1179051A1 (en) * 1999-05-19 2002-02-13 Midwest Research Institute E1 endoglucanase variants y245g, y82r and w42r

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1179051A1 (en) * 1999-05-19 2002-02-13 Midwest Research Institute E1 endoglucanase variants y245g, y82r and w42r
EP1179051A4 (en) * 1999-05-19 2003-04-23 Midwest Research Inst E1 endoglucanase variants y245g, y82r and w42r

Similar Documents

Publication Publication Date Title
US5536655A (en) Gene coding for the E1 endoglucanase
EP2046819B1 (en) Methods of increasing secretion of polypeptides having biological activity
US20160186156A1 (en) Artificial cellulosomes comprising multiple scaffolds and uses thereof in biomass degradation
CN101558166A (en) Construction of highly efficient cellulase compositions for enzymatic hydrolysis of cellulose
CN104245926B (en) GH61 polypeptide variants and polynucleotides encoding same
EP3224356B9 (en) Methods for recombinant expression of beta-glucosidase gene
CN100575484C (en) A kind of beta-glucosidase and encoding gene thereof and application
CN112522249B (en) Fiber small body with improved catalytic activity, and assembly method and application thereof
CN109439642A (en) 61 protein gene of glycoside hydrolase Families and its albumen and preparation method
US7059993B2 (en) Thermal tolerant cellulase from Acidothermus cellulolyticus
Ferreira et al. The cellodextrinase from Pseudomonas fluorescens subsp. cellulosa consists of multiple functional domains
CN109280672A (en) Recombinant fiber element incision enzyme gene and its albumen and protein preparation method
US5712142A (en) Method for increasing thermostability in cellulase ennzymes
WO2003012109A1 (en) Thermal tolerant cellulase from acidothermus cellulolyticus
US20030108988A1 (en) Thermal tolerant avicelase from acidothermus cellulolyticus
EP2536842A1 (en) Bio-engineered multi-enzyme complexes comprising xylanases and uses thereof
US20120115235A1 (en) Enhanced cellulase expression in s. degradans
CA2226898A1 (en) E1 endoglucanase cellulose binding domain
EP0649471A1 (en) Recombinant cellulases
CN111484988B (en) Bifunctional enzyme with xylanase and feruloyl esterase activities, and coding gene and application thereof
WO2003012095A1 (en) Thermal tolerant exoglucanase from acidothermus cellulolyticus
WO2003012090A2 (en) Thermal tolerant avicelase from acidothermus cellulolyticus
CN108463551A (en) The cellulase derived from metagenomics
JP6131764B2 (en) Artificial scaffold material and system for retaining proteins and use thereof
Sanusi Identification and enzyme production of a cellulolytic Bacillus-strain isolated from moose (Alces alces) rumen

Legal Events

Date Code Title Description
FZDE Dead