Biotechnology 2ed (2015)

Views:
 
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

slide 1:

AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Biotechnology Second Edition David P. Clark Department of Microbiology Southern Illinois University Carbondale Illinois USA Nanette J. Pazdernik Washington University School of Medicine St. Louis Missouri USA Academic Cell is an imprint of Elsevier

slide 2:

Academic Cell is an imprint of Elsevier 125 London Wall London EC2Y 5AS UK 525 B Street Suite 1800 San Diego CA 92101-4495 USA 225 Wyman Street Waltham MA 02451 USA The Boulevard Langford Lane Kidlington Oxford OX5 1GB UK Copyright © 2016 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying recording or any information storage and retrieval system without permission in writing from the publisher. Details on how to seek permission further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher other than as may be noted herein. Notices Knowledge and best practice in this feld are constantly changing. As new research and experience broaden our understanding changes in research methods professional practices or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information methods compounds or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others including parties for whom they have a professional responsibility. To the fullest extent of the law neither the Publisher nor the authors contributors or editors assume any liability for any injury and/or damage to persons or property as a matter of products liability negligence or otherwise or from any use or operation of any methods products instructions or ideas contained in the material herein. ISBN: 978-0-12-385015-7 British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress For information on all Academic Cell publications visit our website at http://store.elsevier.com/ Typeset by TNQ Books and Journals www.tnq.co.in Printed and bound in the United States of America

slide 3:

ONLINE SUDY GUIDE An Online Study Guide is now available with your textbook containing a relevant journal article with a case study to focus understanding and discussion about each chapter. 1. To access the Online Study Guide as well as other online resources for the book please visit: http://booksite.elsevier. com/9780123850157 2. For instructor-only materials please visit: http://textbooks.elsevier.com/web/Manuals.aspxisbn 9780123850157 Academic

slide 4:

This book is dedicated to Donna. —DPC This book is dedicated to my children and husband. Their patience and understanding have given me the time and inspiration to research and write this text. —NJP

slide 5:

vii ACDEMIC CELL HOW WE GOT HERE In speaking with professors across the biological sciences and going to conferences we the editors at Academic Press and Cell Press saw how often journal content was being incorporated in the classroom. We understood the benefts students were receiving by being exposed to journal articles early: to add perspective improve analytical skills and bring the most current content into the classroom. We also learned how much additional preparation time was required on the part of instructors fnding the articles then obtaining the images for presentations and providing additional assessment. So we collaborated to offer instructors and students a solution and Academic Cell was born. We offer the benefts of a traditional textbook to serve as a reference to students and a framework to instructors but we also offer much more. With the purchase of every copy of an Academic Cell book students can access an online study guide containing relevant recent Cell Press articles and providing bridge material in the form of a case study to help ease them into the articles. In addition the images from the articles are available as zipped .jpg fles and we have optional test bank questions. We plan to expand this initiative as future editions will be further integrated with unique pedagogical features incorporating current research from the pages of Cell Press journals into the textbook itself.

slide 6:

xi From the simple acts of brewing beer and baking bread has emerged a feld now known as bio - technology. Over the ages the meaning of the word biotechnology has evolved along with our growing technical knowledge. Biotechnology began by using cultured microorganisms to create a variety of food and drinks despite its early practitioners not even knowing of the existence of the microbial world. Today biotechnology is still defned as any application of living organ - isms or bioprocesses to create new products. Although the underlying idea is unchanged the use of genetic engineering and other modern scientifc techniques has revolutionized the area. The felds of genetics molecular biology microbiology and biochemistry are merging their respective discoveries into the expanding applied feld of biotechnology and advances are occurring at a record pace. Two or three years of research can dramatically alter the approaches that are of practical use. For example the simple discovery that double-stranded RNA can block expression of any gene with a matching sequence has revolutionized how we study and apply genetic interactions in less than a ten-year period. This rapid increase in knowledge is very hard to incorporate into a textbook format and often instructors who teach advanced molecular biology classes rely on the primary research to teach students novel concepts and applications. This type of teaching is diffcult and requires many hours to plan and organize. The new partnership between Academic Press and Cell Press has adopted a solution to teach- ing advanced molecular biology and biotechnology courses. The partnership combines years of textbook publishing experience with the most relevant and high impact research. What has emerged is a new teaching paradigm. In Biotechnology the basic ideas and methodologies are explained using very clear and concise language. These techniques are supplemented with a wide variety of diagrams and illustrations to simplify the complex biotechnology processes. These basics are then supported with a Biotechnology online study guide that not only tests the student’s knowledge of the textbook chapter but also contains primary research articles. The articles are chosen from the Cell Press family of journals which includes such high- impact journals as Cell Molecular Cell and Current Biology. The articles expand upon a topic presented in each chapter or provide an exemplary research paper for that particular chapter. The entire full-color research article is included online. In addition to the article itself the Biotechnology study guide includes a synopsis of the research paper. The synopsis includes a thorough discussion of the relevant background information. This material is often absent from primary research articles because their authors assume that readers are also experts. Then each synopsis breaks the paper into sections explaining each individual experiment separately. Each experiment is explained by defning the underlying hypothesis or question the methods used to study the question and the results. The fnal sec - tion of the synopsis provides the overall conclusions for the paper. This approach reinforces the basic scientifc method. The instructor does not have to fnd an article create a presentation on the background and then work with the student to explain each of the methods and results. The study guide synopsis provides all of this information already. The online format ensures that only the most recent papers are associated with the chapter. The combination of the online study guide with the newest relevant research and a solid basic textbook provides the instructor with the best of both worlds. You can teach students the basic concepts using the textbook and then use the relevant research paper to stretch the student’s knowledge of current research in the feld of biotechnology. PREFACE

slide 7:

xiii ACKNOWLEDGMENTS We would like to thank the following individuals for their help in providing information suggestions for improvement and encouragement: Laurie Achenbach Rubina Ahsan Phil Cunningham Donna Mueller Dan Nickrent Holly Simmonds and Dave Pazdernik. Special thanks go to Marshall Spector for helping us understand bioethics to Michelle McGehee for writing the questions and online supplements and to Karen Fiorino for creating most of the original artwork for the frst edition. Alex Berezow was responsible for writing a major part of the following chapters: Chapter 16 Transgenic animals Chapter 22 Biowarfare and bioterrorism and Chapter 24 Bioethics in biotechnology.

slide 8:

xv INTRODUCION MODERN BIOTECHNOLOGY RELIES ON ADV ANCES IN MOLECULAR BIOLOGY AND COMPUTER TECHNOLOGY Traditional biotechnology goes back thousands of years. It includes the selective breeding of livestock and crop plants as well as the invention of alcoholic beverages dairy products paper silk and other natural products. Only in the past couple of centuries has genetics emerged as a feld of scientifc study. Recent rapid advances in this area have in turn allowed the breeding of crops and livestock by deliberate genetic manipulation rather than trial and error. The so-called green revolution of the period from 1960 to 1980 applied genetic knowledge to natural breeding and had a massive impact on crop productivity in particular. Today plants and animals are being directly altered by genetic engineering. New varieties of several plants and animals have already been made and some are in agricultural use. Animals and plants used as human food sources are being engineered to adapt them to conditions that were previously unfavorable. Farm animals that are resistant to disease and crop plants that are resistant to pests are being developed in order to increase yields and reduce costs. The impact of these genetically modifed organisms on other species and on the environment is presently a controversial issue. Modern biotechnology applies not only modern genetics but also advances in other sciences. For example dealing with vast amounts of genetic information depends on advances in computing power. Indeed the sequencing of the human genome would have been impossible without the development of ever more sophisticated computers and software. It is sometimes claimed that we are in the middle of two scientifc revolutions one in information technology and the other in molecular biology. Both involve handling large amounts of encoded information. In one case the information is human made or at any rate man-encoded and the mechanisms are artifcial the other case deals with the genetic information that underlies life. However there is a third revolution that is just emerging—nanotechnology. The develop- ment of techniques to visualize and manipulate atoms individually or in small clusters is opening the way to an ever-fner analysis of living systems. Nanoscale techniques are now beginning to play signifcant roles in many areas of biotechnology. This raises the question of what exactly defnes biotechnology. To this there is no real answer. A generation ago brewing and baking would have been viewed as biotechnology. Today the application of modern genetics or other equivalent modern technology is usually seen as necessary for a process to count as “biotechnology.” Thus the defnition of biotechnology has become partly a matter of fashion. In this book we regard modern biotechnology as resulting in a broad manner from the merger of classical biotechnology with modern genetics molecular biology computer technology and nanotechnology. The resulting feld is of necessity large and poorly defned. It includes more than just agriculture: it also affects many aspects of human health and medicine such as vaccine development and gene therapy. We have attempted to provide a unifed approach that is based on genetic information while at the same time indicate how biotechnology has begun to sprawl often rather erratically into many related felds of human endeavor.

slide 9:

CHAPTER 1 Biotechnology Copyright © 2016 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/B978-0-12-385015-7.00001-6 1 Basics of Biotechnology Advent of the Biotechnology Revolution Chemical Structure of Nucleic Acids Packaging of Nucleic Acids Bacteria as the Workhorses of Biotechnology Escherichia coli Is the Model Bacterium Many Bacteria Contain Plasmids Other Bacteria in Biotechnology Basic Genetics of Eukaryotic Cells Y east and Filamentous Fungi in Biotechnology Y east Mating T ypes and Cell Cycle Multicellular Organisms as Research Models Caenorhabditis elegans a Small Roundworm Drosophila melanogaster the Common Fruit Fly Zebrafsh Are Models for Developmental Genetics Mus musculus the Mouse Is Genetically Similar to Humans Animal Cell Culture In Vitro Arabidopsis thaliana a Model Flowering Plant Viruses Used in Genetics Research Subviral Infectious Agents and Other Gene Creatures

slide 10:

Basics of Biotechnology 2 ADVENT OF THE BIOTECHNOLOGY REVOLUTION Biotechnology involves the use of living organisms in industrial processes—particularly in agriculture food processing and medicine. Biotechnology has been around ever since humans began manipulating the natural environment to improve their food supply hous- ing and health. Biotechnology is not limited to humankind. Beavers cut up trees to build homes. Elephants deliberately drink fermented fruit to get an alcohol buzz. People have been making wine beer cheese and bread for centuries Fig. 1.1. For wine the earliest evidence of wine production has been dated to c. 6000 BC. All these processes rely on microorganisms to modify the original ingredients. Ever since the beginning of human civilization farmers have chosen higher yielding crops by trial and error so that many modern crop plants have much larger fruit or seeds than their ancestors Fig. 1.2. FIGURE 1.1 Traditional Biotechnology Products Bread cheese wine and beer have been made worldwide using microorganisms such as yeast. Photo taken by Karen Fiorino Clay Lick Creek Pottery IL USA. FIGURE 1.2 Teosinte versus Modern Corn Since early civilization people have improved many plants for higher yields. Teosinte smaller cob and green seeds is considered the ancestor of commercial corn larger cob a blue-seeded variety is shown. Courtesy of Wayne Campbell Hila Science Camp. We think of biotechnology as modern because of recent advances in molecular biology and genetic engineering. Huge strides have been made in our understanding of microorganisms plants livestock as well as the human body and the natural environment. This has caused an explosion in the number and variety of biotechnology products. Face creams contain antioxidants—supposedly to fght the aging process. Genetically modifed plants have genes inserted to protect them from insects thus increasing the crop yield while decreasing the amount of insecticides used. Medicines are becoming more specifc and compatible with our physiology. For example insulin for diabetics is now genuine human insulin although produced by genetically modifed bacteria. Almost everyone has been affected by the recent advances in genetics and biochemistry. Mendel’s early work that described how genetic characteristics are inherited from one generation to the next was the beginning of modern genetics see Box 1.1. Next came the discovery of the chemical material of which genes are made—DNA deoxyribonucleic acid. This in turn led to the central dogma of genetics: the concept that genes made of DNA are expressed as an RNA ribonucleic acid intermediary that is then decoded to make proteins. These three steps are universal applying to every type of living organism on earth. Yet these three steps are so malleable that life is found in almost every available niche on our planet. Biotechnology affects all of our lives and has altered everything we encounter in life.

slide 11:

CHAp TER 1 3 As a young man Mendel spent his time doing genetics research and teaching math physics and Greek to high school children in Brno now in the Czech Republic. Mendel studied the inheritance of various traits of the common garden pea Pisum sativum because he was able to raise two generations a year. He studied many different physical traits of the pea such as fower color fower position seed color and shape and pod color and shape. Mendel grew different plants next to each other looking for traits that mixed together. Luckily the traits he studied were each due to a single gene that was either dominant or recessive although he did not know this at the time. Consequently he never saw them “mix.” For example when he grew yellow peas next to green peas the offspring looked exactly like their parents. This showed that traits do not blend in the offspring which was a common theory at the time. Next Mendel moved pollen from one plant to another with different traits. He counted the number of offspring that inherited each trait and found that they were inherited in specifc ratios. For example when he cross-pollinated the yellow and green pea plants their offspring the F 1 generation was all yellow. Thus the yellow trait must dominate or mask the green trait. He then let the F 1 plants produce offspring and grew all of the seeds. These the F 2 generation segregated into 3/4 yellow and 1/4 green. When green seeds reappeared after skipping a generation Mendel concluded that a “factor” for the trait—what we call a gene today—must have been present in the parent even though the trait was not actually displayed. Mendel demonstrated many principles that form the basis of mod- ern genetics. First units or factors now called genes for each trait are passed on to successive generations. Each parent has two cop- ies of each gene but contributes only one copy of the gene to each offspring. This is called the principle of segregation. Second the principle of independent assortment states that different offspring from the same parents can get separate sets of genes. The same phenotype the observable physical traits can be represented by dif- ferent genotypes combinations of genes. In other words although a gene is present the corresponding trait may not be seen in each generation. When Mendel began these experiments he used pure- bred pea plants that is each trait always appeared the same in each generation. So when he frst crossed a yellow pea with a green pea each parent had two identical copies or alleles of each gene. The green pea had two green alleles and the yellow pea had two yellow alleles. Consequently each F 1 offspring received one yellow allele and one green allele. Despite this the F 1 plants all had yellow peas. Thus yellow is dominant to green. Finally when the F 1 generation was self- pollinated the F 2 plants included some that inherited two recessive green alleles and had a green phenotype Fig. A. Mendel published these results but no one recognized the signifcance of his research until after his death. Later in life he became the abbot of a monastery and did not pursue his genetics research. Box 1.1 Gregor Johann Mendel 1822–1884: Founder of Modern Genetics YG G G YG YG YELLOW Green allele Chromosomal location Yellow allele YG F 1 : YELLOW PEAS X GREEN PEAS A YY FIGURE A Relationship of Genotype and Phenotype A Each parent has two alleles either two yellow or two green. Any offspring will be heterozygous each having a yellow and a green allele. Since the yellow allele is dominant the peas look yellow. B When the heterozygous F 1 offspring self-fertilize the green phenotype re-emerges in one-fourth of the F 2 generation. Continued

slide 12:

Basics of Biotechnology 4 CHEMICL SRUCURE OF NUCLEIC ACIDS The upcoming discussions introduce the organisms used extensively in molecular biology and genetics research. Each of these has genes made of DNA that can be manipulated and studied. Thus a discussion of the basic structure of DNA is essential. The genetic information carried by DNA together with the mechanisms by which it is expressed unifes every crea - ture on earth and is what determines our identity. Nucleic acids include two related molecules: deoxyribonucleic acid DNA and ribonucleic acid RNA. DNA and RNA are polymers of subunits called nucleotides and the order of these nucleotides determines the information content. Nucleotides have three components: a phosphate group a fve-carbon sugar and a nitrogen-containing base Fig. 1.3. The fve- carbon sugar or pentose is different for DNA and RNA. DNA has deoxyribose whereas RNA uses ribose. These two sugars differ by one hydroxyl group. Ribose has a hydroxyl at the 2′ position that is missing in deoxyribose. There are fve potential bases that can be attached to the sugar. In DNA guanine cytosine adenine or thymine is attached to the sugar. In RNA thymine is replaced with uracil see Fig. 1.3. Each phosphate connects two sugars via a phosphodiester bond. This connects the nucleotides into a chain that runs in a 5′ to 3′ direction. The 5′-OH of the sugar of one nucle- otide is linked via oxygen to the phosphate group. The 3′-OH of the sugar of the following nucleotide is linked to the other side of the phosphate. The nucleic acid bases jut out from the sugar phosphate backbone and are free to form connections with other molecules. The most stable structure occurs when another single strand of nucleotides aligns with the frst to form a double-stranded molecule as seen in the DNA double helix. Each base forms hydrogen bonds to a base in the other strand. The two strands are antiparallel that is they run in opposite directions with the 5′ end of the frst strand opposite the 3′ end of its partner and vice versa. Box 1.1 Gregor Johann Mendel 1822–1884: Founder of Modern Genetics—cont’d Chromosomal location Green allele Green allele F2: HETEROZYGOUS SELF-FERTILIZATION B YY Y Y G G YG YG GG GREEN FIGURE A Relationship of Genotype and Phenotype—cont’d B When the heterozygous F 1 offspring self-fertilize the green phenotype re-emerges in one-fourth of the F 2 generation.

slide 13:

CHAp TER 1 5 GC C C C C T T T T T A A A A A 5 5 3 3 G G G G O O CH 3 THYMINE CYTOSINE GUANINE O − O − O − O O H O H H H Deoxyribose Deoxyribose Deoxyribose Deoxyribose N N H H N P N N N N N N N ADENINE H N N H N N N HH O HH H 2 C OH OH H OH DEOXYRIBOSE PHOSPHODIESTER LINKAGE PHOSPHATE O H H CH 2 Base Base O 3 4 4 2 2 1 1 O O CH 2 O O O − O 5 5 3 P STRUCTURE OF DNA A B HH O HH H 2 C OH A G U C A A U C OH OH Base 5 3 RIBOSE STRUCTURE OF RNA URACIL N N Ribose O O FIGURE 1.3 Nucleic Acid Structure A DNA has two strands antiparallel to each other. The structure of the subcomponents is shown to the sides. B RNA is usually single-stranded and has two chemical differences from DNA. First an extra hydroxyl group -OH is found at the 2′ position of ribose and second thymine is replaced by uracil.

slide 14:

Basics of Biotechnology 6 The bases are of two types: purines guanine and adenine and pyrimidines cytosine and thymine. Each base pair consists of one purine connected to a pyrimidine via hydrogen bonds. Guanine pairs only with cytosine G-C via three hydrogen bonds. Adenine pairs only with thymine A-T in DNA or uracil A-U in RNA. Because an adenine–thymine A-T or adenine–uracil A-U base pair is held together with only two hydrogen bonds it requires less energy to break the connection between the bases than in a G-C pair. Double-stranded DNA takes the three-dimensional shape that has the lowest energy con- straints. The most stable shape is a double-stranded helix. The helix turns around a central axis in a clockwise manner and is considered a right-handed helix. One complete turn is 34 Å in length and has about 10 base pairs. DNA is not static but can alter its conformation in response to various environmental changes. The typical conformation just described is the B-form of DNA and is most prevalent in aqueous environments with low salt concentra- tions. When DNA is in a high-salt environment the helix alters making an A-form that has closer to 1 1 base pairs per turn. Another conformation of DNA is the Z-form which has a left-handed helix with 12 base pairs per turn. This form occurs when certain proteins bind to the DNA in regions around genes and induce the change in shape. In this form the phos- phate backbone has a zigzag conformation. These forms are biologically relevant under cer- tain conditions but the exact role the shape of DNA plays in cellular function is still under investigation. DNA and RNA are both structures with alternating phosphate and sugar residues linked to form a backbone. Base residues attach to the sugar and stick out from the backbone. These bases can base-pair with another strand to form double-stranded helices. pACKAGING OF NUCLEIC ACIDS Most bacteria have just a few thousand genes each approximately 1000 nucleotides long. These are carried on a chromosome that is a single giant circular molecule of DNA although there are exceptions. A single DNA double helix with this many genes is about 1000 times too long to ft inside a bacterial cell without being condensed somehow in order to take up less space. In bacteria the double helix undergoes supercoiling to condense it. Supercoiling is induced by the enzyme DNA gyrase which twists the DNA in a left-handed direction so that about 200 nucleotides are found in one supercoil. The twisting causes the DNA to condense. Extra supercoils are removed by topoisomerase I. The supercoiled DNA forms loops that connect to a protein scaffold see Fig. 1.4. In humans and plants much more DNA must be packaged so just adding supercoils is not suffcient. Eukaryotic DNA is wound around proteins called histones frst. Histones have a positive charge to them and this neutralizes the negatively charged phosphate backbone. DNA plus histones looks like beads on a string and is called chromatin. Each bead or nucleosome has about 200 base pairs of DNA and nine histones—two H2A two H2B two H3 two H4 and one H1. All the histones form the “bead” except for H1 which connects the beads by holding the DNA in the linker region. The histones are highly conserved proteins that are found in all eukaryotes and in simplifed form in archaebacteria. Histone tails stick out from the nucleosome and are important in regula- tion. In regions of DNA that are expressed the histones are loose allowing regulatory proteins and enzymes access to the DNA. In regions that are not expressed the histones

slide 15:

CHAp TER 1 7 Proteins 30 nm fiber Nucleosomes 6 per helical twist Chromatin SUPERCOILED SUPERCOILED NUCLEOSOME PROKARYOTE EUKARYOTE H1 H4 H2A H2B H3 H4 H3 Scaffold B A FIGURE 1.4 Packaging of DNA in Bacteria and Eukaryotes A Bacterial DNA is supercoiled and attached to a scaffold to condense its size to ft inside the cell. B Eukaryotic DNA is wrapped around histones to form a nucleosome. Nucleosomes are further condensed into a 30-nm fber attached to proteins at MAR sites. are condensed preventing other proteins from accessing the DNA this structure is called heterochromatin. Chromatin is not condensed enough to ft the entire eukaryotic DNA genome into the nucleus. It is coiled into a helical structure the 30-nanometer fber which has about six nucleosomes per turn. These fbers loop back and forth and the ends of the loops are attached to a protein scaffold or chromosome axis. These attachments occur at matrix attachment regions MAR and are mediated by MAR proteins. These sites are 200–1000

slide 16:

Basics of Biotechnology 8 base pairs in length and have 70 A/T. The structure of A/T-rich DNA is slightly bent and these bends promote the connection between proteins in the matrix and the DNA. Often enhancer and regulatory elements are also found at these regions suggesting that the struc- ture here may favor the binding of protein activators or repressors. This structure refers to chromosomes during normal cellular growth. When a eukaryotic chromosome readies for mitosis and cell division it condenses even more. The nature of this condensation is still uncertain. BACERIA A THE WORKHORSE OF BIOTECHNOLOGY DNA is the common thread of life. DNA is found in every living organism on Earth and even in some entities that are not considered living such as viruses—see later discus- sion. Only a tiny selection of living organisms has been studied in the molecular biol- ogy laboratory. These few chosen species have special traits or features that make them easy to grow study and manipulate genetically. Each of the model organisms has had its entire genome sequenced. The model organisms are used both as a guide to understand other related organisms not investigated in detail and for various more practical biotech- nological purposes. Bacteria are the workhorse of model organisms. Bacteria live everywhere on the planet and are an amazing part of the ecosystem. There are an estimated 5 × 10 30 bacteria on the Earth with about 90 of these living in the soil and the ocean subsurface. If this estimate is accurate then about 50 of all living matter is microbial. Bacteria have been found in every environmental niche. Some bacteria live in icy lakes of Antarctica that only thaw a few months each year. Others live in extremely hot environments such as hot sulfur springs or the thermal vents at the bottom of the ocean Fig. 1.5. There has been great interest in these extreme microbes because of their physiological differences. For example Thermus aquaticus a bacterium from hot springs can survive at temperatures near boiling point and at a pH near 1. Like others this bacterium replicates its DNA using the enzyme DNA polymerase. The difference is that T. aquaticus DNA polymerase has to function at high temperatures and is therefore considered thermosta- ble. Molecular biologists have exploited this enzyme for procedures like polymerase chain reaction or PCR see Chapter 4 which is carried out at high temperatures. Other bacteria from extreme environ- ments provide interesting proteins and enzymes that may be used for new biotechnological applications. Hydrothermal vents found on the ocean foor have revealed a fascinating array of novel organisms see Fig. 1.5. Water temperatures in different vents range from 25°C to 450°C. Bacteria are highly evolved into every niche of the planet and provide researchers with many unique properties. FIGURE 1.5 Hydrothermal Vent Mineral-rich fuid is escaping from an opening in the bottom of the ocean along the East Pacifc Rise which has temperatures as high as 403°C. Surprisingly bac- teria are able to survive in this high-heat environ- ment. The vent base is covered with a bed of tube worms and a probe surrounds the vent. Photo courtesy of NOAA PMEL EOI program and obtained from http://www.pmel. noaa.gov/eoi/gallery/. DNA must be condensed by supercoiling and wrapping around nucleosomes to form chromatin and fnally attached to protein scaffolds in order to ft into the nucleus.

slide 17:

CHAp TER 1 9 ECHERICHIA COLI IS THE MODEL BACERIUM Although extreme bacteria are interesting and useful more typical bacteria are the routine workhorses for research in molecular biology and biotechnology. The most widely used is Escherichia coli a rod-shaped bacterium about 1 by 2.5 microns in size. E. coli normally inhabits the colon of mammals including humans Fig. 1.6. E. coli is a Gram-negative bacterium that has an outer membrane a thin cell wall and a cytoplasmic membrane surrounding the cellular components. Like all prokaryotes E. coli does not have a nucleus or nuclear membrane and its chromosome is free in the cytoplasm. The outer surface of E. coli carries about 10 fagella that propel the bacteria to different locations and thousands of pili that allow the cells to attach to surfaces. GRAM-NEGATIVE e.g. E. coli Cytoplasmic membrane Ribosomes Pili Flagella Single condensed chromosome A B Periplasmic space Outer membrane Polysome Storage granule inclusion body FIGURE 1.6 Subcellular Structure of Escherichia coli A Scanning electron micrograph of E. coli. The rod-shaped bacteria are approximately 0.6 microns by 1–2 microns. Courtesy of Rocky Mountain Laboratories NIAID NIH. B Gram-negative bacteria have three structural layers surrounding the cytoplasm. The outer membrane and cytoplasmic membrane are lipid bilayers and the cell wall is made of peptidogl ycan. Unlike eukaryotes no membrane surrounds the chromosome leaving the DNA readily accessible to the cytoplasm.

slide 18:

Basics of Biotechnology 10 23 1 C 45 Time hours 2 1 4 8 16 Cells per mL x 10 8 Population may reach 5 x 10 9 cells/mL Doubling time approx. 45 min. One doubling Bacteria provide many advantages for research. Bacteria have growth characteristics that are very useful when large num- bers of identical cells are needed. A culture of bacteria can be grown in a few hours and can contain up to 109 bacterial cells per milliliter. Growth can be strictly controlled that is the amount and types of nutrients temperature and time may all be adjusted based on the desired result. E. coli are so easy to grow that they can grow in mineral salts water and a sugar source. The cells can be grown in liquid cultures or as solid cultures on agar plates Fig. 1.7. Liquid cultures can be stored in a refrigerator for weeks and the bacteria will not be harmed. Additionally bacteria can be frozen at −70°C for 20 years or more so different strains can be maintained without having to constantly culture them. E. coli are normally grown in air but can grow anaerobically if an experiment requires that oxygen be eliminated. Bacteria are single-celled organisms. The cells in a bacterial culture are identical in contrast to mammalian cells where even a single tissue contains many different types of cells. Each E. coli has one circular chromosome with one copy each of about 4000 genes. This is signifcantly fewer than in humans who have two copies each of about 25000 genes on 46 chromosomes. This makes genetic analysis much easier in bacteria Fig. 1.8. MANY BACERIA CONTAIN pLMIDS Because many different types of bacteria are found in every environment competi- tion for nutrients and habitat occurs regularly. Many bacteria compete using a form of Escherichia coli is the model bacterial organism used in basic molecular biology and biotechnology research. The organism is simple in structure grows easily in the laboratory and contains very few genes. Although the media often report about E. coli-contaminated food E. coli is usually harmless. However occasional strains of E. coli are pathogenic and secrete toxins that cause diarrhea by damaging the intestinal wall. This results in fuid being released into the colon rather than being extracted. E. coli O157:H7 is a particularly potent patho- genic strain of E. coli with two toxin genes that can cause bloody diarrhea. It is especially dangerous to young children the elderly and those with compromised immune systems. AB FIGURE 1.7 Bacteria Are Easy to Grow A Bacteria growing in liquid culture. B Bacteria growing on agar. This photo shows a mixture of bacterial colonies from the blue/white method for screening plasmid insertions. C Fast-growing bacteria can double in numbers in short periods. Here the number of bacteria double after approxi- mately 45 minutes and reach a density of 5 × 10 9 cells/mL in about 5 hours.

slide 19:

CHAp TER 1 11 biological warfare and secrete toxins called bacteriocins which kill neighboring bacteria. For example nisin a bacteriocin from Lactococcus lactis kills other food-borne pathogens such as Listeria monocytogenes and Staphylococcus aureus. E. coli also pro- duce bacteriocins called colicins. Bacteriocin is a general term whereas colicin specifcally refers to toxins pro - duced by E. coli. Sometimes colicin is used as a general term but this is not strictly correct. E. coli makes differ- ent types of colicins such as colicin E1 or colicin M to kill neighboring cells. Colicins act by two main mecha- nisms. Some puncture the cell membrane allowing vital cellular ions to leak out and destroying the proton motive force that drives ATP production. Others encode nucleases that degrade DNA and RNA. These toxins do not affect their producer cells because the cell that makes the toxin also makes an immunity protein that recog- nizes the toxin and neutralizes it. The ability to make colicin is due to the presence of an extrachromosomal genetic element called a plasmid. These are small rings of DNA that exist within the cytoplasm of bacteria and some eukaryotes such as yeast. A colicin-producing plasmid has several genes: the gene for the colicin the gene for the immunity protein and genes that control plasmid replication and copy number. In addition all plasmids contain an origin for DNA replication. When the host cell divides the plasmid divides in step Fig. 1.9. These colicin plasmids are used extensively for molecular biology. The colicin genes have been removed and the remaining segments have been greatly modifed so that other genes can be expressed effciently in bacteria. The resulting recombinant plas- mids are the crux of all molecular biology. All the modern advances in biotechnology started with the ability to express heterologous proteins in bacteria see Chapter 3 for cloning vectors. 0/100 thrABC lac gal trpABCDE terC terB hisABCDEFGHI argA argG malQPT oriC argBCEH malEFGKM 7.8 17.0 25 28.3 34.6 36.2 45.1 50 63.5 71.4 75 76.5 84.5 89.5 91.3 FIGURE 1.8 The E. coli Chromosome The E. coli chromosome is divided into 100 map units arbitrarily starting at the thrABC operon. Various genes and their locations are shown. The replication origin oriC and termination zone terB and terC are indicated. Another useful trait of E. coli is the presence of extrachromosomal elements called plasmids. These small rings of DNA are easily removed from the bacteria modifed by adding or modifying genes and reinserted into a new bacterial cell where new genes can be evaluated. OTHER BACERIA IN BIOTECHNOLOGY Other bacteria besides E. coli are used to produce biotechnology products. Bacillus subtilis is a Gram-positive bacterium that is used as a research organism to study the biology and genetics of Gram-positive organisms. Bacillus can form hard spores that can survive almost indefnitely. It is also used in biotechnology. For industrial production secret - ing proteins through the single membrane of Gram-positive bacteria is much easier than secreting them through the double membrane of Gram-negative bacteria therefore Bacillus strains are used to make extracellular enzymes such as proteases and amylases on a large scale. Pseudomonas putida is a bacterium that normally lives in water. It is a Gram-negative bacterium like E. coli but is commonly used in environmental studies because it is able to degrade many aromatic compounds. Streptomyces coelicolor is a soil bacterium that is Gram positive. This organism degrades cellulose and chitin and also produces a large number of different antibiot- ics. Another example of a common industrial microorganism is Corynebacterium glutamicum which is used to produce L-glutamic acid and L-lysine for the biotechnology industry.

slide 20:

Basics of Biotechnology 12 Many different bacteria are used for biotechnology research because of their unique qualities. Condensed chromosome Plasmid ColE1 6466 bp MOBILIZATION GENES −required for mobilization by F-plasmid COLICIN E1 GENE − cea COLICIN IMMUNITY GENE − imm kil − required for cell lysis and colicin release Origin region Gene for Rom protein FIGURE 1.9 Plasmids Encode the Genes for Colicin ColE1 plasmids are extrachro- mosomal DNA elements that are maintained by bacteria for producing a toxin cea gene. They also carry genes for toxin release and immunity. These plasmids have been modifed to carry genes useful in genetic engineering. BAIC GENEIC OF EUKARYOTIC CELL Most eukaryotes are diploid that is they have two homologous copies of each chromosome. This is the case for humans mice zebrafsh Drosophila Arabidopsis Cae- norhabditis elegans and most other eukaryotes. Having more than two copies of the genome is extremely rare in animals and only one rat from Argentina has been discovered with four copies of its genome. On the other hand many plants especially crop plants are polyploid and contain multiple copies of their genomes. For example ancestral wheat has seven pairs of chromosomes i.e. its diploid state 2n 14 whereas the wheat grown for food today has 42 chro- mosomes. Thus modern wheat is hexaploid. Domestic oats peanuts sugar cane white potato tobacco and cotton also have four to six copies of their genome. This makes genetic analysis very diffcult In animals there is a division between germline and somatic cells. Germline cells are the only ones that divide to give haploid descendents. Diploid germline cells give rise to haploid gametes—the eggs and sperm that propagate the species—by undergoing meiosis. After mat- ing the two haploid cells fuse to become diploid forming the zygote. Somatic cells on the other hand are normally diploid and make up the individual. Any mutations in a somatic cell disappear when the organism dies whereas a mutation in a germline cell is passed on to the next generation Fig. 1.10. If a somatic cell is mutated early in development all the somatic cells derived from this ancestral cell will receive the defect. Suppose this ancestral cell is the precursor of the left eye and that this defect prevents the manufacture of the brown pig- ment responsible for brown eyes. The right eye will be brown but the mutant left eye will be blue Fig. 1.1 1. Blue eyes are not due to blue pigment they simply lack the brown pigment. People or animals with eyes that don’t match are unusual but not incredibly rare. Such events are known as somatic muta- tions. They are not passed on to the offspring. Nonetheless mutations in somatic cells can cause severe problems as they are the cause of most cancers see Chapter 19. In plants the division between germline and somatic cells is less distinct because many plant cells are totipotent. A single plant cell has the ability to form any part of the plant repro- ductive or not. This is not true for the majority of animal cells. Nevertheless many animal cells do have the potential to form several different types of cells. A cell able to differentiate into multiple cell types is called a stem cell. Research on embryonic stem cells has become a hot political topic because of the potential ability to form an embryo. However researching adult stem cells holds much promise see Chapter 18. For example researchers are hoping to identify stem cells that can form new neurons so that patients with spinal cord injuries can be cured.

slide 21:

CHAp TER 1 13 YEAT AND FILMENTOUS FUNGI IN BIOTECHNOLOGY Fungi are incredibly useful microorganisms in the world of biotechnology. Anyone who has grown mold on a loaf of bread understands the ease with which these are cultured. Fungi are traditionally used in food applications. Yeasts are used in baking and brewing and other fungi in cheese mak- ing mushroom cultivation and making foods such as soy sauce. Cheese production uses a variety of fungi. For example a mold called Penicillium roqueforti makes the blue veins in cheeses such as Roquefort and Penicil- lium candidum Penicillium caseicolum and Penicillium camemberti make the hard surfaces of Camembert and Brie cheeses. Soy sauce is made from soybeans that are fermented with Aspergillus oryzae. Fungi are responsible for the production of many industrial chemicals and pharmaceuticals. The most famous is penicillin which is manufactured by Penicillium notatum in large tanks called bioreactors. Citric acid is a chemical additive to food that occurs naturally in lemons. It gives the fruit their sour taste. Rather than extracting citric acid from lemons it has been manufactured since about 1923 by culturing Aspergillus niger. Much like bacteria yeast has a two-fold purpose in biotechnology. It offers many of the same advantages as bacteria with the additional advantage of being a eukaryote. Yeasts are also important for production of biotechno- logical products. The most common research strain of yeast is brewer’s or baker’s yeast Saccharomyces cerevisiae. This is the same little creature that makes the alcohol in beer and makes bread soft and fuffy by releasing carbon dioxide bubbles that get trapped in the dough. Yeast is a single-celled eukaryote that has its cellular components com- partmentalized Fig. 1.12. Like all eukaryotes yeasts have their genomes encased in a nuclear envelope. The nucleus and cytoplasm are separated but they communicate with each other through gated channels called nuclear pores. Saccharomyces cerevisiae has 16 linear chromosomes that have telomeres and centromeres two features not found in bacteria. The yeast genome was the frst eukaryotic genome sequenced in its entirety. It has 12 Mb of DNA with about 6000 different genes. Unlike higher eukaryotes yeast genes have very few intervening sequences or introns see Chapter 2. Out- side the nucleus yeast has organelles including the endoplasmic reticulum Golgi apparatus and mitochondria to carry out vital cellular functions. Like bacteria yeast grow as single cells. A culture of yeast has identical cells making genetic and biochemical analysis easier. The culture medium can either be liquid or solid and the amount and composition of nutrients can be controlled. The temperature and time of growth may also be controlled. Under ideal circumstances yeast doubles in number in about 90 minutes as opposed to E. coli which doubles in 20 minutes. Although slower than bacte- ria the growth of yeast is fast in comparison to other eukaryotes. Like bacteria yeast cells can be stored for weeks in the refrigerator and may be frozen for years at −70°C. Much like bacteria some yeast cells also have extrachromosomal elements within their nuclei. The most common element is a plasmid called the 2-micron circle. Like the Eukaryotic cells are more complex than bacteria. Eukaryotic cells are also specialized that is some cells are for reproduction some cells are stem cells that can differentiate into somatic cells and some cells are specialized in function and shape. Egg Mother Father Early embryo Somatic cells BODY NEXT GENERATION Germ cells Sperm Egg or Sperm FIGURE 1.10 Somatic versus Germline Cells During development cells either become somatic cells which form the body or germline cells which form either eggs or sperm. The germline cells are the only cells whose genes are passed on to future generations.

slide 22:

Basics of Biotechnology 14 Early embryo Somatic stem cells Germ line Other organs Mutation All cells in this lineage have the defective gene Blue left eye Brown right eye Sperm or eggs to next generation passing on genetic information for brown eyes. FIGURE 1.11 Somatic Mutations The early embryo has the same genetic information in every cell. During division of a somatic cell a mutation may occur that affects the organ or tissue it gives rise to. Because the mutation was isolated in a single precursor cell other parts of the body and the germline cells will not contain the mutation. Consequently the mutation will not be passed on to any offspring. Cell wall Cytoplasmic membrane Nucleus Golgi complex Mitochondrion Storage vacuole Nuclear pore DNA Endoplasmic reticulum Ribosomes Mitochondrion Storage granules Storage vacuole Bud of a daughter cell forming FIGURE 1.12 Structure of Yeast Cell This yeast cell undergoing division is starting to parti- tion components into the bud. Eventually the bud will grow in size and be released from the mother lower oval leaving a scar on the surface of the cell wall.

slide 23:

CHAp TER 1 15 chromosomes of all eukaryotes the DNA of this plasmid is also wound around histones. This element has been exploited as a cloning vector see Chapter 3 to express heterologous genes in yeast. The plasmid has two perfect DNA repeats FRT sites on opposite sides of the circle. The plasmid also has a gene for Flp protein also called Flp recombinase or fippase . This enzyme recognizes the FRT sites and fips one half of the plasmid relative to the other via DNA recombination Fig. 1.13. Flippase recombines any DNA segments carrying FRT sites no matter what organism they are in. Consequently fippase is used in transgenic engi - neering in higher organisms see Chapter 16. In plants a related system Cre recombinase plus LoxP sites is used in a similar way see Chapter 15. YEAT MATING TYpE AND CELL CYCLE Yeast cells grow and divide by budding. Cellular organelles such as mitochondria and some cellular proteins are partitioned into the growing bud. Finally mitosis creates another nucleus and when the bud has reached a suffcient size the new daughter cell is released leaving a scar on the surface of the mother cell. Budding creates genetically identical cells because the genome divides by mitosis. Yeast has diploid and haploid phases in its life cycle greatly simplifying genetic analysis. Most yeast found in the environment is diploid having two copies of its genome. Under poor environmental conditions yeast can undergo meiosis creating four haploid spores called ascospores contained within an ascus. These are released to fnd a new environ - ment with more nutrients. If the spores fnd a better environment they germinate. In the laboratory the haploid cells can be isolated and grown separately but in the wild haploid cells quickly fuse with another forming diploid cells again Fig. 1.14. This life cycle allows individual genes to be followed during segregation and inheritance patterns to be analyzed much as with Mendel’s peas. However the shorter life cycle of yeast allows greater numbers to be analyzed. Just as meiosis creates haploid male and female gametes in humans meiosis in yeast creates haploid cells of two different mating types. Because they are structurally the same rather than male and female the yeast mating types are called a and α. Fusion may occur only between different mating types that is only an a plus an α cell can merge forming a diploid. Each mating type expresses a distinct mating pheromone that binds to receptors on the opposite mating type. The phero- mones are secreted into the environment. For example when an a cell encounters the α pheromone a cell surface receptor the α receptor binds the α pheromone readying the yeast for fusion. Conversely when α cells encounter an a pheromone the cell surface a receptor binds the a pheromone and readies the cell for mating. The two cells then fuse combining two different genomes into one. The exchange of genes during sex is important for evolution as it forms new genetic combinations that may have an advantage in different environments. Yeast offer a variety of advantages to biotechnology. They are single-celled organisms that grow fast. Yeast are eukaryotes with chromosomes that have telomeres and centromeres like the human genome. Yeast cells have extrachromosomal elements similar to plasmids that allow researchers to study new genes. Rep3 RepD Rep1 Rep2 FLP Rep2 FLP IVR2 AB FRT site FRT site IVR1 IVR1 IVR2 ori Rep1 RepD Rep3 ori FIGURE 1.13 The 2-Micron Plasmid of Yeast Two different forms of the 2-micron plasmid are shown. The enzyme Flp recombi- nase recognizes the FRT sites and recombines them thus fipping one half of the plasmid relative to the other half.

slide 24:

Basics of Biotechnology 16 Diploid yeast will also form genetic clones by budding when plenty of nutrients are available for growth. Yeast like other eukaryotic organisms can create new genetic combinations with sexual reproduction. The two forms of haploid yeast are a and α which mate to form a new genetically unique diploid cell. MATING CELL DIVISION SPORULATION CELL DIVISION CELL DIVISION RELEASE OF ASCOSPORES 2n 2n a a a a Diploid cell α factor receptor Ascus Haploid cells a α α α α α α α a a α 2n a a factor receptor a factor α factor FIGURE 1.14 Alternating Haploid and Diploid Phases of Yeast Haploid cells come in two different forms: a and α. These express mating phero- mones a factor and alpha α factor which attract the two forms to each other. When the pheromones bind to receptors on the opposite cell type the two haploid cells become competent to fuse into a diploid cell. Diploid cells sporulate under growth-limiting conditions. Otherwise the diploid cells form genetic clones by budding. MULTICELLULR ORGANISMS A REEARCH MODEL Single-celled creatures offer many advantages but understanding human physiology requires information about cellular interactions. Although single-celled organisms interact with each other this is not the same as multicellular organisms where one cell is surrounded by other cells on all sides. The location of cells affects both their role and development. The cells in our hair follicles are different from our skin cells. Bone cells differ drastically from the long nerve cells of our spinal cord. Much basic work on cellular interactions development of multicellular organisms and understanding cellular physiology in different tissues has been done on the roundworm Caenorhabditis elegans. Although this is a multicellular organism it is still relatively simple compared to mammals or other vertebrates. Caenorhabditis elegans a Small Roundworm C. elegans is a small roundworm that is found in soil particularly rotting vegetation where it feeds on bacteria Fig. 1.15. There are two sexes a self-fertilizing hermaphrodite and a male allowing genetic studies on both self- and cross-fertilization. The body is shaped as a simple nonsegmented tube that is encased in a cuticle layer to prevent dehydration. Inside C. elegans there are 959 somatic cells which include more than 300 neurons. The head has many sense organs that respond to taste smell temperature and touch but no eyes. There is a nerve ring that serves as the brain and a nerve cord that runs down the back of the body. The digestive system consists of a pharynx followed by intestine and anus. There are 81 muscle cells that control the sinusoidal movement of the worm around its environment. The reproductive system occupies the largest volume within the worm. In the hermaphrodite the tail is long and tapered whereas the male has a blunt end. The hermaphrodite has a vulval opening where it lays eggs. The sperm cells come either from itself or from a male C. elegans in a sexual encounter. C. elegans has many advantages for molecular biology and genetics. These creatures are transparent and can be studied in real time using various fuorescent techniques. They have many physiological characteristics similar to higher animals. For example they undergo programmed cell death and the genes involved are similar to genes found in humans see Chapter 20. C. elegans is used to study development aging sexual dimorphism alcohol metabolism cellular differentiation and many other phenomena that apply to humans. The life cycle of C. elegans is conducive to research. One generation lasts about 3 days. First a sperm and egg cell fuse and a single-celled embryo partially develops within the hermaphrodite’s body. After the embryo hatches from the chitin shell the larval stages begin. There are four larval stages that culminate with the adult worm with the sexual development occurring last Fig. 1.16. Spermatogenesis is the limiting factor in the number of offspring that a hermaphrodite produces which is about 300 progeny.

slide 25:

CHAp TER 1 17 FIGURE 1.15 Caenorhabditis elegans Plate of C. elegans. Small dark spots are embryos that are going to hatch and the long adult worms are moving in a sinusoidal pattern across the surface. Courtesy of Jill Bettinger Virginia Commonwealth University Richmond VA. Drosophila melanogaster the Common Fruit Fly Another multicellular model organism widely used because of its genetics is Drosophila melano- gaster usually referred to simply as Drosophila the common fruit fy. This insect is about 3 mm in length and can often be found around rotting fruit. These fies are easy to grow and maintain in a lab. They need a food source and are kept in a bottle capped with cotton so they cannot escape. Their entire life span is 2 weeks and starts with an egg about 0.5 mm in length Fig. 1.17. The embryo hatches into a worm-like larva after about 24 hours. There are three larval instars that develop 1 day 2 days and 4 days after the frst instar larva. Each instar grows and eats continuously and molts to form the next instar. The third larval instar forms a pupa that is immobile. The pupa usually clings to the side of the fask where it stays for 4–6 days. During this time the larva transforms into the winged adult fy. Wings legs antenna segmented bodies eyes and hair are formed. The main focus of Drosophila research is genetics. Many different mutations are available from simple changes such as longer or shorter body hairs to dramatic muta- tions where body segments are duplicated. That is some C. elegans is a model multicellular eukaryotic organism. Biotechnology research uses this organism because it is easy to grow it is transparent and it is a hermaphrodite so it can create either genetic clones or novel genetic organisms. L4 9 hours 25°C L3 7 hours L2 7 hours L1 12 hours Adult Vulva Egg FIGURE 1.16 Life Cycle of Caenorhabditis elegans When the C. elegans sperm fuses with an egg a small worm develops L1. The larva goes through multiple stages until it reaches the sexually mature adult phase.

slide 26:

Basics of Biotechnology 18 Drosophila adult female Egg First instar larva Second instar larva Third instar larva Pupa Metamorphosis FIGURE 1.17 Life Cycle of Drosophila melano- gaster Drosophila fruit fies start as tiny eggs that develop into worms. After a series of larval stages the worm forms a pupa where the adult form develops. mutants of Drosophila have four wings or extra legs. Studying these mutants has identifed many genes that determine basic body patterns in Drosophila and based on homology humans too. The genome of Drosophila has been sequenced and has 165 Mb of DNA divided among three autosomes and the X/Y sex chromosomes. There are a predicted 12000 genes in the genome. During the rapid growth of the larval stages the number of cells actually remains fairly constant. The size of the cells does increase dramatically though. In order for these large cells to work a large amount of extra protein and mRNA needs to be made and the chromosomes duplicate hundreds of times to provide multiple gene copies. Although they duplicate they do not divide but stay attached to each other creating thick polytene chromosomes Fig. 1.18. Because they are so large they can be visualized under a light microscope. The polytene chromosomes have characteristic banding patterns with each section of each chromosome being unique. The banding pattern allows some mutations to be localized. For example a deletion that causes white eyes in the adult would alter the banding pattern on the corre- sponding polytene chromosome. Thus the mutation can easily be mapped to its chromosome location. Zebrafsh Are Models for Developmental Genetics The small zebrafsh Danio rerio is a simple vertebrate used in molecular biol- ogy research. It is a common fsh found in pet stores for keeping in freshwater aquariums. The qualities that have made it so prevalent as a pet also make it attrac- tive for research. It is easy to maintain and breed in an aquarium. A wide variety of mutations exist which makes the fsh handy for genetics research. The adult is about an inch long with black horizon- tal stripes down its body Fig. 1.19. The mother lays about 200 eggs at one time so many offspring can be studied after one mating. Embryonic development occurs outside the mother. The embryos are completely transparent so the effects of mutations that affect embryo development can be seen with ease. Moreover different cells can either be destroyed or moved to FIGURE 1.18 Polytene Chr omosome Fluorescent staining of polytene chromosome from Drosophila. Photo courtesy of LPLT/Wikimedia commons. The true sexual reproduction of Drosophila allows genetic manipulations and the complex alterations that occur in the pupal to adult fy metamorphosis are two key characteristics that are studied by researchers.

slide 27:

CHAp TER 1 19 new locations and the effect on development can be traced. Such experiments are insightful for decipher- ing the effect of position on cellular development. The embryos develop from one single cell to a tiny fsh in about 24 hours so studies of development can be done relatively quickly. The zebrafsh genome has been sequenced. There are 25 pairs of chromosomes with a haploid genome size of 1700 Mb of DNA. About 70 of human genes that code for proteins have orthologs in zebrafsh. Thus when a new gene function is identifed in the fsh it suggests possible roles for corresponding human genes and researchers are turning to mutations in zebrafsh to create a disease model organism. For example in humans porphyria causes skin sensitivity to light and porphyrin metabolic precursors to be secreted. Zebrafsh with a mutation in uroporphyrinogen decarboxylase UROD have the same phenotype suggesting that mutations in this gene are responsible for this disease. In addition human studies have identifed a potential mutation in a ribo - somal protein RPS19 as the causative agent for Diamond–Blackfan anemia. To confrm that this gene was responsible for the disease a zebrafsh was developed that did not express the RPS19 ortholog. This mutant fsh had anemic symptoms much like the human disease and recapitulated the disease. Another key advantage for zebrafsh is the large number of offspring and their ability to grow outside their mother. The embryos are easily used in a drug screens to fnd compounds that treat these diseases. For example melanocytes in zebrafsh arise from the neural crest cells. To treat melanoma cancer in humans drugs are needed to stop their proliferation. When various zebrafsh were grown in different chemical compounds an inhibitor lefunomide was identifed to inhibit the developmental migration of neural crest cells melanocyte precursors. Further studies found the same compound was effective in inhibiting melanoma metastasis and further studies are underway to study whether this compound will work in humans. Mus musculus the Mouse Is Genetically Similar to Humans The model organism most closely related to humans is the mouse. The mouse genome has about 2500 Mb of DNA on 20 different chromosomes. Less than 1 of the genes have no human gene counterpart so mouse genetics relates to humans very readily. Mice are easy to manipulate genetically and animals with one or more genes inactivated knockout mice are fairly easy to generate. In addition to genetic deletions extra genes can be inserted and expressed in the mouse giving transgenic animals see Chapter 16. The effect of such genetic manipulations on growth development or physiology can be determined. FIGURE 1.19 The Zebrafsh Danio rerio This fsh is used as a model vertebrate to study genetics cell biology and developmental biolog y . Photo courtesy of Wikipedia commons. Zebrafsh are key organisms to study development of embryos because they have live babies that develop outside the mother. In addition as many as 70 of our genes have zebrafsh orthologs. The combination of their life cycle and the genetic relatedness makes zebrafsh a good model organism for drug screens. Researchers consider mice very similar to humans because they have so many genes in common.

slide 28:

Basics of Biotechnology 20 ANIMAL CELL CULTURE IN VITRO Another way to approximate human physiology is by studying mammalian cells cultured in vitro Fig. 1.20. Many different cell lines have been generated from humans and monkeys and they can be grown in plastic dishes or fasks using culture media containing growth fac - tors and nutrients. Cell lines must be maintained at 37°C and require an atmosphere rich in carbon dioxide. Adherent cell lines stick to and divide on the plastic dishes whereas suspen- sion cells grow and divide in liquid culture. Most cell lines are one particular type of cell from a particular tissue and many different cell lines have been grown from kidney liver heart and so forth. The original cell lines cannot divide in culture forever see Chapter 20. Primary cells as they are called can be maintained for only a short time. Using cancer cells overcomes this limitation since cancer cells do not stop dividing see Chapter 19 for discussion. These cell lines are immortal and can in principle be grown under the correct circumstances forever. The best aspect of using cultured human cells is the ability to do genetic studies. Different genes can be expressed in cultured cells and their effect on cellular physiology can be deter- mined. In addition gene deletions or mutations can be examined. Cultured mammalian cells are also important for production of recombinant proteins which are then isolated and purifed for medicine research and other biotechnology applications. Cell lines have also been developed from insects Fig. 1.21. They are primarily used to express heterologous proteins for the biotechnology industry. Insect cells are preferred to mammalian cells because they require fewer nutrients for growth and survive in media free of serum. Mammalian cells require serum from fetal cows which is very expensive and in limited supply. Insect cells also grow at lower temperatures without carbon dioxide and therefore do not require special incubation chambers. Insect cells are used in research to study viruses that are transmitted between insects and plants as well as cell signaling pathways. Insect cell lines are primarily derived from Spodoptera frugiperda fall armyworm Trichoplusia ni cabbage looper Drosophila mela- nogaster fruit fy Heliothis virescens tobacco budworm the mosquito and others. The most common cell lines are those from ovarian tissue of S. frugiperda which include Sf9 and Sf21 cells those from embryonic cells from T. ni which include the “High Five” cell lines and those from late-stage Drosophila embryos which include Schneider S2 cells. A B FIGURE 1.20 Human HeLa Cells Grown In Vitro HeLa cells were taken from the tumor of Henrietta Lacks a woman suffer- ing from cervical cancer in the 1950s and have been cultured continuously ever since. A Viewed under phase contrast. B Viewed under differential interference contrast. Courtesy of Michael W. Davidson Optical Micros- copy Group National High Magnetic Field Laboratory Florida State University T allahassee Florida. Studying cells in a dish rather than an organism provides the researcher with another way to study genes. The cell lines are useful for genetic manipulations such as expressing new genes or deleting existing genes.

slide 29:

CHAp TER 1 21 A B FIGURE 1.21 Insect Cells in Culture A HvT1 cells from tobacco budworm testes are strongly attached to the surface of the dish. B TN368 cells from cabbage looper ovary are only loosely attached. Courtesy of Dwight E. Lynn Insect Biocontrol Lab USDA Beltsville MD. ARABIDOpSIS THALIANA A MODEL FLOWERING pLNT The model organism most widely used in plant genetics and molecular biology is the weed Arabidopsis thaliana wild mustard weed or mouse ear cress Fig. 1.22. Growing different crops to feed the world population is incredibly important and much money is invested in research on the crops most used for food such as rice soybean wheat and corn. These plants have huge genomes and most are polyploid—even hexaploid such as wheat. There- fore a model organism is essential to learn the basic biology of plants. Arabidopsis has much the same responses to stress and disease as crop plants. Moreover many genes involved in reproduction and development are homologous to those in plants with more complex genomes. Arabidopsis has many convenient features. First it is easily grown and maintained in a labora- tory setting. The plant is small and grows to match its environment. If there is plenty of space and nutrients the plant can grow to over a foot in height and width. If the environment is a small culture dish in a lab the plant will grow about 1 cm in height and width. At either size the plant forms fowers and seeds. An entire generation from seed to adult to seeds is fnished in 6–10 weeks which is relatively quick for a plant. Note that for corn or soybeans only one generation can occur in the span of a summer. In Arabidopsis many seeds are produced on each plant so aiding genetic analysis. Much like yeast Arabidopsis can be main- tained in a haploid state. Arabidopsis has a small genome for a plant containing only fve chromosomes with a total of 125 Mb of sequence. The genome was completely sequenced in 2000 allowing researchers to identify about 25000 genes and important sequence features. Rice has also been sequenced and has an estimated 40000 to 50000 genes. This tops the number of predicted human genes and so rice and doubtless many other plants may be more “advanced” than us lowly humans. Plant research also relies on a model organism to study. Arabidopsis thaliana is used because of its size ease of growth and small genome.

slide 30:

Basics of Biotechnology 22 FIGURE 1.22 Arabidopsis thaliana The plant most used as a model for plant biology research is A. thaliana a member of the mustard family Brassicaceae. Courtesy of Dr. Jeremy Burgess Science Photo Library. VIRUSE USED IN GENEIC REEARCH Viruses are entities that border on living. But unlike genuine living organ- isms viruses cannot survive outside a host organism. Viruses are patho- gens that invade host cells and subvert them to manufacture more viruses. Viruses are simple in principle and yet very powerful. They consist of a protein shell called a capsid surrounding a genome made of RNA or DNA. The particle is called a virion and unlike a living cell has no way to make its own energy or duplicate its own genome. The virus relies on the host to do this work. Viruses come in many different types and can inhabit every living thing from bacteria to humans to plants. Viral diseases in humans are extremely com- mon and most cause only minor symptoms. For example when rhinovirus invades the victim ends up with a runny nose and other cold symptoms and usually feels miserable for a few days. However viruses do cause a signifcant number of serious diseases such as AIDS smallpox hepatitis and Ebola. When viruses invade bacteria the infected bacteria usually die. Bacterial viruses are called bacteriophage or phage and they normally destroy the bacterial cell in the process of making new viral particles. When bacteria grow on an agar plate they form a hazy or cloudy layer a bacterial lawn over the top of the agar. If the culture of bacteria is infected with bacteriophage the viruses eat holes or plaques into the bacterial lawn leaving clear zones where all the bacteria were killed. Bacteriophages like other types of viruses have the following stages of their life cycle Fig. 1.23: a Attachment of the virion to the correct host cell b Entry of the virus genome c Replication of the virus genome d Manufacture of new virus proteins e Assembly of new virus particles f Release of new virions from the host Not every virus kills the host cell and in fact many viruses have a latent phase in which they lie dormant within the cell not producing any proteins or new viruses. Latency as it is called in animal cells is also called lysogeny when referring to bacteria. In contrast the phase of viral growth in which the host cell is destroyed is called the lytic phase. Sometimes a virus becomes latent by inserting its genome into the genome of the cell. The viral genome integrates into a host chromosome and remains inactive until some stimulus triggers it to reactivate. The integrated virus is called a provirus or a prophage if the virus invades bacteria. The great variety of viruses can be divided into groups based on capsid shape or the type of genome. The three major shapes are spherical flamentous and complex. Spherical viruses actually have 20 fat triangular sides and are thus icosahedrons. Complex viruses come in various shapes but some have legs that attach to the host cell a linear segment that injects the DNA or RNA genome into the host and a structure that stores the viral genome. This type of complex virus is common among bacteriophages several of which are widely used in molecular biology research. Bacteriophage T4 lambda P1 and Mu all look like the Apollo lunar landers Fig. 1.24. Viral genomes are varied in size but all contain suffcient genetic information to get the host cell to make more copies of the virus genome and make more capsid proteins to package it. At the very least a virus needs a gene to replicate its genome a gene for capsid protein and a gene to release new viruses from the host cell. Bacteriophage Qβ infects bacteria its entire

slide 31:

CHAp TER 1 23 BACTERIAL VIRUS ATTACHES TO A BACTERIAL CELL VIRAL GENOME IS REPLICATED VIRAL PROTEINS ARE SYNTHESIZED AND ASSEMBLED INTO VIRAL PARTICLES HOST CELL LYSES AND VIRUSES ARE RELEASED Bacterial chromosome Bacterial chromosome Newly synthesized viral DNA Bacterial cell Bacterial virus FIGURE 1.23 Virus Life Cycle The life cycle of a virus starts when the viral DNA or RNA enters the host cell. Once inside the virus uses the host cell to manufacture more copies of the virus genome and to make the protein coats for assembly of virus particles. Once multiple copies of the virus have been assembled the host cell bursts open allowing the progeny to escape and fnd other hosts to invade. genome is only 3500 base pairs and the entire genome consists of only four genes. On the other hand large complex viruses may have more than 200 genes that are used at different times after infecting the host cell. The genes are then divided into categories based on when they are active. Some genes are considered early genes and are active immediately after infecting the host whereas late genes are active only after the virus has been inside the host cell for some time. Viral genomes are either made from DNA or RNA can be double- stranded or single-stranded and can be circular or linear. When viruses use a single strand of RNA as genome this can either be the positive or plus + strand or the negative or minus – strand. The positive strand corresponds to the coding strand and the negative strand to the template strand see Chapter 2. When a positive-strand RNA virus injects its genome into the host the RNA can be used directly as a messenger RNA to make protein. If the RNA virus has a negative-strand genome the RNA must frst be converted into double- stranded form the replicative form or RF. Then each strand is used: the negative strand is used as a template to make more positive- stranded genomes and the positive strand is used to make proteins. Some viruses actually use both RNA and DNA versions of their genome Fig. 1.25. Retroviruses infect animals and include such members as human immunodefciency virus HIV. The genome inside a retro- virus particle is a single-stranded RNA that is converted to DNA once it enters the host. Reverse transcriptase is the enzyme that manu- factures the DNA copy of the RNA genome and is used extensively in molecular biology and genetic engineering see Chapter 3. Once the DNA copy is made it is inserted into the host DNA using two repeated DNA sequences at the ends called long terminal repeats LTRs. Once integrated the retrovirus becomes part of the host’s genome. This is why there is no complete cure for acquired immunodefciency syndrome AIDS. The host can never rid itself of the retroviral DNA once it becomes integrated. The viral genes then direct the synthesis of new viral particles that infect neighboring cells. Reverse transcriptase is an example of a viral gene product that is synthesized by the host and packaged inside the virions for use in the next infection cycle. Retroviral genomes have three major genes gag pol and env as well as several minor genes. The tat and rev genes regulate the expression of the other retroviral genes. Nef vif vpr and vpu encode four accessory proteins that block the host cell’s immune defense and increase the effciency of virus production. Gag pol and env each give single mRNA transcripts that encode multiple proteins. Gag encodes three proteins involved with making the capsid. Pol gives three proteins: a protease that digests other proteins during particle assembly reverse transcrip- tase that makes the DNA copy of the genome and an integrase that integrates the viral DNA into the host chromosome. Env codes for two structural proteins one forms the outer spikes and the other helps the virus enter the host cell.

slide 32:

Basics of Biotechnology 24 TOBACCO MOSAIC VIRUS ssRNA + non-enveloped ADENOVIRUS dsDNA non-enveloped HERPESVIRUS dsDNA enveloped REOVIRUS dsRNA non-enveloped BACTERIOPHAGE dsDNA non-enveloped RETROVIRUS ssRNA enveloped FIGURE 1.24 Examples of Different Viruses Viruses come in a variety of shapes and sizes that deter- mine whether the entire virus or only its genome enters the host cells. Viruses are used extensively in biotechnology research because they specialize in inserting their genome into the host genome. They subvert the host into expressing their genes and making more copies of themselves. Researchers exploit these characteristics to study new genes to alter the genomes of other model organisms and to do gene therapy on humans.

slide 33:

CHAp TER 1 25 SUBVIRAL INFECIOUS AGENTS AND OTHER GENE CREATURE We have used the term gene creatures to refer to vari- ous genetic entities that are sometimes called subvi- ral infectious agents. These creatures exist but are not considered living because none of them can produce their own energy duplicate their own genomes or live independent of a host. The main advantage a virus has over a gene creature is the ability to survive as an inac- tive particle outside the host cell. Gene creatures are not normally found outside the host cell. Satellite viruses are defective viruses. They can either replicate their genome or package their genome into a capsid but they are unable to do both by themselves. Satellite viruses rely on a helper virus to supply the missing components or genes. For example hepatitis delta virus HDV is a small single-stranded RNA satellite virus that infects the liver. Its helper is hepatitis B virus. Bacteriophage P4 is a satellite virus that infects E. coli. It is a double-stranded DNA virus that can replicate as a plasmid or integrate into the host chromosome but it cannot form virus particles by itself. It relies on P2 bacteriophage to supply the structural proteins. P4 sends transcription factors to the P2 genome to control expres- sion of the genes it pirates. Gene creatures also include genetic elements that may be helpful to the host. For example the plasmids of E. coli and yeast are genetic elements that cannot produce their own energy and rely on the host cell to replicate their genome. They cannot survive outside a host cell. These traits qualify plasmids as gene creatures. Like viruses and satellite viruses plasmids are replicons that is they have suffcient information in their genome to direct their own replication. Plasmids may confer positive traits to the host. For example plasmids can provide antibacterial enzymes such as bacteriocins that help their host compete with other bacteria for nutrients see earlier discussion. Plasmids may carry genes for antibiotic resistance thus allowing the host bacteria to survive after encountering an antibiotic. Plasmids may confer virulence making the host bacteria more aggressive and deadly. Finally some plasmids contain genes that help the host degrade a new carbon source to provide food. Plasmids are usually found as circles of DNA although some linear plasmids have been found. Plasmids come in all sizes but are usually much smaller than the bacterial chromo- some. The genes on plasmids are often benefcial to the host. Because the plasmid coexists within the cytoplasm of the host cell it does not generally harm its host. The F plasmid is found in some E. coli and it is about 1 of the size of the chromosome. It was named “F” for fertility because it confers the ability to mate. F plasmids can trans- fer themselves from one cell to the next in a process called conjugation Fig. 1.26. The plasmid has genes for the formation of a specialized pilus the sex-pilus which physically attaches an F + E. coli to an F − cell. After contact a junction—the conjugation bridge—forms between the two cells. During replication of the F plasmid one strand is cut at the origin and the free end enters the cytoplasm of the F − cell via the conjugation bridge. Inside the recipient a complementary strand of DNA is made and the plasmid is recircularized. The ssRNA RNA : DNA hybrid MAKE DNA COPY DEGRADE RNA STRAND MAKE SECOND DNA STRAND INTEGRATE NEW VIRAL PARTICLES dsDNA + Left LTR Right LTR Host chromosome FIGURE 1.25 Retroviral Life Cycle Retroviral genomes are made of positive RNA. Once the RNA enters the host a DNA copy of the genome is made using reverse tran- scriptase. The original RNA strand is then degraded and replaced with DNA. Then the entire double-stranded DNA version of the retrovi- rus genome can integrate into the host genome.

slide 34:

Basics of Biotechnology 26 5 5 3 Single-strand nick is made Double-stranded DNA Single-strand enters recipient cell 5-end Origin of transfer Synthesis of new DNA complementary to unbroken strand F-plasmid F plasmid F-plasmid REPLICATION TRANSFER Chromosome Complementary strand synthesized in recipient Complementary strand synthesized in donor DONOR CELL A B RECIPIENT CELL FIGURE 1.26 Conjugation in E. coli A During bacterial conju- gation the F plasmid of E. coli is transferred to a new cell by rolling circle replication. First one strand of the F plasma is nicked at the origin of transfer. The two strands start to separate and synthesis of a new strand starts at the origin green strand. B The single strand of F plasmid DNA that is displaced pink strand crosses the conjugation bridge and enters the recipient cell. The second strand of the F plasmid is synthesized inside the recipient cell. Once the complete plasmid has been transferred it is recircularized. other strand of the parent plasmid remains in the original F + cell and is also duplicated. Thus after conjugation both cells become F + . Occasionally the F plasmid integrates into the host chromo- some. If an integrated F plasmid is transferred to another cell via con- jugation parts of the host chro- mosome may also get transferred. Therefore bacteria can exchange chromosomal genetic information through conjugation. Another gene creature that is very useful in biotechnology is the transposable element or transposon. This genetic ele- ment is merely a length of DNA that cannot exist or replicate as an independent molecule. To survive it integrates into another DNA molecule. Mobile DNA or jump- ing genes are two terms used to describe transposons. When the transposon moves from one location to another the process is called transposition. Unlike plas- mids transposons lack an origin of replication and are not considered replicons. They can only be replicated by integrating themselves into a host DNA molecule such as a chromosome plasmid or viral genome. Transposons can move from site to site within the same host DNA or move from one host molecule of DNA to another. If a transposon loses its ability to move its DNA remains in place on the chromosome or other DNA molecule. Transposons come in several varieties and are classifed based on the mechanism of move - ment. Transposons have two inverted DNA repeats at each end and a gene for transposase the enzyme needed for movement. Transposase recognizes the inverted repeats at the ends of the transposon and excises the entire element from the chromosome. Next transposase recognizes a target sequence of 3 to 9 base pairs in length on the host DNA. The transpo- son is then inserted into the target sequence which is duplicated in the process. One copy is found on each side of the transposon. When a transposon is completely removed from one site and moved to another the mechanism is conservative transposition or cut-and-paste transposition Fig. 1.27. This leaves behind a double-stranded break that must be repaired by the host cell. Several cellular mechanisms exist to make this type of repair. An alternative mechanism is replicative transposition where a second copy of the trans- poson is made. Complex transposons use this method. Much as before transposase recognizes the inverted repeats of the transposon. However in this case it only makes single- stranded nicks at the ends. Transposase then makes two single-stranded nicks one at each end of the target site. Each single DNA strand of the transposon is joined to one host strand at the target site. This creates two single-stranded copies of the transposon. The host responds to such single-stranded DNA regions by making the second complementary strand of the transposon. This gives two copies of the transposon. Notice how the transposon itself does not replicate. It tricks the host into making the replica.

slide 35:

CHAp TER 1 27 Transposon movement can cause problems for the host. When the transposon moves there is a potential for insertions deletions and inversions in the host DNA. If two copies of a transposon are found on a plasmid and the target sequence is on the host chromosome a segment of the plasmid fanked by the transposons may be inserted into the host DNA. More generally when multiple transposons are near each other the ends of two neighboring but separate transposons may be used for transposition. When the two ends move to a new location the DNA between them will be carried along. Whole genes or segments of genes may be deleted from the original location in this process. Conversely regions of chromo- some may become duplicated. If transposons are active and move often the genome will become very damaged and the host cells often commit suicide see Chapter 20. Because the transposon will be destroyed along with its host many transposons move only rarely. Controlling their movement preserves their existence within the genome and keeps the host cell from committing suicide. DONOR DNA MOLECULE RECIPIENT DNA MOLECULE Transposon TRANSPOSON IN NEW LOCATION Transposon DONOR DNA WITH BREAK Target sequence Original host DNA Transposon Transposo Transposon Target DNA molecule Target sequence REPLICATIVE TRANSPOSITION CONSERVATIVE TRANSPOSITION A B FIGURE 1.27 Transposons Move by Replicative or Conservative Transposition A Replicative transposition leaves the original transpo- son in its original place and a copy is inserted at another site within the host genome. B During conservative transposition the original transposon excises from its original site and integrates at a different location. Gene creatures is a term to describe genetic elements that exist within the confnes of a host cell yet are separate from the original host genome. Some gene creatures include satellite viruses plasmids and transposons. The plasmid is a unique gene creature because it confers positive traits such as resistance to antibiot- ics bacteriocins and the ability to transfer genetic material between two cells. Transposons do not contain origins for their independent replication as do plasmids. These elements subvert the cell to make their copies by inducing breaks in the genome.

slide 36:

Basics of Biotechnology 28 Summary This chapter introduces the variety of different organisms used to study genes useful for biotechnology. Each organism even the lowly gene creatures is based on DNA. DNA and RNA have unique structures that ensure their survival and existence in all facets of life. Each structure has a backbone of alternating phosphate molecules with sugar residues. In DNA the sugar deoxyribose is missing a hydroxyl group on the 2′ carbon. The bases which attach at the 1′ carbon form pairs so that adenine joins with thymine and guanine joins with cyto- sine. These pairs are held together with hydrogen bonds that induce the two backbones to twist into a double-stranded helix. In RNA the sugar ribose has one extra hydroxyl group and the base thymine is replaced with uracil. Many different organisms are used in biotechnology research and they have a particular trait that is useful to study new genes. Bacteria are genetic clones that are easily grown and stored for long periods of time. Two key traits are their simple genomes and availability of plasmids to alter their genetic makeup. Although useful bacteria are prokaryotes and differ greatly from humans. Therefore eukaryotic model organisms are also used for research. Yeasts are single-celled eukaryotes that have similar traits to human cells such as multiple chromo- somes a nucleus and various organelles. In addition yeasts also have plasmids in which extra genes can be added to study in a model organism. Finally the chapter outlines the key traits of multicellular organisms from barely visible roundworms such as C. elegans to mice cultured human animal and insect cells and the model plant organism Arabidopsis. Besides real organisms research in biotechnology relies on gene creatures such as viruses transposons and plasmids. These genetic vehicles are critical to manipulating the genome of the model organisms. In fact viruses may be the key to accomplishing gene therapy in humans also. Viruses are used as vehicles to inject foreign DNA into a host cell. Transposons are also used to deliver new genes into the host DNA. Plasmids are used for the same purpose but do not work in higher organisms and therefore are restricted to cultured cells yeast and bacteria. The use of gene creatures and model organisms is key to biotechnology research. End-of-Chapter Questions 1. Which statement best describes the central dogma of genetics a. Genes are made of DNA expressed as an RNA intermediary that is decoded to make proteins. b. The central dogma only applies to yellow and green peas from Mendel’s experiments. c. Genes are made of RNA expressed as a DNA intermediary which is decoded to make proteins. d. Genes made of DNA are directly decoded to make proteins. e. The central dogma only applies to animals. 2. What is the difference between DNA and RNA a. DNA contains a phosphate group but RNA does not. b. Both DNA and RNA contain a sugar but only DNA has a pentose. c. The sugar ring in RNA has an extra hydroxyl group that is missing in the pentose of DNA.

slide 37:

CHAp TER 1 29 d. DNA consists of fve different nitrogenous bases but RNA only contains four different bases. e. RNA only contains pyrimidines and DNA only contains purines. 3. Which of the following statements about eukaryotic DNA packaging is true a. The process involves DNA gyrase and topoisomerase I. b. All of the DNA in eukaryotes can ft inside of the nucleosome without being packaged. c. Chromatin is only used by prokaryotes and is not necessary for eukaryotic DNA packaging. d. Eukaryotic DNA packaging is a complex of DNA wrapped around proteins called histones and further coiled into a 30-nanometer fber. e. Once eukaryotic DNA is packaged the genes on the DNA can never again be expressed. 4. Which statement about Thermus aquaticus is false a. T. aquaticus was isolated from a hot spring. b. The DNA polymerase from T. aquaticus is used in molecular biology for a procedure called polymerase chain reaction PCR. c. The DNA polymerase from T. aquaticus is able to withstand very high temperatures. d. T. aquaticus can survive high temperatures and low pH. e. T. aquaticus is found in the frozen lakes of Antarctica. 5. Which statement about Escherichia coli is not correct a. E. coli is called “the workhorse of molecular biology.” b. E. coli can grow in a simple solution of water a carbon source and mineral salts. c. All E. coli strains are pathogenic and therefore must be handled accor dingly . d. The chromosome of E. coli consists of one circular DNA molecular containing approximately 4000 genes. e. All of the above answers are correct. 6. Plasmids from bacteria can be described by which of the following statements a. Plasmids provide an advantage to the host bacterium to compete against non-plasmid-containing bacteria for nutrients. b. Plasmids are used as a molecular biology tool to express other genes effciently in the host bacterium. c. Plasmids are extrachromosomal segments of DNA that carry several genes benefcial to the host organism. d. Plasmids have their own origin of replication. e. All of the above statements describe plasmids. 7. Which of the following statements is not correct about the usefulness of fungi in biotechnology research a. Fungi produce the blue veins in some types of cheeses. b. Yeast is responsible for the alcohol in beer and for bread rising. c. Fungi are called “the workhorses of molecular biology.” d. The 2-micron circle is a useful extrachromosomal element from yeast that can be utilized in molecular biology research. e. Fungi produce many industrial chemicals and pharmaceuticals. Continued

slide 38:

Basics of Biotechnology 30 8. What mechanism does yeast utilize to control mating type in the cells a. Yeast is only able to reproduce through mitosis. b. The MAT locus in the yeast genome contains two divergent genes that encode for the pheromones a and α along with the pheromone receptors. c. The mating type of yeast is determined by pheromones called b and β. d. There are no mechanisms to control mating type in yeast because all of the cells are structurally the same. e. Yeast mating types are generally referred to as either male or female. 9. Which of the following yeast cellular component is typically not found in bacteria a. centr omer es b. telomer es c. nuclear por es d. nuclear envelope e. All of the above are found in yeast and not bacteria. 10. Identify the statement about multicellular model organisms that is correct. a. C. elegans has been used extensively to study multicellular interactions partly because the creature can reproduce by self-fertilization genetic clones or sexually novel genetic organisms. b. Based on homology research on Drosophila mutants has identifed genes in the human genome responsible for body patterns. c. The zebrafsh or Danio rerio are used to study developmental genetics because the embryonic cells are easily destroyed or manipulated and the effects can be observed within 24 hours. d. The mouse is a model organism for studying human genetics physiol- ogy and development because less than 1 of the genes in the mouse genome have no genetic homology in humans. e. All of the statements are correct. 11. What is the main advantage for studying cells in culture rather than in a whole organism a. Cell lines in culture are easily manipulated genetically to introduce new genes or delete other genes. b. Cell lines are not very stable and therefore it is more advantageous to study cells within the organism itself. c. There is no advantage to studying cells in a cell line rather than in a live organism. d. The information obtained from studying cell lines as opposed to live organisms is not relevant to what happens in vivo. e. None of the above is the main advantage. 12. Why is Arabidopsis thaliana used as a model organism for plant genetics and biology a. Arabidopsis responds to stress and disease similarly to important crop plants such as rice wheat and corn. b. Arabidopsis is easy to grow and maintain in the laboratory. c. The genome of Arabidopsis is relatively small compared to other plants. d. The generation cycle of Arabidopsis is shorter than most other crop plants and produces many seeds for further study. e. All of the above statements are reasons for using Arabidopsis as a model organism.

slide 39:

CHAp TER 1 31 13. Why are viruses signifcant to biotechnology a. They are able to insert their genome into the host genome thus integrat- ing genes in the process. b. Viruses can be used to alter the genomes of other organisms. c. Reverse transcriptase an enzyme used in molecular biology is encoded in a retroviral genome. d. Viruses play an important role in delivering gene therapy to humans. e. All of the above statements are reasons why viruses are signifcant to biotechnology r esear ch. 14. Which statement best describes the F plasmid a. F plasmids contain genes for formation of a specialized pilus that initiates the formation of a conjugation bridge between two cells for the purpose of transferring genetic material. b. The F plasmid does not have an origin of replication and can therefore not replicate itself. c. The primary host for the F plasmid is Saccharomyces cerevisiae. d. The F plasmid is not important for biotechnology research. e. All of the above statements describe the F plasmid. 15. Which of the following elements is important in biotechnology research a. transposons b. F plasmid c. satellite viruses d. plasmids e. all of the above Further Reading Ablain J. Zon L. I. 2013. Of fsh and men: using zebrafsh to fght human diseases. Trends in Cell Biology 23 584–586. Steele J. H. Lutz R. A. 2001. Hydrothermal vent biota. In Encyclopedia of Ocean Sciences 2nd ed. Amsterdam: Academic Press pp. 133–143. Verma A. S. Singh A. Tsuiji H. Yamanaka K. 2014. Animal models for neurodegenerative disorders. In Animal Biotechnology: Models in Discovery and Translation. Amsterdam: Academic Press pp. 39–56. Verma A. S. Singh A. Ram K. R. Chowdhuri D. K. 2014. Drosophila: A model for biotechnologists. In Animal Biotechnology: Models in Discovery and Translation. Amsterdam: Academic Press pp. 3–19.

slide 40:

CHAPTER 33 Biotechnology Copyright © 2016 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/B978-0-12-385015-7.00002-8 DNA RNA and Protein 2 The Central Dogma of Molecular Biology T ranscription Expresses Genes Making RNA T ranscription Stop Signals The Number of Genes on an mRNA Varies Eukaryotic T ranscription Is More Complex Regulation of T ranscription in Prokaryotes Prokaryotic Sigma Factors Regulate Gene Expression Lactose Operon Demonstrates Specifc and Global Activation Control of Activators and Repressors Regulation of T ranscription in Eukaryotes Eukaryotic T ranscription Enhancer Proteins Epigenetics and DNA Regulation Eukaryotic mRNA Is Processed Before Making Protein T ranslating the Genetic Code into Proteins The Genetic Code Is Read as T riplets or Codons Protein Synthesis Occurs at the Ribosome Diferences between Prokaryotic and Eukaryotic T ranslation Mitochondria and Chloroplasts Synthesize Their Own Proteins

slide 41:

DNA RNA and Protein 34 THE CENTRAL DOGMA OF MOLECULR BIOLOGY Two essential features of living creatures are the ability to reproduce their own genome and manufacture their own energy. To accomplish these feats an organism must be able to make proteins using information encoded in its DNA. Proteins are essential for cellular architecture giving the cell a particular shape and structure. Proteins include enzymes that catalyze reactions used to make energy. Proteins control cellular processes like replication. Proteins provide channels in the membrane for cells to communicate with each other or share metabolites. Making proteins is a key operation for all living organisms. The central dogma of molecular biology states that infor- mation fows from DNA to RNA to protein Fig. 2.1. First this chapter focuses on how RNA is made from DNA in a process called transcription. Next the mechanisms used to control transcription are discussed. We then discuss how particular RNA molecules called mRNA or messenger RNA are used to make protein in a process called transla- tion. By examining these processes the reader will gain an understanding of the complexity involved in engineering cells for the purposes of biotechnology. The central dogma of molecular biology is that DNA is transcribed into RNA which in turn is translated into proteins. FIGURE 2.1 The Central Dogma Cells store genetic informa- tion as DNA which is able to replicate so that daughter cells have the same informa- tion as the parent. When a protein is needed DNA is transcribed into RNA which in turn is translated into a protein. RNA also functions in a cell to regulate gene expression and as ribozymes that carry out catalytic reactions. DNA double-stranded RNA single-stranded Protein linked amino acids TRANSCRIPTION REPLICATION OF DNA TRANSLATION TRANSCRIPTION EXPRESSES GENES Gene expression involves making an RNA copy of the DNA code a process called transcrip- tion. Making RNA involves uncoiling the DNA melting the strands at the start of the gene and moving any histones out of the way making an RNA molecule that is complementary in sequence to the template strand of the DNA with an enzyme called RNA polymerase and stopping at the end of the gene. The newly made RNA releases from the DNA which then returns to its supercoiled form. Two long-standing questions in biology are how a cell turns genes on and off and what genes are transcribed at what time in development or function. These questions have mul- tiple answers that are based on the different types of genes. Housekeeping genes encode proteins that are used continually. Inducible genes are converted to protein only under certain circumstances. For instance in Escherichia coli genes that encode proteins involved with the utilization of lactose are expressed only when lactose is present see later discus- sion. The same principle applies to the genes for using other nutrients. Various inducers and accessory proteins control whether or not these genes are expressed or made into RNA they will be discussed in more detail in upcoming sections. The fnal product encoded by a gene is often a protein but may be RNA also. Genes that encode proteins are transcribed to give messenger RNA mRNA which is then translated to give the protein. Other RNA molecules such as tRNA rRNA snRNA and other regulatory noncoding RNAs are used directly i.e. they are not translated to make proteins. Some RNA molecules called ribozymes catalyze enzymatic reactions. One well-researched ribozyme is an rRNA found in the large subunit of the ribosome see Chapter 5. The genes that ultimately code for a protein via an mRNA intermediate are studied most often since they are historically thought to be the most important to the function of the organism. Coding regions of a gene sometimes called a cistron or a structural gene have the code to make a

slide 42:

CHAPTER 2 35 protein or a nontranslated RNA. The term cistron was originally defned by genetic complementation using the cis/trans test. In contrast an open reading frame ORF is a stretch of DNA or the corresponding RNA that encodes a protein and therefore is not interrupted by any stop codons for protein translation see later discussion. During transcription the enzymes that make RNA must identify the start site among the DNA code. Every gene has a region upstream of the coding sequence called a promoter Fig. 2.2. RNA polymerase recognizes this region and starts transcription here. Bacterial promoters have two major recognition sites: the −10 and −35 regions. The numbers refer to their approximate location upstream or before the transcrip- tional start site. By convention positive numbers refer to nucleotides downstream or after the transcription start site and negative numbers refer to those upstream or before. The exact sequences at −10 and −35 vary but the consensus sequences are TATAA and TTGACA respec- tively. When a gene is transcribed continually or constitutively then the promoter sequence closely matches the consensus sequence. If the gene is expressed only under special conditions activator proteins or transcription factors are needed to bind to the promoter region before RNA polymerase will recognize it. Such promoters rarely look like the consensus. The transcription start site begins after the promoter and denotes where RNA polymerase starts adding nucleotides complementary to the template strand. Between the transcription start site and the ORF is a region called the 5′ untranslated region 5′ UTR. This region is not made into protein but contains translation regulatory elements like the ribosome bind- ing site. Next is the ORF where no translational stop codons are found. Then there is another untranslated region after the ORF known as the 3′ untranslated region 3′ UTR. This region is not made into protein either and often contains important regulatory elements that modulate the rate of translation. Finally transcription stops at the termination sequence. Bacterial RNA polymerase is made of different protein subunits. The sigma subunit rec- ognizes the −10 and −35 regions and the core enzyme catalyzes RNA. RNA polymerase synthesizes nucleotide additions only in a 5′ to 3′ direction. The core enzyme has fve protein subunits: a dimer of two α proteins a β protein a related β′ subunit and an ω subunit. The β and β′ subunits form the catalytic site and the α subunit helps recognize the promoter. The 3D structure of RNA polymerase shows a deep groove that can hold the template DNA and a minor groove to hold the growing RNA. FIGURE 2.2 The Structure of a Typical Gene Genes are regions of DNA that are transcribed to give RNA. RNA can be translated into protein or used directly. The gene has a promoter region plus transcriptional start and stop points that fank the region that is converted into mRNA. After transcription the mRNA has a 5′ untranslated region 5′ UTR and 3′ untranslated region 3′ UTR which are not translated only the ORF is translated into protein. DNA mRNA Promoter Transcription start Transcription stop 5 UTR 3 UTR ORF 5 UTR 3 UTR ORF Protein TRANSCRIPTION TRANSLATION Genes have a transcriptional promoter where RNA polymerase attaches to the DNA and begins mak- ing an RNA copy of the template strand. The RNA has three regions: the 5′ UTR contains information important for making the protein the ORF has the actual coding region translated into amino acids during translation and the 3′ UTR contains other important regulatory elements. MAKING RNA In bacteria once the sigma subunit of RNA polymerase recognizes the −10 and −35 regions the core enzyme forms a transcription bubble where the two DNA strands are separated from each other Fig. 2.3. The strand used by RNA polymerase is called the template strand aka noncoding or antisense and is complementary to the resulting mRNA. The core

slide 43:

DNA RNA and Protein 36 enzyme adds RNA nucleotides in the 5′ to 3′ direction based on the sequence of the template strand of DNA. The newly made RNA anneals to the template strand of the DNA via hydrogen bonds between base pairs. The opposite strand of DNA is called the coding strand aka nontemplate or sense strand. Because this is complementary to the template strand its sequence is identical to the RNA except for the replacement of thymine with uracil in RNA. RNA synthesis normally starts at a purine normally an A in the DNA that is fanked by two pyrimidines. The most typical start sequence is CAT but some- times the A is replaced with a G. The rate of elongation is about 40 nucleotides per second which is much slower than replication ∼1000 bp/sec. RNA polymerase unwinds the DNA and creates positive supercoils as it travels down the DNA strand. Behind RNA poly- merase the DNA is partially unwound and has surplus negative supercoils. DNA gyrase and topoisomerase I either insert or remove negative supercoils respectively returning the DNA back to its normal level of supercoiling see Chapter 4. RNA polymerase makes a copy of the gene using the noncoding or template strand of DNA. RNA has uracils instead of thymines. Transcription bubble Growing mRNA chain RNA polymerase 3′ 5′ Direction of synthesis 5′ 3′ G C U A A T U A C G A T 5′ 3′ FIGURE 2.3 RNA Polymerase Syn- thesizes RNA at the Transcription Bubble RNA polymerase is a complex enzyme that can hold a strand of double- stranded DNA open to form a transcription bubble and add ribonucleotides to create RNA complementary to the template strand. TRANSCRIPTION STOP SIGNALS RNA polymerase continues transcribing DNA until it reaches a termination signal. In bacteria the Rho-independent terminator is a region of DNA with two inverted repeats separated by about six bases followed by a stretch of As. As RNA polymerase makes these sequences the two inverted repeats form a hairpin structure. The secondary structure causes RNA polymerase to pause. As the stretch of As is transcribed into Us the DNA/RNA hybrid molecule becomes unstable A/U base pairs have only two hydrogen bonds. RNA polymerase “stutters” and then falls off the template strand of DNA in the middle of the As. Bacteria also have Rho-dependent terminators that have two inverted repeats but lack the string of As. Rho ρ protein is a special helicase that unwinds DNA/RNA hybrid double heli- ces. Rho binds upstream of the termination site in a region containing many cytosines. After RNA polymerase passes the Rho binding site Rho attaches to the RNA and moves along the RNA transcript until it catches RNA polymerase at the hairpin structure. Rho then unwinds the DNA/RNA helix and separates the two strands. The RNA is then released. Transcription terminates either in a Rho-independent manner or in a Rho-dependent manner. THE NUMBER OF GENES ON AN mRNA V ARIES Bacterial and eukaryotic chromosomes are organized very differently. In prokaryotes the dis- tance between genes is much smaller and genes associated with one metabolic pathway are often found next to each other. For example the lactose operon contains several clustered

slide 44:

CHAPTER 2 37 genes for lactose metabolism. Oper- ons are clusters of genes that share the same promoter and are transcribed as a single large mRNA that contains multiple structural genes or cistrons. Thus the mRNA transcripts are called polycistronic mRNA Fig. 2.4. The multiple cistrons are translated indi- vidually to give separate proteins. In eukaryotes genes are often separated by large stretches of DNA that do not encode any protein. In eukaryotes each mRNA has only one cistron and is therefore called monocistronic mRNA. If a polycistronic transcript is expressed in eukaryotes the ribosome translates only the frst cistron and the other encoded proteins are not made. Bacterial mRNA transcripts have multiple open reading frames for proteins in the same metabolic path- way. Eukaryotes tend to have only one open reading frame in a single mRNA transcript. EUKARYOTIC TRANSCRIPTION IS MORE COMPLEX There are several differences between eukaryotic and prokaryotic transcription with more complexity associated with eukaryotic transcription. The simple fact that eukaryotic mRNA is synthesized in a nucleus makes the process more involved than bacterial transcription but this is only one of the differences. In contrast to the single RNA polymerase in prokaryotes eukaryotes have three different RNA polymerases that each transcribe different types of genes. RNA polymerase I transcribes the eukaryotic genes for large ribosomal RNA. These two rRNAs are transcribed as one long mRNA that is cleaved into two different transcripts: the 18S rRNA and 28S rRNA. These are used directly and not translated into protein. RNA polymerase III transcribes the genes for tRNA 5S rRNA and other small RNA molecules. RNA polymerase II transcribes the genes that encode proteins and has been studied the most. Starting transcription of eukaryotic genes is more complex than in bacteria. The layout of the eukaryotic promoter is much different. RNA polymerase II needs three different regions the initiator box the TATA box and various upstream elements that bind proteins known as transcription factors. The initiator box is the site where transcription starts and is separated by about 25 base pairs from the TATA box. The upstream elements vary from gene to gene and aid in controlling what proteins are expressed at what time. Many proteins are involved in positioning eukaryotic RNA polymerase II at the tran- scriptional start site Fig. 2.5 and Table 2.1. RNA polymerase II requires several general transcription factors to initiate transcription at all promoters. In addition specifc transcription factors are needed that vary depending on the particular gene see later discussion. The TATA binding protein or TATA box protein TBP recognizes the TATA box. This factor is used by all three RNA polymerases in eukaryotes. For RNA polymerase II TBP is found with other proteins in a complex called TFIID. For the other RNA poly- merases TBP associates with different proteins. After this complex binds TFIIB binds to FIGURE 2.4 Mono- cistronic versus Polycistronic Eukaryotes transcribe genes in single units where each mRNA encodes for only one protein. Prokaryotes transcribe genes in operons as one single mRNA and then translate the proteins as separate units. Single protein Several proteins Operon DNA Structural gene Promoter DNA Structural genes Monocistronic mRNA Polycistronic mRNA Promoter TRANSCRIPTION EUKARYOTES PROKARYOTES TRANSLATION

slide 45:

DNA RNA and Protein 38 the promoter which then triggers the binding of RNA polymerase II and TFIIA. RNA polymerase is associated with TFIIF which probably helps it bind to the promoter. Once RNA polymerase II has bound to the promoter it still requires TFIIE TFIIH and TFIIJ to initiate transcrip- tion. In particular TFIIH phosphorylates the tail of RNA polymerase II which allows it to move along the DNA. As RNA polymerase II leaves the promoter it leaves behind all of the general complexes except TFIIH. Bacterial RNA polymerase can function with a pro- moter containing no upstream elements. However in eukaryotes the upstream elements are essential to RNA polymerase II function and a promoter with no upstream elements is extremely ineffcient at initiat - ing transcription. These elements are from 50 to 200 base pairs in length and vary based on the gene being expressed. They bind regulatory proteins known as specifc transcription factors as opposed to the general transcription factors shared by all promoters that use RNA polymerase II. For example the specifc transcription factors Oct-1 and Oct-2 proteins bind only to the Octamer elements. Oct-1 is found in all tissues whereas Oct-2 is found only in immune cells. A plethora of spe- cifc factors exists which is beyond the scope of this discussion. FIGURE 2.5 Eukary- otic Transcription Many different general tran- scription factors help RNA polymerase II fnd the TATA and initiator box region of a eukaryotic promoter. Specifc transcription factors bind to upstream control elements and transmit the activa- tion signal to the general transcription factors and RNA polymerase II. Gene TFII F TFIID TFII H TFII B TFIIA Initiator box Upstream control element Gene Transcription start site Initiator box TATA box Upstream control element RNA pol II TFIIJ Activator domain A specific transcription factor DNA binding domain Clamp +1 +1 General Transcription Factors for RNA Polymerase II TBP Binds to TATA box part of TFIID TFIID Includes TBP recognizes Pol II specifc promoter TFIIA Binds upstream of TATA box required for binding of RNA Pol II to promoter TFIIB Binds downstream of TATA box required for binding of RNA Pol II to promoter TFIIF Accompanies RNA Pol II as it binds to promoter TFIIE Required for promoter clearance and elongation TFIIH Phosphorylates the tail of RNA Pol II retained by polymerase during elongation TFIIJ Required for promoter clearance and elongation T able 2.1 Eukaryotes have three different RNA polymerases that transcribe different genes. RNA polymerase II binds to TATA and initiator boxes of the promoter region of protein-encoding genes. Different general transcription factors facilitate RNA polymerase II binding. Eukaryotes require specifc transcription factors to initiate gene transcription also. There are also a large number of different specifc transcription factors. REGULTION OF TRANSCRIPTION IN PROKARYOTES In prokaryotes various activator and repressor proteins control which genes are transcribed into mRNA. The activators and repressors work by binding to DNA in the promoter region and either stimulating or blocking the action of bacterial RNA polymerase. In E. coli about 1000 of the 4000 total genes are expressed at one time. Activator proteins work by positive regulation in other words genes are expressed only when the activator gives a positive signal. In contrast repressors work by negative regulation. Here the gene is expressed only

slide 46:

CHAPTER 2 39 when the repressor is removed. Some repressors block RNA polymerase from binding to the DNA others prevent initiation of transcription even though RNA polymerase has bound. Regulation of transcription is complex even in simple prokaryotes. Many genes are con- trolled by a variety of factors. Some operons in bacteria have multiple repressors and activa- tors. Less often regulatory proteins may block elongation either by slowing the actual rate of elongation or by signaling premature termination. Conversely a few antiterminator proteins are known that override termination and allow genes downstream of the termina- tion site to be expressed. Prokaryotes use positive regulation where activator proteins signal RNA polymerase to transcribe the gene or negative regulation where the transcription factor inhibits RNA polymerase. Prokaryotic Sigma Factors Regulate Gene Expression Prokaryotic RNA polymerase includes a sigma σ subunit which recognizes the promoter frst and binds the catalytic portion of the enzyme the core enzyme. There are many dif - ferent sigma subunits and each one recognizes a different set of genes. The σ70 subunit or RpoD is the most commonly used form. It recognizes most of the housekeeping genes in E. coli. During the stationary phase when E. coli is not growing rapidly σ38 or RpoS acti- vates the necessary genes. Sigma subunits are named either by σ plus their molecular weight or by Rpo for RNA polymerase plus their function: D default S stationary etc. Another sigma factor RpoH or σ32 activates genes needed during heat shock. Normally E. coli grows at body temperature 37°C and stops growing at temperatures much above 43°C. At such higher temperatures proteins begin to unfold and are degraded. RpoH acti- vates expression of chaperonins that help proteins fold correctly and prevent aggregation. RpoH also activates proteases that degrade proteins too damaged by the heat to be saved. The transcription and translation of RpoH depend on temperature. When E. coli grows at a normal temperature very few misfolded proteins are present. DnaK a chaperonin and HfB a protease are found in the cytoplasm but since there are few proteins to degrade they bind to RpoH and degrade it. They even degrade partially translated RpoH protein. When high temperatures promote unfolding and aggregation of proteins DnaK and HfB bind to the aberrant proteins and no longer destroy RpoH. Now the sigma factor initiates transcrip- tion of other genes associated with heat shock. Sigma σ subunits are transcription factors that associate with prokaryotic RNA polymerase and control which genes are transcribed. Lactose Operon Demonstrates Specifc and Global Activation Many genes require specifc regulator proteins to activate RNA polymerase binding and transcription. Some of these proteins exist in two forms: active binds to DNA in promoter region and inactive nonbinding. The forms are interconverted by small signal molecules or inducers that alter the shape of the protein. For example the inducer allo-lactose controls the activator protein for the lactose operon. The lactose or lac operon is well characterized genetically and the importance of each of the DNA elements in the promoter region has been studied. The promoter region controls three structural genes: lacZ lacY and lacA. The lacZYA genes are transcribed as a polycistronic message. Upstream of the promoter and transcribed in the opposite direction as the lacZYA

slide 47:

DNA RNA and Protein 40 region is another gene which encodes the LacI protein the lac operon repressor Fig. 2.6. The lacZ gene encodes β-galactosidase which cleaves the disaccharide lactose into galactose and glucose. The lacY gene encodes lactose permease which transports lactose across the cytoplasmic membrane into the bacteria. Finally the lacA gene encodes the protein lactose acetylase with an unknown role. The promoter has a binding site lacO for the repressor protein which overlaps the binding site for RNA polymerase. This region is also known as the operator and when the repres- sor binds RNA polymerase cannot transcribe the operon. There is also a binding site for CRP protein cyclic AMP receptor protein also known as CAP catabolite acti- vator protein. This global regulator activates transcrip- tion of many different operons for using alternate sugar sources. It is active when E. coli does not have glucose to utilize as an energy source. The environment controls whether or not the lactose operon is expressed Fig. 2.7. When E. coli has plenty of glucose then the lactose operon is turned off as well as other operons for other sugars such as maltose or fructose. When glucose is present levels of a small inducer cyclic AMP cAMP are low. If E. coli consumes all the available glucose the levels of cAMP increase. cAMP binds to Crp the global regulator which then dimer- izes so that it can bind to the Crp sites in various promoters such as the lactose operon. Crp binding will not activate the lactose operon alone and lactose must also be present to activate transcription. If lactose is available β-galactosidase converts some lactose into allo-lactose. This acts as an inducer and binds to the tetrameric LacI repressor protein. This releases the repressor from the promoter. The lactose operon is expressed only when both glucose levels are low and lactose is present. The control relies on two inducer molecules: cAMP binds to the global activator Crp and allo-lactose binds to the specifc repressor LacI. One control is global Crp because it controls many different operons and one control is specifc LacI because it regulates only the lactose operon. Many researchers use the lactose promoter to control expression of other genes. In the lab a gratuitous inducer IPTG isopropyl-thiogalactoside replaces allo-lactose Fig. 2.8. IPTG is not cleaved by β-galactosidase because its two halves are linked through a sulfur rather than oxygen. Since it is not metabolized IPTG does not have to be added continually throughout the experiment as would be the case for allo-lactose. FIGURE 2.6 Com- ponents of the lac Operon The lac operon consists of three structural genes lacZYA which are all transcribed from a single promoter designated lacP. The promoter is regulated by binding of the repressor at the operator lacO and of Crp protein at the Crp site. Note that in reality the operator partly overlaps both the promoter and the lacZ structural gene. The single lac mRNA is translated to produce the LacZ LacY and LacA proteins. The lacl gene that encodes the LacI repres- sor has its own promoter and is transcribed in the opposite direction from the lacZYA operon. lacI TRANSLATION PROTEINS Crp site +1 Promoter for lacI Terminator Structural genes Structural gene Regulatory region mRNA TRANSCRIPTION lacA lacI LacI lacY lacZ lacA lacY lacZ LacA LacY LacZ lacP lacO The lac operon is important to understand because its inducers and regulators are used to control new genes that are engineered into model organisms. Control of Activators and Repressors Various mechanisms control gene activators and repressors. In some cases the repressor or activator binds to the promoter of its own gene and controls its own transcription this is called autogenous regulation. Many activators and repressors rely on activation by small molecules as for Crp and LacI. In some cases a repressor needs a co-repressor in order to be active. For example ArgR represses the arginine biosynthetic operon when arginine is present. Arginine is a co- repressor and ensures that the bacteria do not make the amino acid when it is not needed.

slide 48:

CHAPTER 2 41 In many cases adding different groups such as phosphate methyl acetyl AMP- and ADP- ribose covalently modifes activators or repressors. The two-component regulatory sys- tems of bacteria transfer phosphate groups from a sensor protein to a regulator protein Fig. 2.9. The frst protein the sensor kinase senses a change in the environment and changes shape. This causes the kinase to phosphorylate itself using ATP. The phosphate group is then transferred to the regulator protein an activator or repressor which changes shape to its DNA binding form. The phosphorylated regulator then binds to its recognition site in the target promoter. This either stimulates or represses transcription of the operon. Structural geneslacA lacI gene lacY lacZ la la la la la la la a lacO lacP Promoter for LacI Crp site DNA Structural geneslacA lacI gene lacY lacZ la la la la la la la lacO lacP Promoter for LacI Crp site DNA Structural geneslacA lacI gene lacY lacZ la la la la la la lacO lacP Promoter for LacI Crp site DNA Structural geneslacA lacI gene lacY lacZ la la la la la la la a lacO lacP Promoter for LacI Crp site DNA Structural geneslacA lacI gene lacY lacZ la la l la la la la lacO lacP Promoter for LacI Crp site DNA Crp Crp LACTOSE OPERON NO GLUCOSE YES LACTOSE ON lacI lacI +1 +1 cAMP RNA polymerase RNA polymerase Lactose +1 +1 +1 +1 YES GLUCOSE YES LACTOSE OFF +1 +1 YES GLUCOSE NO LACTOSE OFF lacI +1 +1 NO GLUCOSE NO LACTOSE OFF Crp Crp Crp Crp cAMP FIGURE 2.7 Control of Lactose Operon The lactose operon is converted into a polycistronic mRNA only when glucose is absent and lactose is present. When glucose is available the global activator protein Crp does not activate binding of RNA polymerase. When there is no glucose Crp binds to the promoter and stimulates RNA polymerase to bind. The lack of lactose keeps LacI protein bound to the operator site and prevents RNA polymerase from transcribing the operon. Only when lactose is present is LacI released from the DNA.

slide 49:

DNA RNA and Protein 42 REGULTION OF TRANSCRIPTION IN EUKARYOTES Just as the initiation of transcription is more complex in eukaryotes so is its control. The mechanisms to regulate which gene is expressed at what time are very complicated. The fact that eukaryotic DNA wraps around histones hinders many proteins from binding to the DNA delaying access by activators and repressors. In addition the nuclear mem- brane prevents the access of most proteins to the nucleus. Consider the complexity of the human body with its multiple tissues. Each gene in each cell needs to be expressed only when needed and only in the amount needed. In addition to normal organ functions and to changes during development the environment has a huge impact on our bodies and changes in gene expression help us adapt. Overall the numbers and types of controls for Phosphorylation co-repressors and co-activators modulate prokaryotic gene activators and repressors of transcription to express genes only in appropriate conditions. There are many different two-component systems in bacteria that respond to a variety of environmental conditions. For example when there is low oxygen the ArcAB system modi- fes gene expression to compensate. The ArcB protein is the sensor kinase and it has three phosphorylation sites. The ArcA regulator has only one site for phosphorylation. The phos- phate group transfers from one site to the next in a phosphorelay system ultimately regulat- ing transcription of the genes. These types of phosphorelays are very common particularly in eukaryotes where there are often more than two components. FIGURE 2.8 Struc- tures of Lactose allo- Lactose and IPTG IPTG is a nonmetaboliz- able analog of the lactose operon inducer allo-lactose. β-galactosidase cannot break the sulfur linkage and therefore does not cleave IPTG in two. H H H H H H H H H H O O OH H CH 2 OH H OH HO O OH H CH 2 OH H OH HOH H LACTOSE galactose glucose O O OH CH 2 OH HO HO HO O OH CH 2 H H OH HOH ALLO-LACTOSE O SC OH H CH 2 OH CH 3 CH 3 OH HO ISOPROPYL-β-D-THIOGALACTOSIDE IPTG FIGURE 2.9 Model of Two-Component Regulatory System The two-component regulatory system includes a membrane component sensor kinase and a cytoplasmic component response regulator. Outside the cell the sensor domain of the kinase detects an environmental change which leads to phosphorylation of the transmitter domain. The response regulator protein receives the phosphate group and consequently changes confguration to bind the DNA. Sensor domain Sensor kinase Transmitter domain ATP ADP P P Response regulator DNA binding form of response regulator DNA P P

slide 50:

CHAPTER 2 43 gene expression are staggering. Other eukaryotes such as mice rats Arabidopsis C. elegans and even the relatively simple yeast have similarly complex control systems. There are many different transcription factors for eukaryotic genes yet they all have at least two domains: one binds to DNA and the other binds to some part of the transcription apparatus. The two domains are connected yet may function when separated from each other Fig. 2.10. If the DNA binding domain of one transcriptional regulator is connected to the activation domain of another the hybrid protein will work each part retaining its original characteristics. That is the DNA binding domain will bind the same sequence as it did before and the activation domain will activate transcription as before. This property can be exploited when trying to identify protein-to-protein interactions with newly characterized proteins in the yeast two-hybrid screen see Chapter 9. Transcription factors work via an assembly of many proteins called the mediator complex Fig. 2.1 1A. These proteins receive all the signals from each of the activator pro- teins compile the message and transmit this to RNA polymerase II. The mediator contains 26 different subunits most of which make up the core. The presence of other proteins may vary depending on the cell or organism. These accessory proteins were originally thought to be co-activators or co-repressors since their presence varies based on tissue. The mediator complex sits directly on RNA polymerase II waiting for information from activators or repressors. These may bind to regions just upstream of RNA polymerase II and the mediator complex. However eukaryotic transcription factors may also bind to DNA sequences known as enhancers that may be thousands of base pairs away from the pro- moter. Even so the regulatory proteins bind directly to the mediator complex. The enhancer elements work in either orientation but affect only genes that are in the general vicinity. The prevailing theory is that the DNA loops around so that the enhancer is brought near the promoter. GAL4 site Promoter DNA LexA site Promoter DNA GAL4 site Promoter DNA LexA site Promoter DNA GAL4 site Promoter DNA LexA site Promoter DNA A Binding and transcription Natural GAL4 protein Hybrid protein D Binding and transcription B No binding C No binding GAL4 DNA- binding GAL4 activator LexA DNA- binding GAL4 activator TFIIF TFIID TFII H TFII B TFIIA RNA pol II TFIIJ TFIIF TFIID TFII H TFII B TFIIA RNA pol II TFIIJ Clamp CTD Clamp CTD FIGURE 2.10 Transcription Factors Have Two Independent Domains A One domain of the GAL4 transcription factor normally binds to the GAL4 DNA recognition sequence and the other binds the transcription apparatus. B If the LexA site on the DNA is substituted for GAL4 site the transcription factor does not recognize or bind the DNA. C An artifcial protein made by combining a LexA binding domain with a GAL4 activator domain will not recognize the GAL4 site on the DNA but D will bind to the LexA recognition sequence and activate transcription. Thus the GAL4 activator domain acts independently of any particular recognition sequence.

slide 51:

DNA RNA and Protein 44 Spacer DNA Spacer DNA No transcription TFIIH Specific transcription factor DNA- binding domain Enhancer Enhancer Enhancer Enhancer Enhancer Enhancer Upstream control element TFIIB Clamp TFIIJ TFIIA TATA CTD Initiator box RNA Pol II Gene + 1 No transcription TFIID Transcription proceeds A TFIIH Upstream control element TFIIB Clamp TFIIJ TFIIA TATA CTD Initiator box RNA Pol II Gene + 1 Start transcription TFIID A specific transcription factor Mediator complex DNA-binding domain Activation domain B DNA Gene X Gene Y Enhancer Insulator- binding protein IBP Insulator region FIGURE 2.11 Enhancer and Insulator Sequences A Enhancer elements are found many hundreds of base pairs from the gene they control. They bind specifc proteins that interact with the mediator complex by looping the DNA around. B Insulator binding protein IBP connects DNA at the insulator binding sequence to form large loops of DNA. This arrangement keeps the correct enhancers associated with the correct genes.

slide 52:

CHAPTER 2 45 Insulators are DNA sequences that prevent enhancers from activating the wrong genes. Insulators are placed between enhancers and those genes they must not regulate. The insu- lator binding protein IBP recognizes the insulator sequences and blocks the action of enhancers that are not within the looped region see Fig. 2.1 1B. Insulator sequences may be controlled by methylation. When the DNA sequence is methylated IBP cannot bind and the enhancer is allowed to access promoters beyond the insulator. The eukaryotic transcription factor AP-1 consists of two different proteins that work as a dimer. These transcription factors control gene expression but what type of gene activated by the protein depends on the constituents of the dimer the type and amount of post-translational modifcations and the interaction with other modifer proteins. The eukaryotic transcription factor has two domains. The DNA binding domain binds to the DNA at the promoter and the activator domain has sites for initiating RNA polymerase action. Eukaryotic transcription factors control gene expression by binding to the mediator complex. Insulator sequences prevent transcription factors from binding to the wrong promoter. Enhancer sequences are far from the promoter but may loop around to directly bind to the mediator complex. Eukaryotic T ranscription Enhancer Proteins AP-1 activator protein-1 activates a wide variety of genes and provides an example of the complexity involved in eukaryotic gene expression. This protein affects a variety of genes and responds to a wide range of different stimuli. The most potent stimulators of AP-1 include growth factors and UV irradiation which are two disparate processes with different end points. The former stimulates cell growth whereas the latter induces cell death yet they both work through the same transcription factor AP-1. The complex effects of this single tran- scription factor are still being investigated. AP-1 is actually a dimer of two proteins from the Fos and Jun family of transcription factors. In addition members of the ATF/CREB family can replace one of the Fos or Jun proteins in the dimer. The dimer recognizes a palindromic sequence 5′-TGAC/GTCA-3′ Fig. 2.12. AP-1 belongs to a family of DNA binding proteins called bZIP proteins. The proteins each have a dimerization domain and a DNA binding domain. Jun family members dimerize with themselves or form heterodimers with Fos family members. In contrast Fos members can bind to Jun but cannot dimerize by themselves. Fos and Jun also have activation domains that receive cellular signals that increase or attenuate their activity. When AP-1 is stimulated two different effects are involved. First the cell makes more Fos and Jun proteins through increased expression of their genes. In addition the proteins themselves become more stable and are not degraded as quickly. Second the activity of Fos and Jun are stimulated by phosphorylation of their activation domain by JNK Jun amino- terminal kinase. Many other cellular signaling proteins can alter Jun and Fos activity but JNK is the most potent. Phosphorylation of Jun and Fos triggers their interaction with the protein mediator complex and RNA polymerase II. It also affects other signal proteins and triggers other genes. Epigenetics and DNA Regulation Epigenetics is any heritable change in DNA other than changes in nucleotide sequence Fig. 2.13. The different types of epigenetic changes that affect gene expression include

slide 53:

DNA RNA and Protein 46 histone post-translational modifcations DNA methylation nucleosome remodeling and RNA-associated silencing. These four mechanisms work in conjunction with a variety of transcription factors enhancers repressors and other proteins that modify gene expression to take the DNA that has so little variability and to make the entire biological diversity seen on our planet. B A Jun Jun Jun Fos Basic region Basic region Basic region Leucine zipper Leucine zipper RNA polymerase TGACTCA ACTCAGT ACTGAGT TGAGTCT Jun JNK Jun Promoter DNA 5 3 3 5 RNA polymerase TGACTCA ACTCAGT ACTGAGT TGAGTCT Jun Jun Promoter DNA 5 3 3 5 P P FIGURE 2.12 Eukaryotic Regulation of Transcription A The eukaryotic transcription factor AP-1 is a dimer of two proteins from the Jun family Fos family or ATF/CREB family. These two proteins interact through their leucine zippers. B To activate transcription AP-1 must itself first be activated by phosphorylation by the kinase JNK. Only then does Jun stimulate RNA polymerase II to transcribe the appropriate genes.

slide 54:

CHAPTER 2 47 METHYLATION GC CG GC CG GC CG GC CG METHYLCYTOSINE- BINDING PROTEIN ARRIVES HISTONE DEACETYLASE BINDS TO MeCP DEACETYLATION OF HISTONES GC CG CH3 CH3 GC CG MeCP AGGREGATION OF NUCLEOSOMES GC CG GC CG MeCP HDAC CH3 CH3 CH3 CH3 CH3 CH3 CH3 CH3 CH3 CH3 A BC AGGREGATED H2B H4 H3 H3 H4 H4 H2B H4 H3 H3 H2A H2A DISAGGREGATED H4 H2B H4 H3 H3 H4 H2B H4 H3 H3 H2A H2A Acetyl group Acetyl group D X-INACTIVATION BY NONCODING RNA POST-TRANSLATIONAL MODIFICATIONS OF HISTONES DNA METHYLATION NUCLEOSOME REMODELING PRODUCTION OF Xist RNA INACTIVATION OF ONE X-CHROMOSOME BY METHYLATION INACTIVE ACTIVE COATING OF ONE X-CHROMOSOME BY Xist RNA Xist gene X-chromosome 1 X-chromosome 2 X-chromosome 1 X-chromosome 2 Coating by Xist RNA spreads outwards from Xist gene Xist inactive Xist RNA Xist stays active CH3 CH3 CH3 CH3 CH3 CH3 CH3 CH3 CH3 CH3 FIGURE 2.13 Epigenetic Changes Control Gene Expression A Histone post-translation modifcations such as acetylation can loosen nucleosomes making them more accessible to various regulatory proteins. B DNA methylation can induce heterochromatin formation. First the area to be silenced is methylated. The methyl groups attract methylcytosine binding protein MeCP which in turn attracts histone deacetylases. Once HDAC removes the acetyl groups from the histone tails the histones aggregate tightly. The closeness of histones excludes any DNA binding proteins and hence turns off gene expression in the area. C Nucleosome remodel- ing moves nucleosomes by sliding them from promoter areas or can remodel his- tones to be further or closer together. D A noncoding RNA called Xist which is produced from both copies of the X chromosome covers the surface of one of the X chromosomes which in turn induces methylation and heterochromatin forma- tion of the X chromosome. The inactivation of one X chromosome ensures equal amount of protein expression with males that have only one X chromosome.

slide 55:

DNA RNA and Protein 48 Most epigenetic changes in DNA-associated proteins such as histones modulate the access that various transcription factors and regulatory proteins have to genes. Eukaryotic DNA wraps around histones to form nucleosomes the “beads on a string” structure of chromatin see Chapter 1. Loosely packed nucleosomes provide access for transcription factors whereas tightly packed nucleosomes exclude regulatory proteins. Therefore controlling the density of nucleosomes regulates transcription initiation. The histone proteins have protein extensions or “tails” that can be modulated by enzymes called histone acetyl transferases HATs which transfer acetyl groups to lysines within the tail. These histone tails normally stabilize DNA by binding neighboring nucleosomes so aggregating the nucleosomes. When HAT transfers acetyl groups to the tails they can no lon- ger bind to neighboring nucleosomes and the structure loosens. To tighten the nucleosomes histone deacetylases HDACs remove the acetyl groups and the histones reaggregate Fig. 2.13A. Although this is generally true there are some histone post-translational modi- fcations that decrease the accessibility of the DNA therefore post-translational modifca - tions are considered context specifc for gene expression. Another epigenetic modulation of eukaryotic DNA important to gene expression is meth- ylation see Fig. 2.13B. In prokaryotes differences in methylation patterns distinguish the newly synthesized strand of DNA from the template during replication. In eukaryotes methylation is used to silence various regions of DNA and prevent their expression. Methylation of cytosine by two different enzymes occurs in the sequence CpG or CpNpG where p represents the phosphate group that links the cytosine and guanosine nucleotides on the strand of DNA. Maintenance methylases add methyl groups to newly synthesized DNA to give the same pattern as in the template strand. De novo methylases add new methyl groups and of course demethylases remove unwanted methyl groups. Many genes are located near stretches of DNA containing many CpG sequences called CpG islands. If these are methylated the nearby genes are not expressed whereas if they are not methylated the genes are expressed. Methylation patterns depend on the tissue. For example muscle cells will not methylate CpG islands in front of genes necessary for muscle function. The muscle- specifc genes will have methylated CpG islands in other tissues though. Silencing can occur for one gene or for areas as large as an entire chromosome. In order for large regions to become “silenced” CpG or CpNpG sequences are methylated which then attracts methylcytosine binding proteins that block binding of other DNA binding proteins. These proteins also recruit HDACs that deacetylate the histone tails thus condens- ing the chromatin. The areas turn into heterochromatin which prevents any further gene expression. The third epigenetic modifcation also affects nucleosome spacing. Chromatin remodeling complexes slide histones remove histones remodel histones alter nucleosome spacing and affect nucleosome assembly. These complexes move histones to expose promoter regions or can completely remodel histones so that more DNA is accessible Fig. 2.13C. They affect the overall spacing of nucleosomes around different genes. These complexes also are able to replace histone proteins with variant histones. Histone variants replace normal histones in different regions of the genome and mark regions of active transcription. For example the SWR1 complex in yeast a chromatin remodeling complex replaces histone 2A with histone 2A.Z H2A.Z which protects regions from turning into heterochromatin and therefore marks areas of gene activation. Methylation patterns have a major impact during development because gene expression is tightly controlled for proper development. Some genes remain methylated in the gam- ete whereas other genes must be demethylated so they can become active. Imprinting occurs when a gene from the parent is methylated in the gamete and remains methylated in the new organism. This affects relatively few genes. In contrast genes that are not imprinted change their methylation patterns during development. Imprinting

slide 56:

CHAPTER 2 49 can be different for male gametes and female gametes setting the stage for sexual differ- ences in gene expression. One special type of imprinting is X-inactivation where the entire second X chromosome in females is silenced by methylation and heterochromatin formation except for a few loci. This inactivation is triggered by the fnal epigenetic modifcation noncoding RNA regulation Fig. 2.13D. The Xist gene is expressed in both X chromosomes initially but one copy is methylated and silenced. The chromosome con- taining the silenced Xist gene remains active. The other chromosome continues to make Xist mRNA which then coats the surface of the chromosome and triggers methylation and heterochromatin formation. Therefore the Xist mRNA is one of the few loci of this X chromosome that is expressed. This ensures that females with two X chromosomes have the same level of gene expression as males who have an X/Y chromosome pair. In addi- tion other RNAs that affect gene expression include antisense RNA and small interfering RNA siRNAs. EUKARYOTIC mRNA IS PROCESSED BEFORE MAKING PROTEIN Bacterial mRNA is translated without any processing. Indeed bacteria often start translating their mRNA while it is still being transcribed known as coupled transcription/translation. However eukaryotic mRNA is processed in a variety of ways before it leaves the nucleus. First eukaryotic mRNA must have a cap added to the 5′ end of the message Fig. 2.14. The cap is a GTP that is added in reverse orientation and is methylated on posi- tion 7 of the guanine base. Methyl groups may also be added to the frst one or two nucleo - tides of the mRNA. The second modifcation of eukaryotic mRNA is adding a long stretch of adenines to the 3′ end—the polyA tail. Three sequences at the end of an mRNA mediate the addition of the tail: the recognition sequence for the polyadenylation complex AAUAAA the cut site for cleavage binding factor and the recognition sequence for polyadenylation binding protein a length of GU repeats. First the polyadenylation complex binds to the AAUAAA and an endonuclease in the complex cuts the mRNA after a CA dinucleotide downstream from the AAUAAA recognition sequence. Next polyA polymerase adds 100 to 200 adenine nucleotides. Finally the polyA binding protein binds to both the polyA tail and the cap structure. This circularizes the mRNA. A third modifcation made to eukaryotic mRNA is the removal of introns. Eukaryotic DNA contains many stretches of intervening sequence introns between regions that will ultimately code for a protein exons. First the entire region is transcribed into an RNA molecule called the primary transcript. After capping and tailing this is processed to remove the introns. The exons are spliced together to form the mRNA. Proteins called splicing factors recognize the exon/intron borders cut the DNA and join the neighbor- ing exons. These three modifcations probably occur with the capping frst since this occurs when transcription starts but the polyA tail addition and splicing occur simultaneously. The polyA tail addition releases the mRNA from RNA polymerase and demarks the end of transcription. Whether or not splicing has been completed by this endpoint is unknown as of now. Epigenetic regulation of DNA includes DNA methylation histone modifcation by acetylation nucleosome remodeling and silencing of DNA by various types of RNA including antisense RNA noncoding RNA and small interfering RNA. The density of nucleosome packing can exclude transcription factors from binding to the enhancer regions. Adding methyl −CH 3 groups to the cytosine of CpG areas controls expression of nearby genes. These groups can prevent binding of various transcription factors and have implications in setting up male/ female differences during development.

slide 57:

DNA RNA and Protein 50 AAAAAAAAAAAA MET Gene DNA Transcription start Exon Exon Intron Intron Exon Promoter TRANSCRIPTION A B Primary transcript RNA Messenger RNA Cap Protein Exon Exon Intron Intron Exon PROCESSING ADD CAP ADD TAIL REMOVE INTRONS TRANSLATION TRANSLATION Exon Exon Exon Cap Start codon Exons Tail AUG Protein Tail signal AAAAAAAAAAAA Poly-A tail Tail signal 3 end 5 end 5 untranslated region 3 untranslated region P P P R G FIGURE 2.14 Processing Eukaryotic mRNA A Eukaryotic RNA is processed before exiting the nucleus for translation into protein. A cap reversed GTP with a methyl group on position 7 of the guanine base is added to the 5′ end of the message a polyA tail is added to the 3′ end and the introns are spliced out. These modifcations stabilize the message for protein translation in cytoplasm. B Detailed structure of a processed eukaryotic mRNA. The cap structure is followed by the 5′ UTR the protein coding exons 3′ UTR and a polyA tail. Eukaryotic RNA is transcribed as a primary transcript a cap is added at the 5′ end a polyA tail is added and the introns are removed. The mRNA is then transported from the nucleus to the endoplasmic reticulum for translation by ribosomes.

slide 58:

CHAPTER 2 51 TRANSLTING THE GENETIC CODE INTO PROTEINS The Genetic Code Is Read as T riplets or Codons Messenger RNA provides the information a ribosome needs to make proteins. This process is known as translation because it involves translating information carried by nucleic acids to give the sequences of amino acids that make up proteins. Before the mechanism is discussed the code used to assemble proteins must frst be understood. Nucleic acids each have four different bases the T in DNA is equivalent to U in RNA. However proteins con- sist of 20 different amino acids. If each nucleotide corresponded to an amino acid this would encode only four different amino acids. Two nucleotides give only 16 combina- tions still not enough. Only groups of three nucleotides provide enough combinations to create all 20 amino acids Fig. 2.15. Messenger RNA reads groups of three bases known as triplets or codons. Each triplet of bases codes for one amino acid. Because there are more than 20 triplets many are redundant so multiple codons will be translated into the same amino acid. For example valine is encoded by GUU GUC GUA or GUG. The genetic code listed in Fig. 2.15 is considered the universal genetic code. Not all organ- isms use precisely this code although exceptions are rare. For example UGA normally sig- nals stop but in Mycoplasma UGA encodes tryptophan and in the protozoan Euplotes UGA encodes cysteine. Small RNA molecules known as transfer RNA tRNA recognize the individual codons on mRNA and carry the corresponding amino acids. Although tRNA is synthesized as a single strand it folds back on itself to form regions of double-stranded RNA. The fnal shape of tRNA is a folded “L” shape with the anticodon at one end and the acceptor stem at the other. The anticodon consists of three bases complementary to those of the corresponding codon and it therefore recognizes the codon by base pairing. The accep- tor stem is the place where the amino acid is added to the free 3′ end of the tRNA Fig. 2.16. How does each specifc tRNA carry the correct amino acid A group of enzymes called aminoacyl tRNA synthetases attaches the correct amino acid to the corresponding tRNA. These enzymes are very specifc and recognize the correct tRNA by its sequence at the anticodon or elsewhere along the RNA structure. There is a specifc aminoacyl tRNA synthetase for each amino acid. The enzymes catalyze the addition of the correct amino acid onto the end of the correct tRNA. In fact some aminoacyl tRNA synthetases also have domains that edit their work ensuring that the correct amino acid connects to the correct tRNA. The frst base of the anticodon binds the third base of the codon in the mRNA. Because this nucleotide in tRNA is not constrained by neighboring nucleotides it can wobble instead of forming a perfect double helix. This allows nonstandard base pairs to be cre- ated. For example if the frst anticodon base were G it would normally pair with C in the third position of the codon. Because of wobble G can also pair with U. Thus the tRNA for FIGURE 2.15 The Genetic Code The 64 codons found in mRNA are shown with their corresponding amino acids. As usual bases are read from 5′ to 3′ so that the frst base is at the 5′ end of the codon. Three codons UAA UAG UGA have no cognate amino acid but signal stop. AUG encoding methionine and much less often GUG encoding valine act as start codons. To locate a codon fnd the frst base in the vertical column on the left the second base in the horizontal row at the top and the third base in the vertical column on the right. UUU Phe UUU Phe UUC Phe UUA Leu UUG Leu CUU Leu CUU Leu CUC Leu CUA Leu CUG Leu AUU Ile AUU Ile AUC Ile AUA Ile AUG Met UCU Ser UCU Ser UCC Ser UCA Ser UCG Ser CCU Pro CCU Pro CCC Pro CCA Pro CCG Pro ACU Thr ACU Thr ACC Thr ACA Thr ACG Thr GUU Val GUU Val GUC Val GUA Val GUG Val GCU Ala GCU Ala GCC Ala GCA Ala GCG Ala UAU Tyr UAU Tyr UAC Tyr UAA stop UAG stop CAU His CAU His CAC His CAA Gln CAG Gln AAU Asn AAU Asn AAC Asn AAA Lys AAG Lys GAU Asp GAU Asp GAC Asp GAA Glu GAG Glu UGU Cys U UGC Cys UGA stop UGG Trp CGU Arg C CGC Arg CGA Arg CGG Arg AGU Ser A AGC Ser AGA Arg AGG Arg GGU Gly GGU Gly GGC Gly GGA Gly GGG Gly U 3rd base 1st base U U U C C A A G G CA G 2nd middle base C A G U C A G U C A G U C A G

slide 59:

DNA RNA and Protein 52 histidine has the anticodon GUG and recognizes both CAC and CAU in the mRNA. Simi- larly U in the frst place in the anticodon can base pair with A or G in the third position of the codon. Wobble explains how the same tRNA can read multiple codons all encoding the same amino acid. Each organism has a preference as to which triplet codon is used most often for a particular amino acid. This is called codon bias see Box 2.1. D loop D loop Anticodon Anticodon Codon Codon T loop T loop 5 end 5 end 3 end 3 end R group R group VALINE VALINE O O C C C C CH CH O O H H H 3 N + H 3 N + H 3 C H 3 C CH 3 CH 3 mRNA GUA CAU CAU 123 123 321 321 FIGURE 2.16 Structure of tRNA Allows Wobble in the Third Position Transfer RNA recognizes the codons along mRNA and presents the correct amino acid for each codon. The frst position of the anticodon on tRNA matches the third position of the codon. During protein translation each tRNA recognizes a specifc three-nucleotide sequence and has the cor - rect amino acid attached to the opposite end. A family of specifc enzymes aminoacyl tRNA synthetases ensure that each tRNA has the correct amino acid. Several amino acids are encoded by multiple codons and have more than one corresponding tRNA. Thus valine is encoded by GUU GUC GUA and GUG. One tRNA for valine recognizes GUU and GUC by wobble but another tRNA is necessary for the other two codons. However many organisms tend to use only one or two of the codons for amino acids with multiple codons—a phenomenon known as codon bias. Consequently they make low amounts of tRNA for the rarely used codons. Furthermore different organisms show different codon preferences. This becomes an issue when genes from one organism are expressed in another. Plants and ani- mals often prefer different codons than bacteria for the same amino acids. When bacteria express plant or animal proteins not enough tRNA is available for the nonpreferred codons and the ribosomes stall and fall off making protein yield very low. To remedy this prob- lem researchers may genetically engineer the genes so that abun- dant tRNAs recognize their codons see Chapter 14. Alternatively bacterial host strains may be engineered to express higher levels of the necessary tRNAs. Box 2.1 Codon Bias

slide 60:

CHAPTER 2 53 Protein Synthesis Occurs at the Ribosome The molecular machine called a ribosome unites mRNA with the appropriate tRNAs and then catalyzes the linkage of amino acids together into a chain. Prokaryotic ribosomes consist of two subunits called the 30S and 50S which combine to form a functional 70S ribosome. A ribosome consists of several RNA molecules ribosomal RNA or rRNA and many proteins. The 30S subunit has a 16S rRNA plus 21 proteins the 50S subunit has two rRNAs the 5S and 23S plus 34 proteins. The larger subunit has three binding sites for tRNA called A for acceptor P for peptide and E for exit referring to the action occurring at each site. The 23S rRNA actually catalyzes the addition of amino acids to the growing polypeptide chain and is therefore a ribozyme. Ribozymes are discussed in Chapter 5. In prokaryotes various factors besides the ribosome are involved in protein synthesis Fig. 2.17. First a ribosome must assemble at the start site and begin protein synthesis at the correct start codon. The 5′ untranslated region of the mRNA see above has the signal for ribosome binding in front of the start codon. In prokaryotes translation begins at the frst AUG codon after the Shine–Dalgarno sequence or ribosome binding site which has the consensus sequence UAAGGAGG. The anti-Shine–Dalgarno sequence is found in the 16S rRNA of the smaller 30S subunit. So frst the small ribosomal subunit binds the Shine–Dalgarno sequence. A derivative of methionine N-formyl-methionine fMet and a special initiator tRNA tRNA i are used to initiate translation in prokaryotes. Only initiator tRNA charged with fMet referred to as tRNA i fMet can bind the small subunit of the ribosome. Translation factors are proteins needed to recruit and assemble the components of the ribosome and translational complex. Initiation factors IF1 IF2 and IF3 assemble the 30S initiation complex which is the 30S ribosomal subunit plus tRNA i fMet . The IF3 factor then leaves the complex and the 50S ribosomal subunit binds forming the 70S initiation complex see Fig. 2.17A. Finally polypeptide assembly can begin see Fig. 2.17B. The tRNA i fmet occupies the P-site on the ribosome. Another tRNA recognizes the next codon and enters the A-site the peptidyl transferase activity of 23S rRNA then catalyzes the peptide bond between the frst and second amino acids. fMet releases its tRNA which moves into the E-site. This allows the second tRNA to move into the P-site and the cycle begins again. A third tRNA complementary to the next codon enters the A-site a peptide bond forms between amino acids 2 and 3 and then the second tRNA moves into the E-site of the ribosome and exits. Adding successive amino acids is called elongation and requires elongation factors. EF-T which is a pair of proteins EF-Tu and EF-Ts uses a phosphate group from GTP to catalyze the addition of a new tRNA into the A-site EF-Tu thus converting GTP to GDP. After the reaction the GDP is exchanged for a fresh GTP for the next cycle EF-Ts. The movement of tRNA from the P-site to the E-site is called translocation and the mRNA simultaneously moves one codon sideways relative to the ribosome. The E-site and A-site cannot be occupied at the same time and the used tRNA must exit before the next tRNA enters. EF-G oversees the translocation step. Amino acids are added to the growing chain and the process continues until the ribosome encounters a stop codon UAA UAG or UAA. None of the tRNAs in a cell recognize the stop codon. Instead proteins known as release factors bind the stop codons see Fig. 2.17C. RF1 and RF2 recognize the different stop codons and stimulate the 23S rRNA to split the bond between the last amino acid and its tRNA. The whole ribosome assembly falls off the mRNA and dissociates. Its components are recycled for translation of another mRNA. The new polypeptide chain folds to form its fnal structure. In prokaryotes multiple ribosomes

slide 61:

mRNA mRNA S-D sequence Start codon mRNA 5′ 5′ 5′ 3′ 3′ fMet fMet 70S INITIATION COMPLEX 30S INITIATION COMPLEX fMet 50S EP A EA AUG 30S fMet tRNA A. INITIATION B. ELONGATION C. TERMINATION Last translated codon tRNA Stop codon 3′ tRNA Final amino acid Protein Protein 5′ RELEASE FACTORS RF1 + RF2 30S 50S 3′ 5′ AUG AUG mRNA FIGURE 2.17 Translation in Prokaryotes A Initiation. Initiation of translation begins with the association of the small ribosome subunit with the Shine–Dalgarno sequence S-D sequence on the mRNA. Next initiation factors IF1 IF2 and IF3 not shown charge or connect the initiator tRNA with fMet. The charged initiator tRNA tRNA i fMet associates with the small ribosome subunit and fnds the start codon. Finally the large ribosomal subunit joins the small subunit and situates the initiator tRNA at the P site. B Elongation. During elongation peptide bonds are formed between the amino acids at the A-site and the P-site. The movement of the ribosome along the mRNA and addition of a new tRNA to the A-site are controlled by elongation factors also not shown. C Termination. Termination requires release factors. The various components dissociate. The completed protein folds into its proper three-dimensional shape.

slide 62:

CHAPTER 2 55 bind to the same mRNA to form a polysome. Because there is no nucleus transcription and translation are often simultaneous. As partially made mRNA comes off the DNA ribosomes bind and start synthesizing protein. DIFFERENCES BETWEEN PROKARYOTIC AND EUKARYOTIC TRANSLTION Translation in eukaryotes differs from prokaryotes in many ways Fig. 2.18. First of all mRNA is made in the nucleus but translation occurs on the ribosomes in the cytoplasm. Therefore there is no coupled transcription and translation in eukaryotes. Eukaryotic ribosomes have 60S and 40S subunits that combine to form an 80S ribosome which is a little larger than bacterial ribosomes. Additionally eukaryotes have more initiation factors than prokaryotes and they assemble the initiation complex in a different order. Overall more proteins are involved in eukaryotic translation to deal with the greater complexity of regulation see Table 2.2. Despite this the binding of the mRNA is simpler in eukaryotes. Eukaryotic mRNA does not have a Shine–Dalgarno sequence. Instead eukaryotic ribosomes recognize the 5′ cap struc- ture and the Kozak sequence which is a loosely conserved sequence found around the frst AUG. Only one gene per mRNA is found unlike bacteria which often have polycistronic messages and whose ribosomes recognize separate Shine–Dalgarno sequences for each cod- ing sequence. The frst amino acid in each new polypeptide is methionine as in bacteria. However unlike in bacteria this methionine is not modifed with a formyl group. Finally many eukaryotic proteins are modifed after translation by addition of chemical groups. Although bacteria do modify some proteins this is much rarer and the variety of additions is much more limited. MITOCHONDRIA AND CHLOROPLTS SYNTHESIZE THEIR OWN PROTEINS The mitochondria and chloroplasts found in eukaryotes have their own genome and make some of their own proteins. The symbiotic theory of organelle origin argues that these organelles were once free-living bacteria or blue-green algae cyanobacteria that formed a symbiotic relationship with a single-celled ancestral eukaryote. The bacteria supplied energy to the early eukaryote. Over time the bacteria gave up many duplicate functions and came to rely on the host for precursor molecules. Eventually the symbiotic mitochondria and chloroplasts lost the majority of their genes yet today they still maintain a small version of their genome. These genomes have many genes associated with protein synthesis Fig. 2.19. Translation in prokaryotes starts in the 5′ UTR of the mRNA message where the ribosome scans for the frst start codon. After an initiator methionine is added to the AUG the ribosome catalyzes the addition of more amino acids. Ribosomes work with elongation factors and release factors to control the movement down the mRNA until the stop codon. Ribosomes have three different sites of action. The A-site accepts the next tRNA with the correct anticodon and amino acid. The P-site holds the previous tRNA with amino acid. The E-site is occupied briefy after the amino acids are linked as the empty tRNA exits the ribosome. Eukaryotic mRNA has information for one protein. The ribosome recognizes the cap structure scans until it fnds the frst AUG and starts translating the message into protein. Many different initiation factors elongation factors and termination factors are important for eukaryotic translation.

slide 63:

PABP 4G 4E 4A 4B 4G 4E 4A 4G 4E 4G 4E 4A ATP ADP 40S 40S 40S AUG AUG 40S 3 5 1 1A 3 5 5′ 3′ 1 PABP AAAAA AAAAA 1A 3 5 1 1A eIF1 eIF1a eIF3 eIF5 eIF2 eIF2 Met Met eIF2 Met tRNA i Met tRNA i Met eIF2 + GDP + P i eIF1 60S 80S ribosome eIF3 eIF5 eIF5 eIF1A tRNA i Met eIF2B GDP GDP GTP GTP GTP 2 40S ribosome AUG AUG 3 Met GTP GTP eIF5B 2 m 7 G m 7 G m 7 G A B GTP P P 4B CAP BINDING COMPLEX 43S PRE-INITIATION COMPLEX 5 1 1A PABP AAAAA Met GTP m 7 G PABP AAAAA 1A 5B 4G 4E 80S ribosome Translation begins m 7 G PABP AAAAA Met FIGURE 2.18 Translation Initiation in Eukaryotes A The cap-binding complex includes polyA-binding protein PABP eIF4A eIF4B eIF4E and eIF4G which is in an unphosphorylated state when unbound to mRNA. ATP transfers a phosphate to the complex to make it competent for binding the mRNA top left of part A. The 43S initiation complex forms bringing the small ribosomal subunit together with the tRNA i met . This complex uses GTP to attach the tRNA to the 40S subunit via eIF2. In addition initiation factors eIF1 eIF1A eIF3 eIF5 and eIF2B guide and make the complex competent to bind to the 5′ UTR of mRNA top right of panel A. Finally the activated cap-binding complex rec- ognizes the cap and polyA tail of the mRNA causing the mRNA to loop around into a circular shape. Then 43S pre-initiation complex can attach and start scanning for the frst AUG. B After the complex stops at the frst AUG the remaining 60S subunit and associated factors combine to form the fnal competent 80S ribosome.

slide 64:

CHAPTER 2 57 Organelle genes are often more closely related to bacterial genes than to eukaryotic nuclear genes. Moreover the ribosomes in animal mitochondria are 28S and 39S in size closer to the 30S and 50S subunits of bacteria. The ribosomal RNA of mitochondria and bacteria are also much more similar in sequence than either is to the rRNA encoded by the eukaryotic nucleus. Mitochondria and chloroplasts have their own genome that includes many genes for transcription and translation. These may have been free-living bacteria that formed a symbiotic relationship with a unicellular eukaryote. Translation Factors: Prokaryotes versus Eukaryotes Prokaryotes Eukaryotes Initiation IF1 eIF1A IF2 eIF5B GTPase IF3 eIF1 eIF2 α β γ GTPase eIF2B αβγ δ ε eIF3 13 subunits eIF4A RNA helicase eIF4B activates eIF4A eIF4E cap binding protein eIF4G eIF4 complex scaffold eIF4H eIF5 eIF6 PABP PolyA-binding protein Elongation EF-Tu eEF1A EF-Ts eEF1B 2–3 subunits SBP2 EF-G eEF2 Termination RF1 eRF1 RF2 RF3 eRF3 Recycling RRF EF-G eIF3 eIF3j eIF1A eIF1 Functionally homologous factors are in the same row. Adapted from Table 1 of Rodnina MV Wintermeyer W 2009. Recent mechanistic insights into eukaryotic ribosomes. Curr Op Cell Biol 21 435–443. T able 2.2

slide 65:

DNA RNA and Protein 58 Summary This chapter briefy explains the process of transcription and translation highlighting the differences between eukaryotes and prokaryotes. Transcription occurs when RNA polymerase makes a complementary copy of the gene using ribose phosphate uracil guanine cytosine and adenine. The complementary copy is called mRNA and this form is used to translate into protein. The ribosome holds the mRNA so that two triplet codons starting at AUG are stable. Then a complementary tRNA that is holding the correct amino acid is held close to the mRNA by the ribosome. A second tRNA-amino acid complex moves next to the frst and the ribosome connects the two amino acids using its peptidyl transferase activity. The ribo- some translocates to the next triplet codon on the mRNA and continues to link the amino acids to form a polypeptide. These basic mechanisms of transcription and translation are very similar in prokaryotes and eukaryotes. The regulation of transcription and translation varies signifcantly between prokaryotes and eukaryotes. First proteins called transcription factors control the expression of the correct gene at the correct time in the correct amount. In prokaryotes the lactose operon demon- strates how activator proteins and repressor proteins work together so that lactose utilization genes are only expressed when lactose is the only sugar source for the bacteria. Prokaryotes have different sigma factors an integral part of RNA polymerase which specify the correct gene expression. In eukaryotes many different transcription factors control gene expres- sion by binding to the mediator complex. In addition eukaryotes use epigenetic changes to control gene expression. These include methylation of DNA post-translational modifcation of histones histone remodeling complexes and noncoding RNAs antisense RNAs and other forms of RNA to control expression of genes. During translation eukaryotes are actually less complex and express the mRNA transcript as a single message. In prokaryotes the mRNA may contain multiple coding regions that are translated into proteins simultaneously as the transcript is made from the DNA. HUMAN MITOCHONDRIAL DNA HUMAN mt DNA 16569 bp 0/16569 14747 P T E L S H R G K D S Y C N A W M Q I L V F 12336 10766 7445 5512 ND4L ND3 ATP6 ATP8 3229 12S CytB ND6 ND5 ND4 CO3 CO2 CO1 ND2 ND1 16S FIGURE 2.19 Human Mitochondrial DNA The mitochondrial DNA of humans contains the genes for ribosomal RNA 16S and 12S some transfer RNAs single-letter amino acid codes mark these on the genome and some proteins of the electron transport chain.

slide 66:

CHAPTER 2 59 1. Which of the following are important features for transcription a. pr omoter b. RNA polymerase c. 5′ and 3′ UTRs d. ORF e. all of the above 2. Adenine in DNA is complementary to a. uracil b. adenine c. guanine d. cytosine e. inosine 3. Which of the following is not necessary during Rho-independent termination of transcription a. RNA polymerase b. Rho pr otein c. hairpin structur e d. repeating As in the DNA sequence e. All of the above are necessary. 4. Which of the following statements is not true about mRNA a. Prokaryotic mRNA may contain multiple structural genes on the same transcript known as polycistronic mRNA. b. Eukaryotes only transcribe one gene at a time on mRNA called monocistronic mRNA. c. Some eukaryotes are capable of having polycistronic mRNA. d. Eukaryotes almost always produce polycistronic mRNA. e. The genes for metabolic pathways in bacteria are typically located close together and transcribed on one mRNA. 5. In what way is eukaryotic transcription more complex than prokaryotic transcription a. Eukaryotes have three different RNA polymerases whereas prokaryotes only have one RNA polymerase. b. Eukaryotic transcription initiation is much more complex than prokaryotic initiation because of the various transcription factors involved. c. Upstream elements are required for effcient transcription in eukaryotic cells but these elements are not usually necessary in prokaryotes. d. Eukaryotic mRNA is made in the nucleus. e. All of the above statements outline ways that eukaryotic transcription is more complex. 6. Why is the lac operon of E. coli important to biotechnology research a. IPTG is a cheaper additive than lactose to growing cultures. b. The lac operon is not used in biotechnology research. c. The inducers and regulators of the lac operon are used to control the expression of genes in model organisms. d. The lac operon controls the amount of lactose that E. coli metabolizes. e. All of the above. End-of-Chapter Questions Continued

slide 67:

DNA RNA and Protein 60 7. What feature about eukaryotic transcription factors is useful to biotechnology research a. They have two domains both of which bind to DNA. b. They have two domains both of which bind to separate proteins. c. They have two domains: one domain binds DNA and the other binds to some part of the transcription apparatus. d. They have only one domain that binds to RNA polymerase. e. They have two domains but neither domain can be engineered and are therefore not useful to biotechnology research. 8. Which of the following DNA structure modifcations are used to regulate transcription a. acetylation/deacetylation of the histone tails b. methylation of specifc bases in the DNA sequence c. use of non-coding regulatory RNA to alter DNA accessibility d. nucleosome r emodeling e. All of the above are important modifcations for transcription regulation 9. Which of the following statements about eukaryotic mRNA processing is not correct a. The mRNA transcript must be exported from the nucleus. b. A 5′ cap and a 3′ polyA tail must be added. c. The introns are removed. d. A 3′ cap and a 5′ polyA tail must be added. e. Exons are spliced together to form the mRNA transcript. 10. Which of the following statements about protein translation is not correct a. The genetic code is read in triplets also called codons. b. The enzyme aminoacyl tRNA synthetase is responsible for adding the amino acid to the tRNA. c. The anticodon of the tRNA must recognize the codon on the mRNA exactly. d. Because of the wobble effect a tRNA for one amino acid often recognizes multiple codons in the mRNA. e. The genetic code is universal. 11. Codon bias can be overcome by which scenario a. Genetically engineering host organisms to express rarer tRNAs b. Nothing can be done to overcome codon bias when expressing proteins. c. Genetically engineering the gene so that the codons are recognized by more abundant tRNAs d. Genetically engineer the gene to remove the codons for rare tRNAs. e. Both A and C are suitable scenarios. 12. Choose the statement about translation that is not correct. a. The ribosome is comprised of multiple subunits containing both ribosomal RNA and proteins. b. The consensus sequence UAAGGAGG is called the Shine–Dalgarno sequence and is recognized by the ribosome. c. Translation requires three initiation factors two elongation factors and two release factors. d. Transcription and translation are coupled in eukaryotes. e. There are three sites E P and A on the ribosome that can be occupied by a tRNA.

slide 68:

CHAPTER 2 61 Further Reading Bustamante C. Cheng W. Mejia Y. X. 201 1. Revisiting the central dogma one molecule at a time. Cell 144 480–497. Clark D. Pazdernik N. 2013. Molecular Biology. Waltham MA: Elsevier Inc. Kato S. Yokoyama A. Fujiki R. 201 1. Nuclear receptor coregulators merge transcriptional coregulation with epigenetic regulation. Trends in Biochemical Sciences 36 272–281. 13. Which of the following statements does not highlight a difference in eukary- otic and prokaryotic translation a. The frst methionine in eukaryotic translation contains a formyl group. b. In eukaryotes mRNA is made in the nucleus but translated in the cytoplasm. c. Prokaryotes often couple transcription and translation forming a polysome. d. Eukaryotic mRNA does not have a Shine–Dalgarno sequence but prokaryotic mRNA does. e. Many eukaryotic proteins are chemically modifed after translation which is a much rarer phenomenon in prokaryotes. 14. Why do mitochondria and chloroplasts contain their own genes a. They are free-living prokaryotes able to survive outside of the host cell. b. They are thought to have once been free-living organisms similar to bacteria that formed a symbiotic relationship with a unicellular eukaryote. c. They do not contain their own genetic material. d. They contain genetic material but do not make their own proteins. e. None of the above is correct. 15. Methylation of DNA _______________. a. silences gene expression in eukaryotes b. enhances the binding of RNA polymerase to the promoters in the methylated region c. results in the removal of histones from the methylated regions d. causes histone tail fbers to become acetylated e. remodels nucleosomes to allow entry of transcription machinery

slide 69:

CHAPTER 63 Biotechnology Copyright © 2016 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/B978-0-12-385015-7.00003-X Recombinant DNA T echnology 3 DNA Isolation and Purifcation Electrophoresis Separates DNA Fragments by Size Restriction Enzymes Cut DNA Ligase Joins DNA Methods of Detection for Nucleic Acids Radioactive Labeling of Nucleic Acids and Autoradiography Fluorescence Detection of Nucleic Acids Chemical T agging with Biotin or Digoxigenin Complementary Strands Melt Apart and Reanneal Hybridization of DNA or RNA in Southern and Northern Blots Fluorescence In Situ Hybridization FISH General Properties of Cloning Vectors Useful T raits for Cloning Vectors Specifc T ypes of Cloning Vectors Getting Cloned Genes into Bacteria by T ransformation Constructing a Library of Genes Screening the Library of Genes by Hybridization Eukaryotic Expression Libraries Features of Expression Vectors Recombineering Increases the Speed of Gene Cloning Gateway® Cloning Vectors

slide 70:

Recombinant DNA Technology 64 DNA ISOLTION AND PURIFICTION Basic to all biotechnology research is the ability to manipulate DNA. First and foremost for recombinant DNA work researchers need a method to isolate DNA from different organisms. Isolating DNA from bacteria is the easiest procedure because bacterial cells have little structure beyond the cell wall and cell membrane. Bacteria such as E. coli are the preferred organisms for manipulating any type of gene because of the ease at which DNA can be isolated. E. coli maintain both genomic and plasmid DNA within the cell. Genomic DNA is much larger than plasmid DNA allowing the two different forms to be separated. To release the DNA from a cell the cell membrane must be destroyed. For bacteria an enzyme called lysozyme digests the peptidoglycan which is the main component of the cell wall. Next a detergent such as sodium dodecyl sulfate SDS bursts the cell membranes by disrupting the lipid bilayer. For other organisms disrupting the cell depends on their architecture. Tissue samples from animals and plants have to be ground up to release the intracellular components. Plant cells are mechanically sheared in a blender to break up the tough cell walls and then the wall tissue is digested with enzymes that break the long polymers of lignin and cellulose into monomers. DNA from the tail tip of a mouse is isolated after proteinase K degrades the tissue and detergent dissolves the cell membranes. Cells cultured in dishes are probably the easiest since they do not have cell walls or other structures outside their cell membrane. Detergent alone disrupts the cell membrane to release the intracellular components. Every organism or tissue needs slight varia- tions in the procedure for releasing intracellular components including DNA. Once released the intracellular components are separated from the insoluble remains such as the cellular membranes bones cartilage and/or cell wall by either centrifugation or chemical extraction. Centrifugation separates components according to size because heavier or larger molecules sediment at a faster rate than smaller molecules. In addition materi- als that are insoluble in the liquid phase form aggregates that sediment to the bottom of a centrifuge tube faster. For example after the cell wall has been digested its fragments are smaller than the large DNA molecules. Centrifugation causes the DNA to form a pellet but the soluble cell wall fragments stay in solution. Another method of separating cellular com- ponents chemical extraction uses the properties of phenol to remove unwanted proteins from the DNA. Phenol is an acid that dissolves 60 to 70 of all living matter especially proteins. Phenol is not very soluble in water and when it is mixed with an aqueous sample of DNA and protein the two phases separate much like oil and water. The protein dissolves in the phenol layer and the nucleic acids in the aqueous layer. The two phases are separated by centrifugation and the aqueous DNA layer is removed from the phenol. Once the proteins are removed the sample still contains RNA along with the DNA. Because RNA is also a nucleic acid it is not soluble in phenol. Luckily the enzyme ribonuclease RNase digests RNA into ribonucleotides. Ribonuclease treatment leaves a sample of DNA in a solution containing short pieces of RNA and ribonucleotides. When an equal volume of alcohol is added the extremely large DNA falls out of the aqueous phase and is isolated by centrifugation. The smaller ribonucleotides stay soluble. The DNA is then ready for use in various experiments. ELECROPHOREIS SEPARATE DNA FRAGMENTS BY SIZE Gel electrophoresis is used to separate DNA fragments by size Fig. 3.1. The gel consists of agarose a polysaccharide extracted from seaweed that behaves like gelatin. Agarose is a pow- der that dissolves in water only when heated. After the solution cools the agarose hardens. DNA can be isolated by frst removing the cell wall and cell membrane components. Next the proteins are removed by phenol and fnally the RNA is removed by ribonuclease.

slide 71:

CHAPTER 3 65 FIGURE 3.1 Electrophoresis of DNA A Photo of electrophoresis supplies. Electrophoresis chamber holds an agarose gel in the center portion and the rest of the tank is flled with buffer solution. The red and black leads are then attached to an electri- cal source. FisherBiotech Horizontal Electrophoresis Systems Midigel System Standard 13 × 16-cm gel size 800 mL buffer volume Model No. FB-SB-1316. B Agarose gel separation of DNA. The size of the fragments can be calculated by comparing them to the standard DNA marker in lane 1. The brighter bands in the marker are 1000 base pairs and 500 base pairs with the 1000 base-pair marker closer to the wells marked with numbers 1–8. Used with permission from Gha- daksaz et al. 2014. The prevalence of some Pseu- domonas virulence genes related to bioflm formation and alginate production among clinical isolates. J Appl Biomed 131 61–68. DOI: 10.1016/j. jab.2014.05.002. AB For visualizing DNA agarose solidifes into a rectangular slab about 1/4 inch thick by casting the molten liquid into a special tray. Inserting a comb at one end of the tray before it hard- ens makes small wells or holes. After the gel solidifes the comb is removed leaving small wells at one end. Gel electrophoresis uses electric current to separate DNA molecules by size. The agarose slab is immersed in a buffer-flled tank that has a positive electrode at one end and a negative electrode at the other. DNA samples are loaded into the wells and when an electrical feld is applied the DNA migrates through the gel. The phosphate backbone of DNA is negatively charged so it moves away from the negative electrode and toward the positive electrode. Polymerized agarose acts as a sieve with small holes between the tangled chains of agarose. The DNA must migrate through these gaps. Agarose separates the DNA by size because larger pieces of DNA are slowed down more by the agarose. To visualize the DNA the agarose gel is removed from the tank and immersed into a solution of ethidium bromide. This dye intercalates between the bases of DNA or RNA although less dye binds to RNA because it is single-stranded. When the gel is exposed to ultraviolet light it fuoresces bright orange. Since ethidium bromide is a mutagen and carcinogen less dangerous DNA dyes such as SYBR Safe® are used in most laboratories now. This DNA dye is also excited by ultraviolet light emitting a bright orange fuorescence. In Figure 3.1 the DNA fragments are visualized by a positively charged dye from the thiazin family. The dye interacts with the negatively charged backbone of the DNA and is a nontoxic alternative that does not require ultraviolet light sources. The size of DNA being examined affects what type of gel is used. DNA molecules of the same size usually form a tight band and the size can be determined by comparing to a set of molecular weight standards run in a different well. Because the standards are of known size the experimental DNA fragment can be compared directly. When DNA samples are sep- arated by size through an agarose gel DNA fragments from about 200 base pairs to 10000 base pairs can be separated. For DNA fragments from 50 to 1000 base pairs polyacrylamide gels are used instead. These gels are able to resolve DNA fragments that vary by only one base pair and are essential to sequencing DNA with the Sanger method see Chapter 4. For very large DNA fragments 10 kilobases to 10 megabases agarose is used but the current is alternated at two different angles. Pulsed feld gel electrophoresis PFGE as this is

slide 72:

Recombinant DNA Technology 66 called allows very large pieces of DNA to migrate further than if the current fows in only one direction. Each change in direction loosens large pieces of DNA that are stuck inside the gel matrix letting them migrate further. Finally gradient gel electrophoresis can be used to resolve fragments that are very close in size. A concentration gradient of acrylamide buffer or electrolyte can reduce compression i.e. crowding of similar sized fragments due to sec- ondary structure and/or slow the smaller fragments at the lower end of the gel. Fragments of DNA are separated by size using gel electrophoresis. A current causes the DNA fragments to move away from the negative electrode and toward the positive. As the DNA travels through agarose the larger fragments get stuck in the gel pores more than the smaller DNA fragments. Pulsed feld gel electrophoresis separates large pieces of DNA by alternating the electric current at right angles. FIGURE 3.2 Type II Restriction Enzymes—Blunt ver­ sus Sticky Ends Hpal is a blunt-end restric- tion enzyme that is it cuts both strands of DNA in exactly the same position. EcoRI is a sticky-end restric- tion enzyme. The enzyme cuts between the G and A on both strands which generates four base-pair overhangs on the ends of the DNA. Since these ends may base pair with comple- mentary sequences they are considered “sticky.” CUT BY Hpa1 BLUNT ENDS 5 - GTTAAC -3 3 - CAATTG -5 AAC -3 TTG -5 5 - GTT 3 - CAA CUT BY EcoR1 STICKY ENDS 5 - GAATTC -3 3 - CTTAAG -5 5 - G 3 - CTTAA AATTC -3 G -5 RERICION ENZYME CUT DNA LIGAE JOINS DNA The ability to isolate separate and visualize DNA fragments would be useless unless some method was available to cut the DNA into fragments of different sizes. Luckily naturally occurring restriction enzymes or restriction endonucleases are the key to making DNA fragments. These bacterial enzymes bind to specifc recognition sites on DNA and cut the backbone of both strands. They evolved to protect bacteria from foreign DNA such as viruses. The enzymes do not cut their own cell’s DNA because they are methylation sensi- tive that is if one of the nucleotide bases in the recognition sequence is methylated then the restriction enzyme cannot bind and therefore cannot cut the methylated DNA. Bacteria produce modifcation enzymes that recognize the same sequence as the corresponding restriction enzyme. These methylate each recognition site in the bacterial genome. Therefore the bacteria can make the restriction enzyme without endangering their own DNA. Restriction enzymes have been exploited to cut DNA at specifc sites since each restriction enzyme has a particular recognition sequence. Differences in cleavage site deter- mine the type of restriction enzyme. Type I restriction enzymes cut the DNA strand 1000 or more base pairs from the recognition sequence. Type II restriction enzymes cut in the middle of the recognition sequence and are the most useful for genetic engineering. Type II restriction enzymes can either cut both strands of the double helix at the same point leaving blunt ends or they can cut at different sites on each strand leaving single-stranded ends sometimes called sticky ends Fig. 3.2. The recognition sequences of Type II restriction enzymes are usually inverted repeats so that the enzyme cuts between the same bases on both strands. Some commonly used restriction enzymes for biotechnology applications are listed in Table 3.1. Since restriction enzymes recognize a specifc nucleic sequence these can also be used to compare the nucleo - tide sequence of different organisms or individuals see Box 3.1. The number of base pairs in the recognition sequence determines the likelihood of cutting. Finding a particular sequence of four nucleotides is much more likely than fnding a six base- pair recognition sequence. So to generate fewer longer fragments restriction enzymes with six or more base-pair recognition sequences are used. Conversely four base-pair enzymes give more shorter fragments from the same original segment of DNA. When two different DNA samples are cut with the same sticky-end restriction enzyme all the fragments will have identical overhangs. This allows DNA fragments from two sources e.g. two different organisms to be linked together Fig. 3.3. Fragments are linked or ligated using DNA ligase the same enzyme that ligates the Okazaki fragments during replication see Chapter 4.

slide 73:

CHAPTER 3 67 The most common ligase used is actually from T4 bacteriophage. Ligase catalyzes linkage between the 3′-OH of one strand and the 5′-PO 4 of the other DNA strand. Ligase is much more effcient with overhanging sticky ends but can also link blunt ends much more slowly. Table of Common Restriction Enzymes Enzyme Source Organism Recognition Sequence HpaII Haemophilus parainfuenzae C/CGG GGC/C MboI Moraxella bovis /GATC GATC/ NdeII Neisseria denitrifcans /GATC GATC/ EcoRI Escherichia coli RY13 G/AATTC CTTAA/G EcoRII Escherichia coli RY13 /CCWGG GGWCC/ EcoRV Escherichia coli J62/pGL74 GAT/ATC CTA/TAG BamHI Bacillus amyloliquefaciens G/GATCC CCTAG/G SauI Staphylococcus aureus CC/TNAGG GGANT/CC BglI Bacillus globigii GCCNNNN/NGGC CGGN/NNNNCCG NotI Nocardia otitidis-caviarum GC/GGCCGC CGCCGG/CG DraII Deinococcus radiophilus RG/GNCCY YCCNG/GR / position where enzyme cuts. N any base R any purine Y any pyrimidine W A or T. T able 3.1 Restriction enzymes are naturally occurring enzymes that recognize a particular DNA sequence and cut the phosphate backbone. When two pieces of DNA are cut by the same restriction enzyme the two ends have compatible over- hangs that can be reconnected by ligase. Restriction enzymes are useful for many different applications. Because the DNA sequence is different in each organism the pat- tern of restriction sites will also be different. The source of isolated DNA can be identifed by this pattern. If genomic DNA is isolated from one organism and cut with one particular restriction enzyme a specifc set of fragments can be separated and identifed by elec - trophoresis. If DNA from a different organism is cut by the same restriction enzyme a different set of fragments will be generated. This technique can be applied to DNA from two individuals from the same species. Although the DNA sequence differences will be small restriction enzymes can be used to identify these differences. If the sequence difference falls in a restriction enzyme recognition site it gives a restriction fragment length polymorphism RFLP Fig. A. When the restriction enzyme patterns are compared the number and size of one or two fragments will be affected for each base dif- ference that affects a cut site. Box 3.1 Restriction Fragment Length Polymorphisms Identify Individuals Continued

slide 74:

Recombinant DNA Technology 68 CUT WITH ENZYME CUT WITH ENZYME RUN GELS I II ai b a b c d e f a d a f b e c b c + d e f ii c iii divv ef I Cut site Two related DNA molecules Cut site Cut site Cut site Cut site ai biic iii divv ef Cut site mutated cd a f b e c is missing d is missing II FIGURE A RFLP Analysis DNA from related organisms shows small differences in sequence that cause changes in restriction sites. In the example shown cutting a segment of DNA from the frst organism yields six fragments of different sizes labeled a–f on the gel. If the equivalent region of DNA from a related organism were digested with the same enzyme a similar pattern would be expected. Here a single-nucleotide difference is present which eliminates one of the restriction sites. Consequently digesting this DNA produces only fve fragments. Fragments c and d are no longer seen but form a new band labeled cd. Box 3.1 Restriction Fragment Length Polymorphisms Identify Individuals—cont’d

slide 75:

CHAPTER 3 69 Bam HI DIGEST Bgl II GGATCC CCTAGG AGATCT TCTAGA G CCTAG G CCTAG GATCT A GATC T A 5 3 5 3 5 5 3 5 3 3 5 3 3 5 3 3 5 5 3 5 OH OH OH OH ATP ADP + Pi P P G CCTAG GATC T A 5 3 3 5 P P ATP ADP + Pi LIGASE ONE PIECE G CCTAG GATC T A 5 3 3 5 OH OH P P ANNEAL G CCTAG GATC T A 5 3 3 5 OH P FIGURE 3.3 Compatible Overhangs Are Linked Using DNA Ligase BamHI and Bgl Il generate the same overhanging or sticky ends: a 3′-CTAG-5′ overhang plus a 5′-GATC-3′ overhang. These are complementary and base pair by hydrogen bonding. The breaks in the DNA backbones are sealed by T4 DNA ligase which hydrolyzes ATP to energize the reaction. MEHODS OF DEECION FOR NUCLEIC ACIDS Recombinant DNA methodologies require the ability to detect DNA. One of the easiest ways to detect the amount of DNA or RNA in solution is to measure the absorbance of ultraviolet light at 260 nm Fig. 3.4. DNA absorbs ultraviolet light because of the ring structures in the bases. Single-stranded RNA and free nucleotides also absorb ultraviolet light. In fact they

slide 76:

Recombinant DNA Technology 70 Radioactive Labeling of Nucleic Acids and Autoradiography Ultraviolet light absorption is a general method for detecting DNA but does not dis- tinguish between different DNA molecules. DNA can also be detected with radioactive isotopes Fig. 3.5. During replication radioactive precursors such as 32 P in the form of a phosphate group and 35 S in the form of phosphorothioate can be incorporated. Because native DNA does not contain sulfur atoms one of the oxygen atoms of a phos- phate group is replaced with sulfur to make phosphorothioate. Most radioactive molecules used in laboratories are short lived. 32 P has a half-life of 14 days and 35 S has a half-life of 68 days so the isotopes degrade fairly fast. Although radioactive DNA is invisible photo- graphic flm will turn black when exposed to the radioactive DNA. Radioactively labeled DNA is considered “hot” whereas unlabeled DNA is considered “cold.” Autoradiography identifes the location of radioactively labeled DNA in the gel Fig. 3.6. If the gel is thin like most polyacrylamide gels it is dried with heat and vacuum. If the gel is thick like agarose gels the DNA is transferred to a nylon membrane using capil- lary action see Fig. 3.9 later. The dried gel or nylon membrane is placed next to photo- graphic flm. As the radioactive phosphate decays the radiation turns the photographic flm black. Only the areas next to radioactive DNA will have black spots or bands. The use of flm detects where the hot DNA is on a gel and the use of ethidium bromide shows where all of the DNA hot or cold is. These two methods allow distinguishing one DNA fragment from another. Fluorescence Detection of Nucleic Acids Autoradiography has its merits but working with and disposing of radioactive waste are costly both monetarily and environmentally. Using fuorescently tagged nucleotides was Radioactive isotopes are incorporated into the DNA backbone during replication. Autoradiography identi- fes the radioactively labeled “hot” DNA. FIGURE 3.4 Determining the Con­ centration of DNA All nucleic acids absorb UV light via the aromatic rings of the bases. Stacked nucleo- tides on the left absorb less UV than scattered bases on the right because of the ordered structure. UV source Nucleic acid polymer Free bases spread out and absorb more absorb more light because their struc- tures are looser. Since the absorbance of UV light depends on the amount of DNA and the molecular structure the relationship between UV absorbance and concentration is Double-stranded DNA concentration μ g/ml OD 260 × 50 μ g DNA/ml/ 1 OD 260 unit RNA concentration μ g/ml OD 260 × 40 μ g RNA/ml/1 OD 260 unit In addition to the amount of DNA a second absorbance reading at 280 nm is commonly used to determine the purity of the sample. The ratio of the 260 nm absorbance value divided by the 280 nm absorbance value will indicate whether the sample is pure. If the sample is pure DNA then the 260/280 ratio is 1.8 whereas a 260/280 ratio for pure RNA is 2.0. When the ratios deviate from the expected value there could be residual phenol from the purifcation or a very low concentration of DNA or RNA. The concentration of DNA or RNA in a liquid can be determined by measuring the absorbance of UV light at 260 nm.

slide 77:

CHAPTER 3 71 32 P-LABELED DNA OH O 5 3 3 O O − O 35 S 35 S O 32 P Base OH O O − O O 32 P Base 35 S-LABELED DNA OH O P P 5 O − O O Base OH O − O O Base FIGURE 3.5 Radioactively Labeled DNA DNA can be synthesized with radioactive precursor nucleotides. These nucleotides have 32 P rather than nonradioactive 31 P phosphorus or 35 S replacing oxygen in the phosphate backbone. Gel GEL AUTORADIOGRAPH Film Gel with radioactive but invisible bands of DNA Lay film on gel and keep in dark then develop film Film shows position of bands Film Gel FIGURE 3.6 Autoradiography A gel containing radioactive DNA or RNA is dried and a piece of photographic flm is laid over the top. The two are loaded into a cassette case that prevents light from entering. After some time hours to days the flm is developed and dark lines appear where the radioactive DNA was present. developed as a better method of DNA detection Fig. 3.7. Fluorescent tags absorb light of one wavelength which excites the atoms increasing the energy state of the tag. This excited state releases a photon of light at a different longer wavelength and returns to the ground state. The emitted photon is detected with a photodetector. There are many different fuorescent tags and each emits a different wavelength of light. Some photo - detector systems are sensitive enough to distinguish between these different tags there- fore if different bases have different fuorescent labels the photodetector can determine which base is present. This is the basis for most modern DNA sequencing machines see Chapter 4.

slide 78:

Recombinant DNA Technology 72 Fluorescently labeled nucleotides can be used to incorporate a fuorescent tag on DNA during replication or PCR amplifcation. Fluorescent tag Fluorescence Exciting light beam Excited state Ground state Excitation shorter wavelength photon Fluorescence longer wavelength photon S 1 S 0 S 1 Excited state Relaxation FLUORESCENT TAGGING OF DNA AB ENERGY LEVELS IN FLUORESCENCE DNA ENERGY 1 2 3 FIGURE 3.7 Fluorescent Labeling of DNA A Fluorescent tagging of DNA. During synthesis a nucleotide linked to a fuorescent tag is incorporated at the 3′ end of the DNA. A beam of light excites the fuorescent tag which in turn releases light of a longer wavelength. B Energy levels in fuorescence. The fuorescent molecule attached to the DNA has three different energy levels: S 0 S 1 ′ and S 1 . The S 0 or ground state is the state before exposure to light. When the fuorescent molecule is exposed to a light photon the fuorescent tag absorbs the energy and enters the frst excited state S 1 ′. Between S 1 ′ and S 1 the fuorescent tag relaxes slightly but doesn’t emit any light. Eventually the high-energy state releases its excess energy by emitting a longer wavelength photon. This release of fuorescence returns the molecule to the ground state. Chemical T agging with Biotin or Digoxigenin Biotin is a vitamin and digoxigenin is a steroid from the foxglove plant. Using these two molecules allows scientists to label DNA without radioactivity or costly photodetectors. To incorporate the label into DNA a biotin or digoxigenin molecule is chemically linked to uracil therefore DNA is synthesized with the labeled uracil replacing thymine using in vitro DNA replication as described in Chapter 4. Single-stranded DNA template DNA polymerase a short DNA primer and nucleotides dATP dGTP dCTP plus dUTP linked to either biotin or digoxigenin are mixed in a tube and incubated. DNA polymerase synthesizes the comple- mentary strand to the template incorporating biotin- or digoxigenin-linked uracil opposite all the adenines. The labeled DNA is visualized in a two-step process Fig. 3.8. To visualize biotin a mol- ecule of avidin or streptavidin which both have a high affnity for biotin is added to the DNA sample. Avidin originally identifed in egg whites has a higher tendency for aggrega - tion and is glycosylated therefore streptavidin is used more often. Streptavidin is isolated from Streptomyces avidinii and is not glycosylated. In contrast a specifc antibody is used to visualize digoxigenin. Both avidin and the digoxigenin antibody are conjugated to either a fuorophore or reporter enzyme which allow for visible detection. One example of a reporter enzyme is alkaline phosphatase an enzyme that removes phosphates from a variety of sub- strates. Several different chromogenic molecules act as substrates for alkaline phosphatase but the most widely used one is X-Phos. Once alkaline phosphatase removes the phosphate group from X-Phos the intermediate molecule reacts with oxygen and forms a blue precipi- tate. This blue color reveals the location of the labeled DNA. Another substrate of alkaline phosphatase is Lumi-Phos which is chemiluminescent and emits visible light when the phosphate is removed. Much like autoradiography when photographic flm is placed over

slide 79:

CHAPTER 3 73 labeled DNA treated with Lumi-Phos emitted light causes the flm to turn dark. Another possible reporter enzyme is horseradish peroxidase or HRP an enzyme that reacts with luminol to release light. As before the light can be detected with photographic flm. COMPLEMENTARY SRANDS MELT APART AND REANNEAL The complementary antiparallel strands of DNA form an elegant molecule that is able to unzip or melt and come back together or reanneal Fig. 3.9. The hydrogen bonds that hold the two halves together are relatively weak. Heating a sample of DNA will dissolve the hydrogen bonds resulting in two complementary single strands. If the same sample of DNA is slowly cooled the two strands will reanneal so that G matches with C and A matches with T as before. The proportion of G/C base pairs affects how much heat is required to melt a double helix of DNA. G/C base pairs have three hydrogen bonds to melt whereas A/T base pairs have only two. Consequently DNA with a higher percentage of GC will require more energy to melt than DNA with fewer GC base pairs. The GC ratio is defned as follows: G + C A + G + C + T × 100 The ability to zip and unzip DNA is crucial to cellular function and has also been exploited in biotechnology. Replication see Chapter 4 and transcription see Chapter 2 rely on strand separation to generate either new DNA or RNA strands respectively. In molecular biology research many techniques from PCR to DNA sequencing exploit the complemen- tary nature of DNA strands. DNA Biotin Streptavidin Fluorophore Light DNA Digoxigenin Antibody to digoxigenin Light Biotin Streptavidin Reporter enzyme Substrate Light or color Reporter enzyme Substrate Light or color or precipitate Digoxigenin Antibody to digoxigenin Biotin and digoxigenin-labeled DNA are detected using either streptavidin or antibody to digoxigenin. Either label can be conjugated to alkaline phosphatase which reacts with X-Phos to leave a blue precipi- tate or Lumi-Phos to emit visible light. Another reporter enzyme commonly used is horseradish peroxi- dase. These reporter enzymes are used to identify quantify or locate the labeled DNA. FIGURE 3.8 Label­ ing and Detecting DNA with Biotin or Digoxigenin DNA can be synthesized in vitro with a uracil nucleotide linked to biotin or digoxigenin. Streptavidin binds tightly to avidin left panels and antibody to digoxigenin binds to digoxigenin right panels. Streptavidin and antibody to digoxigenin can be conjugated to a fuorophore that emits light of specifc wavelengths top panels or to reporter enzymes such as horseradish peroxidase or alkaline phosphatase lower panels. Reporter enzymes act on different substrates some of which release light and others that form a colored precipitate. FIGURE 3.9 Heat Melts DNA Cooling Reanneals DNA Hydrogen bonds read- ily dissolve when heated leaving the two strands intact but separate. When the temperature returns to normal the hydrogen bonds form again. HEAT HEAT COOL SLOWLY C A T A T G C G G T A T A C G C C A T A T G C G G T A T A C G C C A T A T G C G G T A T A C G C C A T A T G C G G T A A T C G C

slide 80:

Recombinant DNA Technology 74 FIGURE 3.10 Capil­ lary Action Transfers DNA from Gel to Membrane Single-stranded DNA from a gel will transfer to the mem- brane. The flter paper wicks buffer from the tank through the gel and membrane and into the paper towels. As the buffer liquid moves the single-stranded DNA also travels from the gel and sticks to the membrane. The weight on top of the setup keeps the membrane and gel in contact and helps wick the liquid from the tank. HYBRIDIZATION OF DNA OR RNA IN SOUTHERN AND NORTHERN BLOTS If two different double helixes of DNA are melted the single strands can be mixed together before cooling and reannealing. If the two original DNA molecules have similar sequences a single strand from one may pair with the opposite strand from the other DNA molecule. This is known as hybridization and can be used to determine whether sequences in two separate samples of DNA or RNA are related. In hybridization experiments the term probe molecule refers to a known DNA sequence or gene that is used to screen the experimental sample or target DNA for similar sequences. Southern blots are used to determine how closely DNA from one source is related to a DNA sequence from another source. The technique involves forming hybrid DNA molecules by mixing DNA from the two sources. A Southern blot has two components: the probe sequence e.g. a known gene of interest from one organism and the target DNA often from a different organism. A typical Southern blot begins by isolating the target DNA from one organism digesting it with a restriction enzyme that gives fragments from about 500 to 10000 base pairs in length and separating these fragments by electrophoresis. The separated fragments will be double-stranded but if the gel is incubated in a strong acid the DNA separates into single strands. Using capillary action the single strands can be transferred to a membrane as shown in Fig. 3.10. The DNA remains single-stranded once attached to the membrane. Next the probe is prepared. First the known sequence or gene must be isolated and labeled in some way see earlier discussion. Identifying genes has become easier now that many genomes have been entirely sequenced. For example a scientist can easily obtain a copy of a human gene for use as a probe to fnd similar genes in other organisms. Alternatively using sequence data a unique oligonucleotide probe can be designed that recognizes only the gene of interest see Chapter 4. If an oligonucleotide has a common sequence it will bind to many other sequences. Therefore oligo- nucleotide probes must be long enough to have sequences that bind to only one or very few specifc sites in the target genome. To prepare DNA probes for a Southern blot they are labeled using radioactivity bio- tin or digoxigenin see earlier discussion. Finally the labeled DNA is denatured at high temper- ature to make it single-stranded. Synthetic oligonucleotides do not require treatment as they are already single-stranded. To perform a Southern blot the single-stranded probe is incubated with the membrane carrying the single-stranded target DNA Fig. 3.1 1. These are incubated at a temperature that allows hybrid DNA strands to form with only a low amount of mismatch. The temperature and hence the level of mismatching tolerated can be varied depending on how closely identical the probe and target sequence are expected to be. At a high temperature the probe will only Weight to press down on gel Stack of paper towels Membrane Gel Filter paper The complementary strands of DNA are easily separated by heat and spontaneously reanneal as the DNA mixture cools.

slide 81:

CHAPTER 3 75 stay attached at locations with almost identical sequences whereas at a low temperature the probe will bind locations with multiple mismatched nucleotides. If the probe is radioactive then the membrane is exposed to photographic flm. If the probe is labeled with biotin or digoxigenin the membrane may be treated with chemiluminescent substrate to detect the labeled probe and target DNA hybrid and then exposed to photographic flm. Dark bands on the flm reveal the positions of fragments with similar sequence to the probe. Alterna - tively biotin or digoxigenin labels may be visualized by treatment with a chromogenic substrate. In this case blue bands will form directly on the membrane at the position of the related sequences. Northern blots are also based on nucleic acid hybridization. The difference is that RNA is the target in a Northern blot. The probe for a Northern blot is either a fragment of a gene or a unique oligonucleotide just as in a Southern blot. The target RNA is usually messenger RNA. In eukaryotes screening mRNA is more effcient because genomic DNA has many introns which may interfere with probes binding to the correct sequence. Besides mRNA is already single-stranded so the agarose gel does not have to be treated with strong acid. Much like a Southern blot Northern blots begin by separating mRNA by size using electrophoresis. The mRNA is transferred to a nylon membrane and incubated with a sin- gle-stranded labeled probe. As before the probe can be labeled with biotin digoxigenin or radioactivity. The membrane is processed and exposed to flm or chromogenic substrate. A variation of these hybridization techniques is the dot blot Fig. 3.12. Here the target sample is not separated by size. The DNA or mRNA target is simply attached to the nylon membrane as a small dot. As in Southern blots the DNA sample must be made single-stranded before it is attached to the membrane. As before the dot-blot membrane is allowed to hybridize with a labeled probe. The membranes are processed and exposed to flm. If the dot of DNA or mRNA contains a sequence similar to the probe the flm will turn black in that area. Dot blots are a quick and easy way to determine if the target sample has a related sequence before more detailed analysis by Southern or Northern blotting. Another advantage of dot blots is that multiple samples can be processed in a smaller amount of space. Loading slots LOAD AND RUN GEL TRANSFER Invisible bands NYLON SHEET SOUTHERN BLOT DIP IN PROBE SOLUTION PHOTOGRAPHIC FILM PLACE FILM OVER BLOT FIGURE 3.11 Hybrid DNA Molecules Can Detect Related Sequences in Southern Blots Southern blotting requires the target DNA to be cut into smaller fragments and run on an agarose gel. The fragments are denatured chemically to give single strands and then transferred to a nylon membrane. A radioactive probe also single-stranded is incubated with the mem- brane at a temperature that allows hybrids with some mismatches to form. When photographic flm is placed over the top of the membrane the location of the radioactive hybrid molecules is revealed. Southern blots form hybrid DNA molecules to determine if a sample of DNA has a homologous sequence to another DNA probe. Northern blots determine if a sample of mRNA has a homologous sequence to a DNA probe. In large genomes using mRNA is more effcient because all the introns are removed. FIGURE 3.12 Dot Blot Dot blots begin by spotting DNA or RNA samples onto a nylon membrane. Often different concentrations of the sample are dotted side by side. The membrane is incubated with a radioactive probe and then exposed to photographic flm. Samples that contain DNA or RNA complementary to the probe will leave a black spot on the flm. Different samples in each row Dot different concentrations of sample 1 2 3 4 5 DOT ssDNA OR mRNA DOT BLOT EXPOSE TO FILM DIP IN PROBE SOLUTION 1 2 3 4 5 PHOTOGRAPHIC FILM

slide 82:

Recombinant DNA Technology 76 FLUORECENCE IN SITU HYBRIDIZATION FISH The previously discussed hybridization techniques rely on purifed DNA or RNA run in an agarose gel. In fuorescence in situ hybridization FISH the probe is hybridized directly to DNA or RNA within the cell Fig. 3.13. As described earlier the probe is a small seg- ment of DNA that has been labeled with fuorescent tags in order to be visualized. The target DNA or RNA is located within the cell and requires some special processing. The target cells may be extremely thin sections of tissue from a particular organism. For example when a person has a biopsy a small piece of tissue is removed for analysis. This tissue is preserved and then cut into extremely thin sections to be analyzed under a microscope. These can be used to determine the presence of a gene with FISH. Another source of target cells for FISH is cultured mammalian or insect cells see Chapter 1. Additionally blood can be isolated and processed to isolate the white blood cells. Note: Red blood cells do not contain a nucleus and therefore do not contain DNA. Chromosomes from white blood cells can be isolated Fluorescent DNA Probe DENATURE AND HYBRIDIZE CELL NUCLEUS DNA probe Fluorescent tag DENATURE MIX AND ANNEAL VIEW BY FLUORESCENCE MICROSCOPE CELL NUCLEUS chr17: 7515000 7520000 7525000 7530000 Deletions associated with developmental delay Deletions associated with early-onset cancer 11 TP53 9 5 kb 76 31 Exon: A B C Father Mother Child 17p13.1 Del Child 17p13.1 Del FIGURE 3.13 Gene Location on Chromosomes by FISH A FISH can localize a gene to a specifc place on a chromosome. First metaphase chromosomes are isolated and attached to a microscope slide. The metaphase DNA is denatured into single-stranded pieces that remain attached to the slide. The fuorescently labeled DNA probe hybridizes to the corresponding gene. When the slide is illuminated the hybrid molecules fuoresce and reveal the location of the gene of interest. B A cell with intact DNA in its nucleus is treated to denature the DNA into single strands. The fuorescently labeled DNA probe is added and anneals to the corresponding sequence inside the nucleus. The hybrid molecule will fuoresce when the fuorescent tag is excited by the correct wavelength of light and identifes the location of the gene in the nucleus. C An analysis of chromo - some structure using TP53 red and 17ptel green probes. Normal human DNA has two copies of the region of DNA complementary to both the TP53 and 17ptel probes. Note: Metaphase chromosomes have four copies because the DNA has been replicated. The parents have a normal DNA structure for these two probe sequences. In comparison the child has only two red TP53 spots on one chromosome and the other chromosome has no red spots suggesting this region of one of the child’s chromosomes is deleted. The deletion is visible in both metaphase chromosomes top and interphase nuclei bottom. From Shlien et al. 2010. A common molecular mechanism underlies two phenotypically distinct 17p13.1 microdeletions syndromes. Am J Hum Gen 87 631–642.

slide 83:

CHAPTER 3 77 and dropped onto a glass slide. FISH can be done on either interphase or metaphase chro- mosomes. Whether the target DNA is from blood cells cultured in dishes or actual tissue sections the cells must be heated to make the DNA single-stranded. Samples where RNA is the target do not need to be heated. The fuorescently labeled probe hybridizes to complementary sequences in the DNA or RNA and when the cells are illuminated at the appropriate wave- length the probe location on the chromosome can be identifed by fuorescence. GENERAL PROPERTIE OF CLONING VECORS Cloning vectors are specialized plasmids or other genetic elements that will hold any piece of foreign DNA for further study or manipulation. The numbers and types of plasmids available for cloning have grown. In addition other DNA elements are now used including viruses and artifcial chromosomes. Once a fragment of DNA has been cloned and inserted into a suitable vector large amounts of DNA can be manufactured the sequence can be determined and any genes in the fragment can be expressed in other organisms. Study- ing human genes in humans is virtually impossible because of the ethical ramifcations. In contrast studying a human gene expressed in bacteria provides useful information that can often be applied to humans. Modern biotechnology depends on the ability to express foreign genes in model organisms. Before discussing how a gene is cloned the properties of vectors are considered. Useful T raits for Cloning Vectors Although many specialized vectors now exist the following properties are convenient and found in most modern generalized cloning plasmids: n Small size making them easy to manipulate once they are isolated n Easy to transfer from cell to cell usually by transformation n Easy to isolate from the host organism n Easy to detect and select n Multiple copies which helps in obtaining large amounts of DNA n Clustered restriction sites polylinker to allow insertion of cloned DNA n Method to detect presence of inserted DNA e.g. alpha complementation Most bacterial plasmids satisfy the frst three requirements. The next key trait of cloning vec - tors is an easy way to detect their presence in the host organism. Bacterial cloning plasmids often have antibiotic resistance genes that make bacteria resistant to particular antibiotics. When treated with the antibiotic only bacteria with the plasmid-encoded resistance gene will survive. Other bacteria will die. Other traits have been exploited to detect plasmids. Vectors derived from the yeast 2μ plasmid often carry genes for synthesizing essential amino acids such as leucine which allow yeast with mutations in leucine biosynthesis to grow on media lacking leucine. Plasmids vary in their copy number. Some plasmids exist in just one or a few copies in their host cells whereas others exist in multiple copies. Such multicopy plasmids are in general more useful as the amount of plasmid DNA is higher making them easier to isolate and purify. The type of origin of replication controls the copy number since this region on the plasmid determines how often DNA polymerase binds and induces replication. Most cloning vectors have several unique restriction enzyme sites. Usually these sites are grouped in one location called the multiple cloning site MCS or polylinker Fig. 3.14. This allows FISH is a technique in which a labeled probe is incubated with cells that have had their DNA denatured by heat. The probe hybridizes to its homologous sequence on the chromosome.

slide 84:

Recombinant DNA Technology 78 researchers to open the cloning vector at one site without disrupt- ing any of the vector’s replication genes. Fragments of foreign DNA are digested with enzymes matching those in the polylinker. Ligase con- nects the vector and insert. Specifc restriction enzyme sites can be added using PCR primers or synthetic DNA oligomers see Chapter 4. Some vectors have ways to detect whether or not they contain an insert. The simplest way to do this is insertional inactivation of an antibiotic gene Fig. 3.15A. Here the vector has two different antibiotic resistance genes. The foreign DNA is inserted into one of the antibiotic- resistant genes. Thus the host bacteria will be resistant to one antibiotic and sensitive to the other. Alternatively alpha complementa- tion may be used see Fig. 3.15B. The vector has a short portion of the β-galactosidase gene the alpha fragment and the bacterial chromosome has the rest of the gene. If both gene fragments are transcribed and then translated into proteins the partial proteins com- bine to form functional β-galactosidase. If DNA is inserted into the plasmid-borne gene segment the encoded subunit is not made and β-galactosidase is not produced. When β-galactosidase is expressed the bacteria can degrade X-gal which turns the bacterial colony blue. If a piece of DNA is inserted into the alpha fragment gene the bacteria cannot split X-gal and they stay white. Once an appropriate vector has been chosen for the gene of interest or other insert the two pieces are ligated into one construct. The term construct refers to any recombinant DNA molecule that has been assembled by genetic engineering. If both the vector and insert are cut with the same restriction enzyme the two pieces have complementary ends and require only ligase to link them. Tricks are used to make two pieces of DNA with unrelated ends compat- ible. Sometimes short oligonucleotides are synthesized and added onto the ends of the insert to make them compatible with the vector. These short oligonucleotides are called linkers and they add one or a few new restriction enzyme sites to the ends of a segment of DNA. In addition to adding linkers ends of DNA fragments can be made compatible with a vector multicloning site by PCR amplifcation. The primers can have extensions of DNA sequence containing the recognition site for a restriction enzyme. After PCR amplifcation the DNA can be cut with the restriction enzyme and ligated into the vector see Chapter 4 for discussion. SPECIFIC TYPE OF CLONING VECORS Because E. coli is the main host organism used for manipulating DNA most vectors are based on plasmids or viruses that can survive in E. coli or similar bacteria. Most vectors have FIGURE 3.14 Typical Polylinker or Multiple Cloning Site The restriction enzyme sites within the polylinker region are unique. This ensures that the plasmid is cut only once by each restriction enzyme. Recognition site for restriction enzyme 1 Recognition site for restriction enzyme 3 Recognition site for restriction enzyme 5 Recognition site for restriction enzyme 7 Recognition site for restriction enzyme 2 Recognition site for restriction enzyme 4 Plasmid Recognition site for restriction enzyme 6 Cloning vectors have multiple cloning sites with many unique restriction enzyme sites they have genes for antibiotic resistance that make the bacterial cell able to grow with the antibiotic present and they have a way to detect when a foreign piece of DNA is present such as alpha complementation.

slide 85:

CHAPTER 3 79 bacterial origins of replication and antibiotic resistance genes. The polylinker or multiple cloning site is usually placed between prokaryotic promoter and terminator sequences Fig. 3.16A. Some vectors may also supply the ribosome binding site so any inserted cod- ing sequence will be expressed as a protein. Many other features are present in specialized cloning vectors. The following discussion will introduce some of the different categories of vectors with their essential features. Many yeast vectors are based on the 2μ circle of yeast. The native version of the 2μ circle has been modifed in a variety of ways for use as a cloning vector. A shuttle vector contains origins of replication for two organisms plus any other sequences necessary to survive in either organism see Fig. 3.16B. Shuttle vectors that are based on the 2μ plasmid have the components needed for survival in yeast and bacteria plus antibiotic resistance and a polylinker. The Cen sequence is a eukaryotic centromere Cen sequence that keeps the plasmid in the correct location during mitosis and meiosis in yeast. Because yeast cells are eukaryotic and also have such a thick cell LIGATE INSERT INTO ANTIBIOTIC RESISTANCE GENE 2 CELLS CARRYING VECTOR ARE RESISTANT TO BOTH ANTIBIOTICS INSERTIONAL INACTIVATION ALPHA COMPLEMENTATION CELLS CARRYING VECTOR WITH INSERT ARE RESISTANT TO FIRST ANTIBIOTIC ONLY Antibiotic resistance gene 1 Antibiotic resistance gene 2 MCS Cut site Insert Vector Vector plus insert Plasmid α fragment combines with rest of LacZ protein to form active β-galactoside β-gal metabolizes X-gal to form blue dye Chromosome lacZ gene TRANSCRIPTION AND TRANSLATION lacZα Plasmid Νο α fragment so β-gal is inactive β-gal cannot metabolize X-gal and bacteria stay white Chromosome lacZ gene TRANSCRIPTION AND TRANSLATION lacZα is split A B FIGURE 3.15 Detecting Inserts in Plasmids A Insertional inactivation. Cells with an insert become sensitive to the second antibiotic. Cells without an insert remain resistant to the antibiotic. B Alpha comple- mentation. Alpha comple- mentation refers to the ability of β-galactosidase to be expressed as two protein fragments which assemble to form a functional protein. In cells without an insert in the plasmid β-galactosidase is active and splits X-gal to form a blue dye. In cells with an insert the alpha fragment is not made and β-galactosidase is inactive. These cells remain white on media with X-gal.

slide 86:

Recombinant DNA Technology 80 att SHUTTLE VECTOR Antibiotic resistance gene Origin and replication genes for bacteria Yeast origin of replication Leucine biosynthesis gene for selection in yeast Yeast centromere sequence Multiple cloning site CLONING PLASMID Bacterial promoter Bacterial terminator MCS Origin of replication bacterial LacZα gene Gene for antibiotic resistance TYPICAL BACTERIAL VECTOR YEAST SHUTTLE VECTOR LAMBDA REPLACEMENT VECTOR cos 0 kb Head and tail proteins Regulation and DNA synthesis Nonessential region may be replaced by cloned DNA Integration and recombination kb 10 20 30 40 48 cos Head Tail IN VITRO PACKAGING INFECTION cos end Linear form 50 kb DNA insert ori cos cos AmpR COSMID DIGEST WITH BamHI TO LINEARIZE MCS cen ARS Amp ori Tel Tel BamHI BamHI Yeast selectable marker Telomere Bacterial origin of replication Selectable marker for bacteria Centromere sequence Selectable marker for yeast Autonomously replicating sequence yeast origin Multiple cloning site Circular Plasmid form will replicate in bacteria Linear YAC form will replicate in yeast ARTIFICIAL CHROMOSOME cen Amp ori Tel Tel ARS MCS AB C E. coli cell Circular form Bacterial chromosome cos DE FIGURE 3.16 Various Cloning Vectors A Typical bacterial cloning vector. This vector has bacterial sequences to initiate replication and transcription. In addition it has a multiple cloning site embedded within the lacZ α gene so that the insert can be identifed by alpha-complementation. The antibiotic resistance gene allows the researcher to identify any E. coli cells that have the plasmid. B Yeast shuttle vector. This vector can survive in either bacteria or yeast because it has both yeast and bacterial origin of replication a yeast centromere and selectable markers for yeast and bacteria. As with most cloning vectors there is a polylinker. C Lambda replacement vectors. Because lambda phage is easy to grow and manipulate its genome has been modifed to accept foreign DNA inserts. The region of the genome shown in green is nonessential for lambda growth and packaging. This region can be replaced with large inserts of foreign DNA up to about 23 kb. D Cosmids. Cosmids are small multicopy plasmids that carry cos sites. They are linearized and cut so that each half has a cos site not shown. Next foreign DNA is inserted to relink the two halves of the cosmid DNA. This construct is packaged into lambda virus heads and used to infect E. coli. E Artifcial chromosomes. Yeast artifcial chromosomes have two forms: a circular form for growing in bacteria and a linear form for grow - ing in yeast. The circular form is maintained like any other plasmid in bacteria but the linear form must have telomere sequences to be maintained in yeast. The linear form can hold up to 2000 kb of cloned DNA and is very useful for genomics research.

slide 87:

CHAPTER 3 81 wall most antibiotics do not kill yeast. Therefore a different strategy is used to detect the presence of plasmids in yeast. A gene for synthesis of an amino acid such as leucine allows strains of yeast that require leucine to grow. Bacteriophage vectors are viral genomes that have been modifed so that large pieces of nonviral DNA can be pack - aged in the virus particle. Lambda bacteriophages have linear genomes with two cohesive ends—cos sequences lambda cohesive ends. These are 12-base overlapping sticky ends. When inside the virus coat the cohesive ends are coated with protein to prevent them from annealing. After lambda attaches to E. coli it inserts just the linear DNA. The proteins that protect the cohesive ends are lost and the genome circu- larizes with the help of DNA ligase. The circular form is the replicative form RF and it replicates by the rolling circle mechanism see Chapter 4. Expression of various lambda genes produces the proteins that assemble into protein coats. Each coat is packaged with one genome and after many of these are assembled the E. coli host explodes releasing the new bacteriophage to infect other cells. The lambda bacteriophage is a widely used cloning vector see Fig. 3.16C. The middle segment of the lambda genome has been deleted and a polylinker has been added. An insert of 37 to 52 kb can be ligated into the polylinker and packaged into viral particles. To work with the bacteriophage DNA without killing the entire E. coli culture the researcher deletes one or more genes necessary for packaging. When the researcher wants to form fully packaged bacteriophages coat proteins from helper virus can be added Fig. 3.17. The helper viruses do not con- tain foreign DNA but supply the missing genes for the coat proteins. Because coat proteins self-assemble in vitro helper lysates are mixed with recombinant lambda DNA and complete virus particles containing DNA are produced. This is known as in vitro packaging. Cosmid vectors can hold pieces of DNA up to 45 kb in length see Fig. 3.16D. These are highly modifed lambda vectors with all the sequences between the cos sites removed and replaced with the insert. The DNA of interest is ligated between the two cos sites using restric- tion enzymes and ligase. This construct is packaged into a lambda particle produced by helper phage see Fig. 3.17 and then these are used to infect E. coli. Artifcial chromosomes hold the largest pieces of DNA see Fig. 3.16E. These include yeast artifcial chromosomes YACs bacterial artifcial chromosomes BACs and P1 bacteriophage artifcial chromosomes PACs. They are used to contain lengths of DNA from 150 kb to 2000 kb. YACs hold the largest amount of DNA up to about 2000 kb. YACs have yeast centromeres and yeast telomeres for maintenance in yeast. BACs can be circularized and grown in bacteria therefore they have a bacterial origin of replication and antibiotic resistance genes. FIGURE 3.17 In Vitro Packaging A lambda cloning vector containing cloned DNA must be packaged in a phage head before it can infect E. coli. First one culture of E. coli cells is infected with a mutant lambda that lacks the gene for one of the head proteins called E. A different culture of E. coli is infected with a different mutant which lacks the phage head protein D. The two cultures are induced to lyse which releases the tails assembly proteins and head proteins but no complete heads because of the missing pro- teins. When these are mixed with a lambda replacement vector the three spontane- ously form complete viral particles containing DNA. These are then used to infect E. coli. λE amber lysogen Tails Assembly proteins Protein D No pre-heads due to lack of E Tails Assembly proteins Protein E No pre-heads due to lack of D Lambda DNA with insert Head Tail λD amber lysogen INDUCE λ LYSIS MIX SUCCESSFUL PACKAGING Many different cloning vectors are available to biotechnology research. The smaller genes are studied using bacterial plasmids or shuttle vectors whereas the larger genes are studied in bacteriophage vec- tors cosmids and artifcial chromosomes. Shuttle vectors have sequences that enable them to survive in two different organisms such as yeast and bacteria.

slide 88:

Recombinant DNA Technology 82 GETING CLONED GENE INTO BACERIA BY TRANSFORMATION Once the gene of interest is cloned into a vector the construct can be put back into a bacterial cell through a process called transformation Fig. 3.18 see Box 3.2. Here the DNA construct is mixed with competent E. coli cells. To make the cells competent that is able to take up DNA the cell wall must be temporarily opened up. E. coli cells are mixed with calcium ions on ice and then shocked at a higher temperature such as 42°C for a few minutes which destabilizes the mem- brane and cell wall. Most of the cells die during the treatment but some survive and take up the DNA. Another method to make E. coli cells competent is to expose them to a high-voltage shock. Electroporation opens the cell wall allowing the DNA to enter. This method is much faster and more versatile. Electroporation is used for other types of bacteria as well as yeast. Bacteria can have different plasmids but they must have different origin of replications. If there are two different plasmids with the same origin of replication one of the two will be lost dur- ing bacterial replication. For example if genes A and B are both cloned into the same kind of vector and both cloned genes get into the same bacterial cell the bacteria will lose one plasmid and keep the other. This is due to plasmid incompatibility which prevents one bacterial cell from harboring two of the same type of plasmid. Incompatibility stems from conficts between two plasmids with identical or related origins of replication. Only one is allowed to replicate in any given cell. If a researcher wants a cell to have two cloned genes then two different types of plasmids could be used or both genes could be put onto the same plasmid. Chromosome DESTROY CELL AND PURIFY DNA ADD DNA TO RECIPIENT CELL RECOMBINATION ORIGINAL BACTERIAL CELL FRAGMENTS OF DNA TRANSFORMED CELL RECOMBINANT CELL FIGURE 3.18 Transformation Bacterial cells are able to take up DNA such as recombinant plasmids by incubation with metal ions such as Ca ++ on ice. This destabilizes the bacterial cell wall and after a brief heat shock some of the bacteria take up the DNA or plasmid. If the DNA integrates in the chromosome the recombinant cell will express any genes found on the DNA. Whole plasmids can also be taken up by a bacterial cell and these exist as extrachromosomal elements with their own origin of replication. In 1972 two researchers met at a conference in Hawaii to discuss plasmids the small rings of extrachromosomal DNA found in bacteria. Herbert W. Boyer PhD was a faculty member at the University of California San Diego and he was studying restriction and modifcation enzymes. He had just presented his research on EcoRI. Stanley N. Cohen MD was a faculty member at Stanford and he was interested in how plasmids could confer resistance to different antibiotics. His lab perfected laboratory transformation of Escherichia coli using calcium chloride to permeabilize the cells. After the talks ended the two met over corned beef sandwiches and combined their two ideas. They isolated different fragments of DNA from animals other bacteria and viruses and using restriction enzymes ligated the fragments into a small plasmid from E. coli. This was the frst recom - binant DNA made. Finally they transformed the engineered plasmid back into E. coli. The cells expressed the normal plasmid genes as well as those inserted into the plasmid artifcially. This sparked the revolution in genetic engineering and since then every biotechnol- ogy lab has used some variation of their technique. Boyer and Cohen applied for a patent on recombinant DNA technology. In fact Boyer cofounded Genentech with Robert Swanson a venture capitalist. Genentech is one of the frst biotechnology companies in the United States and under Boyer and Swanson the company produced human somatostatin in E. coli. Box 3.2 Discovery of Recombinant DNA

slide 89:

CHAPTER 3 83 CONSRUCING A LIBRARY OF GENE Gene libraries are used to fnd new genes to sequence entire genomes and to compare genes from different organisms Fig. 3.19. Gene libraries are made when the entire DNA from one particular organism is digested into fragments using restriction enzymes and then each of the fragments is cloned into a vector and transformed into an appropriate host. The basic steps used to construct a library are as follows: 1. Isolate the chromosomal DNA from an organism such as E. coli yeast or humans. 2. Digest the DNA with one or two different restriction enzymes. 3. Linearize a suitable cloning vector with compatible restriction enzymes sites. 4. Mix the cut chromosome fragments with the linearized vector and ligate. 5. Transform this mixture into E. coli. 6. Isolate large numbers of E. coli transformants. The type of restriction enzyme affects the type of library. Because restriction sites are not evenly spaced in the genome some inserts will be large and others small. Using a restriction enzyme that recognizes only four base pairs will give a mixture of mostly small fragments whereas a restriction enzyme that has a six or eight base-pair recognition sequence will generate larger fragments. Note that fnding a particular four base-pair sequence in a genome is more likely FIGURE 3.19 Creat­ ing a DNA Library Genomic DNA from the cho- sen organism is frst partially digested with a restriction enzyme that recognizes a four base-pair sequence. Partial digestions are pre- ferred because some of the restriction enzyme sites are not cut and larger fragments are generated. If every recognition site were cut by the restriction enzyme then the genomic DNA would not contain many whole genes. The genomic fragments are cloned into an appropriate vector and transformed and maintained in bacteria. PARTIAL DIGESTION WITH 4-BASE SPECIFIC RESTRICTION ENZYME CLONE FRAGMENTS INTO VECTOR TRANSFORM PLASMIDS INTO BACTERIA Cut site Mixture of fragments some still with cut sites Bacterial colonies each carrying different cloned fragment of DNA Plasmid vector

slide 90:

Recombinant DNA Technology 84 than fnding a six base-pair sequence. Even if an enzyme that recognizes a four base-pair recognition sequence is used to digest the entire genome these sites are not equally spaced throughout the genome of an organism. The digested genome will contain some segments too large to be cloned and some segments too small. To avoid cutting pieces too small partial digestion is often used. The enzyme is allowed to cut the DNA for only a short time and many of the restriction enzyme sites are not cut leaving larger pieces for the library. In addition it is customary to construct another library using a different restriction enzyme. SCREENING THE LIBRARY OF GENE BY HYBRIDIZATION Once the library is assembled researchers often want to identify a particular gene or segment of DNA within the library. Sometimes the gene of interest is similar to one from another organism. Sometimes the gene of interest contains a particular sequence. For example many enzymes use ATP to provide energy. Enzymes that bind ATP share a common signature sequence whether they come from bacteria or humans. This sequence can be used to fnd other enzymes that bind ATP. Such common sequence motifs may also suggest that a protein will bind various cofactors other proteins and DNA to name a few examples. Screening DNA libraries by hybridization requires preparing the library DNA and prepar- ing the labeled probe. A gene library is stored as a bacterial culture of E. coli cells each having a plasmid with a different insert. The culture is grown up diluted and plated onto many different agar plates so that the colonies are spaced apart from one another. The colonies are transferred to a nylon flter and the DNA from each colony is released from the cells by lysing them with detergent. The cellular components are rinsed from the flters. The DNA sticks to the nylon membrane and is then denatured to form single strands Fig. 3.20. If a scientist is looking for a particular gene in the target organism the probe for the library may be the corresponding gene from a related organism. The probe is usually just a segment of the gene because a smaller piece is easier to manipulate. The probe DNA is then syn- thesized and labeled either with radioactivity or with chemiluminescence. Single-stranded probe DNA is mixed with the library DNA on the nylon flters. The probe hybridizes with matching sequences in the library. The level of match needed for binding can be adjusted by incubating at various temperatures. The higher the temperature the more stringent that is the more closely matched the sequences must be. The lower the temperature the less stringent. If the probe is labeled with radioactivity photo- graphic flm will turn black where the probe and library DNA hybridized. The black spot is aligned with the original bacterial colony. Usu- ally the most likely colony plus its neighbors are selected grown plated and rescreened with FIGURE 3.20 Screening a Library with DNA Probe First bacterial colonies containing the library inserts are grown and plated on large shallow agar-flled dishes. Many different colonies are plated so that every cloned piece of DNA is present. These colonies are transferred to nylon flters and lysed open. The cell remains are washed away while the genomic and plasmid DNA sticks to the nylon. The sequences are made single-stranded by incubating the flters in a strong base. When these are incubated with a radioactive single-stranded probe at the appropriate temperature the probe hybridizes to any matching sequences. TRANSFER TO MEMBRANE OR FILTER LYSIS OF BACTERIAL CELLS AND DENATURATION OF DNA ADD LABELED DNA PROBE Bacterial colonies on agar each carry a cloned fragment of DNA Probe binds to DNA from colonies with matching sequences Gene libraries are used for many purposes because they contain almost the entire genome of a particular organism.

slide 91:

CHAPTER 3 85 the same probe to ensure that a single transformant is isolated. Then the DNA from this isolate can be analyzed by sequencing see Chapter 4. EUKARYOTIC EXPRESION LIBRARIE In expression libraries the vector has sequences required for tran- scription and translation of the insert. This means that the insert DNA is expressed as RNA and then translated into a protein. An expres- sion library in essence generates a protein from every cloned insert whether it is a real gene or not. When eukaryotic DNA is studied expression libraries are constructed using complementary DNA cDNA to help ensure the insert is truly a gene. Eukaryotic DNA espe- cially in higher plants and animals is largely noncoding with coding regions spaced far apart. Even genes are interrupted with noncoding introns. cDNA is a double-stranded DNA copy of mRNA. cDNA is made by reverse transcriptase an enzyme frst identifed in retroviruses see Chapter 1. It is used in eukaryotic research to eliminate the introns and generate a version of a gene consisting solely of an uninter- rupted coding sequence. In contrast bacteria have very little noncoding DNA and their genes are not interrupted by introns therefore genomic DNA can be used directly in expression libraries. Eukaryotic DNA is frst made into cDNA in order to construct an expression library Fig. 3.21. To make cDNA the messenger RNA is isolated from the organism of interest by binding to a column containing polyT i.e. a DNA strand consisting of repeated thy- mines. This isolates only mRNA because polyT anneals to the polyA tail of eukaryotic mRNA. Screening a library has two parts. First the library clones growing in E. coli are attached to nylon flters and the cellular components washed away and then denatured to form single-stranded DNA pieces. Second a probe is labeled with radioactivity is heated to melt the helix into single strands and fnally added to the nylon membranes where it hybridizes to its matching sequence. FIGURE 3.21 Making a cDNA Library from Eukaryotic mRNA First eukaryotic cells are lysed and the mRNA is purifed. Next reverse tran - scriptase plus primers con- taining oligodT stretches are added. The oligodT hybridizes to the adenine in the mRNA polyA tail and acts as a primer for reverse transcriptase. This enzyme makes the complementary DNA strand forming an mRNA/cDNA heteroduplex. The mRNA strand is digested with ribonuclease H and DNA polymerase I is added to synthesize the opposite DNA strand thus creating double-stranded cDNA. Next S1 nuclease is added to trim off any single-stranded ends and linkers are added to the ends of the dsDNA. The linkers have convenient restriction enzyme sites for cloning into an expression vector. AAAAAA LYSE CELLS EXTRACT PURIFY mRNA REVERSE TRANSCRIPTASE USING OLIGOdT PRIMER RIBONUCLEASE H 1 DNA POLYMERASE I 2 S1 NUCLEASE LIGATE LINKERS WITH CUT SITE DIGEST WITH RESTRICTION ENZYME LIGATE INTO VECTOR Eukaryotic cells mRNA mixture mRNA/cDNA 3 5 5 3 Heteroduplex TTTTTT 3 5 5 3 Double-stranded cDNA Linker Cut site AAAAAA TTTTTT AAAAAA TTTTTT 5 3 Single-stranded cDNA TTTTTT AAAAAA TTTTTT Cut site Sticky end AAAA AAAA AAAA

slide 92:

Recombinant DNA Technology 86 The mRNA is converted into cDNA using reverse transcriptase which synthesizes a DNA complement to mRNA. An enzyme then removes the mRNA part of the mRNA/cDNA heteroduplex and DNA polymerase makes the second strand of DNA see Fig. 3.21. The fnal product is a double-stranded DNA copy of the mRNA sequence. The cDNA is then ligated into an expression vector with sequences that initiate transcrip- tion and translation of the insert. In some cases the insert will have its own translation start site e.g. a full-length cDNA. If the insert does not contain a translation start then the reading frame becomes an issue. Because the genetic code is triplet each insert can be translated in three different reading frames. A protein may be produced for all three reading frames but only one frame will actually produce the correct protein. To ensure obtaining inserts with the correct reading frame each cDNA is cloned in all three reading frames by using linkers with several different restriction sites. The number of transformants to screen for a protein of interest is therefore increased. The cloned genes are transformed into bacteria which express the foreign DNA. The bacteria are grown on agar and the colonies are then transferred to a nylon membrane and lysed. The proteins released are attached to the nylon membranes and are screened in various ways. Most often an antibody to the protein of interest is used see Chapter 6. This recognizes the protein and can be identifed using a secondary antibody that is conjugated to a detection system. Usually alkaline phosphatase is conjugated to the secondary antibody. The whole complex can be identifed because alkaline phosphatase cleaves X-Phos leaving a blue color where the bacterial colony expressed the right protein Fig. 3.22. E. coli cannot perform most of the post-translational modifcations that eukaryotic proteins often undergo. Therefore the proteins are not always in their native form. Nonetheless appropriate antibodies can detect most proteins of interest. FEATURE OF EXPRESION VECORS Because foreign protein can be toxic to E. coli especially if made in large amounts the pro- moter used to express the foreign gene is critical. If too much foreign protein is made the host cell may die. To control protein production expression vectors have promoters with FIGURE 3.22 Immu­ nological Screening of an Expression Library Bacteria expressing foreign genes are grown on an agar plate transferred to a membrane and lysed. Released proteins are bound to the membrane. This fgure shows only one attached protein although in reality many different proteins are present. These include both expressed library clones and bacterial proteins. The membrane is incubated with a primary antibody that binds only the protein of interest. To detect this protein:antibody complex a second antibody with a detection system such as alkaline phosphatase is added. The bacterial colony expressing the protein of interest will turn blue when X-Phos is added. This allows the vector with the correct insert to be isolated. ADD ANTIBODY SPECIFIC FOR TARGET PROTEIN ADD SECOND ANTIBODY THAT BINDS FIRST ANTIBODY Membrane Membrane Protein bound to membrane Protein Antibody 1 Membrane Protein Detection system Detection system Antibody 1 Antibody 2 Antibody 2 Complementary DNA or cDNA is constructed by isolating mRNA and making a DNA copy with reverse transcriptase. Expression libraries express the foreign DNA insert as a protein because expression vectors contain sequences for both transcription and translation. The protein of interest is identifed by incubating the library with an antibody to the protein of interest.

slide 93:

CHAPTER 3 87 on/off switches therefore the host cell is allowed to grow and then after suffcient amounts of bacteria are produced the gene of interested is turned on. One commonly used promoter is a mutant version of the lac promoter lacUV which drives a very high level of transcription but only under induced conditions Fig. 3.23. It has the following elements: a binding site for RNA polymerase a binding site for the LacI repressor protein and a transcription start site. The vector has strong transcription stop sites downstream of the polylinker region. The vector also has the gene for LacI so that high levels of repressor protein are made thus keeping the cloned genes repressed. Like all vectors there is an origin of replication and antibiotic resistance gene for selection in bacteria. When a gene library is cloned behind this promoter the genes are not expressed due to high levels of LacI repressor. When an inducer such as IPTG is added LacI is released from the DNA and RNA polymerase transcribes the cloned gene. Another common promoter in expression vectors is the lambda left promoter or P L . It has a binding site for the lambda repressor. The gene of interest or library fragment is not expressed unless the repressor is removed. Rather than using its natural inducer a mutant version of the repressor has been isolated that releases its binding site at high temperatures. So when the culture is shifted to 42°C the repressor falls off the DNA and RNA polymerase transcribes the cloned genes. Another expression system uses a promoter whose RNA polymerase binding site recognizes only RNA polymerase from the bacteriophage T7. Bacterial RNA polymerase will not tran- scribe the gene of interest. This system is designed to work only in bacteria that have the gene for T7 RNA polymerase integrated into the chromosome and under the control of an induc- ible promoter. Some expression vectors contain a small segment of DNA that encodes a protein tag. These are primarily used when the gene of interest is already cloned rather than for screening libraries. The gene of interest must be cloned in frame with the DNA for the protein tag. The tag can be of many varieties but 6HIS Myc and FLAG® tag are three popular forms Fig. 3.24. 6HIS is a stretch of six histidine residues put at the beginning or end of the protein of interest. The histidines bind strongly to nickel. This allows the tagged protein to be isolated by binding to a column with nickel attached. Myc and FLAG® are epitopes that allow the expressed protein to be purifed by binding to the corresponding antibody. The antibodies may be attached to a column used for a Western blot or seen in vivo by staining the cells with fuorescently tagged versions of the Myc or FLAG® antibodies. The histidine tag can also be recognized with a specifc antibody if desired. FIGURE 3.23 Expression Vectors Have Tightly Regu­ lated Promoters An expression vector con- tains sequences upstream of the cloned gene that control transcription and transla- tion of the cloned gene. The expression vector shown uses the lacUV promoter which is very strong but inducible. To stimulate transcription the artifcial inducer IPTG is added. IPTG binds to the LacI repressor protein which then detaches from the DNA. This allows RNA polymerase to tran- scribe the gene. Before IPTG is added the LacI repressor prevents expression of the cloned gene. EXPRESSION VECTOR Cloning sites Transcription terminators Lac UV promoter Lac I Amp r Cloned gene The most important feature of expression vectors is a tightly controlled promoter region. The proteins of the expression library are expressed only under certain conditions such as presence of an inducer removal of a repressor or change in temperature. Small tags can be fused into the protein of interest using expression vectors. These tags allow the protein of interest to be isolated and purifed.

slide 94:

Recombinant DNA Technology 88 RECOMBINEERING INCREAE THE SPEED OF GENE CLONING Assembling new DNA vectors with different genes of interest can become diffcult when the gene is long since it can be hard to identify unique restriction enzymes compatible with a polylinker that do not cut within the gene. Large recombinant DNA vectors can be created using homologous recombination a process called recombineering Fig. 3.25. To facilitate recombination enzymes from lambda phage called RED are engineered to be expressed by a specifc host strain of bacteria. They recognize homologous sequences and recombine them to form a single molecule. These proteins are so effcient that as little as a 45 base-pair region of homology is enough to initiate recombination. In practice E. coli have the genes for the RED proteins under the control of a heat inducible promoter. The gene of interest is electroporated into bacteria that have the lambda RED proteins active in the cytoplasm. They recognize the ends of the gene of interest and fnd their homologous sequences. In this fgure the homologous sequences are found on the BAC or bacterial artifcial chromosome. The enzymes break the BAC at the appropriate TRANSCRIPTION TRANSLATION 6HIS TAG A B Terminator Cloned gene 6His Promoter HisHisHisHisHisHis 6His Binds to 6His Antibody or binds to a nickel column TRANSCRIPTION TRANSLATION MYC OR FLAG TAG Terminator Cloned gene Myc Promoter Myc Antibody to Myc used to detect expressed protein TRANSCRIPTION TRANSLATION Terminator Cloned gene FLAG Promoter FLAG Antibody to FLAG used to detect expressed protein FIGURE 3.24 Using Tags to Isolate Pro­ teins Some expression vectors have DNA sequences that code for short protein tags. The 6HIS tag A codes for six histidine residues. When fused in-frame with the coding sequence for the cloned gene the tag is fused to the protein. The 6HIS tag specifcally binds to nickel ions therefore binding to a nickel ion column isolates 6HIS-tagged proteins. Additionally antibodies to the 6HIS tag can also be used to isolate the tagged proteins. Other tags such as Myc or FLAG ® B are specifc anti - body epitopes that work in a similar manner. Myc-tagged or FLAG ® -tagged proteins can be isolated or identifed by binding to antibodies to Myc or FLAG ® respectively.

slide 95:

CHAPTER 3 89 GENE OF INTEREST 30–50bp sequence homologous to BAC Gene 30–50bp sequence homologous to BAC ELECTROPORATE TO GET GENE FRAGMENT INTO BACTERIA λ RED PROTEINS INTEGRATE FRAGMENT Gene for λ RED recombinase Temperature sensitive promoter Grow at 32°C to stop RED protein production λ RED λ RED E. coli chromosome E. coli chromosome E. coli chromosome Gene of interest Homologous sequence to ends of the gene of interest Bacterial artificial chromosome BAC BAC Recombineered BAC FIGURE 3.25 Recombineering A gene of interest is fanked by sequences homologous to the BAC. Once inside the bacteria the RED proteins recognize the ends of the gene of interest and facilitate homologous recombination between the BAC and gene of interest. The RED proteins are produced only when the bacteria are exposed to high temperatures since their genes are controlled by a heat-inducible promoter.

slide 96:

Recombinant DNA Technology 90 location and add the gene of interest. The engineered BAC is removed from this strain of E. coli to prevent any residual RED proteins from initiating further recombination. Identifying which E. coli have the gene of interest is different for recombineering because the bacteria have the vector whether or not the insert recombines. Instead of using a positive selection scheme such as antibiotic resistance a selection/counterselection scheme is used to identify the recombined vector containing the gene of interest Fig. 3.26. First the vector contains a gene for galK or galactose kinase a gene essential for growth on galactose. The bacteria produce galactose kinase and are able to grow on minimal media that contains only galactose as a carbon source. GalK protein also converts 2-deoxygalactose 2-DOG into a toxic substance so bacteria expressing GalK die when grown on 2-DOG. After the recombination reaction occurs the bacteria are plated onto minimal media that have only 2-deoxygalactose. If any bacteria still have galK galactose kinase creates toxin from 2-deoxygalactose and the bacteria die. When galK is replaced with the gene of interest there is no toxin produced and the bacteria grow. RECOMBINATION 2-DEOXYGALACTOSE BACTERIA WITH GALK GALACTOSE MINIMAL MEDIA 2-DEOXYGALACTOSE GALACTOSE MINIMAL MEDIA REGIONS OF HOMOLOGY WITH GENE INTEREST Antibiotic resistance gene A B Gene of interest Grow Do not grow Do not grow Grow BACTERIA WITH GENE OF INTEREST VECTOR WITHOUT GENE OF INTEREST Gal K gene Antibiotic resistance gene VECTOR WITH GENE OF INTEREST Gene of interest FIGURE 3.26 Selec­ tion and Counter­ selection in Recombineering Recombineering vectors use a selection and counterse- lection method to identify which bacterium harbors the vector containing the gene of interest. In part A the gene for galK encodes a galactose kinase. When bacteria expressing GalK are grown on 2-deoxygalactose 2-DOG a toxin is produced which kills the bacteria top plate. The GalK also allows the bacteria to grow on galactose minimal media bottom plate. In part B recombineering replaces the galK gene with the gene of interest and therefore the bacteria can no longer grow on galactose minimal media bottom plate. The lack of GalK allows the bacteria to grow on 2-DOG top plate.

slide 97:

CHAPTER 3 91 GATEWA Y® CLONING VECORS A newer cloning system uses the lambda phage integration and excision sites for cloning genes from one vector to another Fig. 3.27. Lambda phage exists as a phage but also integrates into the E. coli chromosome at the attB site to form a prophage. The inte- gration reaction occurs when integrase makes staggered cuts in the center of the phage attP site and in the center of the bacterial site attB. The ends then connect so that the phage DNA is integrated but notice that the sequences are different than the original. These are called attL and attR after integration. This reaction can be reversed but since the two sites are different after integration another enzyme called Xis or excisionase removes the inserted DNA and relegates the broken DNA. The Gateway cloning vectors exploit the lambda phage integration/excision system to study a gene of interest. The vectors include sequences for expressing the gene of interest into protein adding a protein tag shuttling the gene between different model organisms or sequencing the gene. The frst step of the system is to get the gene of interest between two attL sites Fig. 3.28. This can be done by cloning the gene into a multicloning site found in the entry clone. The entry clone has a gene called ccdB in the middle of the multi- cloning site which encodes a toxin that kills the host when expressed. When this gene is expressed in E. coli the bacteria die which ensures that any bacteria that harbor the original vector die. When the gene of interest replaces the ccdB gene the bacteria are able to grow. A special strain of E. coli with an antitoxin to the ccdB gene product allows researchers to maintain the entry clone before the gene of interest is cloned into the vector. Once the gene is in the entry vector two different reactions move it among the different destination clones Fig. 3.29. The LR reaction removes the gene of interest by cutting at the attL sites and moves it into the attR1 and attR2 sites in the destination vector. This results in an expression clone that has the gene of interest fanked by attB1 and attB2. The entry vector no longer has the gene of interest which is replaced with the toxin gene ccdB. Any bacteria receiving this vector die. The only surviving bacteria are those with the gene of interest in the destination vector. Just as with the lambda phage the reaction is reversible. BP reaction removes the gene of interest from the destination vector and can put it back into any vector with attL sites. The ease at which the cloned gene can move between vectors makes this system very adaptable to different research. There are Gateway® cloning vectors for protein expression in bacteria adding different tags such as HIS6 expressing the gene of interest in insect human or mouse cells or sequencing the gene of interest. FIGURE 3.27 Inte­ gration of Lambda DNA Phage DNA has an attach- ment sequence called attP. Bacterial DNA has an attachment sequence called attB. Bacterial DNA and λ-phage DNA align at the “O” region of the attachment sequences. During integra- tion int protein induces two double-stranded breaks that are resolved resulting in the integration of the phage DNA into the bacterial DNA. The process is reversible and requires int protein and xis protein to excise the phage DNA from the bacterial DNA. Notice that the integrated phage DNA “O” site is fanked with one side from the phage and one side from the bacteria. These are called the attL and attR sites. INTEGRATION REQUIRES INT EXCISION REQUIRES INT XIS PHAGE DNA BACTERIAL DNA attP O O attB BB O attR PB O PROPHAGE attL BP P P O B B P P FIGURE 3.28 Gate­ way ® Entry Clone The entry clone for the Gateway ® system has an origin of replication for growing in bacteria an antibiotic resistance gene for selecting bacteria with the vector a multicloning site containing the gene ccdB in between two attL sites attL1 and attL2 . The gene of interest replaces the ccdB gene during standard clon- ing using unique restriction enzyme sites. The ccdB gene produces a toxin that kills its host bacteria unless the bacteria has a corresponding gene for an antitoxin. MULTIPLE CLONING SITE Nsp V Xmn I Nco I Sal I BamH I EcoR I EcoR I Not I Xho I EcoR V ccdB GATEWAY® CLONING ENTRY VECTOR T1 T2 attL1 attL2 pUC ori Kanamycin

slide 98:

Recombinant DNA Technology 92 Summary Recombinant DNA technology is the basis for almost all biotechnology research. Under- standing these techniques is tantamount to understanding the rest of the textbook. First DNA must be isolated from the organism in order to identify novel genes to recover new recombinant vectors or to purify a new gene. The DNA is isolated from the cellular compo- nents using enzymes followed by centrifugation RNA digestion with RNase and precipita- tion with ethanol. Each organism requires special adaptations of this basic process in order to remove the cellular and extracellular components. Purifed DNA can be manipulated in many different ways. Restriction enzymes cut the phos - phate backbone of the DNA into smaller fragments which can be visualized by gel electro- phoresis. Specifc DNA pieces are visualized using radioactively labeled nucleotides followed by autoradiography. To avoid using radioactivity researchers can synthesize DNA with digoxigenin or biotin-linked nucleotides which are then linked to an antibody to digoxigenin or streptavidin respectively. To visualize the antibody or streptavidin researchers link either one to a fuorophore or reporter enzyme that converts a substrate into light or a colored precipitate. Hybridization of related sequences is a key technique for FISH Southern blots Northern blots and dot blots as well as the screening of a genomic library for a particular sequence. The chapter also outlines the key characteristics of vectors including plasmids bacteriophage vectors cosmids and artifcial chromosomes. These extrachromosomal genetic elements vary in their uses but are very important to getting a foreign gene expressed in a host organism. Vectors require a region that is convenient to adding a foreign piece of DNA such as a multi- cloning site they need a gene for selection and they need some easy way to identify whether the vector contains the foreign piece of DNA. A genomic library simply contains all the DNA of the organism of interest cut into smaller fragments and cloned into a vector. The library recombinant vectors are then returned to a host bacterial cell so that only one fragment of the original DNA is inside each bacterium. Expression libraries start with mRNA rather than genomic DNA. The mRNA is converted into cDNA with reverse transcriptase. Libraries are screened for particular DNA sequences of inter- est by hybridization of related DNA sequences. In the case of expression libraries each of the DNA pieces is made into protein by the bacteria. These proteins are then screened using antibodies. Cloning genes using the traditional restriction enzyme digests can be very diffcult for large genes since the gene is likely to have the restriction enzyme sites. To overcome this obstacle recombineering uses recombination between homologous DNA sequences to insert the gene of interest into a vector. Recombination is also used in the Gateway® cloning system but instead of using regions of homology these vectors use the lambda phage attB attP attR and attL KanR attL1 attL2 ENTRY VECTOR AmpR attR1 attR2 ccdB Gene of Interest Gene of Interest LR clonase BP clonase DESTINATION VECTOR + KanR attB1 attB2 DESTINATION VECTOR AmpR attP1 attP2 ccdB ENTRY VECTOR + FIGURE 3.29 Gateway ® BP and LR Reactions Moving a gene of interest from the entry clone to the destination vector is done in the LR reaction. The exisionase and integrase enzymes work to remove the gene of interest in the entry clone by cutting at the attL and attR sites of the entry clone and destination vector. The gene of interest and ccdB swap positions therefore changing the att site to become attB and attP. The BP reaction works in reverse moving the gene of interest back into the entry clone.

slide 99:

CHAPTER 3 93 recognition sequences and the enzymes integrase and exisionase. In both systems the vector has a gene that produces a toxin ccdB or converts a specifc substrate to a toxin galK. If the gene of interest does not replace ccdB or galK then the host bacteria die. If the gene of interest recombines into the vector then the host bacteria live and propagate the recombinant vector. 1. Which of the following statements about DNA isolation from E. coli is not correct a. Chemical extraction using phenol removes proteins from the DNA. b. RNA is removed from the sample by RNase treatment. c. Detergent is used to break apart plant cells to extract DNA. d. Lysozyme digests peptidoglycan in the bacterial cell wall. e. Centrifugation separates cellular components based on size. 2. Which of the following is important for gel electrophoresis to work a. Negatively charged nucleic acids to migrate through the gel. b. Ethidium bromide to provide a means to visualize the DNA in the gel. c. Agarose or polyacrylamide to separate the DNA based on size. d. Known molecular weight standards. e. All of the above are important for gel electrophoresis. 3. How are restriction enzymes and ligase used in biotechnology a. Restriction enzymes cut DNA at specifc locations producing ends that can be ligated back together with ligase. b. Only restriction enzymes that produce blunt ends after cutting DNA can be ligated with ligase. c. Only restriction enzymes that produce sticky ends on the DNA can be ligated with ligase. d. Restriction enzymes can both cut DNA at specifc sites and ligate them back together. e. Restriction enzymes randomly cut DNA and the cut fragments can be ligated back together with ligase. 4. Which of the following is an appropriate method for detecting nucleic acids a. Measuring absorbance at 260 nm. b. Autoradiography of radiolabeled nucleic acids. c. Chemiluminescence of DNA labeled with biotin or digoxigenin. d. Measuring the light emitted after excitation by fuorescent-labeled nucleic acids on a photodetector. e. All of the above are appropriate methods for detecting nucleic acids. 5. Why does the GC content of a particular DNA molecule affect the melting of the two strands a. The G and C bond only requires two hydrogen bonds thus requiring a lower temperature to “melt” the DNA. b. Because G and C base-pairing requires three hydrogen bonds and a higher temperature is required to “melt” the DNA. c. The percentage of As and Ts in the molecule is more important to melting temperature than the percentage of Gs and Cs. d. The nucleotide content of a DNA molecule is not important to know for biotechnology and molecular biology research. e. None of the above. End-of-Chapter Questions Continued

slide 100:

Recombinant DNA Technology 94 6. What is the difference between Southern and Northern hybridizations a. Southern blots hybridize a DNA probe to a digested DNA sample but Northern blots hybridize a DNA probe to usually mRNA. b. Southern blots use an RNA probe to hybridize to DNA but Northern blots use an RNA probe to hybridize to RNA. c. Southern blots determine if a particular gene is being expressed but North- ern blots determine the homology between mRNA and a DNA probe. d. Southern blots determine the homology between mRNA and a DNA probe but Northern blots determine if a particular gene is being expressed. e. Southern and Northern blots are essentially the same technique per- formed in different hemispheres of the world. 7. What might be a use for fuorescence in situ hybridization FISH a. For identifcation of a specifc gene in a DNA extraction by hybridization to a DNA probe. b. For identifcation of a specifc gene by hybridization to a DNA probe within live cells that have had their DNA denatured by heat. c. For identifcation of an mRNA within an RNA extraction by hybridization to a DNA probe. d. For identifcation of both mRNA and DNA in cellular extracts using an RNA probe. e. None of the above. 8. Which of the following are useful traits of cloning vectors a. An antibiotic resistance gene on the plasmid for selection of cells contain- ing the plasmid. b. A site that contains unique clustered restriction enzyme sequences for cloning foreign DNA. c. A high copy number plasmid so that large amounts of DNA can be obtained. d. Alpha complementation to determine if the foreign DNA was inserted into the cloning site. e. All of the above are useful traits. 9. Which of the following vectors holds the largest pieces of DNA a. plasmids b. bacteriophage c. Y ACs d. P ACs e. cosmids 10. Besides a high voltage shock what is another method to make E. coli com- petent to take up “naked” DNA a. high concentrations of calcium ions followed by high temperature b. high concentrations of calcium ions and several hours on ice c. large amounts of DNA added directly to a bacterial culture growing at 37 °C d. high concentrations of minerals followed by high temperature e. A high voltage shock is the only way to make E. coli competent. 11. Why are gene libraries constructed a. To fnd new genes. b. To sequence whole genomes. c. To compare genes to other organisms. d. To create a “bank” of all the genes in an organism. e. All of the above.

slide 101:

CHAPTER 3 95 12. Which of the following statements about gene libraries is correct a. Genes in a library can be compared to genes from other organisms by hybridization with a probe. b. A gene library is only necessary to maintain known genes. c. Every gene in the library must be sequenced frst in order to compare genes in the library to genes from other organisms. d. Gene libraries are only created for eukaryotic organisms. e. Gene libraries can only be created in prokaryotes. 13. Why must reverse transcriptase be used to create a eukaryotic expression library a. Reverse transcriptase is only used to create prokaryotic expression libraries. b. Reverse transcriptase creates cDNA from mRNA in prokaryotes. c. Reverse transcriptase ensures the gene is in the correct orientation within the expression vector to create protein. d. Reverse transcriptase creates cDNA from mRNA because genes in eukaryotes have large numbers of non-coding regions. e. No other enzymes are used to create expression libraries except restriction enzymes. 14. Which of the following are common features of expression vectors a. Small segments of DNA that encode tags for protein purifcation. b. Transcriptional start and stop sites. c. A tightly controlled promoter than can only be induced under certain circumstances. d. Antibiotic resistance gene. e. All of the above are common features of expression vectors. 15. Which method is used to construct large recombinant vectors when polylinker restriction enzymes are not useful a. r ecombineering b. FISH c. gene libraries d. Y ACs e. hybridization 16. Which statement about Gateway® cloning is not true a. Gateway® cloning vectors exploit a bacteriophage recombination system. b. Integrase cuts within attB and attP sites. c. The ccdB gene produces a toxin within host cells carrying the gene of interest. d. Excisionase r egonizes attL and attR sites to remove the recombined DNAfragment. e. The cloning into the entry vector is necessary to generate attL sites on the ends of the gene of interest. Further Reading Clark D. P. 2013. Molecular Biology 2nd ed.. San Diego CA: Elsevier Academic Press. Shlien et al. 2010. A common molecular mechanism underlies two phenotypically distinct 17p13.1 microdele- tions syndromes. American Journal of Human Genetics 87 631–642.

slide 102:

CHAPTER 97 Biotechnology Copyright © 2016 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/B978-0-12-385015-7.00004-1 DNA Synthesis In Vivo and In Vitro 4 Introduction Replication of DNA Uncoiling the DNA Priming DNA Synthesis Structure and Function of DNA Polymerase Synthesizing the Lagging Strand Repairing Mistakes after Replication Comparing Replication in Gene Creatures Prokaryotes and Eukaryotes In Vitro DNA Synthesis Chemical Synthesis of DNA Chemical Synthesis of Complete Genes Polymerase Chain Reaction Uses In Vitro Synthesis to Amplify Small Amounts of DNA Modifcations of Basic PCR Reverse T ranscriptase PCR PCR in Genetic Engineering PCR of DNA Can Determine the Sequence of Bases Next-Generation Sequencing T echnologies

slide 103:

98 DNA Synthesis In Vivo and In Vitro INTRODUCTION Replication copies the entire set of genomic DNA so that the cell can divide in two. During replication the entire genome must be uncoiled and copied exactly. This elegant process occurs extremely fast in E. coli where DNA polymerase copies about 1000 nucleotides per second. Although the process is slower in eukaryotes DNA polymerase still copies 50 nucleotides per second. Many biotechnology applications use the principles and ideas behind replication therefore this chapter frst introduces the basics of DNA replication as it occurs in the cell. We then review some of the most widely used techniques in genetic engineering and biotechnology including chemical synthesis of DNA polymerase chain reaction and DNA sequencing. REPLICTION OF DNA To maintain the integrity of an organism the entire genome must be replicated identi- cally. Even for plasmids viruses or transposons replication is critical for their survival. The complementary two-stranded structure of DNA is the key to understanding its duplication during cell division. The double-stranded helix unwinds and the hydrogen bonds holding the bases together melt apart to form two single strands. This Y-shaped region of DNA is the replication fork Fig. 4.1. Replication starts at a specifc site called an origin of replication ori on the chromosome. The origin is called oriC on the E. coli chromosome and covers about 245 base pairs of DNA. The origin has mostly AT base pairs which require less energy to break than GC base pairs. Once the replication fork is established a large assembly of enzymes and factors called a replisome assembles to synthesize the complementary strands of DNA Fig. 4.2. The replisome starts synthesizing the complementary strand on one side of the fork by adding complementary bases in a 5 ′ to 3 ′ direction. The leading strand is synthesized continuously because there is always a free 3 ′ -OH group. Because DNA polymerase synthesizes only in a 5 ′ to 3 ′ direction the other strand called the lagging strand is syn- thesized as small fragments called Okazaki fragments. As DNA polymerase makes this strand the clamp loader must continually release and reattach at a new location. This results in the single- stranded region bubbling out from the replisome. The lagging strand fragments are ligated together by an enzyme called DNA ligase. Ligase links the 3 ′ -OH and the 5 ′ -PO 4 of neighboring nucle- otides forming a phosphodiester bond. The fnal step is to add methyl -CH 3 groups along the new strand Fig. 4.3. The original double-stranded helix is now two identical double-stranded helices each containing one strand from the original molecule and one new strand. This is why the process is called semiconservative replication. Uncoiling the DNA Because DNA is condensed into supercoils in order to ft inside the cell several different enzymes are needed to open and relax the DNA before replication can start Fig. 4.2. DNA helicase and DNA gyrase attach near the replication fork and untwist the strands Two identical daughter molecules New strand Old strand Replisome protein Replication Fork DNA + protein Parent DNA FIGURE 4.1 Replication Replication enzymes open the double-helix around the origin to make it single-stranded. DNA poly- merase adds complementary nucleotides. In replication DNA polymerase synthesizes the leading strand as one continuous piece and the lagging strand as Okazaki fragments. Each copy has one strand from the original helix and one new strand.

slide 104:

ChAPTER 4 99 of DNA. DNA gyrase removes the supercoiling and helicase unwinds the double helix by dissolving the hydrogen bonds between the paired bases. The two strands are kept apart by single-stranded binding protein which coats the single-stranded regions. This prevents the two strands from reannealing so that other enzymes can gain access to the origin and begin replication. As DNA polymerase travels along the DNA more positive supercoils are added ahead of the replication fork. Because the bacterial chromosome is negatively supercoiled initially the new positive supercoils relax the DNA. After about 5 of the genome has been replicated though the positive supercoils begin to accumulate and need to be removed. DNA gyrase cancels the positive supercoils by adding negative supercoils. When circular chromosomes are replicated the two daughter copies may become catenated or connected like two links Sliding clamp Nucleotides DNA polymerase DNA moves to DNA polymerase DNA moves away from helicase DNA moves toward helicase DNA bulges out Okazaki fragment DNA polymerase Leading strand Lagging strand Nucleotides Sliding clamp loader ATP ADP Single-stranded binding protein Helicase β β χ τ DNA gyrase τ Ψ δ γ δ′ τ FIGURE 4.2 DNA Polymerase III Replication Assembly During replication the sliding clamp loader complex makes contacts with single- stranded binding protein and the sliding clamps. This complex stabilizes the two single-stranded DNA strands and provides a stable binding site for two DNA polymerase III molecules. The unwound single-stranded DNA templates move toward the clamp loader complex. On the leading strand left the strand is unwinding in a 3 ′ to 5 ′ direction so DNA polymerase can add complementary nucleotides in the 5 ′ to 3 ′ direction. On the lagging strand right the template strand is antiparal- lel and therefore the strand is unwinding in a 5 ′ to 3 ′ direction. Since DNA poly- merase III must synthesize the new strand in a 5 ′ to 3 ′ direction also the template strand must move toward the helicase. This causes the lagging strand to bubble out from the complex. Once DNA polymerase III reaches the end of the previous Okazaki fragment the replicated DNA is released by the clamp loader and a new section of single-stranded DNA is reloaded.

slide 105:

100 DNA Synthesis In Vivo and In Vitro Priming DNA Synthesis DNA polymerase cannot initiate new strands of nucleic acid synthesis because it can only add a nucleotide onto a pre-existing 3 ′ -OH. Therefore an 1 1 to 12 base-pair length of RNA an RNA primer is made at the beginning of each new strand of DNA. Since the leading strand is synthesized as a single piece there is only one RNA primer at the origin. On the lagging strand each Okazaki fragment begins with a single RNA primer. DNA poly- merase then makes DNA starting from each RNA primer. At the origin a protein called PriA displaces the SSB proteins so a special RNA polymerase called primase DnaG can enter and synthesize short RNA primers using ribonucleotides. Two molecules of DNA polymerase III bind to the primers on the leading and lagging strands and synthesize new DNA from the 3 ′ hydroxyls Fig. 4.5. of a chain Fig. 4.4. Topoisomerase IV releases catenated daughter strands by introducing double-stranded nicks into one chromosome. The second copy can then pass through the frst giving two separated molecules. CTAG GATC CH 3 CH 3 CH 3 Old strand New strand CH 3 GGACC CCTGG CTAG GATC CH 3 CH 3 CH 3 CH 3 GGACC CCTGG CTAG GATC CH 3 CH 3 CH 3 CH 3 GGACC CCTGG REPLICATION OF DNA TWO MOLECULES OF HEMI-METHYLATED DNA CTAG GATC CH 3 CH 3 GGACC CCTGG CTAG GATC CH 3 CH 3 GGACC CCTGG DAM METHYLASE DCM METHYLASE UNTANGLING CHROMOSOMES TOPOISOMERASE IV FIGURE 4.4 Untangling Circular Chromosomes Sometimes after the replication of circular genomes is complete the two rings are catenated or linked together like links in a chain. Topoisomerase IV untangles the two chromosomes so they can partition into the daughter cells. DNA helicase DNA gyrase and topoisomerase IV untwist and untangle the supercoiled DNA during replication. FIGURE 4.3 Hemimethylated DNA: Old Strands versus New When DNA is replicated the old strand is methylated but there is a delay in methylat- ing the new strand and thus the DNA double helix is hemimethylated. Dam meth- ylase and dcm methylase add the methyl groups onto the newly synthesized DNA. Primase a special RNA polymerase works with PriA to displace the SSB proteins and synthesize a short RNA primer at the origin. DNA polymerase then starts synthesis of the new DNA strand using the 3 ′ -OH of the RNA primer. This synthesis occurs at multiple locations on the lagging strand.

slide 106:

ChAPTER 4 101 Structure and Function of DNA Polymerase DNA polymerase III PolIII is the major form of DNA polymerase used to replicate bacte- rial chromosomes and consists of multiple protein subunits see Fig. 4.2. The sliding clamp is a donut-shaped protein consisting of a dimer of DnaN proteins also called the β-subunits. Two clamps encircle the two single strands of DNA at the replication fork. A cluster of acces- sory proteins called the clamp loader complex loads the clamps onto DNA strands. The two sliding clamps bind two core enzymes one for each strand of DNA. The core enzyme consists of three subunits: DnaE α subunit which links the nucleotides together DnaQ ε subunit which proofreads the new strand and HolE θ subunit which stabilizes the two other subunits not shown in Fig. 4.2. As the α subunit adds new nucleotides the ε sub- unit recognizes any distortions and removes any mismatched bases. A correct nucleotide is then added. Bacterial DNA polymerase III can add up to 1000 bases per second which is an extraordinarily fast rate of enzyme activity. Synthesizing the Lagging Strand After the new lagging strand of DNA has been made it has many segments of RNA derived from multiple RNA primers as well as multiple breaks or nicks along the back- bone that need to be sealed Fig. 4.6. DNA polymerase I removes the RNA primers from the lagging strand. DNA polymerase I has exonuclease activity that removes the RNA bases and then its polymerase activity flls in the regions with DNA bases. The RNA bases may also be removed by RNaseH an enzyme that specifcally identifes RNA:DNA hetero - duplexes and removes the RNA bases. Finally the DNA fragments of the lagging strand are linked together with a ligation reaction by DNA ligase. DNA polymerase I and DNA ligase are both very important enzymes in molecular biology and are used extensively in biotechnology. Repairing Mistakes after Replication After replication is complete the mismatch repair system corrects mistakes made by DNA polymerase. If the wrong base is inserted and DNA polymerase does not correct the error itself there will be a small bulge in the helix at that location. Identifying which of the two bases is correct is critical. The cell assumes that the base on the new strand is wrong and the original parental base is correct. The mismatch repair system of E. coli MutSHL deciphers which strand is the original by monitoring methylation. Imme- diately after replication the DNA is hemimethylated that is the old strand still has methyl groups attached to various bases but the new strand has not been methylated yet see Fig. 4.3. Two different E. coli enzymes add methyl groups: DNA adenine methylase Dam adds a methyl group to the adenine in GATC and DNA cytosine methylase Dcm adds a methyl group to the cytosine in CCAGG or CCTGG. These enzymes meth- ylate the new strand after replication but they are slow. This allows mismatch repair to fnd and fx any mistakes frst. The multiple subunits of DNA polymerase III work together to synthesize a new strand of DNA. The core has two essential subunits: the α subunit links the nucleotides and the ε subunit ensures that they are accurate. Because the lagging strand is synthesized in small pieces either DNA polymerase I or RNaseH excise the RNA bases and replace them with DNA. DNA ligase closes the nicks in the sugar/phosphate backbone of the new DNA strand.

slide 107:

102 DNA Synthesis In Vivo and In Vitro In E. coli mismatch repair proteins MutSHL identify a mistake in replication excise the new nucleotides around the mistake and recruit DNA polymerase III to the single-stranded region to make the new strand without a mistake. PRIMASE BINDS PRIMASE BINDS 3 5 Primosome A B C PriA DISPLACES SSB PROTEIN PRIMASE MAKES SHORT RNA PRIMER Parental DNA SSB protein 3 5 3 5 3 5 RNA primer Direction of movement of primosome Primase PriA Primase PriA PriA FIGURE 4.5 Strand Initiation Requires an RNA Primer DNA polymerase cannot syn- thesize new DNA without a pre-existing 3 ′ -OH. Thus DNA replication requires an RNA primer to initiate strand forma- tion. A First the PriA protein displaces the SSB proteins. B Second primase associates with the PriA protein. C Last the primase makes the short RNA primer. Three genes of E. coli are responsible for mismatch repair: mutS mutL and mutH Fig. 4.7. MutS protein recognizes the bulge or distortion in the sequence. MutH fnds the nearest GATC site and nicks the nonmethylated strand—that is the newly made strand. MutL holds the MutS plus mismatch and the MutH plus GATC site together these may be far apart on the DNA helix. Finally the DNA on the new strand is degraded and replaced with the correct sequence by DNA polymerase III. COMPARING REPLICTION IN GENE CREATURES PROKARYOTES AND EUKARYOTES Although the basic mechanism for replication is the same for most organisms the timing direction and sites for initiation and termination are variable. The major differ- ences in replication occur mainly because of the special challenges posed by circular and linear genomes. Normal DNA replication occurs bidirectionally in prokaryotes and eukaryotes whether the genome is linear or circular. Two replication forks travel in opposite directions unwinding the DNA helix as they go. In bacteria such as E. coli there is only one origin oriC and replication occurs in both direc- tions around the circular chromosome until it meets at the other side the terminus terC. Halfway through this process the chromosome looks like the Greek letter θ therefore this process is often called theta replication Fig. 4.8. The single circular chromosome then becomes two. Theta repli- cation is also used by many plasmids such as the F plasmid of E. coli when growing and dividing asexually as opposed to transferring its genome to another cell via conjugation. Some plasmids and many viruses replicate their genomes by a process called rolling circle replication Fig. 4.9. At the origin of replication one strand of the DNA is nicked and unrolled. The intact strand thus rolls relative to its partner hence “rolling circle”. DNA is synthesized from the origin using the circular strand as a template. As DNA polymerase circles the template strand the new strand of DNA is base-paired to the circular template. Meanwhile the other parental strand is dangling free. This dangling strand is removed ligated to form another circle and fnally a sec - ond strand is synthesized. This process results in two rings of plasmid or viral DNA each with one strand from the original molecule and one newly synthesized strand. Some viral genomes use rolling circle replication but con- tinue to make more and more copies of the original circular template. They continue rolling around the circle synthe- sizing more and more copies that are all dangling as a long single strand. The long strand of new DNA may be made double-stranded or left single-stranded depending on the

slide 108:

ChAPTER 4 103 type of virus. Finally the dangling strand is chopped into genome-sized units and packaged into viral particles. Some viruses circularize these copies before packaging others simply leave the genomes linear. Long linear DNA molecules such as human chromosomes pose several problems for replication. The ends pose a particularly diffcult problem because the RNA primer is synthesized at the very end of the lagging strand. When the RNA primer is removed by an exonuclease there is no upstream 3 ′ -OH for addition of new nucleotides to fll the gap. In eukaryotes there is no equivalent to the dual-function DNA polymerase I. A separate exonuclease MF1 removes the RNA primers and DNA polymerase δ flls in the gaps for the lagging strand. Over successive rounds of replication the ends of linear chromosomes get shorter and shorter. Special structures called telomeres are found at the tips of each linear chromosome and prevent chromosome shortening from affecting impor- tant genes. Telomeres have multiple tandem repeats of a short sequence TTAGGG in humans. The enzyme telomerase can regenerate the telomere by using an RNA template to synthe- size new repeats. This happens only in some cells in others the telomeres shorten every time the cell replicates its DNA. One theory regards telomere shortening as a molecular clock aging the cell and eventually triggering suicide see Chapter 20. The length of linear chromosomes also poses a problem. The time it takes to synthesize an entire human chromosome would be too long if replication began at only one origin. To solve this issue multiple origins exist each initiating new strands in both directions. These are elongated until they meet the new strands from the other direction. The cellular structure of eukaryotes also poses some problems for replication. In bacteria the chromosome simply replicates the two copies move to each end of the cell and a new wall forms in the middle. There are no nuclear membranes or organelles to divide there is just one chromosome plus perhaps some plasmids. In eukaryotes the cell has a specifc cell cycle with four different phases and replication occurs at specifc points Fig. 4.10. During G 1 the cell rests for a period before DNA synthesis begins. This period varies lasting about 25 minutes for yeast. The next phase is S or synthesis during which the entire genome is replicated. This is usually the longest phase lasting about 40 minutes in yeast. The third phase G 2 is another resting phase before the cell undergoes mitosis in the M phase. During mitosis cells divide their walls and membranes into two separate cells partitioning the new chromosomes and other cellular components into each half. The signal that triggers cell division depends on many factors including environment size and age. Eukaryotic mitosis is a dynamic process with much movement and repositioning of cellular components. First the nuclear membrane must be dissolved before the chromo- somes can separate. After replication the two sets of chromosomes are partitioned to separate sides of the cell. The chromosomes attach to long fbers making up the spindle via special sequences called centromeres. They slide along the spindle fbers until they reach separate ends of the cell. A new cell membrane separating the two halves is then synthesized. Other cellular components including mitochondria endoplasmic reticulum lysosomes and so forth are split between the two daughter cells. Finally a new nuclear membrane must be assembled around the chromosomes of each new daughter cell. The dynamics of this process are still being investigated and new proteins and molecules are still being discovered that mediate different parts of mitosis in eukaryotic cells. 3 3 3 5 5 Okazaki fragment Okazaki fragment RNA primer Parental DNA 3 3 5 5 Parental DNA New DNA nucleotides inserted Pol I movement Discarded RNA nucleotides 3 3 5 5 Parental DNA Nick DNA ligase 3 3 5 5 Parental DNA Nick sealed FIGURE 4.6 Joining the Okazaki Fragments When frst made the lagging strand is composed of alter- nating Okazaki fragments and RNA primers. Next DNA polymerase I binds to the primer region and as it moves forward it degrades the RNA and replaces it with DNA. Finally DNA ligase seals the nick in the phosphate backbone.

slide 109:

104 DNA Synthesis In Vivo and In Vitro G T G T G T Methyl group on GATC CH 3 MutS binds mismatch G T Methylated strand New strand Nonmethylated strand cut by MutH CH 3 G T CH 3 MutS CH 3 MutS MutL AND MutH JOIN MutS MutH MutH MutL DNA in mismatch region is degraded CH 3 G T New DNA is inserted and the mismatch is corrected to G:C CH 3 G C MutS One of MutH subunits binds to methyl group and DNA in between loops out MutS is still bound to G/T mismatch MutH MutL CH 3 MutH FIGURE 4.7 Mismatch Repair Occurs after Replication MutS recognizes a mismatch shortly after DNA replication. MutS recruits MutL and two MutH proteins to the mismatch. MutH locates the nearest GATC of the new strand by identifying the methyl group attached to the “parent” strand. MutH cleaves the nonmethylated strand and the DNA between the cut and the mismatch is degraded. The region is replaced and the mismatch is corrected.

slide 110:

ChAPTER 4 105 12 Circular bacterial chromosome Replication fork New DNA 34 FIGURE 4.8 Theta Replication In circular genomes or plasmids replication enzymes recognize the origin of replication unwind the DNA and start synthesis of two new strands of DNA one in each direction. The net result is a replication bubble that makes the chromosome or plasmid look similar to the Greek letter theta θ. The two replication forks keep moving around the circle until they meet on the opposite side. IN VITRO DNA SYNThESIS Making DNA in the laboratory relies on the same basic principles outlined for replication Fig. 4.1 1. DNA replication needs the following “reagents”: enzymes to melt the two template DNA strands apart an RNA primer with a 3 ′ -hydroxyl for DNA polymerase to synthesize a new DNA strand a pool of nucleotide precur- sors plus DNA polymerase to catalyze the addi- tion of new nucleotides. To perform DNA replication in the laboratory the researcher makes a few modifcations. First the enzymes that open and unwind the template DNA are not used. Instead double- stranded DNA is converted to single-stranded DNA using heat or a strong base to disrupt the hydrogen bonds that hold the two strands together. Alternatively template DNA can be made by using a virus that packages its DNA in single-stranded form. For example M13 is a bacteriophage that infects E. coli amplifes its genome using rolling circle replication and packages the single-stranded DNA in viral particles that are released without lysing open the E. coli cell. If template DNA is cloned into the M13 genome then the template will also be manufactured as in a single-stranded form. This DNA can be isolated directly from the viral particles. During in vitro synthesis of DNA an RNA primer is not used because RNA is very unstable and degrades easily. Instead a short single- stranded oligonucleotide of DNA is used as a Bacteria and viruses use either theta replication or rolling circle replication to create new genomes. Eukaryotic cells have chromosomes with multiple origins of replication. Telomeres protect the ends of the chromosomes because each round of replication shortens the DNA. Replication in eukaryotes occurs only at a specifc point during the cell cycle. Nick one strand Origin of replication New DNA synthesized UNROLL DNA Rolling Rolling Complete new strand Old strand fully unrolled Rolling FIGURE 4.9 Rolling Circle Replication During rolling circle replication one strand of the plasmid or viral DNA is nicked and the broken strand pink separates from the circular strand purple. The gap left by the separation is flled in with new DNA starting at the origin of replication green strand. The newly synthesized DNA keeps displacing the linear strand until the circular strand is completely replicated. The linear single-stranded piece is fully “unrolled” in the process.

slide 111:

106 DNA Synthesis In Vivo and In Vitro primer. As long as the primer has a free 3 ′ -hydroxyl DNA polymerase will add nucleotides onto the end. The primers are synthesized chemically see later discussion and mixed with the single-stranded template DNA. The oligonucleotide primer has a sequence complementary to a short region on the DNA template. Therefore at least some sequence information must be known about the template. If the sequence of the template DNA is unknown it may be cloned into a vec- tor and the primer is then designed to match sequences of the vector such as the polylinker region that are close to the inserted DNA. Finally purifed DNA polymerase plus a pool of nucleotides dATP dCTP dGTP and dTTP is added to the primer and template. The primer anneals to its complementary sequence and DNA polymerase elon- gates the primer creating a new strand of DNA complementary to the template DNA. In vitro replication requires a single-stranded piece of template DNA a primer nucleotide precursors and DNA polymerase. G 2 Gap 2 B A G 1 Gap 1 G 1 phase G 2 phase M phase Cytokinesis S phase S DNA synthesis M mitosis FIGURE 4.10 Eukaryotic Cell Cycle DNA replication occurs dur- ing the S phase of the cell cycle but the chromosomes are actually separated later during mitosis or the M phase. The S and M phases are separated by G 1 and G 2 . 5 3 DNA polymerase 37C Oligonucleotide dATP dTTP dCTP dGTP Single-stranded DNA template 5 5 3 3 5 3 FIGURE 4.11 In Vitro DNA Synthesis DNA synthesis in the labora- tory uses single-stranded template DNA plus DNA polymerase an oligonucle- otide primer and nucleotide precursors. After all the components are incubated at the appropriate tempera- ture double-stranded DNA is made. ChEMICL SYNThESIS OF DNA Making DNA chemically rather than biologi- cally was one of the frst new technologies to be applied by the biotechnology industry. The ability to make short synthetic stretches of DNA is crucial to using DNA replication in laboratory techniques. DNA polymerase cannot synthesize DNA without a free 3 ′ - OH end to elongate. Therefore to use DNA polymerase in vitro the researcher must supply a short primer. Such primers are used to sequence DNA see later discussion to amplify DNA with PCR see later discussion and even to fnd genes in library screening see Chapter 3. So a short review of how primers are synthesized is included here. Research into chemical synthesis of DNA began shortly after Watson and Crick pub- lished their research on the crystal structure of DNA. H. Gobind Khorana at the Univer- sity of Chicago was an early pioneer in the study of oligonucleotide synthesis see Box 4.1. Technically oligonucleotides are

slide 112:

ChAPTER 4 any piece of DNA less than 20 nucleotides in length but today oligonucleotide denotes a short piece of DNA that is chemically synthesized. In 1970 Khorana’s lab synthesized an active tRNA molecule of 72 nucleotides Agarwal et al. 1970. The chemistry he used was ineffcient and cumbersome but some of his ideas are still used in current oligonucleotide synthesis. Today chemical synthesis is done with an automated DNA synthesizer that creates DNA by sequen- tially adding one nucleotide after another in the correct sequence order. Unlike in vivo DNA synthesis artifcial synthesis is done in the 3 ′ to 5 ′ direction. The frst step is attaching the frst nucleotide to a porous material made of controlled pore glass CPG. The frst nucleotide is not attached directly but is linked to the surface via a spacer molecule that binds to the 3 ′ -OH of the nucleotide Fig. 4.12. The column pores allow reagents to be washed through and removed easily. Using CPG is one improvement over Khorana’s technology. He used polymer beads to couple the reaction but found that the polymer swelled as the reagents passed through the column which inhibited synthesis. CPG is superior because it does not swell. CH 2 CH 2 2 O H C C 1 5 DMT Blocking group Initial nucleotide Spacer CPG Base 1 O O NH H H O O Har Gobind Khorana Marshall W. Nirenberg and Robert W. Holley are pioneers in the feld of molecular biology. The three scientists received the Nobel Prize in Physiology or Medicine in 1968 for their combined efforts in identifying which triplet codons coded for which amino acid. Khorana originally began chemical synthesis of DNA in order to help elucidate the role of different enzymes. He wanted to understand the mode of action for nucleases and phosphodiesterases but without being able to chemically synthesize a defned nucleic acid the work on enzymes would be very diffcult. Khorana’s lab determined ways to synthesize dinucleotide tri- nucleotide and tetranucleotide sequences using chemical synthesis. Rather than using single nucleotide additions his lab focused on syn- thesizing nucleotides in blocks. His ability to chemically synthesize blocks of DNA was the backbone experiment but many other discov- eries were instrumental in determining the amino acid codes. Matthaei and Nirenberg 1961 experimentally determined that poly- uridylate polyU mixed with a bacterial cell-free amino acid incorpo- rating system created polyphenylalanine. This experiment determined that the codon UUU encoded for the amino acid phenylalanine. During this time Robert Holley was working on tRNA. He specifcally identifed the structure of the tRNA for alanine by purifying tRNA-alanine from yeast fragmenting the tRNA into pieces with nucleases and logically piecing together the size of the fragments and the sites at which the enzymes were recognized. Other important discoveries included the purifcation of DNA polymerase and RNA polymerase. These experiments were woven into an elegant method of determin- ing which triplet nucleotide sequence encoded which amino acid. First Khorana’s groups began synthesizing dinucleotide trinucleotide and tetranucleotide double-stranded DNA fragments. For example one of these fragments had the following structure: 5 ′ TCTCTC 3 ′ 3 ′ AGAGAG 5 ′ Arthur Kornberg had previously won the Nobel Prize for his discov- ery and purifcation of DNA polymerase I. Khorana’s group mixed their short synthesized DNA with pure DNA polymerase to create long poly- deoxynucleotides with a known sequence. Next the DNA pieces were mixed with RNA polymerase to create long polyribonucleotides of known sequence. These were mixed with the cell-free system devised by Matthaei and Nirenberg which made polypeptides. The preceding dinucleotide example resulted in a polypeptide of repeating serine and leucine. The experiment demonstrated that TCT or CTC encoded ser- ine or leucine respectively. There was no way to determine defnitively which codon matched which amino acid so more experiments were needed. The fnal important contributions to make the fnal assignments were using purifed tRNAs labeled with 14 C. Nirenberg and Leder 1964 mixed Khorana’s synthetic trinucleotides and mixed them with the labeled tRNAs and ribosomes. Note: The isolation of pure tRNA was not possible without Robert W. Holley’s work. They looked for binding of the labeled tRNA to the trinucleotide sequence. These experiments provided clear answers to many of the trinucleotide sequences but many times the results were not very clear. It was the combination of these experiments with Khorana’s work that determined the direct genetic code. Box 4.1 Khorana Nirenberg and h olley FIGURE 4.12 Addition of a Spacer Molecule and First Base to the CPG The frst nucleotide is linked to a glass bead via a spacer molecule attached to its 3 ′ -OH group. The structure of the spacer varies but it is important to keep the syn- thesis away from the glass surface and to allow effcient removal of the completed oligonucleotide.

slide 113:

108 DNA Synthesis In Vivo and In Vitro CH 2 CH 2 C NCH 2 CH 3 CH 3 CH 3 CH 3 CH CH dR O 3 1 5 DMT Blocking group Phosphoramidite Next nucleotide will be joined here Di-isopropylamino group Deoxyribose Base O O N P O When the spacer is linked to the nucleo- tide 3 ′ -OH a chemical blocking group is attached to the 5 ′ -OH. Thus the 3 ′ -OH is the only available reactive group. Kho- rana’s early synthesis was revolutionary in this respect because he chose the dimeth- yloxytrityl DMT group which is still used as a blocking group in today’s syn- thesizers. DMT has a strong orange color and is easily removed from the 5 ′ -OH so that another nucleotide can be linked to the frst. In practice the CPG–spacer–frst nucleotide is washed and then the DMT group is removed by mild acid such as trichloroacetic acid TCA. The 5 ′ -OH is now ready to accept the next nucleotide. The effciency of removing DMT is critical. If DMT is not removed completely many of the potential oligonucleotides will fail to elongate. The orange color reveals the effciency of removal and is easily measured optically. Each nucleotide is added as a phosphoramidite which is a nucleotide that has a block- ing group protecting a 3 ′ -phosphite group Fig. 4.13. One problem with early oligo- nucleotide synthesis technology was branching. Rather than the incoming nucleotide adding to the 5 ′ end it sometimes attached to the phosphate linking two nucleotides. To prevent branching every added nucleotide has a di-isopropylamine group attached to the 3 ′ phosphite group which also stabilizes the nucleotides allowing long-term storage. Before another nucleoside is added the 3 ′ phosphite group is activated by tetrazole. The next nucleotide is then added and it reacts with the phosphite to form a dinucleotide Fig. 4.14. If the terminal nucleotide of a growing chain fails to react with an incoming nucleotide the chain must be capped off to prevent generation of an incorrect sequence by later reactions. The 5 ′ -OH of all unreacted nucleotides is acetylated with acetic anhydride. This terminates the chain so that no other nucleoside phosphoramidites can be added. Fig. 4.15 At this stage of the synthesis the column has CPG–spacer–frst nucleoside– phosphite–sec- ond nucleoside–DMT. Phosphites are used because they react much faster but they are unstable. Adding iodine oxidizes the phosphite triester into the normal phosphodiester which is more stable under acidic conditions Fig. 4.16. The column can now be prepared to add the third nucleotide. The DMT is removed with TCA and the third phosphoramidite nucleotide is added. The chains are capped so that any dinucleotides that failed to react with the third phosphoramidite are prevented from adding any more nucleosides. Finally the phosphite triester is oxidized to phosphodiester. This pro- cess continually repeats until all the desired nucleotides are added and the fnal oligonucle - otide has the correct sequence Fig. 4.17. After the fnal phosphoramidite nucleoside is added the oligonucleotide still has DMT protecting the 5 ′ -OH cyanoethyl groups attached to the phosphates and amino-protecting groups on the bases. Amino groups would react with the reagents during synthesis there- fore chemical groups are added to protect the bases before they are added to the column. All three types of protective groups must be removed. The organic salts of the protecting groups are then removed by desalting and the fnal oligonucleotide is cleaved from the CPG surface. Finally the 5 ′ -OH must be phosphorylated to make the oligonucleotides FIGURE 4.13 Nucle- oside Phosphorami- dites Are Used for Chemical Synthesis of DNA Nucleotides are modifed to ensure that the correct group reacts with the growing oligo- nucleotide. Each nucleotide has a DMT group blocking its 5 ′ -OH. The 3 ′ -OH is activated by a phosphoramidite group which is originally also pro- tected by di-isopropylamine.

slide 114:

ChAPTER 4 109 dR OH 3 5 Blocking group Base O dR O 3 5 Blocking group Blocking group Base Acid O dR O 3 5 Blocking group Activating group Activating group Activating group Base O dR 5 Base HO PHOSPHORAMIDITE NUCLEOTIDE ACTIVATION COUPLE 1ST NUCLEOTIDE TO CPG COUPLING DI-ISOPROPYLAMINO GROUP PN CH CH 3 CH 3 CH CH 3 CH 3 HN CH CH 3 CH 3 CH CH 3 CH 3 O dR O O CH 2 2 CN 5 Blocking group Base OO O P dR O 5 Blocking group Base O + O P CH 2 2 NC O CH 2 2 NC O CPG Spacer CPG Spacer O dR 5 Base O O CPG Spacer O OO FIGURE 4.14 Adding the Second Nucleotide During chemical synthesis of DNA nucleotides are added in a 3 ′ to 5 ′ direction the opposite of in vivo DNA synthesis. Therefore the 3 ′ -OH of an incoming nucleo- tide must be activated but the 5 ′ -OH must be blocked see top nucleotide. For nucleotides already attached to the bead the opposite must be done. Here the blocking group on the 5 ′ -OH of nucleotide 1 is removed by treatment with a mild acid. When the second nucleotide is added it reacts to form a dinucleotide. Chemical synthesis of DNA occurs by successively adding phosphoramidite nucleotides to the previous base attached to controlled pore glass CPG columns. Synthesis occurs in a 3 ′ to 5 ′ direction by remov- ing the 5 ′ -blocking group from the existing nucleotide and adding the new activated phosphoramidite nucleotide. After coupling the unreacted nucleotides are capped and the phosphate triester is oxidized to a phosphodiester group. Synthesis ends by removal of all blocking groups from the bases removing the cyanoethyl groups and cleavage from the CPG. biologically active. A kinase from bacteriophage T4 is used to transfer a phosphate group from ATP to the 5 ′ end of the oligonucleotides. The newly synthesized oligonucleotide is now ready for use. ChEMICL SYNThESIS OF COMPLETE GENES As mentioned earlier at each nucleoside addition in chemical synthesis a proportion of oligo- nucleotides do not react with the next base and these are capped with an acetyl group. The eff - ciency for nucleoside addition is critical because if each step has low effciency the number of full-length oligonucleotides will decrease exponentially. For example if the effciency is 50 at each round only half of the oligonucleotides add the second base one-fourth would add the

slide 115:

110 DNA Synthesis In Vivo and In Vitro third base one-eighth would get four bases one-sixteenth would get the ffth base and so on. Even if the fnal product were merely 10 bases in length poor coupling would yield minuscule amounts of full-length product. It is critical for DNA synthesizers to have about 99 effciency in each round and then truncated products are the minority of the fnal sample. With high effciencies it is possible to synthesize longer segments of DNA. At 99 effciency an oligo - nucleotide that is 100 nucleotides long would give about 30–40 fnal yield. If the desired oligonucleotide is separated from the truncated products by electrophoresis see Chapter 3 it is possible to get plenty of full-length products. Complete genes can be synthesized by linking smaller oligonucleotides together Fig. 4.18. If the complete sequence of a gene is known then long oligonucleotides can be synthesized identical to that sequence. The effciency of the DNA synthesizer usually limits the length of each segment to about 100 bases therefore the gene seg- ments are made with overlapping ends. Because oligonucleotides are single-stranded both Free 5– OH never reacted with 2 nd nucleotide dR O 5 Base HO ++ Spacer CPG dR O O O 5 Base O Spacer CPG CH 3 H 3 C O O O O – O H 3 C H 3 C O FIGURE 4.15 Capping of Unreacted Nucleotides If any of the frst nucleotides are not coupled to a second nucleotide these could react with a subsequent nucleotide creating an internal deletion of the oligonucleotide. To prevent this error the unreacted 5 ′ - OH is capped with an acetyl group from acetic anhydride. O CPG O CH 2 O O DMT Base 2 Base 1 Spacer PHOSPHODIESTER OXIDATION O CH 2 P O CPG NC CH 2 O DMT Base 2 Base 1 Spacer dR dR dR dR PHOSPHATE TRIESTER O CH 2 CH 2 2 O O NC CH 2 2 O P O O O O FIGURE 4.16 Oxidation Converts Phosphite Triester into a Phosphodiester The phosphite triester is oxi- dized to a phosphodiester by adding iodine. This stabilizes the dinucleotide for further additions.

slide 116:

ChAPTER 4 111 POL YMERAE ChAIN REACTION USES IN VITRO SYNThESIS TO AMPLIFY SMALL AMOUNTS OF DNA The polymerase chain reaction PCR amplifes small samples of DNA into large amounts much as a photocopier makes many copies of a sheet of paper. The DNA is ampli- fed using the principles of replication that is the DNA is replicated over and over by DNA polymerase until a large amount is manufac- tured. Kary Mullis invented this technique while working at Cetus in 1983. He later won the Nobel Prize in Chemistry for PCR because of its huge impact on biology and science. PCR is used in forensic medicine to identify victims or criminals by amplifying the minus- cule amounts of DNA left at a crime scene see Chapter 23 PCR can identify infectious diseases such as HIV before symptoms emerge see Chapter 21 PCR can amplify specifc seg - ments of genes without the need for cloning the segment frst in fact PCR is now used in all aspects of the biological sciences. Just as the photocopier needs more paper ink and a machine to make the copies PCR requires specifc reagents. The sample to be copied is called the template DNA and this is often a known sequence or gene. The template DNA is typically double-stranded and extremely small quantities are suf- fcient. The template DNA can be found within a complex mixture such as whole genomic DNA samples or within a fairly simple sample of bacterial plasmid DNA. The second reagent needed for PCR is a pair of oligonucleotide primers which have sequences complementary to the ends of the template DNA. The DNA primers are oligonucleotides about 8 to 20 nucleotides long. One primer anneals to the 5 ′ end of the sense strand and the other anneals to the 3 ′ end of the antisense strand of the target sequence. The primer sequences specify the exact target region of the DNA sample thus focusing the reaction on the template DNA even if it is found within a complex mixture of genomic DNA. The third reagent is a supply of strands of the gene must be synthesized and annealed to each other and then the segments are linked using ligase. Another strategy for assembly is to create strands that overlap only partially and then use DNA polymerase I to fll in the large single-stranded gaps. Couple spacer + 1st base to CPG Remove DMT from 5 − OH Add phosphoramidite Repeat for each nucleotide DEPROTECTION COUPLING CAPPING STABILIZATION Cap all unreacted nucleotides Oxidize phosphite triester to phosphodiester Remove all other protecting groups Phosphorylate 5 end of oligo Elute from column FIGURE 4.17 Flow Chart of Oligonucle- otide Synthesis Oligonucleotide synthesis has many steps that are repeated. The frst nucleotide is coupled to a bead with a spacer molecule. Next the 5 ′ -DMT is removed and activated phosphoramidite nucleotide is added to the 5 ′ end of the frst nucleotide. All the frst nucleotides that were not linked to a second nucleotide are capped to prevent any further extension. Next the phosphite triester is converted to a phosphodiester. These steps in green are repeated for the entire length of the oligonucleotide. Once the oligonucleotide has the appropriate length the steps in tan are performed on the entire molecule. DNA can be synthesized in long segments provided each base is added effciently. These long segments can be linked into one complete gene.

slide 117:

112 DNA Synthesis In Vivo and In Vitro nucleoside triphosphates and the fnal reagent is Taq DNA polymerase from Thermus aquaticus which actually makes the copies. The basic mechanism of PCR includes heat denaturation of the template annealing of the primers and making a complementary copy using DNA polymerase each step found in DNA replication. The three steps are repeated over and over until one tem- plate strand generates millions of identical copies. An amount of DNA too small to be seen can be copied so that it can be cloned into a vector or visualized on an agarose gel see Chapter 3. The process requires changing the temperature in a cyclic man- ner. Changing temperatures is accomplished by a thermocycler a machine designed to change the temperature of its heat block rapidly so that each cycle can be completed in minutes. The temperature cycles between 94°C to denature the template 50°C–60°C to anneal the primer depending on the length and sequence of the primer and 72°C for Taq polymerase to make new DNA. Before thermocyclers were developed PCR was accomplished by moving the mixture among three different water baths at different temperatures every few minutes which was very tedious. In principle the PCR cycle resembles DNA replication with a few modifcations Fig. 4.19. Like other in vitro DNA synthesis reactions the double-stranded template is denatured with high heat rather than enzymes. Then the temperature is lowered so that the primers anneal to their binding sites. The primers are made so that each binds to opposite strands of the template one at the beginning and one at the end of the gene. Then DNA polymerase elongates both primers and converts both single template strands to double-stranded DNA. Note: During sequencing only one primer is used and only one strand of the template is replicated but during PCR both strands are copied. Taq polymerase is the most widely used polymerase for PCR because it is very stable at high temperatures and does not denature at the high temperatures needed to separate the strands of the template DNA. Taq polymerase comes from Thermus aquaticus a bacterium that grows in the hot springs of Yellowstone Park USA. After the frst replication cycle the whole process is repeated. The two DNA strands are denatured at high heat and then the temperature drops to allow the primers COMPLETE SYNTHESIS OF BOTH STRANDS SYNTHESIS OF OLIGONUCLEOTIDES i.e. single-stranded segments of DNA ANNEAL SEAL NICKS WITH DNA LIGASE COMPLETE dsDNA DNA made by polymerase PARTIAL SYNTHESIS FOLLOWED BY POLYMERASE SYNTHESIS OF OLIGONUCLEOTIDES ANNEAL FILL GAPS USING DNA POLYMERASE I SEAL NICKS WITH DNA LIGASE A B FIGURE 4.18 Synthesis and Assembly of a Gene A Complete synthesis of both strands. Small genes can be chemically synthe- sized by making overlapping oligonucleotides. The com- plete sequence of the gene both coding and noncoding strands is made from small oligonucleotides that anneal to each other forming a double-stranded piece of DNA with nicks along the phosphate backbone. The nicks are then sealed by DNA ligase. B Partial synthesis followed by polymerase. To manufacture longer pieces of DNA oligonucleotides are synthesized so that a small portion of each oligonucleotide overlaps with the next. The entire sequence is manufac- tured but gaps exist in both the coding and noncoding strands. These gaps are flled using DNA polymerase I and the remaining nicks are sealed with DNA ligase.

slide 118:

ChAPTER 4 113 Primer binding site Primer Denatured template DNA Newly synthesized copy Newly synthesized copy Newly synthesized copies 1ST CYCLE Denatured template DNA Primer 5 3 3 3 5 3 5 5 Primer binding site 2ND CYCLE 5 5 3 3 3 5 5 3 3 3 3 5 3 5 5 5 REPEAT 3RD CYCLE REPEAT This short product will become the majority when full number of cycles is complete Denature Products of 2nd Cycle Denature Products of 1st Cycle 5 5 3 3 3 5 5 3 3 3 3 5 3 5 5 5 3 5 5 3 3 5 5 3 5 3 3 5 5 3 3 5 FIGURE 4.19 PCR the First Three Cycles In the frst cycle double-stranded template DNA light purple is denatured complementary primers are annealed to the primer binding sites and a new copy of the template is generated by Taq polymerase red. In the second cycle the two double-stranded products from the frst cycle are denatured to form four single-stranded templates. The same set of primers anneals to the four template strands and Taq polymerase makes each of the four double-stranded dark purple. In the third cycle the four double- stranded products from the second cycle are denatured the primers anneal and the four products from the second cycle become eight light blue. Each subsequent round of denaturation primer annealing and extension doubles the number of copies turning a small amount of template into a large amount of PCR product.

slide 119:

114 DNA Synthesis In Vivo and In Vitro to anneal to their target sequences. Taq polymerase synthesizes the next four strands and now there are four double-stranded copies of the target sequence. Early in the process some longer strands are generated however eventually only the segment fanked by the two primers is amplifed. Ultimately the template strands and early PCR products become the minority. The shorter products become the majority. The primers are key to the process of PCR. If the primers do not anneal in the correct loca- tion if the span between the primers is too large or if the primers form hairpin regions rather than annealing to the target then Taq polymerase will not be able to amplify the segment. Also if both primers anneal to the same strand the reaction will not work. If the template has a known sequence primers are synthesized based on the sequences upstream and downstream of the region to be amplifed. Modifcations exist that allow researchers to analyze unknown sequences by PCR see later discussion. MODIFICTIONS OF BAIC PCR Many different permutations of PCR have been devised since Kary Mullis developed the basic procedure. All rely on the same basic PCR reaction which takes a small amount of DNA and amplifes it by in vitro replication. Many of these variant protocols are essential tools for recombinant DNA research. Several strategies allow amplifying a DNA segment by PCR even if its sequence is unknown. For example the unknown sequence may be cloned into a vector whose sequence is known. The primers are then designed to anneal to the regions of the vector just outside the insert. In another scenario the sequence of an encoded protein is used to generate PCR primers. Remember that most amino acids are encoded by more than one codon. Thus dur- ing translation of a gene one or more codons are used for the same amino acid. Therefore if a protein sequence is converted backwards into nucleotide sequence the sequence is not unique. For example two different codons exist for histidine and glutamine and four codons exist for serine. Consequently the nucleotide sequence encoding the amino acid sequence histidine–glutamine–valine can be one of 16 different combinations. If primers are made that depend on protein sequence they will be degenerate primers and they will have a mixture of two or three different bases at the wobble positions in the triplet codon. During oligonucleotide synthesis more than one phosphoramidite nucleotide can be added to the column at a particular step. Some of the primers will have one of the nucleo- tides whereas other primers will have the other nucleotide. If many different wobble bases are added a population of primers is created each with a slightly different sequence. Within this population some will bind to the target DNA perfectly some will bind with only a few mismatches and some won’t bind at all. Of course the annealing temperature for degenerate primers is adjusted to allow for some mismatches. Inverse PCR is a trick used when sequence information is known only on one side of the target region Fig. 4.20. First a restriction enzyme is chosen that does not cut within the stretch of known DNA. The length of the recognition sequence should be six or more base pairs in order to generate reasonably long DNA segments for amplifcation by PCR. The target DNA is then cut with this restriction enzyme to yield a piece of DNA that has compatible sticky ends one upstream of the known sequence and one downstream. The two ends are ligated to form a circle. The PCR primers are designed to recognize the end regions of the known sequence. Each primer binds to a different strand of the circular DNA and they both point “outward” into the unknown DNA. PCR then amplifes the unknown DNA to give linear molecules with short stretches of known DNA at the ends and the restriction enzyme site in the middle. PCR is a process that uses DNA polymerase in an in vitro sequencing reaction. Here a double-stranded template is replicated to make two copies. Each of these products is replicated to make four and the process continues exponentially.

slide 120:

ChAPTER 4 115 Known sequence Left side Recognition site for restriction enzyme Sticky ends join Sticky ends Circular template PCR primers Right side STEP 1: MAKING THE TEMPLATE CUT WITH RESTRICTION ENZYME LIGATE ENDS Short segment of known sequence Short segment of known sequence Left side Right side Left side Right side Known sequence STEP 2: RUN PCR REACTION Left side Right side Left side Right side Left side Right side FIGURE 4.20 Inverse PCR Inverse PCR allows unknown sequences to be amplifed by PCR provided that they are located near a known sequence. The DNA is cut with a restriction enzyme that cuts upstream and downstream of the known region but not within it. The linear piece of DNA is circularized and then amplifed with primers that anneal in the known region. The PCR products have the unknown DNA from the left and right of the known sequence. These can be cloned and sequenced. Degenerate primers are designed based on amino acid sequences and contain different nucleotides at the wobble position. Inverse PCR sequences DNA near a known sequence by fnding a restriction enzyme recognition sequence away in the unknown region cutting out this template and amplifying the entire piece with Taq polymerase. REVERSE TRANSCRIPTAE PCR Reverse transcriptase PCR RT-PCR uses the enzyme reverse transcriptase to make a cDNA copy of mRNA from an organism and then uses PCR to amplify the cDNA Fig. 4.21. The advantage of this technique is evident when trying to use PCR to amplify a gene from eukaryotic DNA. Eukaryotes have introns some extremely long which inter- rupt the coding segments. After transcription the primary RNA transcript is processed to remove all the introns hence becoming mRNA. Using mRNA as the source of the target

slide 121:

116 DNA Synthesis In Vivo and In Vitro DNA relies on the cell removing the introns. In practice RT-PCR has two steps. First reverse transcriptase recognizes the 3 ′ end of primers containing repeated thymines and synthesizes a DNA strand that is complementary to the mRNA. The thymines base-pair with the polyA tail of mRNA. Then the RNA strand is replaced with another DNA strand leaving a double-stranded DNA i.e. the cDNA. Next the cDNA is amplifed using a normal PCR reaction containing appropriate primers one usually recognizes the polyA tail Taq polymerase and nucleotides. PCR IN GENETIC ENGINEERING PCR allows scientists to clone genes or segments of genes for identi- fcation and analysis. PCR also allows scientists to manipulate a gene that has already been identifed. Various modifed PCR techniques allow scientists to hybridize two separate genes or genes segments into one delete or invert regions of DNA and alter single nucleotides to change the gene and its encoded protein in a more subtle way. PCR can make cloning a foreign piece of DNA easier. Special PCR primers can generate new restriction enzyme sites at the ends of the target sequence Fig. 4.22. The primer is synthesized so that its 5 ′ end has the desired restriction enzyme site and the 3 ′ end has sequence complementary to the target. Obviously the 5 ′ end of the primer does not bind to the target DNA but as long as the 3 ′ end has enough matches to the target then the primer will still anneal. Taq polymerase primes synthesis from the 3 ′ end therefore the enzyme is not bothered by mismatched 5 ′ sequences. The resulting PCR product can easily be digested with the corresponding restriction enzyme and ligated into the appropriate vector. Rather than incorporating restriction enzyme sites into the ends of the PCR product TA cloning will clone any PCR product directly Fig. 4.23. Taq polymerase has terminal transferase activity that generates a single adenine overhang on the ends of the PCR products it makes. Special vectors containing a single thymine overhang have been developed and simply mixing the PCR product with the TA cloning vector plus DNA ligase clones the PCR product into the vector without any special modifcations. Original gene mRNA cDNA RT - PCR Multiple copies Exon Exon Exon Intron Intron Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon TRANSCRIPTION AND PROCESSING REVERSE TRANSCRIPTASE PCR Exon Exon Exon Exon Exon Target sequence Double-stranded DNA Front of primer matches target Primer Primer Extra bases form cut site Target sequence FIGURE 4.22 Incorporation of Artifcial Restriction Enzyme Sites Primers for PCR can be designed to have nonhomologous regions at the 5 ′ end that contain the recognition sequence for a particular restriction enzyme. After PCR the amplifed product has the restriction enzyme sites at both ends. If the PCR product is digested with the restriction enzyme this generates sticky ends that are compatible with a chosen vector. RT-PCR uses reverse transcriptase to convert mRNA into double-stranded DNA and then the gene with- out any introns can be amplifed by regular PCR. FIGURE 4.21 Reverse Transcrip- tase PCR RT-PCR is a two-step proce- dure that involves making a cDNA copy of the mRNA and then using PCR to amplify the cDNA.

slide 122:

ChAPTER 4 117 PCR can be used to manipulate cloned genes also. Two different gene segments can be hybridized into one using overlap PCR Fig. 4.24. Here PCR amplifcation occurs with three primers: one is complementary to the beginning of the frst gene segment one is complementary to the end of the second gene segment and a third is half complementary to the end of gene segment 1 and half complementary to the beginning of gene segment 2. During PCR the two gene segments become fused into one by a mechanism that is hard to visualize but probably involves looping of some of the early PCR products. PCR can be used to create large deletions or insertions into a gene Fig. 4.25. Once again the design of the PCR primers is key to the construction. For example primers to generate insertions have two regions: the frst half is homologous to the sequence around the insertion point the second half has sequences complementary to the insert sequence. For exam- ple suppose an antibiotic resistance gene such as npt confers resistance to neomycin is to be inserted into a cloning vector. The primers would have their 5 ′ ends complementary to the sequence fanking the insertion point on the vector and their 3 ′ ends complementary to the ends of the npt gene. First the primers are used to amplify the npt gene and give a product with sequences homologous to the vector fanking both ends. Next the PCR product is transformed into bacteria harboring the vector. The npt gene recombines with the insertion point by homologous recombination resulting in insertion of the npt gene into the vector. The insertion points will determine whether the antibiotic cas- sette causes just an insertion or both an insertion plus a deletion. If the two PCR primers recog- nize separate homologous recombination sites then the incoming PCR segment will recombine at these two sites. Homologous recombination then results in the npt gene replacing a piece of the vector rather than merely inserting at one particular location. PCR can also generate nucleotide changes in a gene by directed mutagenesis Fig. 4.26. Usually only one or a few adjacent nucleotides are changed. First a mutagenic PCR primer is synthesized that has nucleotide mismatches in the middle region of the primer. The primer Double-stranded DNA 5 3 3 5 5 3 A A 3 5 A-overhang TA cloning plasmid TERMINAL TRANSFERASE ACTIVITY OF Taq POLYMERASE INSERT INTO CLONING SITE WITH T-OVERHANGS A T T A Overlap primer Target sequence 1 Target sequence 2 Target sequence 1 Target sequence 2 DNA 1 DNA 2 EVENTUAL PRODUCT FIGURE 4.24 Overlap PCR Overlapping primers can be used to link two different gene segments. In this scheme the overlapping primer has one end with sequences complementary to target sequence 1 and the other half similar to target sequence 2. The PCR reaction will create a product with these two regions linked together. FIGURE 4.23 TA Cloning of PCR Products When Taq polymerase amplifes a piece of DNA during PCR the terminal transferase activity adds an extra adenine at the 3 ′ ends. The TA cloning vector was designed so that when linearized it has a single 5 ′ -thymine overhang. The PCR product can be ligated into this vector without the need for special restriction enzyme sites.

slide 123:

118 DNA Synthesis In Vivo and In Vitro will anneal to the target site with the mismatch in the center. The primer needs to have enough matching nucleotides on both sides of the mismatch so that binding is stable during the PCR reaction. The mutagenic primer is paired with a normal primer. The PCR reaction then ampli- fes the target DNA incorporating the changes at the end with the mutagenic primer. These changes may be relatively subtle but if the right nucleotides are changed then a critical amino acid may be changed. One amino acid change can alter the entire function of a protein. Such an approach is often used to assess the importance of particular amino acids within a protein. PCR OF DNA CN DETERMINE ThE SEQUENCE OF BAES Being able to quickly and easily determine the sequence of any gene has been the driv- ing force for the recent advances made in biotechnology. Frederick Sanger developed a method for sequencing a gene in vitro in 1974. He was interested in the amino acid sequence of insulin and decided to deduce the sequence of the protein from the nucleotide sequence. He invented the chain termination sequencing method which is still used today Fig. 4.27. Much like DNA replication chain termination sequenc- ing requires a primer DNA polymerase a single-stranded DNA template and deoxy- nucleotides. During in vitro sequencing reactions these components are mixed and DNA polymerase makes many copies of the original template. The frst trick needed to deduce the sequence is to stop synthesis of the newly synthesized DNA chains at each base pair. Consequently the fragments generated will differ in size by one base pair and when separated by gel electropho- resis create a ladder of fragments. The next step is to fgure out the identity of the last nucleotide. If the fnal base pair for each fragment is known the sequence may be directly read from the gel reading from bottom to top. But how do we know what the fnal base is for each fragment on the sequencing ladder DNA polymerase synthesizes a new strand of DNA based on the template sequence. The chain consists of deoxynucleotides each with a hydroxyl group at the 3 ′ position on the deoxyribose ring. DNA polymerase adds the next nucleotide by linking the phosphate of the incoming nucleotide to the 3 ′ -hydroxyl of the previous nucleotide. If a nucleotide lacks The 5 ′ end of PCR primers does not need to be complementary to the template DNA and can be designed to add restriction enzyme sites to the PCR product. The terminal transferase activity by Taq polymerase adds a single adenine onto the 3 ′ end of the PCR product. These traits allow the PCR product to be cloned into a vector. PCR can be used to delete insert and even fuse different gene segments. PCR can be used to make small changes in nucleotide sequences by directed mutagenesis. ASSEMBLE BY OVERLAP PCR TRANSFORM INTO TARGET CELL HOMOLOGOUS CROSSING OVER Forward primer Reverse primer Resistance cassette with homologous ends Gene to be deleted Chromosome with deletion Barcode sequence Resistance gene Resistance gene FIGURE 4.25 Gen- eration of Insertions or Deletions by PCR In the frst step a spe - cifcally targeted cassette is constructed by PCR. This contains both a suitable marker gene and upstream and downstream sequences homologous to the target site. The engineered cassette is transformed into the host cell and homologous cross- ing over occurs. Recombi- nants are selected by the antibiotic resistance carried on the cassette. The barcode sequence is a unique DNA sequence only found in the cassette used to identify the location of the cassette.

slide 124:

ChAPTER 4 119 the 3 ′ -hydroxyl no further nucleotides can be added and the chain is terminated Fig. 4.28. During a sequencing reaction a certain percentage of nucleotides with no 3 ′ -hydroxyl called dideoxynucleotides are mixed with the normal deoxynucleotides. Such reactions typically have a maximum length of about 800 nucleotides. The fragments are relatively small for DNA and vary in length by only one nucleotide there- fore they must be separated by size using polyacrylamide gel electrophoresis see Chapter 3. The principle is the same as for agarose gel electrophoresis but polyacrylamide has smaller pores and so smaller fragments can be separated with higher resolution. The sequence is actually read from the bottom of the gel to the top because the fragments terminated closest to the primer are smaller hence run faster than the ones further from the primer. The bands appear as a ladder each separated by one nucleotide therefore each band represents the fragments ending with the dideoxynucleotide complementary to the template strand. Automated DNA sequencing uses a PCR-type reaction to sequence DNA. In PCR sequencing or cycle sequencing the template DNA with unknown sequence is amplifed by Taq poly- merase as any normal PCR reaction. Taq DNA polymerase was modifed to remove its proof - reading ability and increase the speed at which it incorporates nucleotides. Cycle sequencing PCR WITH NORMAL AND MUTAGENIC PRIMER Gene to be mutated Normal primer Mutagenic primer 5 5 3 3 5 5 3 5 3 3 Mutagenic primer forms 1 base-pair mismatch Mutation is part of PCR product G G C T A G G C C T A C C C A A T C C G G A T G G G C T A G G C C T A C C C A A T C T G G A T G CCAATC T GGATG GGCTAG A CCTAC FIGURE 4.26 Direct Mutagenesis Using PCR The gene to be mutated is cloned and the entire sequence is known. To alter one specifc nucleotide normal and mutagenic primers are combined in a PCR reaction. The mutagenic primer will have a mismatch in the middle but the remaining sequences will be complementary. The PCR product will incorporate the sequence of the mutagenic primer.

slide 125:

120 DNA Synthesis In Vivo and In Vitro reaction mixtures include all four deoxynucleotides all four dideoxynucleotides a single primer template DNA and Taq polymerase. To discern the identity of the dideoxynucleotide they are linked to a unique fuorophore for each of the four nucleotides. The samples are amplifed in a thermocycler. First the template DNA is denatured at a high temperature then the temperature is lowered to anneal the primer and fnally the tempera - ture is raised to 72°C the optimal temperature for Taq polymerase to make DNA copies of the template. During polymerization dideoxynucleotides are incorporated and cause chain termination. The ratio of dideoxynucleotides to deoxynucleotides is adjusted to ensure that some fragments stop at each G A T or C of the template strand. After Taq polymerase makes thousands of copies of the template each stopping at a different nucleotide the entire Direction of movement G A T T A G C A Template strand Primer Largest Slowest Smallest Fastest SEPARATE FRAGMENTS BY ELECTROPHORESIS STOP DNA SYNTHESIS AT EACH NUCLEOTIDE + − A AC ACG ACGA ACGAT ACGATT ACGATTA ACGATTAG DNA polymerase AC GA TT AG TG CT AA TC FIGURE 4.27 Chain Termination Method of Sequencing During chain termination DNA polymerase synthesizes many different strands of DNA from the single-stranded template. DNA poly- merase will stop at each nucleotide such that strands of all possible lengths are made. They are separated by size using electropho- resis. The smallest fragments are at the bottom and represent the primer plus only the frst one or two nucleotides of the template DNA. Longer fragments contain the primer plus longer stretches of synthesized DNA complementary to to the template DNA.

slide 126:

ChAPTER 4 121 mixture is separated in one lane of a sequencing gel Fig. 4.29. Bands of four different colors are seen corresponding to the four fuorescently labeled dideoxy - nucleotides and hence the four bases. Cycle sequencing has many advantages. During cycle sequencing each round brings the temperature to 95°C which destroys any secondary structures or double-stranded regions. Another advan- tage of cycle sequencing is to control primer hybridization. Some primers do not work well with regular sequencing reactions because they bind to closely related sequences. During cycle sequenc- ing the primer annealing temperature is controlled and can be set quite high in order to combat nonspecifc binding. Finally cycle sequencing requires very little template DNA therefore sequenc- ing can be done from smaller samples. Another advance in sequencing has been the detection system. Automatic DNA sequencers detect each of the fuores - cent tags and record the sequence of bases Fig. 4.30. Some automatic DNA sequencers can read up to 384 different DNA samples using capillary tubes flled with gel matrix to separate the DNA frag- ments. At the bottom of each capillary tube is a fuorescent activator which emits light to excite the fuorescent dyes. On the other side is the detector which reads the wavelength of light that the fuorescent dye emits. As each fragment passes the detector it measures the wavelength and records the data as a peak on a graph. For each fuorescent dye a peak is recorded and assigned to the appropriate base. An attached computer records and compiles the data into the DNA sequence. Automated sequencing has a large startup cost because the sequence analyzer is quite expen- sive but they run multiple samples at one time and thus the cost per sample is quite low. Many universities and companies have a centralized facility that does the sequencing for all the researchers. In fact sequencing has become so automated that many researchers just send their template DNA and primers to a company that specializes in sequencing. RANDOM TERMINATION AT “G” POSITIONS RUN ON SEQUENCING GEL Original sequence: TC GG ACC G CT GG TA G CA Mixture of dCTP dATP dTTP dGTP G and ddGTP G. 1. TC G 2. TC G G 3. TC GG ACC G 4. TC GG ACC G CT G 5. TC GG ACC G CT G G 6. TC GG ACC G CT GG TA G G A T G G T C G C C A G G C T 6. A C G A T G G T C G C C A G G C T G G T C G C C A G G C T 5. G T C G C C A G G C T 4. G C C A G G C T 3. G G C T 2. Direction of movement + − Load sample here Sequences ending in “G” 3 6 5 4 3 2 1 5 G C T 1. Chain-terminating dideoxynucleotides are the key to determining DNA sequence. When these are incor- porated into an in vitro replication reaction DNA polymerase cannot add any more nucleotides and the synthesis reaction ends. In cycle sequencing a PCR reaction includes a controlled amount of fuores - cently labeled dideoxynucleotides. Taq polymerase stops adding nucleotides when a dideoxynucleo- tide is incorporated. The fuorescent tag is used to identify the ending base of each fragment using an automated sequencer. FIGURE 4.28 Chain Termination by Dide- oxynucleotides During the sequencing reaction DNA polymerase makes multiple copies of the original sequence. Sequencing reaction mixtures contain dideoxynucleotides that terminate growing DNA chains. The example here shows a sample reaction which includes triphosphates of both deoxyguanosine dG and dideoxyguanosine ddG. Whenever ddG is incorporated shown in red it causes termination of the growing chain. If dG blue is incorporated the chain will continue to grow. When the sequencing reaction contain- ing the ddG is separated on a polyacrylamide gel the frag- ments are separated by size. Each band directly represents the fragment ending in G from the original sequence.

slide 127:

122 DNA Synthesis In Vivo and In Vitro FIGURE 4.29 Cycle Sequencing During cycle sequenc- ing the reaction contains template DNA primer Taq polymerase deoxynucleo- tides and dideoxynucleo- tides. Each of the different dideoxynucleotides has a different fuorescent label attached. The automated sequencer detects the color and compiles the sequence data. 3 MIX FOLLOWING IN PCR REACTION: 1. Template DNA 5 TGCTACCAGCGGTCCGA 3 2. Primer 3. Taq Polymerase 4. Deoxynucleotides dATP dTTP dGTP dCTP 5. Dideoxynucleotides ddATP ddTTP ddGTP ddCTP EXAMPLE REACTION PRODUCT: 5 Direction of movement + − 5 TGCTACCAGCGGTCCGA 3 3 CCAGGCT 5 Primer binding site A C G A T G G T C G C C A G G C T FIGURE 4.30 Data from an Automated Sequencer A representative set of data from an automated sequencer. The fuorescent peaks for the individual bases are shown. The com - puter compiles the information into a sequence fle for the researcher. NEXT -GENERATION SEQUENCING TEChNOLOGIES Sequencing DNA using chain termination was the workhorse for the initial sequencing of the frst human genome. Throughout the human genome project the cost for each base of DNA dropped by making advances in the capillary electrophoresis chain–termination method. The cost for sequencing one million bases of DNA in September 2001 the end of the initial sequence was 5292 and so for the whole human genome over 95 million. Because of the advances in chain termination sequencing the human genome project was done early and under budget. As of October 2013 the cost to sequence one million base pairs of DNA dropped less than 6 cents. The cost to sequence an entire human genome there- fore is a mere 5096. The incredible decrease in cost stems from the advent of massively parallel sequencing which is a descriptive name for next-generation sequencing. These

slide 128:

ChAPTER 4 123 technologies use a type of platform that can hold millions of DNA fragments in separate locations. There are many different chemistries used in next-generation sequencing and they are rapidly changing. Two sequencing platforms 454 sequencing and Illumina are outlined here. The frst step of any next-generation technology is to prepare the DNA for amplifcation by PCR Fig. 4.31. Genomic DNA is isolated from the organism of interest according to a standard DNA isolation protocol. The pure DNA is then sheared into small fragments using sonication. To amplify each of the fragments the end of each piece of DNA must have known sequence. This is impossible especially for genomes that have never been sequenced. And even if the genome sequence is known sonication creates random breaks in the DNA so there is no way to truly know the sequence at each end. The trick to circumvent this problem is to add linkers or adaptors which are short DNA pieces with a known sequence. They are added to the ends using the TA cloning technology. The linker or adaptor sequence depends on which of the next-generation sequencing technologies are employed. A bar- code sequence or an index sequence is a key feature of the adaptor. The barcode or index sequence is much like a zip code in your address: the sequence is unique to the sample of DNA and it allows multiple samples of DNA to be analyzed at the same time a procedure called multiplexing. Once the DNA sample is fragmented and adaptors are added onto each end the DNA is attached to a solid surface so that individual DNA fragments are separated from each other. In 454 sequencing the DNA fragments are attached to beads via the adaptors. The set of beads with small DNA oligonucleotides complementary to the adaptor is mixed with the DNA at a ratio such that one DNA fragment will attach to a single bead. Ensuring that a single DNA from the genome attaches to a single bead is a critical step for sequencing. In the Illumina sequencing methodology the same principle applies but the DNA fragments are added to the surface of a fow cell. The surface has DNA primers complementary to the adaptor scattered on the surface. These must also be of suffcient distance from each other to appear as a separate location by sensors at the bottom of the fow cell. The next step for next-generation sequencing is to create multiple copies of the single piece of DNA using PCR. For 454 sequencing emulsion PCR creates multiple copies of the single piece of DNA that attaches to the bead. The process begins by creating an emulsion of oil and water such that only one bead is found in each of the water droplets. In addition the water droplets contain free deoxynucleotides primers complementary to the adaptors and Taq DNA polymerase. Within the droplets the DNA fragment is amplifed using the tradi - tional denaturation annealing and elongations steps. The fnal result is a bead coated with identical copies of the DNA fragment. The emulsion prevents the DNA from one bead diffus- ing to a different bead. In a similar fashion DNA fragments attached to the surface of a fow cell for Illumina sequencing are amplifed by incubating the fow cell with deoxynucleotides and DNA poly - merase in a process called bridge amplifcation . The primers used for amplifying the DNA fragment are attached to the fow cell so the DNA anneals to another primer on the surface forming a bridge. These are amplifed and released to form a cluster of identical DNA frag - ments. Once a cluster of identical DNA pieces is produced on the bead or fow cell these pieces are denatured into single-stranded DNAs competent for sequencing. Sequencing for 454 and Illumina occurs as the single-stranded DNA is replicated. Each technology uses a different detection method for identifying the sequence but in both 454 and Illumina sequencing each nucleotide is identifed one by one that is as a nucleotide is added to the complemen - tary strand the identity is recorded by a sensor and stored by an attached computer. This method of sequencing is called sequencing by synthesis. In 454 sequencing the beads

slide 129:

124 DNA Synthesis In Vivo and In Vitro FRAGMENT DNA AND ADD ADAPTORS TO ENDS SEPARATE EACH TEMPLATE TO INDIVIDUAL DROPLETS OR SITES ON CHIP AMPLIFY EACH SINGLE TEMPLATE Solid support Solid support Bead Droplet of H 2 O Oil Bead Droplet of H 2 O Oil I II III 454 sequencing Illumina sequencing 454 sequencing Illumina sequencing Polonies FIGURE 4.31 Next-Generation Sequencing During next-generation sequencing the DNA is prepared for amplifcation by isolating and fragmenting the sample. Adaptors are added to the ends of the fragments and then annealed to complementary oligonucleotides on the surface of the bead for 454 sequencing left or fow cell for Illumina sequencing right. These are added such that one unique fragment attaches to one bead left or the spacing of the attached DNA is suffcient for recognition by the detector below the fow cell. Each single DNA template is amplifed and denatured to be single-stranded. Sequence is determined after annealing primer to one end of the template and determining the identity of the each individually added nucleotide see text for details. The sequence is detected by sensors below the well of the picotiter plate or below the fow cell and recorded by an attached computer.

slide 130:

ChAPTER 4 125 coated with copies of DNA are separated into a picotiter plate such that only one bead is within each well. A picotiter plate has over a million individual wells or holes in the surface that hold 75 picoliters. The lower surface of the well is optically clear to allow the light to be visualized by the detector. After a single bead enters the individual well or hole a primer is annealed to the adaptor sequence on the fragment. Just like pyrosequencing one of the four deoxynucleotides and DNA polymerase are foated across the picotiter plates and if the template in the well has the complementary base pair DNA polymerase adds the nucleotide to the primer releasing a molecule of pyrophosphate. The well of the picotiter plate also contains luciferase and sulfurylase which react with the released pyrophosphate and release a fash of light. At the bottom of the well the attached sensor records the fash of light and sends the information to the computer. The fash of light appears only in the cells where the added nucleotide was incorporated and the other cells remain dark. Each of the four nucleo- tides is added separately and washed away before adding another. Illumina sequencing also determines each nucleotide as it is incorporated but uses a unique reversible fuorescent dye for guanine cytosine adenine and thymine. After a primer is annealed to the adaptor sequence on the DNA template all four nucleotides are added to the fow cell simultaneously. The wavelength of light released at each DNA cluster is recorded by the computer. The fuorescent dye terminator is removed and another batch of four fuores - cently labeled nucleotides is added. Again the signal for each spot is recorded and stored. As each nucleotide is added the computer compiles a sequence for each spot on the fow cell. For both methods the recording of data from each well of the picotiter plate or spot on the fow cell is compiled as a sequence for each separate DNA. Each of these pieces of sequence information is called a read. The technology for each method limits the number of nucleo- tides that can be determined with certainty ranging from 50 to 400 base pairs depending on the sequencing machine and sequencing technology. To enhance the quality sequencing data results paired end reads confrm the sequence information by repeating the entire sequence reaction but using a primer to the opposite side of the fragment thus essentially sequencing from both ends of the DNA fragment. The amount of sequence data compiled by next-generation sequencing is tremendous and with- out the increase in computer power and storage there would be no way possible to compile the SEQUENCE OF REACTIONS Bead Droplet of H 2 O Oil Sulfurylase Luciferase IV 454 sequencing Illumina sequencing Polymerase ATP Light + Oxyluciferin Luciferin T A Annealed primer Wash Wash Remove fluorophore Add all four nucleotides each with a different fluorophore Image ATP FIGURE 4.31 Cont’d

slide 131:

126 DNA Synthesis In Vivo and In Vitro data into a linear DNA sequence. Each read— that is each set of sequence information from each well of the 1.7 million wells of the picotiter plate in 454 sequencing or each of the approximately 2 million DNA clusters on the fow cell in Illumina sequencing—represents a small unique piece of an entire genome. Computers compare each piece of sequence back to a reference genome if available and the fnal output has the reference sequence across the top with each of the reads aligned below Fig. 4.32. The number of sequences that map to a region in the genome is called read depth and is used to ensure that a change is not an error in the sequencing process. If for example a single nucleotide substitution was found in a single read but the other 29 reads for that region were identical to the reference genome this substitution would most likely be consid- ered an error. If on the other hand there was a single nucleotide substitution for 29 of 30 reads for that region then this is most likely a true difference between the genome sequenced and the reference genome. If for some reason a region of the genome had a read depth of only 2 and a mutation was identifed in this region the validity would be highly suspect. On the other hand if there was a mutation in 98 of the reads in a region where the read depth was 200 then the mutation is most likely a real substitution. The ability to sequence an entire human genome has gone from a multiyear multibillion dollar project to a simple procedure done in a few hours to few days depending on the machine and technology used. In fact the goal is to be able to reduce the cost of a human genome sequence to 1000. The ability to quickly assess the genomic sequence is going to change many disci- plines. Understanding one genome can be misleading but now we have the ability to compare genomes among different organisms and even among different people. In fact a consortium of different universities and companies has compiled an integrated map of genetic variations by sequencing over 1000 different human genomes. Another major goal for this new technology is to compare cancer genomes to the normal genome. Early studies are identifying what mutations are common to cancers and identifying mutations in genes that determine whether or not the patient will respond to a particular therapy. The ability to ascertain so much information so fast is bound to have applications that have yet to be discovered. Summary This chapter outlines the process of DNA replication. First to replicate the DNA DNA gyrase and DNA helicase relax the coiling in the DNA. The relaxed DNA is open and ready for the replisome to assemble at the origin. Single-stranded binding protein coats or binds to the open DNA which keeps the DNA stable. Then PriA prepares an RNA primer at the origin to provide a 3 ′ -OH group for DNA polymerase to attach the complementary bases during replication. DNA polymerase makes new DNA only in a 5 ′ to 3 ′ direction so on the lead- ing strand the whole strand is made in one piece. Because the lagging strand is antiparallel DNA polymerase has to make the strand in smaller segments called Okazaki fragments. C G T G G G A C T T T A A A A G G A T A G A A G A A T T A C C T G A T A C T G C G T T A G G A T C C C G T G G G A C T T T A Reference genome G T G G G A C T T T A A A G G G A C T T T A A A A G G A C T T T A A A A G G A T A A C T T T A G A A G G A T A G G A T A G A A T T A C G T T A G A A T T A C C T G T A G A A T T A C C T G A T A FIGURE 4.32 Data from Illumina Sequencing The reference sequence is listed across the top. The sequence for the individual reads are shown and aligned under the identical sequence in the reference genome. The yellow boxes represent the location where the read and the reference genome differ. If these differences are found in every read then they are most likely a true difference. If the difference is seen in only one read then the change is most likely a sequencing error. Read depth varies from one nucleotide to the next see blue rectangles. The higher the read depth the more confdence the researcher has in the data.

slide 132:

ChAPTER 4 127 In vitro DNA synthesis can be made by purifed DNA polymerase or by chemically linking nucleotides. In reactions done with chemical reagents the DNA is single-stranded and is short because the process is not very effcient. Chemical synthesis of DNA is primarily used for making short primers or oligonucleotides. In vitro DNA synthesis by DNA polymerase is very versatile and can be used to amplify a piece of DNA from a few copies to millions using PCR. Modifcations of PCR include inverse PCR to amplify unknown regions of DNA and RT-PCR of mRNA rather than DNA creates copies of genes without any introns. Additionally PCR can be used to clone copies of genomic DNA into a vector using TA cloning or by adding novel restriction enzyme sites at the end of the PCR product. Finally PCR can mutate template DNA by inserting or deleting regions linking two separate regions together or by mutating single nucleotides. In vitro DNA synthesis is the basis for determining the sequence of DNA. In cycle sequencing a single reaction contains four different fuorescently labeled dideoxynucleo - tides and unlabeled deoxynucleotides at a ratio that ensures one dideoxynucleotide incorporates at each nucleotide position of the template. The fnal reaction creates a tube flled with DNA fragments that end at each possible position and that end nucleotide is fuorescently labeled. As the fragments are separated by size in a capillary tube-flled gel matrix the smallest fragments exit the bottom of the tube frst. As each subsequent frag - ment passes a detector the identity of the fuorescent tag is determined and recorded. In contrast next-generation sequencing reads nucleotides one by one as they are added to a primer 454 sequencing employs pyrosequencing in order to determine what nucleotide is added. Since the release of pyrophosphate is the same for each of the four nucleo- tides only one nucleotide is added at a time. The fash of light is recorded for each DNA fragment template where the nucleotide was incorporated. Illumina next-generation sequencing uses reversible 3 ′ -fuorescent dye-linked nucleotides which are added to the DNA template. Thus all four nucleotides are added simultaneously to a fow cell containing the DNA templates. After the identity of the nucleotide that is added at each cluster is recorded the fuorescent dye is removed from the nucleotide and washed away. The main difference between typical chain-termination sequencing and next-generation sequencing is the scale. Chain-termination sequencing occurs on one single template DNA. In contrast 454 sequencing uses a picotiter plate with 1.7 million wells and Illu- mina’s fow cell with several million DNA clusters each well or cluster representing a single unique piece of DNA from the genome. 1. Which of the following enzymes aid in uncoiling DNA a. DNA gyrase b. DNA helicase c. topoisomerase IV d. single-stranded binding protein e. all of the above 2. Why is an RNA primer necessary during replication a. DNA polymerase III requires a 3 ′ -OH to elongate DNA. b. An RNA primer is not needed for elongation. c. DNA polymerase requires a 5 ′ -phosphate before it can elongate the DNA. d. A DNA primer is needed for replication instead of an RNA primer. e. An RNA primer is only needed once the DNA has been elongated and DNA polymerase is trying to fll in the gaps. End-of-Chapter Questions Continued

slide 133:

128 DNA Synthesis In Vivo and In Vitro 3. What are the functions of the two essential subunits of DNA polymerase III a. Both subunits synthesize the lagging strand only. b. One subunit links nucleotides and the other ensures accuracy. c. They both function as a clamp to hold the complex to the DNA. d. The subunits function to break apart the bonds in the DNA strand. e. One subunit removes the RNA primer and the other synthesizes DNA. 4. Which of the following statements about mismatch repair is incorrect a. MutSHL excise the mismatched nucleotides from the DNA. b. Mismatch repair proteins identify a mistake in DNA replication. c. The mismatch proteins recruit DNA polymerase III to synthesize new DNA after the proteins have excised the mismatched nucleotides. d. MutSHL can synthesize new DNA after a mismatch has been excised. e. MutSHL monitors the methylation state of the DNA to determine which strand contains the correct base when there is a mismatch. 5. Which of the following statements is incorrect regarding DNA replication a. Rolling circle and theta replication are common for prokaryotes and viruses. b. Each round of replication for linear chromosomes such as in eukaryotes shortens the length of the chromosome. c. Prokaryotic chromosomes have multiple origins of replication. d. Eukaryotic replication only occurs during the S-phase of the cell cycle. e. Eukaryotic chromosomes have multiple origins of replication. 6. During in vitro DNA replication which of the following components is not required a. single-stranded DNA b. a primer containing a 3 ′ -OH c. DNA helicase to separate the strands d. DNA polymerase to catalyze the reaction e. nucleotide pr ecursors 7. Which of the following is not a step in the chemical synthesis of DNA a. The 3 ′ phosphate group is added using phosphorylase. b. The addition of a blocking compound to protect the 3 ′ phosphite from reacting improperly. c. The 5 ′ -OH is phosphorylated by bacteriophage T4 kinase. d. The addition of acetic anhydride and dimethylaminopyridine to cap the 5 ′ -OH group of unreacted nucleotides. e. The amino groups on the bases are modifed by other chemical groups to prevent the bases from reacting during the elongation process. 8. During chemical synthesis of DNA a portion of the nucleotides does not react. How can the effciency of such reactions be increased a. The unreacted nucleosides are not acetylated so that more can be added in subsequent reactions. b. The effciency of the reaction is not critical. Instead the quality of the fnal product is more important than the quantity. c. The desired oligonucleotide can be separated from the truncated oligos by electrophoresis. d. Oligonucleotides should be made using DNA polymerase III instead of in vitro chemical synthesis.

slide 134:

ChAPTER 4 129 Continued e. The reaction times can be increased to allow the reaction to be more effcient. 9. Which of the following components terminates the chain in a sequencing reaction a. dideoxynucleotides b. Klenow polymerase c. DNA polymerase III d. deoxynucleotides e. DNA primers 10. Which of the following statements about PCR is incorrect a. The DNA template is denatured using helicase. b. PCR is used to obtain millions of copies of a specifc region of DNA. c. A thermostable DNA polymerase is used because of the high tempera- tures required in PCR. d. Template DNA a set of primers deoxynucleotides a thermostable DNA polymerase and a thermocycler are the important components in PCR. e. Primers are needed because DNA polymerase cannot initiate synthesis but can only elongate from an existing 3 ′ -OH. 11. Which of the following is not an advantage of automated cycle sequencing over the chain termination method of sequencing a. The reactions in an automated sequencer can be performed faster. b. The reactions performed in an automated sequencer can be read by a computer rather than a human. c. Higher temperatures are used during cycle sequencing which prevent secondary structures from forming in the DNA and early termination of the reaction. d. In cycle sequencing nonspecifc interactions by the primer can be controlled by raising the annealing temperature. e. All of the above are advantages of cycle sequencing. 12. Which of the following statements about degenerate primers is not correct a. Degenerate primers have a mixture of two or three bases at the wobble position in the codon. b. Because of the nature of degenerate primers the annealing temperature during PCR using these primers must be lowered to account for the mismatches. c. Degenerate primers are often designed by working backwards from a known amino acid sequence. d. Degenerate primers are used even when the sequence of DNA is known. e. Within a population of degenerate primers some will bind perfectly some will bind with mismatches and others will not bind. 13. Which of the following techniques would allow a researcher to determine the genetic relatedness between two samples of DNA a. inverse PCR b. reverse transcriptase PCR c. T A cloning d. overlap PCR e. randomly amplifed polymorphic DNA

slide 135:

130 DNA Synthesis In Vivo and In Vitro Further Reading Agarwal K. L. Büchi H. Caruthers M. H. Gupta N. Khorana H. G. Kleppe K. et al. 1970. Total synthesis of the gene for an alanine transfer ribonucleic acid from yeast. Nature 227 27–34. Hillier L. W. Marth G. T. Quinlan A. R. Dooling D. Fewell G. Barnett D. et al. 2008. Whole-genome sequencing and variant discovery in C. elegans. Nature Methods 5 183–188. Mardis E. R. 2008. The impact of next-generation sequencing technology on genetics. Trends in Genetics: TIG 24 133–141. Mardis E. R. 201 1. A decade’s perspective on DNA sequencing technology. Nature 470 198–203. Metzker M. L. 2010. Sequencing technologies—the next generation. Nature Reviews. Genetics 11 31–46. Taft-Benz S. A. Schaaper R. M. 2004. The theta subunit of Escherichia coli DNA polymerase III: a role in stabi- lizing the epsilon proofreading subunit. Journal of Bacteriology 186 2774–2780. Yoo B. Kavishwar A. Ghosh S. K. Barteneva N. Yigit M. V. Moore A. et al. 2014. Detection of miRNA expres- sion in intact cells using activatable sensor oligonucleotides. Chemical Biology 21 199–204. 14. Why would a researcher want to use RT-PCR a. RT-PCR is used to compare two different samples of DNA for relatedness. b. RT-PCR creates an mRNA molecule from a known DNA sequence. c. RT-PCR generates a protein sequence from mRNA. d. RT-PCR generates a DNA molecule without the noncoding introns from eukaryotic mRNA. e. All of the above are applications for RT-PCR. 15. Which of the following is an application for PCR a. site-dir ected mutagenesis b. creation of insertions deletions and fusions of different gene segments c. amplifcation of specifc segments of DNA d. for cloning into vectors e. all of the above 16. In _______________ sequencing the DNA fragments are bound to a solid surface via a fow cell. a. Illumina b. 454 c. chain termination d. Sanger e. cycle 17. Flashes of light are emitted whenever a base is added in _______________ sequencing. a. Illumina b. 454 c. chain termination d. Sanger e. cycle

slide 136:

CHAPTER 131 Biotechnology Copyright © 2016 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/B978-0-12-385015-7.00005-3 5 RNA-Based T echnologies Noncoding RNA Plays Many Roles RNA Coordinates Genomic Integrity in Eukaryotes RNA Protects Genomes from Invading Viruses RNA Modulates T ranscription Antisense RNA Modulates mRNA Expression Antisense RNA Controls a Variety of Biological Phenomena Using Antisense RNA Delivery of Antisense Therapies RNA Interference Uses Antisense RNA to Silence Gene Expression MicroRNAs Modulate Gene Expression Applications of RNAi for Studying Gene Expression Noncoding RNAs T ake Part in RNA Processing Riboswitches Are Contolled by Efector Molecules RNA Catalyzes Enzyme Reactions Allosteric Deoxyribozymes Catalyze Specifc Reactions Engineering Allosteric Riboswitches and Ribozymes

slide 137:

RNA-Based Technologies 132 NONCODING RNA PL YS MANY ROLES RNA plays a multifaceted role in biology that is adaptable for many different applica- tions in biotechnology. The most widely understood role of RNA is in protein synthe- sis which includes messenger RNA mRNA transfer RNA tRNA and ribosomal RNA rRNA see Chapter 2. However RNA plays many other roles. Several small RNAs such as snRNA snoRNA and gRNA take part in RNA processing by removing introns. Some RNA sequences can catalyze enzyme reactions. Ribozymes as they are called are found in many organisms catalyzing cleavage and ligation of various substrates. Between the increased speed and accuracy of sequencing and a heightened awareness of RNA in the cell an ever-increasing number of roles has been found for RNA in the regulation of gene expression and in cell defense. Entirely new classes of noncoding RNAs ncRNAs have been discovered and characterized. Table 5.1 summarizes the major RNA classes and their functions. Indeed several classes of regulatory RNA modulate gene expression at the stage of trans- lating mRNA into protein rather than transcribing DNA to give mRNA. For example in some organisms antisense RNA controls protein translation. Antisense RNA binds to the complementary mRNA and blocks translation. From this discovery came the potential use of antisense RNA to block or attenuate synthesis of proteins that cause various diseases. Several of the RNAs in Table 5.1 are subclasses of antisense RNA. For example microRNA is found in eukaryotes where it often regulates development and cellular differentiation and many small bacterial regulatory RNAs act via an antisense mechanism. RNA also takes part in defending the cell against foreign genetic elements including viruses plasmids and transposable elements. In eukaryotes RNA interference RNAi plays a major role in protecting against RNA viruses. Here noncoding small-interfering RNAs siRNA identify specifc mRNAs and trigger their degradation. This fortuitous fnd - ing opened the door to a specifc technique for controlling protein translation. Since RNA interference was discovered in 1993 its application has become widespread. Bacteria lack RNA interference but instead possess the CRISPR system that uses small RNAs crRNA to identify and combat both DNA and RNA viruses. CRISPR acts by a mechanism quite distinct from RNA interference but still very useful in biotechnology. CRISPR RNAs can be introduced into a eukaryotic cell in order to make small deletions in endogenous genes or to insert different tags or markers e.g. GFP FLAG HA in specifc genes. The use of CRISPR in genome editing is described in Chapter 17 and the basic process is explained later in this chapter. This chapter presents examples of how RNA affects genome defense transcription RNA processing protein translation and enzyme function and it focuses on applications of these different categories in biotechnology. In addition to taking part in translation noncoding RNA plays many roles in molecular biology. Several classes of RNA have found major application in biotechnology. Antisense RNA and RNA interference regulate gene expression and the CRISPR system is used in genetic engineering. RNA COORDINATES GENOMIC INTEGRITY IN EUKARYOTES RNA plays several roles in maintaining genome stability in eukaryotes. It is required for the proper synthesis of chromosome ends telomeres and for dosage compensation in diploid animals. Suppressing the replication and movement of transposable elements in the germ- line also depends on RNA.

slide 138:

ChAPTER 5 133 Major Classes of RNA Class Abbreviation Size in Nucleotides Role Distribution Genomic Integrity and Protection Piwi interacting RNA piRNA 25–32 Transposon silencing in germline cells Eukaryotes Small-interfering RNA siRNA 22 Defense against foreign RNA Eukaryotes Telomerase RNA TERC 451 Synthesis of telomeres Eukaryotes CRISPR RNA crRNA 24–48 Defense against foreign RNA and DNA Bacteria plus Archaea Xist RNA __ 17000 X chromosome inactivation Eukaryotes Transcription Antisense RNA aRNA 19–25 Genetic regulation All organisms Enhancer RNAs eRNAs 200–500 Genetic regulation Eukaryotes 6S RNA 6S RNA 184 E. coli Regulating transcription Bacteria Micro RNA miRNA 22 Regulating mRNA degradation and translation Eukaryotes Circular RNA circRNA 1000 or more Regulation of miRNA abundance Eukaryotes Long noncoding RNA lncRNA Wide range Various regulatory roles Eukaryotes Small RNA regu- lators sRNA 300 Gene regulators various mechanisms Bacteria RNA Processing Guide RNA gRNA Editing of mRNA Protozoa Small nuclear RNA snRNA 100–300 Splicing of RNA Eukaryotes plus Archaea Small nucleolar RNA snoRNA 60–300 RNA nucleotide modifcation Eukaryotes plus Archaea T able 5.1 Continued

slide 139:

RNA-Based Technologies 134 Class Abbreviation Size in Nucleotides Role Distribution Protein Translation Messenger RNA mRNA Wide range Protein synthesis All organisms Transfer RNA tRNA 70–90 Protein synthesis All organisms Ribosomal RNA rRNA 120 160 1868 5025 Protein synthesis All organisms sizes shown are for higher animals Transfer-mes- senger RNA tmRNA Rescues stalled ribosomes Bacteria Riboswitch __ 40–140 Controls translation transcription or splicing of attached mRNA All organisms very rare in eukaryotes Dual-function RNA __ Protein coding plus various regulatory roles All organisms Enzymatic Function Ribozymes __ 250 Function as enzymes All organisms Signal recognition particle RNA 7SL RNA or SRP RNA 300 Membrane insertion of proteins All organisms Major Classes of RNA—cont’d T able 5.1 Eukaryotic chromosomes consist of a linear DNA molecule with special sequences called telomeres at each end. During DNA replication the ends of chromosomes cannot be replicated since DNA polymerase cannot synthesize DNA without a pre-existing 3 ′ OH. During a typical replication round DNA synthesis begins with an RNA primer created by RNA polymerase that supplies the 3 ′ OH group. At the ends the scenario is different. Telomerase is an enzyme that uses an RNA component TERC to regenerate the ends that are not created during replication thus maintaining the chromosome structure. Rather than a primer the RNA component acts as a template to actually increase the length of the ends. Without the RNA component of telomerase the ends of chromosomes shorten and eventually lead to chromosomal fusions and deletions. In addition mutations in either the protein or RNA portion of telomerase are associated with cancers and diseases such as dyskeratosis congenita. The biology of telomere maintenance in relationship to aging is discussed in Chapter 20. Telomerase consists of the RNA template TERC plus the telomerase reverse transcriptase subunit TERT protein Fig. 5.1. When the TERC RNA folds into its proper secondary struc- ture the RNA template sequence is near the reverse transcriptase binding region. From this core unit three different arms jut out and interact with other accessory proteins that stabilize the structure. The RNA component provides the scaffold for proper telomerase assembly as well as the template sequence.

slide 140:

ChAPTER 5 135 Gene dosage compensation occurs to equalize the amount of proteins produced from genes on the sex chromosomes in diploid organisms. In both insects and humans females have two copies of the X chromosome whereas males have only one copy. Insects and mammals both compensate for this but by completely different mechanisms. One common factor is that both mechanisms rely on special RNA molecules. In the fruit fy Drosophila the male XY equalizes expression in com- parison to the female by doubling gene expression from the single X chromosome in males. In Drosophila two noncoding RNAs called roX1 and roX2 complex with fve different pro - teins to form the MSL complex. The complex then binds to the genes on the male X chromosome and increases transcription. In humans and other mammals the second X chromosome in females is inactivated. Thus males and females both essentially function with only one active X chromosome. The inactivation is due to a long noncoding RNA called Xist which coats the inactive X chromosome. The Xist gene of the active X chromosome is inactivated by methylation and the Xist gene on the inactivated X chromosome is transcribed. Expressing the Xist gene thus inactivates the X chromosome that carries it. Furthermore an antisense RNA Tsix which is transcribed from the Xist locus but in the reverse direction regulates the expression of the Xist gene on the active X chromosome. Using genome editing to move the Xist gene to another chromosome which is then shut down is being considered as a possible approach to curing Down syndrome see Chapter 17. Piwi-interacting RNAs piRNA are another a class of small RNAs essential to maintaining the genome in eukaryotes. piRNA are 24–30 nucleotides in length have a monophosphate group preferably attached to a uridine at the 5 ′ end and have a 2 ′ O-methyl group at the 3 ′ end Fig. 5.2. The piRNAs are encoded in the genome and are found in large clusters or within the introns of other genes. They are complementary in sequence to endogenous transposons which are clustered in the centromere area or the telomere regions. When piRNA gene clusters are expressed into RNA members of the Argonaut protein family recognize the piRNA cleave it into small pieces and then use these pieces as single- stranded templates to bind and cleave any complementary RNA produced by the transposon. This action prevents endogenous transposons from moving to new locations. The arrangement of the eukaryotic genome within the nucleus was origi- nally thought to be an amorphous soup FIGURE 5.1 Arrangement of Pro- teins around the RNA Core of Telomerase Telomerase reverse tran- scriptase or TERT is the major telomerase protein and contains the active site for DNA synthesis that uses the TERC RNA as a template. Several other proteins are needed for stability. The pro- tein names shown here are those for human telomerase. TERC RNA TERT Template TCAB1 Dyskerin GAR1 NHP2 NOP10 FIGURE 5.2 Role of Piwi-Interacting RNAs Genomic clusters of piRNA are transcribed into long RNA precursors. They are cleaved into shorter piRNA precursor molecules. After the PIWI complex binds these molecules they are trimmed to generate the fnal piRNA. The PIWI complex then uses piRNA as a template to locate and silence sequences derived from transposable elements. Two variants of the PIWI complex exist: one special- ized for nuclear silencing and the other for cytoplas- mic silencing. 3 5 TRANSCRIPTION RNA precursor CUTTING LOADING ONTO PIWI 3 TRIMMING TERMINAL METHYLATION Mature PIWI complex piRNA gene cluster PIWI SAH SAM 2–OCH 3 Hen1 3 5

slide 141:

RNA-Based Technologies 136 The antiviral defense systems RNA interference in eukaryotes and CRISPR in prokaryotes operate using distinct mechanisms but both rely on noncoding RNA. of chromatin fbrils. However the interior of the nucleus is highly organized. Ribosome assembly is localized in the nucleolus a spherical structure within the nucleus. Further- more the use of chromosome specifc markers has revealed that each chromosome is found in a specifc domain. Recent studies have shown that expressed genes in the form of euchromatin are localized to the central core of the nucleus while nontranscribed DNA as heterochromatin occupies the region closest to the nuclear envelope. This organization is critical to function and the role of RNA especially lncRNA in mainte- nance of the structure is only starting to be understood. These transcripts are produced from regions of the genome then stay within the nucleus and partition to the chroma- tin suggesting they play a role in chromatin structure or regulation. Several classes of RNA promote genome integrity in eukaryotes. Maintenance of telomeres control of gene dosage and nuclear organization all involve noncoding RNA. In addition piRNA plays a major role in protecting the genome during reproduction. RNA PROTECTS GENOMES FROM INV ADING VIRUSES In addition to promoting internal genome stability RNA is involved in protecting against exter- nal genetic elements especially viruses. In eukaryotes RNA interference is the major mechanism of RNA-mediated virus protection whereas in prokaryotes the CRISPR system operates instead. RNA interference is discussed later since it shows many features in common with the microRNA system that regulates cellular genes. Indeed the two systems probably share a common evolu- tionary origin. RNA interference protects only against viruses with RNA genomes but not DNA viruses. The CRISPR system is found in both bacteria and Archaea but not in eukaryotes. It varies considerably in its components among different bacteria and is not found in all species. CRISPR differs in its components and mechanism from RNA interference. Moreover CRISPR can protect against viruses with RNA or DNA genomes as well as hostile plasmids and trans- posons. In consequence CRISPR has been applied to genome editing see Chapter 17. Here we outline the basic mechanism of the CRISPR system. CRISPR is based on memory. The CRISPR system stores an array of short sequence frag- ments derived from foreign genetic elements. CRISPR which stands for clustered regularly interspaced short palindromic repeats refers to the way foreign genetic sequences are stored on the bacterial chromosome. When nucleic acids appear whose sequences contain matches to those stored they are destroyed Fig. 5.3. Both DNases and RNases are present among the CAS proteins and thus the CRISPR system can defend bacteria against both RNA viruses and DNA viruses. It also prevents the entry into bacteria of foreign plasmids and transposons. There is considerable variation in the enzyme components of the CRISPR system between different bacteria. Some bacteria lack the CRISPR system entirely some have very simple systems and others have multiple CRISPR arrays with many different degradative enzymes. Bacteria that occupy environments where there is a major threat from viruses tend to have the more complex CRISPR systems.

slide 142:

ChAPTER 5 137 RNA MODULTES TRANSCRIPTION All organisms regulate gene transcription during development. In addition transcription controls homeostasis of organisms coordinating the proper protein complement for each environment or condition the organism experiences. Proteins known as transcription factors bind immediately before the gene a region termed the promoter to activate or repress RNA polymerase. In addition to short-range control of the gene via the promoter chromatin can loop around so that enhancers that are thousands of base pairs away connect to the transcriptional machinery and activate transcription. RNA controls and modulates gene expression too which adds a whole extra layer of complexity to gene expression. In bacteria a variety of small RNA sRNA molecules take part in genetic regulation. Most of them act by using an antisense mechanism and they bind to mRNA to prevent its translation. However various other mechanisms are also found. Some sRNA molecules bind to mRNA but activate translation by altering the secondary structure. Other sRNA molecules act via binding to proteins. In eukaryotes there is a much greater number and variety of regulatory RNAs. MicroRNAs are short RNAs that act via an antisense mechanism to prevent translation or promote degrada- tion of mRNA. A variety of longer RNAs e.g. enhancer RNA circular RNA lncRNA are also involved in regulation. The role of many of these is as yet poorly characterized. FIGURE 5.3 Overview of the CRISPR System Foreign DNA sequences are stored as an array on the bacterial chromosome sepa- rated by identical repeats. This region is transcribed into a long RNA and then processed into smaller individual RNA guides crRNA. The CAS nucleases use these guides to fnd and destroy intruding foreign nucleic acids both RNA and DNA. TRANSCRIPTION VIRUS DNA OR RNA Part of bacterial chromosome Unique virus sequences Repeat Repeat Repeat CAS genes CAS proteins including nucleases Cutting by CAS proteins Virus sequence recognized by CRISPR RNA Cutting by CAS nucleases RNA Repeat Noncoding RNA takes part in regulating transcription in both bacteria and eukaryotes although by differ- ent mechanisms.

slide 143:

RNA-Based Technologies 138 A T T A G C C G C G G C AUGCCGUAAUUCG G T A A T A T T A T A C G G C G C Sense 3 5 5 3 DNA Antisense T A A U C G G C G C C G A U T A T A A 5 5 3 3 5 3 Normal mRNA U A U G C C G C G A U T A G C C G C G G C T A A U A U T A T A C G G C G C TRANSCRIBE SENSE TRANSCRIBE ANTISENSE AUGCCGUAAUUCGG CCGAAUUACGGCAU 5 3 Antisense mRNA UACGGCAUUAAGCC NORMAL ANTISENSE RNA CAN ANNEAL FIGURE 5. 4 Antisense RNA Is Complementary to Messenger RNA Transcription from both strands of DNA creates two different RNA molecules—on the left the messenger RNA and on the right antisense RNA. These two have complementary sequences and can form double-stranded RNA. Antisense RNA Modulates mRNA Expression Antisense refers to the orientation of complementary strands during transcription. The two complementary strands of DNA are referred to as sense coding or plus and antisense noncoding or minus see Chapter 2. Transcription uses the antisense strand as template resulting in an mRNA that is identical in sequence to the sense strand except for the replacement of uracil for thymine. Antisense RNA is synthesized using the sense strand as template therefore it has a sequence complementary to mRNA Fig. 5.4. Antisense RNA is made in normal cells of many different organisms including humans. Artifcial antisense RNA is also made for manipulating gene expression in labora - tory settings. When a cell has both the mRNA i.e. the sense strand of RNA plus a

slide 144:

ChAPTER 5 139 complementary antisense copy the two single strands anneal to form double-stranded RNA. The duplex can either inhibit protein translation by blocking the ribosome bind- ing site or inhibit mRNA splicing by blocking a splice site Fig. 5.5A. When antisense sequences are made in the laboratory they are usually synthesized as DNA because this is more stable than RNA see Chapter 4. In this case the DNA:RNA duplex is digested by RNase H see Fig. 5.5B. RNase H is a cellular enzyme that normally functions during rep- lication. It recognizes and cleaves the RNA backbone of a DNA:RNA duplex targeting the antisense DNA:mRNA duplex for further degradation. RNase H recognizes a 7-base-pair heteroduplex so the region of homology between the antisense DNA and target mRNA need not be very long. Antisense RNA sequences are complementary to a target mRNA. Antisense RNA forms double-stranded regions that block either protein translation or splicing of introns. Antisense RNA Controls a Variety of Biological Phenomena Naturally occurring antisense genes have been found that control a variety of different processes. When antisense genes are transcribed they produce an RNA molecule that is complementary to the mRNA of their target genes. One example of natural antisense control is found in Neurospora. This fungus follows a strict schedule based on circadian rhythms and forms hyphae only at specifc times during the day. Many mutants have been identifed that do not follow this timetable. The genes affected by these mutations are regulators in the circadian rhythm of Neurospora. One of the frst to be identifed was frequency frq. Mutations in this gene change how often the fungus forms hyphae. The amount of normal frq mRNA fuctuates with highest levels during the day and lowest at night. Conversely antisense frq RNA also cycles but in reverse with the lowest levels during the day and the highest during the night. Although the exact mechanism is uncertain Neurospora that do not produce antisense frq RNA have disrupted circadian rhythms. In addition both the antisense and sense mRNAs are induced by light and therefore respond directly to the environment to maintain the correct circadian rhythm. Using antisense to regulate gene expression is so widespread in nature that scientists became curious how many potential antisense/sense partners exist in various genomes. In the human genome 20–40 of all protein coding genes also have antisense part- ners. They can be complementary to promoter introns exons and even the 3 ′ UTR region of the gene. The position of the antisense gene within the genome categorizes the antisense transcript as cis or trans where cis antisense partners are found adjacent to or within the complementary gene and trans refers to antisense partners that are found at different locations in the genome. Related to natural antisense transcripts are small noncoding regulatory RNAs called microRNAs miRNAs which inhibit gene expres- sion through an antisense mechanism see later discussion. Using computer searches around 1000 potential microRNAs have been identifed in humans but because these are only about 20 nucleotides long identifying them conclusively by computer is very diffcult. EXAMPLES OF NATURAL ANTISENSE CONTROL n Control of Circadian Rhythm in Neurospora: The time of day controls when the fungus forms hyphae by regulating the antisense and sense mRNA for the frq gene.

slide 145:

RNA-Based Technologies 140 FIGURE 5.5 Antisense RNA Blocks Protein Expression A The complementary sequence of antisense RNA binds to specifc regions on mRNA. This can block the ribosome binding sites or splice junctions. B Anti- sense DNA targets mRNA for degradation. When antisense DNA binds to mRNA the heteroduplex of RNA and DNA triggers RNase H to degrade the mRNA. Splice sites Ribosome binding site Sense mRNA Ribosomes cannot bind Splicing factors cannot bind + + 3 3 3 5 5 5 Antisense mRNA A B mRNA mRNA 3 5 3 5 Antisense DNA 5 3 3 5 3 3 5 5 5 3 3 5 mRNA 3 5 3 5 5 3 RNase H

slide 146:

ChAPTER 5 141 n Iron Metabolism in Bacteria: FatB/RNAα are sense/antisense partners that control regulation of iron uptake in the fsh pathogen Vibrio anguillarum. When iron is plenti- ful higher amounts of RNAα prevent fatA and fatB expression. When iron is scarce the bacteria need to get iron from the environment. RNAα is degraded and fatA and fatB are expressed so Vibrio can ingest iron. n Control of HIV-1 Gene Expression: Antisense env mRNA binds to the Rev Response Element RRE on env mRNA. When antisense blocks the RRE Env protein is not pro- duced. When antisense env mRNA is absent Env protein is produced. n Control of Eukaryotic Transcription Factors: The transcription factor hypoxia-induced factor HIF-1 is a basic helix–loop–helix dimeric protein that turns on genes associ- ated with oxygen and glucose metabolism including glucose transporters 1 and 3 and enzymes of the glycolytic pathway. Antisense mRNA to the α subunit mRNA controls the expression of the transcription factor. The level of antisense RNA is modulated by the amount of oxygen in the environment. n Control of RNA Editing: Antisense/sense loops are formed between complemen- tary exon and intron sequences of the gene for the glutamate-gated ion channel in human brain. These loops are recognized by dsRNA-specifc adenosine deaminase DRADA which converts adenosine to inosine by deamination. This alters the sequence of the fnal mRNA and hence of the protein thus reducing the permeability of the ion channel. n Alternate Splicing of Thyroid Hormone Receptor mRNA: Antisense RNA transcribed from the thyroid hormone locus inhibits splicing of the c-erbAα gene. Two alternately spliced transcripts give the authentic thyroid hormone receptor and a decoy receptor that does not bind thyroid hormone. These two forms modulate cellular responses to thyroid hormone. n Control of ColE1 Plasmid Replication: RNAI and RNAII mRNA are sense/antisense partners that prevent DNA polymerase from initiating plasmid replication. The amount of antisense RNAI controls how often replication is initiated. Organisms have antisense genes and microRNAs that bind to a target mRNA and prevent its transla- tion into protein. These modulate a large number of systems including hyphae formation in Neurospora development replication and many more. ANTISENSE TRANSCRIPTS CAN INDUCE FORMATION OF HETEROCHROMATIN Trans acting antisense transcripts are often transcribed from pseudogenes and may suppress or activate the regular gene. One example is the gene for phosphatase and tensin homolog PTEN a tumor suppressor gene whose level of expression correlates to the severity of cancer. There is a pseudogene for PTEN PTENpg that produces three noncoding RNAs: PTENpg1 sense green PTENpg1 antisense α longer red line and β shorter red line Fig. 5.6. The PTENpg1 sense sequence is 95 identical to the PTEN gene even though it is made from a different gene. All three RNAs regulates PTEN but by two different mechanisms. The α antisense RNA converts the PTEN genomic region into heterochromatin repressing further transcription. Curiously this antisense RNA does not bind to the sense strand even though it is complementary in sequence. Instead the α antisense transcript attracts two chromatin-modifying proteins DNMT3A and EZH2 to compact the histones so that RNA polymerase cannot access the promoter. The β anti- sense transcript is shorter as it begins at an internal transcription start site. The β antisense RNA forms a complex with the third transcript from the pseudogene PTENpg1 sense. This resulting double-stranded RNA attracts miRNAs that are targeted against PTEN and therefore blocks gene expression see later discussion of miRNA function.

slide 147:

RNA-Based Technologies 142 DNMT3AEZH2 DNMT3A EZH2 DNMT3A EZH2 PTEN β β PTENpg1 sense PTENpg1 antisense Chr 10 RNP II RNP II PTEN PTEN mRNA PTEN RNP II RNP II Chr 9 asRNA α A B C EF G asRNA β α α FIGURE 5.6 PTEN Pseudogene Encodes Three Noncoding RNAs That Regulate PTEN Expression A The pseudogene for PTEN has three transcription start sites. Transcription of the top strand green produces PTENpg1 sense RNA. Transcription on the lower strand red produces two different forms a longer α antisense RNA and a shorter β antisense RNA. B–C PTENpg1 sense RNA and β antisense RNA asRNA β anneal over complementary areas. This duplex attracts miRNAs light blue and prevents the miRNAs from promoting the degradation of PTEN mRNA from the normal PTEN gene. E–G The α form recruits two chromatin modifcation enzymes DNMT3A and EZH2 to condense the histones purple spheres around the PTEN gene which excludes RNA polymerase RNP II and prevents transcription. Using Antisense RNA In the laboratory antisense RNA can be made by using two different methods Fig. 5.7. The easiest method is to chemically synthesize oligonucleotides that are complementary to the target gene. The oligonucleotides are then injected or transformed into the target cell see later discussion. Alternatively the gene of interest can be cloned in the opposite orientation so that transcription gives antisense RNA. The vector carrying the anti-gene is then trans- formed into the target organism. Full-length antisense RNA can be transcribed from a vector that has been inserted into the cell. See Chapters 15 and 16 for more details on inserting foreign DNA into plant and animal cells. First the target gene is cloned in reverse orientation so that antisense RNA is produced instead of sense mRNA see Fig. 5.7B. This method is believed to inactivate the cellular target mRNA by forming a heteroduplex of sense/antisense RNA. Heteroduplex formation relies on both RNAs to frst unfold. If either RNA has a very stable secondary or tertiary structure then the construct may not work inside the cell. The advantage of internal synthesis of antisense RNA is that the antisense expression can be controlled. If the antisense gene is cloned behind an inducible promoter then the antisense RNA is not made until the gene is induced by specifc signals or conditions. This capability may be useful to allow organ-specifc expression of an antisense gene. Another advantage is that the antisense RNA may be continuously expressed internally over a long-term period. This avoids the inconvenience and expense of constant administration of external antisense oligonucleotides.

slide 148:

ChAPTER 5 143 Cellular mRNA Synthesized oligonucleotide CGUAUGCCAUUGCAUCGAAAAA AACGTAG 3 5 3 5 A B Cellular mRNA mRNA Antisense mRNA ANTISENSE GENE IN VECTOR ANTISENSE SENSE mRNA ANNEAL CGUAUGCCAUUGCAUCGAAAAA 3 5 AAGAUGCGCCAAUUGCCGUAGUAA 3 3 5 5 AAGAUGCGCCAAUUGCCGUAGUAA UUCUACGCGGUUAACGGCAUCAUU 5 3 3 5 AACGTAG 3 5 A A C G T A G 3 5 A A C G T A G 3 5 Chromosome TTCTACGCGGTTAACGGCATCATT AAGATGCGCCAATTGCCGTACTAA AATGATGCCGTTAACCGCGTAGAA UUACUACGGCAAUUGGCGCATCUU TTACTACGGCAATTGGCGCATCTT FIGURE 5.7 Making Antisense RNA in the Laboratory A Antisense oligonucleotides. Small oligonucleotides are synthesized chemically and injected into a cell to block mRNA transla- tion. B Antisense genes. Genes are cloned in inverted orientation so that the sense strand is transcribed. This yields antisense RNA that anneals to the normal mRNA preventing its expression.

slide 149:

RNA-Based Technologies 144 In practice shorter chemically synthesized antisense oligonucleotides are more often used. In fact they are traditionally made of DNA rather than RNA for two reasons: DNA is more stable in the laboratory and DNA synthesis is an established and automated procedure. Inside the cell DNA oligonucleotides are still very susceptible to degradation by endo- nucleases therefore various chemical modifcations are added to increase stability. The most common modifcation is to replace one of the nonbridging oxygens in the phosphate groups with sulfur Fig. 5.8 to make a phosphorothioate oligonucleotide. This makes the phosphorus a chiral center one diastereomer is resistant to nuclease degradation but the NORMAL PHOSPHODIESTER O 5 3 O O Base n Base n+1 O O O − O O P PHOSPHOROTHIOATE O 5 3 O O Base n Base n+1 O O O − S O P RNA 2-O-METHYL OO 5 3 O O Base n Base n+1 O O O CH 3 CH 3 NH 2 CONH 2 NH O − O O P MORPHOLINO 5 3 O ON Base n P O N O O N Base n+1 PEPTIDE NUCLEIC ACIDS Base n N N O O O Base n+1 FIGURE 5.8 Modifcations to Oligonucleotides Replacing the nonbridging oxygen with sulfur upper left increases oligonucleotide resistance to nuclease degradation. Adding an O-alkyl group to the 2 ′ -OH on the ribose upper right makes the oligonucleotide resistant to nuclease degradation and also to RNase H. Morpholino-antisense oligonucleotides and peptide nucleic acids are two more substantial changes in the standard oligonucleotide structure lower left and lower right. Both are resistant to RNase H degradation. The RNase H-resistant oligonucleotides are used to target splice junctions or ribosome binding sites in order to prevent translation of their target mRNA.

slide 150:

ChAPTER 5 145 other is still sensitive leaving about half of the antisense molecules functional inside the cell. This modifcation does not affect the solubility of the oligonucleotides or their susceptibility to RNase H degradation. These types of antisense oligonucleotides have been developed to inhibit cancers such as melanoma and some lung cancers. The most common side effect with phosphorothioate oligonucleotides is nonspecifc interactions especially with proteins that interact with sulfur-containing molecules. Two other modifcations have fewer nonspecifc interactions than phosphorothioate oligo - nucleotides. Adding an O-alkyl group to the 2 ′ -OH of the ribose makes the oligonucleotide resistant to DNase and RNase H degradation see Fig. 5.8. Inserting an amine into the ribose ring thus changing the fve-carbon ribose into a morpholino ring creates morpholino- antisense oligonucleotides see Fig. 5.8. In addition to the morpholino ring a second amine replaces the nonbridging oxygen to create a phosphorodiamidate linkage. This amine neutralizes the charged phosphodiester of typical oligonucleotides. The loss of charge affects their uptake into cells but alternative methods have been developed to get these anti- sense molecules into the cells see later discussion. Both types of modifed oligonucleotides are resistant to RNase H. Therefore they do not promote degradation of an mRNA:DNA hybrid target. Consequently their use is restricted to blocking splicing sites in the pre-mRNA transcript or to block ribosome binding sites. The most different modifed oligonucleotides are peptide nucleic acids PNAs which have the standard nucleic acid bases attached to a polypeptide backbone normally found in proteins rather than a sugar-phosphate backbone see Fig. 5.8. The polypeptide back- bone has been modifed so that the RNA bases are spaced at the same distance as the typical oligonucleotide. The spacing is critical to function because the bases of a PNA must match the bases in the target RNA. This molecule is also uncharged and works through non-RNase- H-dependent mechanisms as with morpholino antisense oligonucleotides. Antisense PNA has been developed to inhibit translation of the HIV viral transcript gag-pol and to block translation of two cancer genes: Ha-ras and bcl-2. As noted earlier antisense oligonucleotides that are resistant to RNase H must be made to target splice sites and/or ribosome binding sites in order to block the target mRNA. In some cases these sequences are not well characterized in the target mRNA so modifed antisense oligonucleotides become useless. Making mixed or chimeric antisense oligonucleotides can restore the targeting of RNase H to the mRNA while allowing the use of modifed structures to prevent degradation see later discussion. In these chimeric antisense oligonucleotides the core has a short ∼7 base-pair span of phosphorothioate linkages which are RNase H sensitive fanked on each side by sequences consisting of one of the RNase H-resistant modi - fcations Fig. 5.9. The fanking regions contain 2 ′ -O-methyl groups morpholino structures or even PNA. These chimeric molecules can target any accessible regions of the mRNA not just splice sites or ribosome binding sites. A target gene can be cloned in the inverse direction to create an antisense gene that can be carried on a vector. Alternatively shorter antisense oligonucleotides can be made artifcially. The antisense RNA and endogenous mRNA will bind thus preventing translation of target mRNA into protein. Various structural modifcations are incorporated to stabilize artifcially made antisense RNA. Delivery of Antisense Therapies Getting antisense oligonucleotides into cells requires special techniques because they do not cross cell membranes easily enough on their own to be effective. Moreover targeting the antisense oligonucleotide to the correct intracellular location poses a further obstacle. Although the natural uptake of oligonucleotides occurs by an unknown mechanism the process is active and depends on temperature oligonucleotide concentration and cell type.

slide 151:

RNA-Based Technologies 146 Because oligonucleotides are highly charged they cannot cross lipid membranes and are probably taken up by endocytosis. This however results in the oligonucleotide trapped inside the uptake vesicle rather than free in the cytoplasm. Escape from these vesicles is slow and poorly understood. It has been suggested that oligonucleotides may also enter via membrane-bound receptors but this suggestion is controversial. A common method to deliver oligonucleotides to cells is to use liposomes Fig. 5.10A. Liposomes are small vesicles made of bilayers of phospholipids and cholesterol. Whether the liposome is neutral or positively charged depends on the type of phospholipid used to manufacture it. The oligonucleotides ride on the exterior if the liposome is posi- tively charged or reside in the aqueous interior if neutral. Positively charged liposomes are drawn to the cell surface because it is negatively charged and the entire liposome oligonucleotides and all is engulfed by endocytosis. Some liposomes contain “helper” molecules that make the endosomal membrane unstable and release the liposome directly into the cytoplasm. Other delivery “vehicles” are cationic polymers which include poly-L-lysine and polyethylenimine. They operate via electrostatic interactions as discussed earlier but they are toxic when taken into the cell therefore they are not used very often. When the endosomal pathway is used for uptake as with liposomes there is a good chance that the antisense oligonucleotide will be degraded or not released to the cyto- plasm. To alleviate this problem antisense oligonucleotides may be attached to basic peptides see Fig. 5.10B. They include the Tat protein of HIV-1 the N-terminal segment of HA2 subunit of infuenza virus agglutinin protein and Antennapedia peptide from Drosophila which normally acts as a transcription factor. These peptides are able to enter the cell nucleus. When they are attached the antisense oligonucleotides are taken directly into the nucleus. Other methods to get oligonucleotides into the cells require chemically or manually disrupt- ing the membrane. Membrane pores can be generated by streptolysin O permeabilization see Fig. 5.10C or electroporation see Chapter 3. Streptolysin O is a toxin from Streptococci bacteria that aggregates after binding to cholesterol in the membrane forming a pore. The oligonucleotide passes through the pore and enters the cytoplasm directly. Antisense oligo- nucleotides can also be microinjected directly into each cell but this method cannot be used for treating patients and is useful only for small-scale experiments on cultured cells Fig. 5.1 1A. Another mechanical method is called scrape-loading see Fig. 5.1 1B. Here adher- ent cultured cells are gently scraped off the dish while the oligonucleotide bathes the cells. Removal of the cells probably creates small openings that allow the oligonucleotides to enter the cytoplasm. 5 3 Morpholino backbone RNaseH R Phosphorothioate backbone RNaseH S CHIMERIC ANTISENSE OLIGONUCLEOTIDE Morpholino backbone RNaseH R FIGURE 5.9 Chimeric Oligonucleotides with RNase H-Sensitive Cores Chimeric oligonucleotides are made using different chemistries. The core region maintains RNase H sensitivity whereas the outer regions are RNase H resistant. When the oligonucleotide hybridizes to target molecules in the cell RNase H will digest the hybrid of oligonucle- otide plus mRNA only where the central domain forms a heteroduplex. RNase H will not digest any nonspecifc complexes between the chimeric oligonucleotide ends and the wrong mRNA.

slide 152:

ChAPTER 5 147 FIGURE 5.10 Methods of Antisense Oligonucleotide Uptake by Cells A Liposomes are spherical structures made of lipids and cholesterol. Oligonucleotides are either encapsulated in the central core or ride on the exterior surface of the liposome. The complexes enter the cell via endocytosis and are released into the cytoplasm. B Basic peptides are naturally occurring proteins that normally enter the nucleus of target cells. The oligonucleotide can be fused to these peptides and ride into the nucleus with the basic peptide. C Streptolysin O a toxin from Streptococci bacteria aggregates at the membrane to form a pore- like structure. The oligonucle- otides can pass into the cell through the pore. LIPOSOMES BASIC PEPTIDES STREPTOLYSIN O Oligonucleotide Peptide Liposome Nucleus Endosome Nuclear localization Streptolysin O Streptolysin O pore Cholesterol A B C Antisense oligonucleotides enter the target cell by endocytosis of oligonucleotide-flled liposomes by riding on basic peptides that normally enter the nucleus by passing through pores created by streptolysin O by microinjection or by mechanical shearing. RNA Interference Uses Antisense RNA to Silence Gene Expression RNA interference RNAi is a pathway for gene regulation where short double-stranded RNA dsRNA segments trigger an enzyme complex to degrade a target mRNA. In essence the short dsRNA pieces decrease target protein expression by degrading its cor- responding mRNA. RNAi was discovered in a variety of different organisms including plants fungi mammals fies and worms. Different organisms have variations of the same basic response. Mutations in the enzymes responsible for RNAi affect a wide range of cellular processes. Some affect development of the organism others affect the ability to fend off viruses particularly RNA viruses. In still other cases mutations affecting RNAi

slide 153:

RNA-Based Technologies 148 increase transposon movement suggesting that RNAi may also prevent transposon jump- ing. All these processes rely on regulating mRNA translation or mRNA degradation. RNAi occurs in two different stages: the initiation phase and the effector phase. Initiation begins with the formation of the shortened dsRNA. The full-length dsRNA can arise from three main sources. First externally infecting RNA viruses replicate through a dsRNA inter- mediate which can trigger RNAi. One theory suggests that the RNAi mechanism may have evolved to combat these infecting viruses see Box 5.1. Another source is the organism’s own genomic DNA which contains sequences that code for microRNAs specifc mediators of RNAi see later discussion. Finally dsRNA can be produced from aberrant transcrip- tion of a genetically engineered gene Fig. 5.12A. After the cellular enzymes recognize the dsRNA an endonuclease called Dicer cuts the dsRNA into small fragments about 21 to 23 nucleotides in length called small-interfering RNAs siRNAs see Fig. 5.12B. Dicer is MICROINJECTION SCRAPE LOADING Needle Small hockey stick coated with rubber DISH WITH CELLS GROWING ON BOTTOM CELL WITH BROKEN MEMBRANE FROM SCRAPING SCRAPE CELLS OFF DISH ADD OLIGONUCLEOTIDES Cell Nucleus A B FIGURE 5.11 Microinjection and Scrape-Loading A Oligonucleotides can be injected directly into a cell using a very fne needle. Microinjection can be done on individual cells grown in culture. B Scrape-loading is a mechanical method of getting oligonucleotides into cultured cells. As the cells are scraped off the bottom of the culture dish the membranes break open allowing the oligonucleotides to enter. When the cell membrane reseals the oligonucleotides are trapped within the cells.

slide 154:

ChAPTER 5 149 It is well established that plants use RNAi to protect themselves from viruses. Virus-derived siRNAs are produced when the plant is infected with either DNA or RNA viruses. Plants with mutations in various RNAi components are more susceptible to viral diseases. In plants the RNAi signal spreads to uninfected regions thus protecting the neigh- boring tissues. Finally some plant viruses have genes/proteins that suppress the RNAi pathway. In mammals RNAi plays a lesser role in protection from viruses. When a mammal is infected with a virus potent immune responses protect the organism from many different infections. Cellular pro- teins such as toll-like receptors protein kinase R and retinoic acid-inducible gene I are activated by virus entry. These proteins activate many different genes most notably type I interferons and nonspecifc RNases. These genes work in unison to fght the infection. Mammals cannot spread the RNAi signal to uninfected tissues as do plants and invertebrates. Thus RNAi is not the major mechanism for antiviral defense in mammals. Nonetheless recent evidence does sug- gest that RNAi helps to limit virus invasion of mammalian cells. First some proteins from mammalian-specifc viruses target RNAi proteins. For example NS1 from infuenza virus binds to siRNA in vitro and sup- presses RNA silencing when expressed in plants. Another viral protein T at from HIV has been shown to inhibit purifed Dicer in vitro. The experimental work that uses siRNA and plasmid-encoded shRNA to block viral infections in mammalian cells is the most con- vincing. Many studies have found that administering siRNA or shRNA to animal models reduces virus replication and protects the organ- ism from lethal infections. So although mammalian cells have other defense mechanisms activating the RNAi system does protect against viral assaults. Box 5.1 Does RNAi Protect Mammalian Cells from Viral Infections a dsRNA-dependent RNA endonuclease that belongs to the RNase III family. The siRNAs have a two-nucleotide overhang on the 3 ′ ends characteristic of RNase III–type enzymes. The 5 ′ ends are phosphorylated by a kinase associated with Dicer making the siRNAs com- petent for the next phase. In the effector phase Dicer transfers the siRNA to a ribonucleoprotein complex called the RNA-induced silencing complex RISC. RISC is activated by the siRNAs and uses an RNA helicase to unwind the double-stranded fragments making single strands. The antisense single-stranded siRNA is then kept as a guide to fnd complementary sequences in the cyto - plasm. When RISC binds complementary sequences the Argonaut AGO family member associated with the RISC complex cleaves the target mRNA which is then degraded by exo- nucleases in the cytosol. This destroys all of the mRNA that is complementary to the siRNA. Both Dicer and the RISC complex are dependent on ATP for energy. The antisense siRNA specifes which mRNA is targeted ensuring that no nonspecifc mRNAs are degraded. RNAi does not require many molecules of siRNA. In fact as few as 50 copies of siRNA may destroy the entire cellular content of target mRNA. The ability to target so many mRNA mol- ecules with so few siRNA copies relies on amplifcation by the enzyme RNA-dependent RNA polymerase RdRP which creates dsRNA. RdRP uses the cleaved target mRNA as template to synthesize more dsRNA. Dicer recognizes the new dsRNA and cleaves it into more siRNA thus amplifying the number of siRNA molecules Fig. 5.13. The fnal aspect of RNAi is its ability to modulate DNA expression by converting copies of the target gene into heterochromatin Fig. 5.14. The siRNA can direct the heterochromatin- forming enzymes and proteins to the target gene location. Once the open expressed DNA conformation is converted into heterochromatin no more mRNA is produced. Therefore RNAi can repress gene expression permanently. RNAi has two phases: The initiation phase forms double-stranded RNA approximately 21 to 23 nucleo- tides long called siRNA and the effector phase makes the double-stranded siRNA into a single-stranded template that searches out complementary mRNA and destroys them.

slide 155:

RNA-Based Technologies 150 A B dsRNA RNA viruses have dsRNA intermediates during replication MicroRNAs are transcribed from genes in the genome Misfolded mRNAs have double-stranded RNA 5 5 3 3 dsRNA region SOURCES OF dsRNA RNA INTERFERENCE Transgene DICER CLEAVES dsRNA INTO siRNAS RISC UNWINDS siRNAS RISC BINDS TO COMPLEMENTARY MESSAGE AND DEGRADES IT PP P P P P P P FIGURE 5.12 Cellular Mechanism of RNAi A Double-stranded RNA triggers RNA interference. dsRNA is produced by RNA viruses during infections microRNA encoded by the genome or overexpres- sion of transgenes. B RNA interference degrades all the RNA that is complementary to segments of double-stranded RNA. First Dicer recognizes dsRNA and cuts it into pieces of 21–23 nucleotides. A kinase phosphorylates the 5 ′ end of each piece. Next RISC unwinds the siRNAs and uses one strand to search out complementary mRNA which is degraded by associated enzymes.

slide 156:

ChAPTER 5 151 FIGURE 5.13 Amplifcation of RNAi After RISC-associated enzymes cleave the complementary mRNA in a cell another enzyme RNA- dependent RNA polymerase binds to some of these fragments. RdRP synthesizes complementary strands mak- ing more double-stranded RNA. Dicer recognizes these fragments and creates more siRNA. 5 5 5 3 3 3 PP P P P P P RISC RISC CLEAVES TARGET mRNA MORE dsRNA IS SYNTHESIZED DICER MAKES MORE siRNAS RNA-DEPENDENT RNA POLYMERASE BINDS TO RNA FRAGMENT ACTIVATES MORE RISC TO DEGRADE TARGET mRNA 5 5 3 3 5 3 RdRP RNAi IN PLANTS AND FUNGI RNAi was actually frst observed in plants where it was named post-transcriptional gene silencing PTGS. This phenomenon was frst noted when some early trans - genic experiments in plants gave strange results. When an extra copy of a gene was inserted to increase protein production both the inserted gene i.e. the transgene and the resident gene were silenced. The result was a plant that made less of the target protein rather than more. For example in 1990 researchers inserted a gene to make petunia fowers a darker purple. Instead the plant made white fowers. Both the transgene and the endogenous gene were suppressed leaving the fower without any pigment. A similar phenomenon was seen in the fun- gus Neurospora where it was called quelling. After the discovery of RNAi in Caenorhabditis elegans it was recog- nized that RNAi PTGS and quelling all operate via the same mechanism. These processes all act after the stage of transcription. Thus plenty of mRNA was produced from the silenced transgenes at least initially. After some time the mRNA for the transgene was found in two fragments suggesting an endonuclease cleaved it in two. Later the target mRNA was found in smaller and smaller fragments suggesting that exonucleases were digesting the large mRNA segments. Finally the genes were converted into heterochromatin and transcription was shut down. How does an extra copy of a gene induce a system that is triggered by dsRNA One theory is that overproduc- tion of certain mRNAs triggers RdRP to make dsRNA from the excess. This dsRNA activates Dicer to create siRNAs that quench mRNA both from the transgene and from any closely related endogenous gene. Alternatively when certain transgenes are expressed some regions of the mRNA may fold back on themselves to form hair- pins. These double-stranded segments may also activate Dicer. Genetic analysis of the model plant Arabidopsis has shown that the RdRP encoded by the SDE1 gene is necessary for transgene silencing but is not needed for antiviral RNAi. In the latter case the virus RNA polymerase would make dsRNA and the plant RdRP enzyme would therefore not be necessary. This favors the frst model for transgene-triggered silencing. The most interesting aspect of PTGS is the ability of silencing to propagate from one part of the plant to the next. Plants can be grafted that is a leaf or stem can be attached to a differ- ent plant. If the grafted piece has a transgene silenced by PTGS this will also silence the cor- responding endogenous gene. The effect of RNAi then travels through the vascular system of the plant and affects regions without the transgene. RNAi in C. elegans also has the ability to spread not only from tissue to tissue but also from parent to progeny. In contrast mammals lack the ability to spread the RNAi signal. The ability of RNAi to spread may not rely solely on siRNA movement. In plants the poty- viruses produce an inhibitor of RNAi called helper component proteinase Hc-Pro. This protein blocks the accumulation of siRNA. Despite this the RNAi signal still spreads to other parts of the plant and triggers methylation of DNA thus turning it into heterochromatin. Other viral genes that inhibit different steps of the RNAi process will it is hoped illuminate the mechanism of spreading.

slide 157:

RNA-Based Technologies 152 Other terms used to describe variants of RNAi are transcriptional gene silencing cosup- pression and virus-induced silencing. Transcriptional gene silencing refers to the silencing of gene expression by converting the gene into heterochromatin. Cosuppression is an early name for PTGS. Virus-induced silencing occurs when the viral genome has a double-stranded RNA intermediate which triggers Dicer and RISC. A comprehensive term GENE impedance GENEi has been proposed to encompass all these phenomena but is rarely used. Early experiments in making transgenic plants identifed a phenomenon now called RNAi. RNAi is also known as post-transcriptional gene silencing PTGS quelling transcriptional gene silencing cosuppres- sion and virus-induced silencing. FIGURE 5.14 Heterochromatin Formation by RNAi The RISC complex contain- ing single-stranded siRNA can also recognize and bind to complementary DNA sequences. When RISC associates with a repeti- tive DNA element various histone-modifying enzymes and silencing complexes are activated to turn that region of DNA into heterochromatin thus silencing the region from any further expression. 5 3 P P RISC RdRP SYNTHESIZES SECOND STRAND DICER MAKES siRNA RISC BINDS REPETITIVE DNA HISTONE MODIFYING ENZYME SILENCING PROTEIN HISTONE METHYL TRANSFERASE FORM HETEROCHROMATIN 5 3 3 5 Repetitive DNA Histones Repetitive DNA P RISC Repetitive DNA Histone methyl transferase Histone methyl transferase Histone modifying enzyme Silencing complex P P P P P CH 3 CH 3 CH 3 CH 3

slide 158:

ChAPTER 5 153 FIGURE 5.15 Pathway of miRNA Inactivation of Gene Translation in Drosophila Two cleavage events create miRNAs. First the gene for the miRNA is transcribed into an RNA that folds into a stem loop called a pri-miRNA. Drosha cleaves pri-mRNA in the nucleus to create a pre-miRNA which is then competent to exit the nucleus. In the cytoplasm Dicer cuts the ends of the pre-miRNA to form a mature miRNA. In some organisms the miRNA does not have perfectly matched sequences and therefore has regions that bulge due to the mismatches. RISC complex creates the single-stranded template and searches the cytoplasm for any matching sequences. When a match is found the Argonaut component of RISC cuts the target mRNA mark- ing it for degradation. NUCLEUS Gene for miRNA Transcription Drosha pri-miRNA pre-miRNA 5 5 5 5 5 3 3 3 3 Nuclear pore Nuclear membrane CYTOPLASM Dicer miRNA RISC Argonaut Imperfect match AAAA 80S Ribosome Translational repression MicroRNAs Modulate Gene Expression The development from embryo to adult of the worm C. elegans requires RNAi to turn off genes at appropriate times. In this case RNAi is not triggered by intrusion of external sequences such as transgenes or viruses. During development small noncoding RNA molecules known as microRNAs miRNA are transcribed from the worm’s own genome. These miRNAs regulate gene expression by blocking the translation of target mRNA. MicroRNAs frst identifed in C. elegans are now known to be present in plants and animals including humans. RNAi induced by miRNAs is similar to the mechanism described previously. The mRNA targets are identifed by antisense that is the miRNA has sequences that are complementary to part of the target mRNA. Some miRNAs bind to the target mRNA and block the initiation of translation. In other cases the miRNA binds to the 3 ′ UTR region of the mRNA. MicroRNAs are transcribed as longer precursor mol- ecules pri-microRNAs of approximately 70 nucleotides in length. In Drosophila pri-microRNAs are transcribed as polycistronic messages which are then cleaved by an endonuclease called Drosha. The cleaved products are called pre-miRNAs. Pre-miRNAs exit the nucleus. In the cytoplasm Dicer recognizes the stem-loop and cleaves the loop structure. The RISC complex then separates the two strands. The miRNAs found in animals such as C. elegans can tolerate a few mismatched base pairs within the binding domain. In animals the antisense miRNA strand blocks translation of the target mRNA Fig. 5.15 which is not degraded. In contrast in plants microRNA must have perfect matches and relies on RISC-mediated recognition and cleavage to degrade the target mRNA. Most microRNA molecules target multiple mRNAs and they rarely inactivate any mRNA totally. They serve to coordinate and modulate regulation of multiple genes and conse- quently often play a role in development. A newly discovered class of RNA known as cir- cular RNA circRNA may counteract the effect of miRNA. The circRNA molecules are large around 1500 nucleotides and have multiple binding sites over 70 in some cases for the corresponding miRNA. The circRNA acts as a molecular sponge that absorbs the miRNA and prevents it from acting in its target mRNAs. Applications of RNAi for Studying Gene Expression By inhibiting translation RNAi allows the elimination of a particular protein from an organism without the need for genetic modifcation. RNAi thus provides a powerful tool to study the roles that particular proteins play. The application of RNAi to studying C. elegans is especially well understood. Three different methods are used to get dsRNA into C. elegans and stimulate RNAi to block expression of a target gene Fig. 5.16. The little MicroRNAs miRNAs modulate expression of various genes during development in many organisms. They are frst translated as pri-miRNAs from the organism’s own genome processed into 21 to 23 nucleotide pieces and then RISC separates the strands and searches the cytoplasm for mRNAs with complementary sequences.

slide 159:

RNA-Based Technologies 154 FIGURE 5.16 Delivering dsRNA to C. elegans A C. elegans can absorb dsRNA expressed in bacteria that they eat. B C. elegans can absorb dsRNA by swim- ming in a solution containing dsRNA. C Injecting dsRNA into an egg will trigger gene silencing in the developing worm. A BC Chromosome dsRNA dsRNA dsRNA vector dsRNA C. elegans C. elegans Bacteria C. elegans C. elegans eggs Gonad Needle worm has an uncanny ability to take up exogenous DNA or RNA. Worms can be fed Escherichia coli bacteria expressing the dsRNA of interest. When the worm digests the bacteria the dsRNA is taken up into the intestinal cells and triggers RNAi. Dicer then cleaves the dsRNA into small siRNAs that activate RISC to block the target mRNA. Another method to deliver dsRNA is simply bathing the worms in a solution of dsRNA. The exogenous dsRNA is absorbed into the worm where it activates RNAi. A third method to induce RNAi is to inject worm eggs with dsRNA. The worm develops with the dsRNA inside and RNAi is activated in all the cells. Bathing worms in dsRNA or feed- ing them dsRNA-expressing bacteria is incomplete—some cells are not penetrated—yet because the signal can spread from cell to cell the method works. Surprisingly the RNAi effect can pass from parent to offspring. The progeny of a worm with a silenced gene will silence the same gene. Current research suggests that epigenetic infuences such as histone modifcation play a major role in passing the RNAi gene silencing from parent to offspring. Delivering dsRNA to Drosophila is not quite as easy. The dsRNA must be microinjected directly into a developing Drosophila egg. The dsRNA enters the cells as the embryo develops. This method knocks out the protein of interest through all stages of development. Obvi- ously if the protein is essential for development activating RNAi too early can stop develop- ment and kill the fy. Luckily Drosophila has a feature not found in C. elegans. Fly cells can be cultured in vitro in nutrient medium and dsRNA can be transfected into the cultured cells. Therefore if RNAi kills the embryo the corresponding protein can still be examined in cultured cells.

slide 160:

ChAPTER 5 155 RNAi FOR STUDYING MAMMALIAN GENES An important application for RNAi is testing the individual roles of human proteins which can reveal new targets for curing diseases or can identify the causative agent for that disease. Until recently using RNAi in mammalian systems was not possible. Applying dsRNA to cultured mammalian cells or whole mice induces a potent antiviral response. Interferon is produced which triggers the cells to degrade all RNA transcripts and shut down protein synthesis. Thus the methods used in C. elegans and Drosophila kill mammalian cells. Recognizing that small-interfering RNA siRNA triggers RNAi was the key to its application in mammals. Instead of using long dsRNA as in C. elegans and Drosophila exposing mammalian cells to dsRNA shorter than 30 nucleotides activated the mammalian counterparts of Dicer and RISC. This in turn abolished expression of the target mRNA. Such short dsRNA segments thus act as endogenously produced siRNA. As described earlier siRNAs made by Dicer are short double-stranded RNA about 2 1 to 23 base pairs in length. In addition the siRNAs have a two-base 3 ′ overhang that is more stable when it consists of two uracils. To study a particular target mRNA in mammalian cells chemically synthesized siRNAs with these characteristics are designed to have the complementary sequence to the target. Like antisense oligonucleotides these may have modifcations to make them more stable such as methyl groups added to the 2 ′ -OH of the ribose. The most diffcult aspect of in vitro siRNA construction is determining an effective sequence. The sequence on the target mRNA must be accessible to the siRNA which may be challenging because of RNA second- ary structure. Many suitable siRNA are designed and each is tested for activating RNAi. These siRNA are delivered to mammalian cells much like antisense oligonucleotides including transfection liposomes and microinjection. Rather than chemically synthesizing siRNA the target mRNA can be mixed with purifed Dicer to cleave the mRNA into siRNA pieces Fig. 5.17. Purifed Dicer generates multiple siRNAs as it would in vivo. To supply the target mRNA scientists amplify the chosen target gene with PCR using PCR primers that also contain a promoter sequence for T7 RNA polymerase. The dsDNA is then converted into dsRNA by T7 RNA polymerase. The RNA strands are allowed to anneal spontaneously and the dsRNA is then mixed with purifed Dicer which digests it into multiple different siRNAs. These are then transfected into mammalian cells. Mammalian cells can also be induced to activate RNAi by expressing short hairpin RNAs shRNAs that mimic the structure of microRNAs. A gene for the shRNA is constructed in a vector. The shRNA can be transcribed as two complementary strands by two different promot- ers facing in opposite directions or simply made as one transcript with complementary ends interrupted by a sequence that forms a loop Fig. 5.18. In both constructs the complementary C. elegans can take up dsRNA pieces that activate RNAi by ingesting transgenic bacteria that are expressing dsRNA by bathing in a solution of the pure dsRNA or by having the dsRNA injected into the eggs. Drosophila is also used as a model organism to study protein function with RNAi by microinjecting dsRNA into Drosophila eggs or cultured cell lines. FIGURE 5.17 In Vitro Treatment with Dicer Generates siRNAs The key to making siRNAs in vitro is cloning the target gene so that both the sense and antisense strands are expressed into mRNA. The two strands anneal sponta- neously and when purifed Dicer is added small siRNAs are produced. Target gene Gene to be suppressed by RNAi dsDNA siRNAs PCR AMPLIFY WITH PRIMERS CONTAINING PHAGE PROMOTER SEQUENCES ADD PURIFIED DICER ADD TO MAMMALIAN CELLS IN VITRO TRANSCRIPTION WITH PHAGE POLYMERASE MAKES DOUBLE-STRANDED RNA Phage promoter Phage promoter 3 5 5 3 3 5 5 3 3 5 5 3 3 5 5 3

slide 161:

RNA-Based Technologies 156 20nt complementary sequence to target mRNA Reverse complement of target sequence Antibiotic resistance for mammalian cells Bacterial origin etc. TRANSCRIPTION shRNA 3 LTR 5 LTR Promoter NNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNGGC NNNNNNNNNNNNNNNNNNNN Antibiotic for bacteria 5 3 G G C T A C G A G C NNNNNNNNNNNNNNNNNNNNCCG UACGA FIGURE 5.18 Design of shRNA Expression Vectors to Activate RNAi Vectors can be designed to express different shRNA molecules. This retrovirus- based vector has two com- plementary sequences about 20 nucleotides in length that form a stem separated by a loop region. When the vector is transformed into a cell the shRNA is transcribed and activates gene silencing. sequences come together to form a double-stranded RNA region. The promoter for mamma- lian RNA polymerase III is most commonly used to express shRNA in vivo. This enzyme nor- mally transcribes small noncoding RNAs. RNA polymerase III starts transcription at a specifc sequence and stops when it encounters four to fve consecutive thymines. In addition RNA polymerase III does not activate the enzymes that add a cap and a polyA tail to the transcript. Thus polymerase III does precisely what is necessary to create a shRNA that mimics those found in eukaryotic cells. Rather than designing the shRNA constructs from scratch another strategy uses pre-existing microRNA. First the stem portion is replaced with sequences that match the target mRNA. The newly designed microRNA will trigger RNAi for the target mRNA rather than its endog- enous target due to the change in sequence. Vectors that express shRNA have some advantages over adding siRNA. When siRNA is added to mammalian cultures the effect is temporary. When the siRNA is gone the effect ends. A shRNA vector has a more sustained effect as it continues to produce shRNA for a consider- able time. In addition expression of the shRNA may be controlled using promoters that are inducible or tissue-specifc. In a clinical setting the ability to deliver the siRNA only to those tissues that need it is crucial and using a vector system can accomplish this task.

slide 162:

ChAPTER 5 157 FUNCTIONAL SCREENING WITH RNAi LIBRARIES RNAi can be used to screen the entire genome by constructing an RNAi library. In these vector-borne gene libraries each gene sequence is expressed as dsRNA rather than DNA. Thus each library clone targets the corresponding gene for suppression by RNA interference. In theory such libraries could be used to screen each protein in an organism for its functional role. At present there are still technical problems with reliability and reproducibility. None- theless some gene products have been characterized—for example those that affect resistance to several viruses. An RNAi library containing most of the predicted genes in the genome of C. elegans has been constructed in E. coli. These dsRNA expressing E. coli are then fed to C. elegans thus triggering RNAi and removing one protein from the organism. This allows the effect of suppress- ing expression of each single protein from the worm to be assayed. Using this library more than 900 genes have been identifed whose suppression kills the embryo or causes gross developmental defects. Most of these genes had no known function before. Similarly an RNAi library of about 7200 genes about 91 of the predicted genes in the genome has been constructed in Drosophila. With Drosophila multiple investigators have examined the same regulatory pathways by RNAi screening. Unfortunately agreement is very low between different studies often merely 20–30 of overlap is seen. To construct an RNAi library scientists isolate the genes as cDNA clones Fig. 5.19 and then amplify them using PCR see Chapter 4. The PCR primers are specifc for each gene and are designed to add two different promoters at the ends of each gene. For example the 3 ′ end would have a T7 polymerase promoter and the 5 ′ end would have a T3 polymerase promoter. The PCR products are then cloned into a suitable vector. When the vector is pres- ent in a cell with both T7 and T3 RNA polymerase both an antisense and a sense transcript are transcribed for each of the clones. The two strands spontaneously anneal to form dsRNA which then activates RNAi. For mammals RNAi libraries must be constructed with siRNAs or shRNAs rather than full-length dsRNA because full-length dsRNA is toxic. This kind of RNAi library is analyzed either with mul- tiwell plates or live cell microarrays Fig. 5.20. For multiwell plates the library is transformed into a large number of cells that are then inoculated into the wells of the plate. The number of cells in each well is adjusted so that only one siRNA is found in each well. Another method for screening a siRNA or shRNA library is to spot each clone onto slides in a microarray. Live cells are then added and take up the siRNA or shRNA at these locations. In either case the cells are analyzed for any noticeable symptoms. RNAi libraries are designed to express dsRNA for each gene in the genome. Each library clone targets one protein by promoting degradation of the corresponding mRNA. RNAi libraries are used to identify the role of unknown proteins. Mammalian cells can be screened for defects induced by an RNAi library clone using a live cell microarray or by using a multiwell-plate assay. FIGURE 5.19 Constructing an RNAi Library Each clone in the library must have two different promoters fanking the cod - ing region. When the clone is transcribed into mRNA both an antisense and a sense transcript will be produced and the two strands will come together to form double-stranded RNA. cDNAs DSRNA DSRNA PIECES FOR EACH CLONE IN LIBRARY IN VITRO TRANSCRIPTION WITH T3 T7 RNA POLYMERASE T3 T7 3 5 5 3 RNAi can be triggered in mammalian cells using chemically synthesized siRNA creating a shRNA that degrades the target mRNA or by modifying existing miRNA to recognize a different target mRNA.

slide 163:

RNA-Based Technologies 158 NONCODING RNA s TAKE PART IN RNA PROCESSING The role of RNA in RNA processing and editing has been known for some time. Noncoding RNAs take part in processing other RNA molecules such as those involved in protein synthesis: tRNA rRNA and mRNA. In all three of these RNA molecules the structures of the mature RNA is different from the original RNA transcript and other noncoding RNAs are essential for these modifcations. For example the enzyme ribonuclease P RNase P trims both the 5 ′ and 3 ′ ends of tRNA precursors. RNase P is unusual in being a complex of RNA and protein but it is the RNA component that cleaves tRNA. Because of its catalytic activity RNase P is considered a ribozyme. Another example of a small noncoding RNA working as a key component of RNA pro- cessing occurs when the primary transcript is processed into mRNA in the nucleus of eukaryotes. A noncoding RNA called snRNA small nuclear RNA removes introns from the primary transcript. snRNA and protein work together in a complex called the spliceo- some that identifes the intron/exon borders and removes the nonprotein coding introns. The key proteins and snRNA assemble at the splice sites that border the exon and intron sequences. During the splicing reaction three essential sequences are required. The 5 ′ splice site is recognized by U1 snRNA the 3 ′ splice site is recognized by U2AF protein and the branch site within the intron is recognized by U2 snRNA. U1 and U2 RNA pull the two exons close together cleave the RNA at the two splice sites and ligate the exons into one piece. The intron is released as a lariat structure Fig. 5.21. Researchers have found that defects in these components may cause diseases like lupus and spinal muscular atrophy. In addition noncoding RNAs are also involved in alternate splicing. Not all exons within a gene are always used and therefore the fnal mRNA may not include all of the available exons. Alternate splicing occurs in almost 95 of human genes a fact only recently discovered by next-generation sequencing technology. The role of snRNA and other noncoding RNAs in these processes is still being investigated. Alternative splicing of mRNA is widely used to control gene expression in eukaryotes. Mutations affecting this process may cause some forms of inherited dis- ease. Recently antisense oligonucleotides have been used to correct the genetic errors in alterna- tive splicing that cause beta-thalassemia and some cancers see Box 5.2. EXPRESSION VECTORS WITH SIRNA OR SHRNA A B TRANSFECT/TRANSFORM INTO CELLS GROW IN SMALL WELLS LOOK FOR PHYSICAL PHENOTYPE ADD CELLS AND MEDIA ASSAY PHENOTYPE OF CELL OVER EACH SPOT Nucleus FIGURE 5.20 Multiwell Plate Assays and Live Cell Microarrays A Small-interfering RNA and short-hairpin RNA libraries can be transfected into mammalian cells. Each cell can then be assessed for altered phenotypes such as loss of adherence mitotic arrest or changed cell shape. Clones that cause interesting phenotypes are isolated and sequenced to identify the protein that was suppressed. B Rather than transforming cells the siRNA or shRNA can be spotted onto microscope slides. As cells grow and divide on the slide they take up RNAs. This initiates RNAi. The cells are then screened for pheno- type changes over the spot.

slide 164:

ChAPTER 5 159 Another example of noncoding RNA function occurs in the nucleolus. Here rRNAs are transcribed by RNA polymerase I but the transcript cannot be used as is. Instead rRNA ribo- nucleotides are modi- fed. Noncoding RNAs known as snoRNAs small nucleolar RNA are responsible for RNA modifcation. The human genome contains 400 snoRNA species but only half of these have a known target. Most snoRNAs are encoded within introns and are tran- scribed with their parent gene. When the splicing machinery removes the intron the snoRNA sequence is spliced from the intron debranched and then processed to the correct size by an endonuclease. These snoRNAs guide key proteins to modify rRNA and snRNA. The H/ACA-box SNORA family of snoRNAs directs pseudouridylation the replacement of uridine with pseudouridine whereas SNORD family members methylate 2 ′ O-ribose. In addition to the role in rRNA maturation snoRNAs are processed into smaller pieces called snoRNA-derived RNAs. These RNAs have an unknown function although some studies sug- gest they play a role in the regulation of cellular growth and therefore affect some cancers. FIGURE 5.21 Operation of the Spliceosome A The spliceosome consists of several ribonucleoproteins U1 to U6 also known as “snurps” which take part in splicing. These assemble at the splice sites at the intron/ exon boundaries. B The binding of U1 at the 5 ′ splice site and of U2 at the branch site is shown in greater detail. B A U1 BINDS 5 SPLICE SITE U2 BINDS BRANCH SITE 3 3 5 5 splice site branch site 5 3 5 3 5 5 splice site 3 splice site Branch site Boundary Boundary SNURPS U5 Intron Exon 1 Exon 2 U2AF U1 U2 U6 U4 The three major RNAs involved in translation are all processed. In all cases processing requires the par- ticipation of other noncoding RNA species. Some diseases are caused by aberrant patterns of mRNA splicing. For example beta-thalassemia is a blood disorder in which red blood cells cannot carry enough oxygen because of defective hemoglobin. Some cases of the disorder arise from aberrant splicing of the beta- globin pre-mRNA. These cases are due to mutations that generate extra splice sites between exons two and three in the beta-globin gene. This results in the inclusion of part of the intron sequence in the mRNA and fnal protein. Antisense morpholino-oligonucleotides have been designed to target and suppress the extra mutant splice sites. The antisense oligonucleotide corrects the splicing pattern and restores the correct protein in red blood cells taken from patients with this form of beta-thalassemia Fig. A part A. The Bcl-x gene in humans produces two different proteins by alternate splicing. Bcl-xL the longer protein includes a segment of coding material between exons 1 and 2. Bcl-xS the shorter protein lacks this segment. Bcl-xL protein promotes cell growth by block- ing apoptosis whereas Bcl-xS opposes this and causes cells to die via apoptosis see Chapter 20. In some cancers the long form is overexpressed. Thus devising a method to inhibit Bcl-xL expression and enhance Bcl-xS expression could induce these cancer cells to Box 5.2 Antisense Oligonucleotides Cure Splicing Defects Continued

slide 165:

RNA-Based Technologies 160 RIBOSWITChE ARE CONTROLLED BY EFFECOR MOLECULE Examples of RNA that regulate gene expression have proliferated over the last decade. Such regulation may occur at a variety of levels including both transcription and translation. Riboswitches provide a fascinating example in which the regulatory RNA is actually part of the RNA molecule whose expression is being regulated. Riboswitches are segments of RNA located on mRNA molecules close to the 5 ′ end. The riboswitch domain alternates between two different RNA secondary structures that determine whether or not the mRNA is expressed. Unlike most regulatory RNA riboswitches bind small effector molecules such as amino acids or other nutrients. Binding of the effector molecule triggers a conformation change in the riboswitch. In most cases effector binding terminates mRNA transcription prematurely or prevents mRNA translation. undergo apoptosis and die. Antisense 2 ′ -O-methyl-oligonucleotides complementary to the splice site in the pre-mRNA prevent the longer form from being made. The Bcl-xS protein then counteracts the can- cerous growth by promoting apoptosis Fig. A part B. Those cancer cells that were “fxed” by antisense therapy also became more sensitive to chemotherapeutic agents because of the restoration of apoptosis. Box 5.2 Antisense Oligonucleotides Cure Splicing Defects—cont’d A B β-THALASSEMIA RATIO OF Bcl-xL + Bcl-xS CONTROL WHETHER THE CANCER CELL LIVES OR DIES NORMAL Exon 2 Exon 3 Exon 2 Exon 3 MUTANT Exon 2 Exon 3 Mutations add two splice sites to the intron Mutations only partially remove intron sequence Splice Splice Splice Splice Splice Splice Exon 2 Exon 3 ANTISENSE OLIGONUCLEOTIDE Exon 2 Exon 3 Exon 2 Exon 3 NORMAL DEFECTIVE CELL Exon 1 Exon 2 Exon 1Exon 2 Bcl-xS induces cell death Bcl-xL prevents defective cell from dying Bcl-xS induces cancer cell to die CANCEROUS CELL Exon 1 Exon 2 Exon 1 Exon 2 ANTISENSE OLIGONUCLEOTIDE Exon 1 Exon 2 Exon 1Exon 2 FIGURE A Antisense Oligonucleotides Correct Splicing Errors A Beta-thalassemia is a blood disorder in which extra splice sites are found between exon 2 and exon 3 middle. Antisense oligonucleotides that block the extra splicing junctions restore the original structure of the gene during splicing right. B The Bcl-x gene makes two different proteins through alternate splicing. Bcl-xS is made when a normal cell becomes defective and promotes cell death via apoptosis left. Some cancerous cells do not produce Bcl-xS. Instead the longer form Bcl-xL is produced via alternate splicing and protects the cancerous cell from apoptosis middle. To resensitize the cancerous cell to apoptosis antisense oligonucleotides that block the splicing junction for Bcl-xL restore Bcl-xS protein production and ultimately sensitivity to apoptosis right.

slide 166:

ChAPTER 5 161 The vast majority of riboswitches are found in bacteria mostly in genes for biosynthetic enzymes. For example in E. coli the thiamine riboswitch is controlled by thiamine pyrophos- phate a vitamin. When the vitamin is abundant it binds to the TH1 box i.e. a riboswitch close to the 5 ′ end of the mRNA and transcription of the mRNA is aborted. When the vitamin is absent the mRNA is transcribed and translated to give enzymes that make more thiamine. Similar control occurs for ribofavin biosynthesis in Bacillus subtilis. The vitamin itself binds to the riboswitch domain of the mRNA and controls whether or not the mRNA is expressed. Riboswitches normally work by changing the stem and loop structure of the mRNA transcript. In attenuation riboswitches the effector molecule binds to the mRNA as it is being tran- scribed. If the effector binds changes in structure create a terminator loop which causes the transcriptional machinery to fall off prematurely. The incomplete mRNA is degraded. When the effector is in short supply then the mRNA is transcribed to completion Fig. 5.22A. Alter- natively some riboswitches work through translational inhibition. Here the riboswitch con- trols whether or not protein translation occurs by sequestering the Shine–Dalgarno sequence. When the effector molecule is abundant its binding changes the stem-loop structure so that the Shine–Dalgarno sequence is not accessible to the ribosomes see Fig. 5.22B. A novel riboswitch was identifed in Bacillus subtilis that controls the expression of a biosyn- thetic gene glmS for a cell wall component Fig. 5.23. As for other riboswitches a product of the biosynthetic pathway controls whether or not the mRNA is expressed. However instead of hiding the Shine–Dalgarno sequence or creating a terminator loop the change in RNA second- ary structure creates a self-cleaving ribozyme. The glmS gene of B. subtilis encodes the enzyme glutamine fructose 6-phosphate amidotransferase which converts fructose 6-phosphate plus glutamine into glucosamine 6-phosphate GlcN6P. This is further converted into a com- ponent of the cell wall UDP-GlcNAc. When this is abundant it binds to glmS mRNA alter- ing the secondary structure. The new structure functions as a ribozyme that cuts the mRNA preventing any further translation. Although the vast majority of riboswitches bind small molecules a few are exceptional. The T box riboswitch binds tRNA a rather large molecule. This controls expression of amino acid metabolism genes depending on whether the tRNA is charged or uncharged. Other riboswitches do not use an effector molecule at all. Instead they respond directly to thermal stress. For exam- ple the rpoH gene of E. coli is involved in the heat shock response. In addition to other forms of 4 3 5 end 1 mRNA Signal metabolite binds Signal metabolite Base pairing CONTINUED TRANSCRIPTION 2 ATTENUATION MECHANISM TRANSLATIONAL INHIBITION 5 end 1 mRNA 23 uuuu 5 end Coding sequence 1 mRNA Terminator 4 23 3 uuuu PREMATURE TERMINATION 5 end mRNA Start codon Shine–Dalgarno sequence Shine–Dalgarno sequence TRANSLATION PROCEEDS 12 TRANSLATION PREVENTED Signal metabolite binds Coding sequence AB FIGURE 5.22 Riboswitches Control mRNA Expression Riboswitches alternate between two stem and loop structures depending on the presence or absence of the signal metabolite. A In the attenuation mecha- nism the presence of the signal metabolite results in formation of the terminator structure and transcription is aborted. B In the transla- tional inhibition mechanism the presence of the metabo- lite results in sequestration of the Shine–Dalgarno sequence which prevents translation of the mRNA.

slide 167:

RNA-Based Technologies 162 FIGURE 5.23 Ribozyme Riboswitch of B. subtilis GlmS Gene A Cell wall synthesis occurs during growth conditions. When the cell is growing levels of UDP-GlcNAc are low and are quickly converted into cell wall components. B If the cell is not growing UDP-GlcNAc is not incorpo- rated into the cell wall and accumulates. The excess UDP-GlcNAc binds to a riboswitch on the glmS gene. Once bound it activates the self-cleaving ribozyme to degrade the mRNA and halts the production of glutamine fructose 6-phosphate amido- transferase. A B glmS gene mRNA Riboswitch Glutamine fructose-6-P amidotransferase GlcN6P Fructose 6-P + glutamine 3 5 LOW UDP-GlcNAc CONCENTRATIONS HIGH UDP-GlcNAc CONCENTRATIONS mRNA Degraded mRNA Riboswitch 3 5 3 3 5 3 5 5 glmS gene UDP-GlcNAc UDP-GlcNAc Cell wall synthesis

slide 168:

ChAPTER 5 163 regulation the mRNA contains a thermosensor domain which controls the amount of transla- tion. At normal temperatures the thermosensor has a stem-loop structure that prevents transla- tion. When the heat increases the stem-loop structure falls apart and translation can occur. The only type of riboswitch found so far in eukaryotes is the thiamine riboswitch. Despite being homologous in sequence to the thiamine riboswitches of bacteria it operates by a dif- ferent mechanism and controls alternative splicing of the mRNA precursor. Only fungi and plants possess this thiamine riboswitch but it is absent in animals. RNA CTAL YZE ENZYME REACIONS Ribozymes are RNA molecules that bind to specifc targets and catalyze enzymatic reactions. Some ribozymes consist of RNA associated with proteins but the RNA catalyzes the actual reaction. Some ribozymes work like allosteric enzymes that is binding an effector molecule alters the ribozyme structure so that the ribozyme becomes competent to cleave its substrate. Ribozymes are naturally occurring but biotechnology research has started to exploit their unique characteristics for medical and industrial applications. There are eight known classes of ribozymes at present with the distinct possibility that more will be identifed. Ribozymes are classifed as large or small. The large ones range from several hundred nucleotides to 3000 nucleotides in length. Large ribozymes were the frst identifed and the frst of these were the group I introns of Tetrahymena. These intron sequences are found in pre-mRNA that are able to self-splice. They do not use splicing factors such as snRNA aka snurps. Group I introns are common in fungal and plant mitochondria in nuclear rRNA genes in chloroplast DNA in viruses and in the tRNA genes of chloroplasts and eubacteria. The important aspect of intron self-cleavage is the RNA structure. RNA is a linear polymer but because of base pairing between different regions RNA also has a sec- ondary structure. Multiple stem-loop structures fold into different confgurations leading to a three-dimensional structure much like a protein. The example shown is the second group I intron within the orf142 gene of bacteriophage Twort which infects Staphyloccoccus aureus Fig. 5.24. The three-dimensional structure of group I introns brings the two exons close together facilitating removal of the intron between them Fig. 5.25. Group II introns are also self-splicing sequences found within genes. They are less common than group I introns being found only in fungal and plant mitochondria in chloroplasts of plants and Euglena in algae and in eubacteria. These introns do not self-splice in vitro and require far from physiological conditions to work. The three-dimensional structure of the intron creates these abnormal conditions in vivo affecting the microenvironment to create the correct ionic concentrations. The 3D structure of group II introns brings the two exons together facilitating intron removal and exon ligation Fig. 5.26. Interestingly the structure of these introns is similar to the structure of snRNA suggesting that group II introns may be evolutionary precursors to the snRNAs and the spliceosome. Another naturally occurring large ribozyme is RNase P from bacteria. This is an RNA-protein complex but the RNA component is the catalytic entity. RNase P cleaves the 5 ′ end of pre- tRNA molecules to remove the leader sequence. RNase P can act on multiple substrates unlike the group I and group II introns that naturally act only on themselves. Ribozymes are naturally occurring RNAs that can facilitate an enzymatic reaction. Group I and group II introns are two types of ribozymes that can cleave the phosphate backbone release themselves from the mRNA molecule and rejoin the ends without using any protein enzymes. Riboswitches are mRNA sequences that bind directly to effector molecules to control the expression of the mRNA into protein.

slide 169:

RNA-Based Technologies 164 SMALL NATURALLY OCCURRING RIBOZYMES In contrast to the large ribozymes small ribozymes are only about 30 to 80 nucleotides long. Small ribozymes include hammerhead and hairpin ribozymes hepatitis delta virus ribo- zyme Varkud satellite and twister ribozyme. They are often found in viroids virusoids and satellite viruses which are subviral agents. Viroids are self-replicating pathogens of plants that are merely naked single strands of RNA with no protein coat. Satellite viruses are small RNA molecules that require a helper virus for either replication or capsid formation. Their genomes may encode proteins. Virusoids are even less functional and are often considered a subtype of satellite virus. Virusoids are single strands of circular RNA that encode no pro- teins. They rely on helper viruses for both replication and a protein coat. FIGURE 5.24 Structure of the Twort Ribozyme A Primary and secondary structure of the wild-type intron. The P1-P2 domain is highlighted in red the P3-P7 region is green the P4-P6 domain is blue the P9-P9.1 region is purple the P7.1- P7.2 subdomain is yellow and the product oligonucle- otide is cyan. Dashed lines indicate key tertiary structure contacts. Nucleotides in italics P5a region are disordered in the crystal. IGS internal guide sequence. B Ribbon diagram colored as in A. The backbone ribbon is drawn through the phosphate positions in the backbone. From Golden BL Kim H Chase E 2004. Crys- tal structure of a phage Twort group I ribozyme-product complex. Nat Struct Mol Biol 12 82–89 Reprinted by permission from: Macmillan Publishers Ltd. copyright 2005. A B A U G A U C G A U C G G U C G A C G A U C A U C A U C G A U CG A U G A U G A U C G A U C G A U C G A A C - - - - - - - - - U U U U U U A A A A A A A G G G G G G G G A A A UU U U U U A U A - - U A U A - - G C- G C- C G- C G- C G- U A- UU C - - - - - - - - - • • • C G - C A U- A U- A U- U A - C G- A AA A A A A A A G G C U C ωG U C G U GU G U C G U C • - - - - - - A A G C - G C - U A - A U - G C - G C - U A - A A A A A A A A A A A A A A U- A U- G C- A U- G C- U U U U U G A A A A A A U - G C - GU • A A A A U A - C G - A U - G C - G C - UG • C G - C G - C G - A A U - AU A A A A A G C A A U - G C U U • • A • • - - • A A • - P6 P4 P5 J6/6a P6a P5a J4/5 5 9 IGS –1 –4 5 80 60 55 90 • - 110 100 70 - • • 20 30 P2 180 170 P8 160 40 121 250 200 210 P9 220 P9.1 138 150 P7.2 P7.1 P3 P7 P9.0 J8/7 G A A 240 228 119 P12 P9 P5 P2 P8 P6a P7.2 P9.1

slide 170:

ChAPTER 5 165 The hammerhead ribozyme is a small catalytic RNA that can catalyze a self-cleavage reaction. Hammerhead ribozymes take part in the replication of some viroids and satellite RNAs Fig. 5.27. These both exist as single-stranded RNA genomes that form rod-like structures that are resistant to cellular ribo- nucleases. During viroid replica- tion the positive RNA strand is replicated by the host cell RNA polymerase resulting in a long concatemer of negative-strand genomes. RNA polymerase then uses this as a template to make a positive strand. The long RNA is cut into individual unit genomes by the hammerhead motif. Hammerhead ribozymes frst cleave the ribose phosphate backbone of RNA and then ligate the linear unit genomes into circular genomes. Another small ribozyme is the hairpin ribozyme Fig. 5.28. It is found in pathogenic plant satellite viruses such as tobacco ring spot virus. The hairpin ribozyme from tobacco ring spot virus was originally called the “paperclip” ribozyme a rather better description of the structure. In vivo hairpin ribo- zymes cleave the linear concate- mers of ssRNA genomes much like hammerhead ribozymes and then ligate the linear seg- ments into circular genomes. Two other small ribozymes are the Varkud satellite VS ribozyme from Neurospora and the hepatitis delta virus HDV ribozyme of humans. Both use similar reaction mecha- nisms for self-cleavage and ligation. The VS ribozyme helps replicate the small Varkud plasmid found within the mitochondria of Neurospora. HDV is a viroid-like satellite virus of hepatitis B virus. Hepatitis B infects the liver and can cause liver scarring and liver failure. In patients with hepatitis B the presence of HDV amplifes the symptoms causing a very severe and often fatal form of the disease. HDV has a single-stranded RNA genome that occurs in both positive genomic and negative antigenomic forms in liver cells. Both forms have regions that fold into an active ribozyme which catalyzes RNA cleavage and ligation. Unlike plant viroids HDV also has an open reading frame encoding the delta antigen protein. Delta antigen plus coat proteins from hepatitis B virus are needed to package HDV into small spherical particles. These particles can be spread from cell to cell and from person to person via bodily fuids such as saliva and semen. FIGURE 5.25 Mechanism of Group I Self-Splicing Reaction A The secondary structure of group I introns shows mul- tiple hairpins that mediate the cleavage reaction. In step 1 a free guanosine Red G-OH mediates the cleavage of the exon 1–intron bound- ary. In step 2 the free end of exon 1 cleaves and ligates to exon 2. B Mechanism of group I ribozyme cleavage. First the exon sequences are brought near the catalytic core via the internal guide sequence IGS. Exon 1 has an important uridine U that forms a UG base pair with the IGS dotted line. The other end of the ribozyme has a binding site for the nucleophile a free guanosine red which initiates intron removal by attacking the end of exon 1 with the 3 ′ -OH of its ribose. The free 3 ′ -OH on the exon than reacts with the splice site on exon 2. The intron is spliced out and the two exons are united not shown. Although it appears that this reaction requires energy the actual number of bonds stays the same and no net energy is needed. A B GROUP I SELF-SPLICING MECHANISM OF GROUP I RIBOZYME CLEAVAGE Step 1 Step 2 Exon 1 Exon 2 G Exon 1 5 Connects to Exon 2 Intron Intron G Intron IGS P O O O − O O A OH O O OU OH G - OH O OH OH OH Guanosine

slide 171:

RNA-Based Technologies 166 ENGINEERING RIBOZYMES FOR PRACTICAL APPLICATIONS Ribozymes can be engineered to suppress the expression of genes such as those that promote cancer or those from pathogenic viruses. The ribozyme catalytic core is linked to a sequence that recognizes the target gene mRNA usually an antisense probe thus combining the two strategies Fig. 5.29. The target region must be free of secondary structure and have no protein-binding sites. The antisense sequence is split so that the 5 ′ half is in front of the ribozyme catalytic core and the 3 ′ half is behind. When this chimeric ribozyme is mixed with target mRNA the antisense regions base-pair with the target and the ribozyme cleaves the target mRNA. The two halves of the target mRNA are further degraded by other enzymes. The engineered ribozyme can attack many target mRNA molecules because it is an enzyme not merely an inhibitor. Using a ribozyme is much better than using antisense FIGURE 5.27 Life Cycle of Viroids Viroids are single-stranded circular RNA genomes with no protein coat but they have the ability to self- replicate. First the plus- stranded genome is con- verted into a concatemer of negative-stranded genomes with rolling-circle replication. RNA polymerase converts the negative-stranded genomes into plus-stranded genomes which are separated and ligated into circular genomes. The ham- merhead ribozyme embed- ded within the viroid genome catalyzes the separation and ligation into a circle. + Genome + Genome SELF-CLEAVAGE BY HAMMERHEAD RIBOZYME MOTIF LIGATION − − − − + + + + ++ + + + A GROUP II SELF-SPLICING A OH Step 1 Step 2 Exon 1 + Exon 2 G Group II intron Exon 1 Exon 2 FIGURE 5.26 Group II Intron Splicing Reactions The secondary and tertiary structure of group II introns brings the two exons together but the reaction mechanism does not require an external nucleophile. Instead the 2 ′ -OH of an internal conserved adenine acts as a nucleophile attacking the 5 ′ splice site and cleaving the phosphate backbone. The 3 ′ -OH of the 5 ′ splice site attacks the 3 ′ splice site resulting in two ligated exons and a free intron. The intron forms a lariat structure. Small naturally occurring ribozymes are found in small subviral agents such as viroids and satellite viruses. They have common motifs that catalyze RNA cleavage.

slide 172:

ChAPTER 5 167 AAACA G 5 3 C C C A G A C A C GU U A G A A U U A A U A A C U G G U C G G U G UG A A GA B A A UUUGUC GUCAA UA CAGU FIGURE 5.28 Secondary Structure of Hairpin Ribozyme A The minus strand of the tobacco ring spot virus genome is shown with the cleavage site indicated by the red arrow. B The three-dimensional repre- sentation of the hairpin ribozyme. Reproduced from Salter et al. 2006. AAAAn Target mRNA ANTISENSE RIBOZYME RIBOZYME CAN DEGRADE MORE MRNA ANTISENSE OLIGONUCLEOTIDE BLOCKS TRANSLATION OR RECRUITS RNASE H 5 3 AAAAn 5 3 5 3 AAAAn 5 5 3 + 3 AAAAn 5 3 3 5 3 5 3 5 3 5 OLIGONUCLETIDE NOT REUSABLE FIGURE 5.29 Antisense Construct with Ribozyme Inactivates Target mRNA The chimeric antisense ribozyme not only has the ability to bind to a specifc target mRNA but also cleaves the target mRNA. A traditional antisense oligonucle - otide must rely on recruiting RNase H to digest the target mRNA. However RNase H also degrades the antisense oligonucleotide which cannot therefore be reused. In contrast antisense ribozymes are not cleaved or degraded therefore they can continue to catalyze degradation of target sequences.

slide 173:

RNA-Based Technologies 168 inhibition alone because antisense constructs are degraded along with the target mRNA see Fig. 5.29. The catalytic core used for the constructs just described is usually from either hairpin or hammer- head ribozymes Fig. 5.30. Altering the ribozymes from group I introns from group II introns or from RNase P is diffcult because of their large size and complex structure. Small ribozymes have a natural division between their catalytic centers and the sequences that specify their target. Thus it is easy to manipulate the intended target for ribozyme to cleave. Hammerhead ribozymes have a lower propensity for ligation and are often used preferentially over hairpin ribozymes. FIGURE 5.30 Nuclease-Resistant Ribozyme Bound to a Target mRNA The ribozyme consists of 2 ′ -O-methyl nucleotides and phosphorothioate linkages. At the 3 ′ end a 3 ′ -3 ′ deoxy- abasic sugar iB is added. All three modifcations prevent nuclease degrada- tion of the ribozyme. The fve green nucleotides rA or rG form the catalytic core and cut the target mRNA at the cleavage site. The H at the cleavage site represents an A C or U. Target mRNA Stem III Cleavage site Stem I Stem-loop II Ribozyme 5 3 3 5 N N N N N N UN N N N N N N n n n s n s n s n s n H a a c c g g a a a g c c g g a u U c n n n n n n rA iB rG rG rA rG New substrates for a known ribozyme are found by incubating the pure ribozyme with a large pool of random RNA sequences. Any RNA sequence that binds to the ribozyme is a potential substrate. RNA SELEX IDENTIFIES NEW BINDING PARTNERS FOR RIBOZYMES Natural ribozymes normally act on only one specifc substrate. One goal of biotechnology is to increase the number of substrates for the known ribozymes. A procedure called RNA SELEX Systematic Evolution of Ligands by EXponential enrichment isolates new substrates for existing ribozymes from a large 10 15 population of random-sequence RNA oligonucleotides Fig. 5.3 1. First a mixture of random DNA oligonucleotides is chemically synthesized. These oligonucle- otides are converted into double-stranded DNA dsDNA using a 5 ′ primer and Klenow poly- merase. The 5 ′ primer contains the promoter sequence for T7 RNA polymerase which is added to the pool of dsDNA to make multiple single-stranded RNA ssRNA copies of each oligonucle- otide. The ribozyme of interest is then mixed with this large pool of ssRNA oligonucleotides and those RNA molecules that bind to the ribozyme are isolated. The ribozyme is immobilized on beads to facilitate isolation. Any nonspecifcally bound RNAs are washed away and the specifc ones are isolated. Each repeated cycle of selection removes nonspecifcally bound RNAs. After the selection is repeated and the fnal RNA bound to the ribozyme is purifed the RNA is converted into cDNA using a 3 ′ primer and reverse transcriptase. Because the actual number of specifc binding molecules is low they are amplifed using PCR before sequencing. The use of SELEX extends beyond ribozymes and it is used in drug design and delivery. The process can be applied to fnding DNA binding substrates for different enzymes. In DNA SELEX the initial pool of random-sequence oligonucleotides is not converted to RNA. Instead the oligonucleotides are used directly in substrate binding and selection. Adding a ribozyme motif such as the hammerhead or hairpin region from the small ribozymes can make antisense oligonucleotides more stable because they are not degraded. These constructs cut the target mRNA without the use of RNase H.

slide 174:

ChAPTER 5 169 FIGURE 5.31 RNA SELEX Identifes New Ribozyme Substrates The key to RNA SELEX is using a very large pool of random RNA sequences. First DNA oligonucleotides are chemically synthesized to create a large pool of random sequences. These are converted into dsDNA with a primer and Klenow polymerase. The primer adds an RNA polymerase binding site. The dsDNA oligonucle- otides are transcribed into RNA with RNA polymerase. This large pool of random RNA is then screened for binding to the ribozyme. Any RNA molecules that bind are kept and the rest are discarded. Those that bind are converted to cDNA and amplifed with PCR. The selection process is repeated numerous times to enrich for RNA sequences that bind more tightly to the ribozyme. 30-60nt random sequence POOL OF 10 15 SYNTHETIC OLIGONUCLEOTIDES Fixed sequence 3 5 Fixed sequence 3 5 3 5 3 5 5 3 3 5 5 3 ADD A PRIMER TO 3 END THAT HAS AN RNA POLYMERASE PROMOTER USE KLENOW POLYMERASE TO MAKE 2ND STRAND ADD T7 POLYMERASE TO SYNTHESIZE RNA SELECT RNA THAT BINDS TARGET MOLECULE CONVERT TO CDNA AND AMPLIFY WITH PCR CONTINUE SELECTION SMALL NUMBER OF THE ORIGINAL 10 15 DIFFERENT SEQUENCES 5 3 5 3 T7 promoter IN VITRO EVOLUTION AND IN VITRO SELECTION OF RIBOZYMES It is also possible to generate new ribozymes with novel enzymatic capabilities from large pools of random RNA sequences. Using in vitro selection allows new ribozyme reactions to be identifed from random nucleotide sequences Fig. 5.32. For example a ribozyme that catalyzes the ligation of a particular sequence can be iden- tifed. This approach begins by synthesizing a set of random oligonucleotide sequences. However these represent the pool of potential ribozymes rather than substrates as seen in RNA SELEX. Each random sequence is fanked by two known sequences. The 5 ′ end sequence is one substrate for the desired ligation reaction. The 5 ′ end also has a terminal triphosphate to energize ligation. The 3 ′ end has a sequence domain that binds a chosen effector molecule which allows the ligation reaction to be regulated. The second substrate for ligation is mixed with the pool of potential ribozymes and incubated in conditions that favor ligation. If one of the random RNA sequences ligates the second substrate to its 5 ′ end the resulting RNA molecule i.e. ribozyme plus

slide 175:

RNA-Based Technologies 170 ligation product will run more slowly on an agarose gel. The slower molecules are isolated from the gel. The ribozyme suspect is then converted to DNA with reverse tran- scriptase. Finally the DNA is amplifed with PCR using primers that match the 5 ′ and 3 ′ ends of the original RNA constructs. In vitro evolution enhances in vitro selection by adding a mutagenesis step after each cycle of selection Fig. 5.33. This method begins with a pool of random oligonucle- otides as before. The pool of random sequences is then mutagenized. The most eff - cient method is to use error-prone PCR see Chapter 4 to amplify the initial pool of sequences. The pool both becomes larger and the sequences diversify even more. This FIGURE 5.32 In Vitro Selection of Ribozyme Ligation The pale pink molecule rep- resents the large pool of ran- dom RNA sequences. At the 5 ′ end there is a substrate sequence with a terminal triphosphate. At the 3 ′ end a blue effector molecule is bound to facilitate ligation. The second substrate purple is then incubated with the random pool of oligonucleotides. If any of the random sequences catalyze the ligation of the substrates the resulting species will be larger and may be separated out by gel electrophoresis. The ligated oligonucleotide is isolated from the gel amplifed by PCR and fnally sequenced. N 90 Random 90 nucleotide long region Effector molecule LARGE POOL OF RANDOM RNA SEQUENCES SMALL SUBSET OF ORIGINAL POOL ADD SECOND SUBSTRATE SELECT ALL THE RNA OLIGONUCLEOTIDES THAT ADDED THE SUBSTRATE REMOVE SUBSTRATE. THEN CONVERT TO dsDNA AND AMPLIFY Ligation site 5 3 3 5 PPP N 90 5 3 3 5 5 3 3 5 P

slide 176:

ChAPTER 5 171 FIGURE 5.33 In Vitro Evolution of Ribozymes In vitro evolution tries to fnd an RNA sequence that works as a ribozyme. In this example the researcher is looking for a ribozyme that catalyzes the addition of a metal ion M + to a porphyrin ring. The pool of random RNA sequences is created and amplifed with error- prone PCR to increase the odds of fnding one or two sequences that catalyze the reaction. Each successive round of selection and muta- tion improves any ribozymes that are found. 5 3 5 3 5 3 5 3 POOL OF RANDOM RNA SEQUENCES RIBOZYMES THAT POTENTIALLY CATALYZE REACTION 5 3 3 NH M + M + + HN N N 5 5 3 5 3 5 3 3 5 USE ERROR PRONE PCR TO MUTATE AND AMPLIFY ADD METAL ION M + AND PORPHYRIN RING SELECT RIBOZYME THAT CATALYZES THE M + ADDITION TO PORPHYRIN N N N N pool is then selected for the sequence that carries out the desired reaction. For example artifcial ribozymes have been evolved to add metal ions to mesoporphyrin IX see Fig. 5.33. The mutagenesis and selection steps can be repeated over and over to improve the ribozyme. Once an effcient ribozyme is obtained the sequence is determined after converting the RNA into cDNA.

slide 177:

RNA-Based Technologies 172 Allosteric Deoxyribozymes Catalyze Specifc Reactions Because RNA may display catalytic properties researchers investigated whether DNA can do the same. Although no natural DNA enzymes are known DNA nonetheless can cata- lyze various reactions in a manner analogous to RNA-based ribozymes. Indeed in vitro selection has been used to create a variety of artifcial deoxyribozymes or DNAzymes that catalyze various reactions. Most DNAzymes catalyze reactions involved in processing RNA or DNA because they are easiest for SELEX type schemes to select. Examples include RNA cleavage DNA cleavage DNA depurination RNA ligation DNA phosphorylation and thymine dimer cleavage. One of the most interesting DNAzymes can split thymine dimers caused by UV radiation of DNA. Different organisms have various mechanisms to deal with these dimers. For example excision repair removes the damaged strand and replaces it with new DNA. Another mecha- nism involves photolyase enzymes which recognize and repair thymine dimers when acti- vated by blue light. To isolate a DNA sequence to perform the photolyase reaction scientists carried out in vitro selection on a pool of random DNA oligonucleotide sequences. The random sequences were frst linked to a substrate that consisted of two DNA oligonucleotides joined via a thymine dimer. If a random DNA oligonucleotide split the thymine dimer after exposure to blue light then the overall length of the DNA construct would be smaller. The smaller spe- cies were isolated by gel electrophoresis. This experiment was successful and a specifc DNA - zyme UV1C that could catalyze a photolyase reaction was identifed Fig. 5.34. Artifcial ribozymes have been made to carry out nucleophilic attacks at various centers including phosphoryl carbonyl and alkyl halides. There is also an artifcial ribozyme that can isomerize a 10-member ring structure. In each of these cases the initial pool of RNA molecules was selected for the ability to carry out the specifc reaction. In both in vitro selection and in vitro evolution the key to success is the selection step. It must be stringent enough that most of the nonfunctional RNA molecules are eliminated but not so stringent that ribozymes with weak activity are eliminated too early. A few attempts have been made to construct ribozymes with clinical applications. For exam- ple a hammerhead ribozyme has been created that can inhibit HIV uptake into cells. This ribozyme recognized seven sites within the human CCR5 mRNA. CCR5 is the coreceptor for HIV entry into immune cells but is not essential for humans and indeed people without this coreceptor are resistant to HIV infection. The ribozyme was targeted against a human gene because HIV is so highly mutable that it might become resistant to any ribozyme. The ribozyme effectively eliminates CCR5 expression in vitro. Another ribozyme which does target the HIV genome has also been used to treat HIV-positive patients. The ribozyme is added to CD34+ immune cells HIV target cells and then given to patients. However so far no ribozymes have provided a survival advantage. In vitro selection can also generate new ribozymes by mixing random sequences that represent potential ribozymes with a specifc substrate. Adding a mutagenesis step to the in vitro selection procedure allows the ribozyme to “evolve” into a better enzyme. FIGURE 5.34 Deoxyribozyme That Repairs Thymine Dimers A model for the deoxyribozyme UV1C–substrate complex. Light energy is absorbed by the guanine quadruplex. The thymine dimer is thought to lie close to the guanine cluster within the folded deoxyribozyme. This allows electron fow from the excited guanines to the thymine dimer. From Chinnapen DJ Sen D 2007. T owards elucidation of the mechanism of UV1C a deoxyribozyme with photolyase activity. J Mol Biol 365 1326–1336. Reprinted with permission. hν 300 nm G G G UV1C DNAzyme 3 5 5 3 TDP substrate TT e − G G G G G Deoxyribozymes are DNA sequences that catalyze an enzymatic reaction. All are artifcial.

slide 178:

ChAPTER 5 173 FIGURE 5.35 Designing Allosteric Ribozymes A Modular design of a ribozyme. The ribozyme has three different domains joined together. The sub- strate domain light green background base-pairs with the ribozyme domain light purple background and the aptamer domain binds the allosteric effector ATP in this example. B In vitro selection scheme to identify ribozymes that are active only when bound to an effec- tor i.e. are allosteric. First all ribozymes that cleave substrate in the absence of an effector are removed. If the substrate is cleaved without the effector the ribozyme will move faster during electrophoresis. Only the uncleaved ribozyme/ substrate band is isolated from the gel. Next the ribozymes are mixed with an effector molecule. This time the ribozymes that cleave the substrate are isolated. Repeated cycles of isolation will identify a ribozyme that works only with the effector. ALLOSTERIC RIBOZYME IN VITRO SELECTION ATP APTAMER DOMAIN LARGE POOL OF RANDOM SEQUENCE ATTACHED TO RIBOZYME SELECT ALL THE RIBOZYMES THAT CLEAVE WHEN EFFECTOR IS PRESENT ADD EFFECTOR cAMP etc. RIBOZYME Ribozyme SUBSTRATE Substrate 3 3 3 3 5 5 5 5 PPP PPP Random sequence domain Cleaved Isolate uncleaved product to collect inactive ribozymes Isolate cleaved ribozymes Cleaved A B Engineering Allosteric Riboswitches and Ribozymes Artifcial or modifed ribozymes have enormous potential in medicine and biotechnology. Consequently the ability to control the activity of a ribozyme would be very advantageous. Ribozymes can be combined with riboswitches to achieve control by using the small effector molecule that triggers the riboswitch. To engineer a ribozyme to cleave only in the presence of a certain effector molecule scientists use a combination of modular design and in vitro selection. Modular design takes various domains from different ribozymes and merges them to create a new molecule. For example the catalytic core of a particular hammerhead ribozyme can be genetically linked to the binding domain of another changing the binding specifcity of the original ribozyme Fig. 5.35A.

slide 179:

RNA-Based Technologies 174 Artifcial allosteric riboswitches have been selected by combining the ribozyme catalytic core with a pool of many different random sequences see Fig. 5.35B. Some of the random sequences will have the ability to bind the chosen effector and thus represent a pool of pos- sible riboswitches. Some of the combinations will catalyze self-cleavage or substrate cleavage without regulation and they must be eliminated. If the ribozyme construct cleaves itself the products will move faster during electrophoresis. Therefore the pool of possible riboswitch/ ribozymes is electrophoresed and the slower moving uncleaved RNAs are isolated from the gel. Next the uncleaved ribozymes are mixed with the chosen effector and incubated under cleavage-promoting conditions. In this positive selection step any ribozyme that undergoes cleavage in the presence of the effector is isolated. As before the ribozymes are separated by gel electrophoresis but this time the cleaved shorter and faster molecules are isolated. Cloning and sequencing of the isolated ribozyme constructs determine the sequence of the riboswitch domain. Some effectors that researchers have used to control riboswitches include cyclic GMP cyclic AMP and cyclic CMP. Allosteric ribozymes have been artifcially created that respond not only to small organic molecules such as cyclic AMP but also to oligonucleotides proteins and even metal ions. Ribozymes can be created with riboswitches. The riboswitch controls the ribozyme so that it is active only when the effector molecule is present. Summary Although RNA was once thought of as an intermediary in the transfer of genetic informa- tion from DNA to protein it is now recognized that RNA plays a wide variety of other roles. Indeed RNA is the most functionally diverse of all the biological macromolecules. RNA helps maintain genomic structure protects genomes from invading viral DNA modulates transcription and translation and can even perform enzymatic functions. The roles of tRNA rRNA and mRNA in protein transcription and translation are well known. The roles of small regulatory RNAs such as snRNA and snoRNA in processing mRNA and rRNA respectively are critical to proper cellular function and are still under investigation as to precisely how when and where they act. Translation is also controlled by riboswitches which are RNA sequences that alter shape after binding a small effector molecule. They may be used in bio- technology to control the expression of various genetic constructs. Antisense RNAs modulate gene transcription in many different organisms. In fact a large fraction of protein coding genes have antisense genes found either in cis or trans. These complementary RNA sequences match the mRNA bind to the sequence and inhibit protein translation. The double-stranded RNA cannot be converted to proteins because ribosomes cannot bind. In addi- tion double-stranded RNA triggers enzymes to degrade the duplex which obliterates the target mRNA. Other antisense RNAs suppress transcription by converting the gene to heterochromatin. The application of antisense in the laboratory is a natural evolution of this knowledge. Altering the phosphate-sugar backbone is essential to stabilize artifcial antisense oligonucleotides. Eff - cient uptake of oligonucleotides is a major problem and several approaches have been used. In RNA interference RNAi double-stranded RNA dsRNA activates Dicer to cleave the RNA into segments of 2 1 to 23 nucleotides known small-interfering RNA or siRNA. The RISC enzyme complex unwinds the double-stranded siRNAs and uses the single-stranded RNA as a template to fnd similar sequences. When RISC fnds complementary sequences it cleaves them. This destroys both foreign mRNA and the dsRNA characteristic of RNA virus replication. A similar process is used for developmental gene regulation. In this case the original dsRNA is transcribed from the genome as pri-miRNA that is then processed into microRNAs. RNAi is now widely used

slide 180:

ChAPTER 5 175 in biotechnology and has largely replaced the use of other RNA-based approaches. RNAi may be used to eliminate one protein from the cell at a time to investigate its role. This approach pro- vides insight into proteins whose function is still unknown. Ribozymes are RNA molecules with enzymatic activity. Naturally occurring ribozymes are clas- sifed as large or small. The small ribozymes have compact motifs and therefore are useful for designing ribozymes in the laboratory. For example the hammerhead motif can be linked to an antisense RNA sequence that recognizes mRNA from a disease-causing virus. The antisense seg- ment will bind the target mRNA and the hammerhead motif will then cut the target. Large pools of random RNA sequences can be created to either fnd new substrates for an existing ribozyme or fnd new sequences that are ribozymes. These can be “evolved” into better ribozymes by add - ing a mutagenesis step before the selection. In addition to RNA ribozymes DNA has also been shown to have catalytic power. 1. Which of the following statements about antisense RNA is true a. Antisense RNA binds to form double-stranded regions on RNA to either block translation or intron splicing. b. Antisense RNA is transcribed using the sense strand of DNA as a template. c. The sequence of antisense RNA is complementary to mRNA. d. Antisense RNA is made naturally in cells and also artifcially in the laboratory. e. All of the above statements about antisense RNA are true. 2. Which biological function is not controlled by antisense RNA a. iron metabolism in bacteria b. the circadian rhythm of Neurospora c. replication of prokaryotic genomic DNA d. replication of ColE1 plasmid e. developmental control of basic fbroblast growth factor 3. Which of the following is a modifcation of antisense oligonucleotide structure to increase intracellular stability a. insertion of an amine into the ribose ring to create a morpholino structure b. attachment of nucleic acid bases to a peptide backbone instead of a sugar-phosphate backbone c. replacement of one of the oxygen atoms in the phosphate group with a sulfur atom to inhibit nuclease degradation in some molecules d. addition of an O-alkyl group to the 2 ′ -OH of the ribose group to make the molecule resistant to nuclease degradation e. all of the above 4. How can antisense RNA be expressed within a cell a. The target gene can be cloned inversely into a vector and under the con- trol of an inducible promoter. b. The antisense RNA cannot be expressed within a cell and instead must be delivered via liposomes. c. Antisense RNA can be expressed within cells but this is unfavorable because of the high degree of non-specifc interactions. d. No system has been designed to express antisense RNA within a cell. e. None of the above is correct. End-of-Chapter Questions Continued

slide 181:

RNA-Based Technologies 176 5. Which of the following terms describes when gene regulation occurs by short is dsRNA molecules triggering an enzymatic reaction that degrades the mRNA of a target gene a. post-transcriptional gene silencing b. quelling c. co-suppr ession d. RNA interfer ence e. all of the above 6. Which statement about RNAi is not correct a. RNAi was frst discovered in plants. b. RNAi has two phases: initiation and effector. c. During the initiation phase of RNAi a protein called Dicer cuts dsRNA into small fragments called siRNAs. d. Non-specifc interactions between the antisense siRNA and mRNA often cause mRNAs to be degraded that should not have been. e. The RNA-induced silencing complex has both helicase and endonuclease activities. 7. Which of the following is not a method for delivering dsRNA for RNAi into Drosophila or C. elegans a. ingestion of transgenic bacteria that express dsRNA b. injection of dsRNA into eggs c. bathing in a solution of pure dsRNA d. injection of dsRNA into cell culture lines 8. How can RNAi be triggered in mammalian cells a. transfection of siRNA b. chemically synthesized siRNA c. degradation of target mRNA through shRNA creation d. modifcation of an existing shRNA to recognize a different mRNA e. all of the above 9. What information has been obtained through the creation of RNAi libraries a. the function of unknown proteins by degrading all of the mRNA for that protein b. the mechanism by which E. coli delivers dsRNA to C. elegans c. the mechanism by which heterochromatic formation occurs after some RNAi d. all of the above e. none of the above 10. What is a ribozyme a. an enzyme that cuts ribosomes b. an RNA molecule that binds to specifc targets and catalyzes reactions c. an enzyme that catalyzes the degradation of dsRNA d. an RNA molecule that catalyzes the degradation of ribonucleases e. none of the above 11. Which of the following is a large ribozyme a. hairpin ribozyme b. hammerhead ribozyme c. T wort ribozyme d. hepatitis delta virus e. Varkud satellite ribozyme

slide 182:

ChAPTER 5 177 12. What process is used to identify possible ribozyme substrates a. DNA SELEX b. DNA BLAST c. RISC d. RNA SELEX e. GENEi 13. What property must a ribozyme possess in order to be used in clinical medicine a. stability and resistance to degradation b. no deleterious side effects to the host c. expression within a diseased cell only d. be able to be delivered to the correct location e. all of the above 14. What is a riboswitch a. an mRNA sequence that binds directly to an effector molecule to control the translation of the mRNA into protein b. an enzyme that converts ribozymes into deoxyribozymes c. the effector molecule responsible for translational control of a particular mRNA d. an RNA molecule that switches between being translated into protein or being a ribozyme e. none of the above 15. Which of the following is not an example of an effector molecule for ribo- switches a. some cyclic mononucleotides b. oligonucleotides c. metal ions d. some pr oteins e. antisense RNAs 16. Which RNA is incorrectly paired with its function a. snRNA – RNA processing b. circRNA – transcriptional regulation c. piRNA – RNA processing d. Xist – chromosomal structure e. lncRNA – transcriptional regulation 17. Which of the following helps replicate telomeres a. snoRNA b. Xist c. TERC d. gRNA e. cir cRNA 18. In Drosophila two non-coding RNAs called roX1 and roX2 are used to _______________. a. double the expression of X chromosomal genes in males b. inhibit the expression of one X chromosome in females c. inactivate X chromosomal gene expression in males d. regulate the replication of sex chromosomes in males and female. 19. What is the role of the alpha α antisense form in PTEN expression a. Anneals over complementary areas to prevent degradation. b. Attraction of miRNAs to promote PTEN mRNA degradation. Continued

slide 183:

RNA-Based Technologies 178 Further Reading Aravin A. A. Hannon G. J. Brennecke J. 2007. The Piwi-piRNA pathway provides an adaptive defense in the transposon arms race. Science 318 761–764. Arthanari Y. Heintzen C. Griffths-Jones S. Crosthwaite S. K. 2014. Natural antisense transcripts and long non-coding RNA in neurospora crassa. PLoS One 9 e91353. Ashe A. et al. 2012. piRNAs can trigger a multigenerational epigenetic memory in the germline of C. elegans. Cell 150 88–99. Batista P. J. Chang H. Y. 2013. Long noncoding RNAs: cellular address codes in development and disease. Cell 152 1298–1307. Benenson Y. 2012. Synthetic biology with RNA: progress report. Current Opinion in Chemical Biology 16 278–284. Cech T. R. Steitz J. A. 2014. The noncoding RNA revolution—trashing old rules to forge new ones. Cell 157 77–94. Clark D. P. Pazdernik N. J. 2012. Molecular Biology 2nd ed.. Waltham MA: Elsevier Academic Press/Cell Press. Derrien T. Johnson R. Bussotti G. Tanzer A. Djebali S. Tilgner H. et al. 2012. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure evolution and expression. Genome Research 22 1775–1789. Estrozi L. F. Boehringer D. Shan S.-O. Ban N. Schafftzel C. 201 1. Cryo-EM structure of the E. coli translat - ing ribosome in complex with SRP and its receptor. Nature Structural Molecular Biology 18 88–90. Gavrilov K. Saltzman W. M. 2012. Therapeutic siRNA: principles challenges and strategies. The Yale Journal of Biology and Medicine 85 187–200. Henkin T. M. 2014. The T box riboswitch: A novel regulatory RNA that utilizes tRNA as its ligand. Biochimica et Biophysica Acta 183910 959–963. Hirsch A. J. 2010. The use of RNAi-based screens to identify host proteins involved in viral replication. Future Microbiology 5 303–31 1. Johnsson P. Lipovich L. Grandér D. Morris K. V. 2014. Evolutionary conservation of long non-coding RNAs sequence structure function. Biochimica et Biophysica Acta 1840 1063–1071. Juliano R. L. Ming X. Nakagawa O. 2012. Cellular uptake and intracellular traffcking of antisense and siRNA oligonucleotides. Bioconjugate Chemistry 23 147–157. Kassube S. A. Fang J. Grob P. Yakovchuk P. Goodrich J. A. Nogales E. 2013. Structural insights into tran- scriptional repression by noncoding RNAs that bind to human Pol II. Journal of Molecular Biology 425 3639–3648. Kirwan M. Dokal I. 2009. Dyskeratosis congenita stem cells and telomeres. Biochimica et Biophysica Acta 1792 371–379. Kornfeld J. W. Brüning J. C. 2014. Regulation of metabolism by long non-coding RNAs. Frontiers in Genetics 5 57 eCollection. Lam M. T. Y. Li W. Rosenfeld M. G. Glass C. K. 2014. Enhancer RNAs and regulated transcriptional pro- grams. Trends in Biochemical Sciences 394 170–182. Lee H. C. Gu W. Shirayama M. Youngman E. Conte D. Jr. Mello C. C. 2012. C. elegans piRNAs mediate the genome-wide surveillance of germline transcripts. Cell 150 78–87. c. Recruits two chromatin modifcation enzymes to condense histones surrounding PTEN genes. d. Activate transcription of PTEN gene by modifying chromatin structure. e. Activates PTEN mRNA degradation. 20. What role do piRNAs play a. Serve as a template for transposon silencing. b. Serve as a guide to mRNA degradation enzymes. c. Structural component of some ribozymes. d. Antisense RNA involved in RNA processing. e. Silences the second X chromosome in human females.

slide 184:

ChAPTER 5 179 Li S1 Breaker R. R. 2013. Eukaryotic TPP riboswitch regulation of alternative splicing involving long-distance base pairing. Nucleic Acids Research 41 3022–3031. Li Y. Lu J. Han Y. Fan X. Ding S. W. 2013. RNA interference functions as an antiviral immunity mechanism in mammals. Science 342 231–234. Maillard P. V. Ciaudo C. Marchais A. Li Y. Jay F. Ding S. W. Voinnet O. 2013. Antiviral RNA interference in mammalian cells. Science 342 235–238. Martens-Uzunova E. S. Olvedy M. Jenster G. 2013. Beyond microRNA—novel RNAs derived from small non- coding RNA and their implication in cancer. Cancer Letters 340 201–21 1. Mulhbacher J. St-Pierre P. Lafontaine D. A. 2010. Therapeutic applications of ribozymes and riboswitches. Current Opinion in Pharmacology 10 551–556. Nadal-Ribelles M. Solé C. Xu Z. Steinmetz L. M. de Nadal E. Posas F. 2014. Control of Cdc28 CDK1 by a stress-induced lncRNA. Molecular Cell 53 549–561. Oustric V. Manceau H. Ducamp S. Soaid R. Karim Z. Schmitt C. et al. 2014. Antisense oligonucleotide- based therapy in human erythropoietic protoporphyria. American Journal of Human Genetics 94 61 1–617. Pan Q. van der Laan L. J. Janssen H. L. Peppelenbosch M. P. 2012. A dynamic perspective of RNAi library development. Trends in Biotechnology 30 206–215. Peng J. C. Lin H. 2013. Beyond transposons: the epigenetic and somatic functions of the Piwi-piRNA mecha- nism. Current Opinion in Cell Biology 25 190–194. Phillips C. M. Montgomery B. E. Breen P. C. Roovers E. F. Rim Y.-S. Ohsumi T. K. et al. 2014. MUT-14 and SMUT-1 DEAD box RNA helicases have overlapping roles in germline RNAi and endogenous siRNA formation. Cur- rent Biology: CB 24 839–844. Pircher A. Bakowska-Zywicka K. Schneider L. Zywicki M. Polacek N. 2014. An mRNA-derived noncoding RNA targets and regulates the ribosome. Molecular Cell 54 147–155. Salter J. Krucinska J. Alam S. Grum-Tokars V. Wedekind J. E. 2006. Water in the active site of an all-RNA hair- pin ribozyme and effects of Gua8 base variants on the geometry of phosphoryl transfer. Biochemistry 45 686–700. Scheer U. Hock R. 1999. Structure and function of the nucleolus. Current Opinion in Cell Biology 11 385–390. Scheer U. Weisenberger D. 1994. The nucleolus. Current Opinion in Cell Biology 6 354–359. Serganov A. Patel D. J. 2007. Ribozymes riboswitches and beyond: regulation of gene expression without proteins. Nature Reviews. Genetics 8 776–790. Shirayama M. Stanney W. Gu W. Seth M. Mello C. C. 2014. The vasa homolog RDE-12 engages target mRNA and multiple argonaute proteins to promote RNAi in C. elegans. Current Biology: CB 24 845–851. Stahel R. A. Zangemeister-Wittke U. 2003. Antisense oligonucleotides for cancer therapy—an overview. Lung Cancer 41 81–88. Tang X. Lim S. C. Song H. 2014. RNase AS versus RNase T: similar yet different. Structure 22 663–664. Towbin B. D. Gonzalez-Sandoval A. Gasser S. M. 2013. Mechanisms of heterochromatin subnuclear localiza- tion. Trends in Biochemical Sciences 38 356–363. Ulitsky I. Bartel D. P. 2013. lincRNAs: genomics evolution and mechanisms. Cell 154 26–46. Ulveling D. Francastel C. Hubé F. 201 1. When one is better than two: RNA with dual functions. Biochimie 93 633–644. Voorhees R. M. Ramakrishnan V. 2013. Structural basis of the translational elongation cycle. Annual Review of Biochemistry 82 203–236. Wachter A. 2010. Riboswitch-mediated control of gene expression in eukaryotes. RNA Biology 7 67–76. Wachter A. 2014. Gene regulation by structured mRNA elements. Trends in Genetics: TIG 305 172–181. Wittmann A. Suess B. 2012. Engineered riboswitches: expanding researchers’ toolbox with synthetic RNA regulators. FEBS Letters 586 2076–2083. Wood A. M. Garza-Gongora A. G. Kosak S. T. 2014. A crowdsourced nucleus: understanding nuclear organiza- tion in terms of dynamically networked protein function. Biochimica et Biophysica Acta - Gene Regulation Mechanism 1839 178–190. Yang H. Vallandingham J. Shiu P. Li H. Hunter C. P. Mak H. Y. 2014. The DEAD box helicase RDE-12 promotes amplifcation of RNAi in cytoplasmic foci in C. elegans. Current Biology: CB 24 832–838.

slide 185:

CHAPTER 181 Biotechnology Copyright © 2016 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/B978-0-12-385015-7.00006-5 Immune T echnology 6 Introduction Antibodies Antigens and Epitopes The Great Diversity of Antibodies Structure and Function of Immunoglobulins Monoclonal Antibodies for Clinical Use Humanization of Monoclonal Antibodies Humanized Antibodies in Clinical Applications Antibody Engineering Diabodies and Bispecifc Antibody Constructs ELISA Assay The ELISA as a Diagnostic T ool Visualizing Cell Components Using Antibodies Fluorescence-Activated Cell Sorting Immune Memory and Vaccination Creating a Vaccine Making Vector Vaccines Using Homologous Recombination Reverse Vaccinology Identifying New Antigens for Vaccines DNA Vaccines Bypass the Need to Purify Antigens Edible Vaccines

slide 186:

Immune Technology 182 ANTIBODY BINDS ANTIGEN Constant region Variable region Hinge A B Antigen-binding site Antigen foreign molecule INTRODUCTION The world is full of infectious microorganisms all looking for a suitable host to infect. Bacteria viruses and protozoans are constantly attempting to gain entry into our tissues. If nothing prevented these attempts at invasion no human could survive. Fortunately cells of the immune system patrol the organism protecting the entire body from attack. Any foreign macromolecules that are not recognized as being “self” are regarded as signs of an intrusion and trigger an immune response. In particular proteins that are exposed on the surfaces of invading microorganisms attract the attention of the immune system. These molecules are called antigens. Some of the immune system molecules that recognize and bind to them are called antibodies Fig. 6.1. To be prepared for any possible invasion the B cells of the adaptive immune system generate billions of different antibodies. Most antibodies are secreted into the lymph but some remain bound to the cell surface and are called B-cell receptors BCR. Eventually when a foreign antigen appears a few of the billions of predesigned antibodies will ft the antigen reasonably well Fig. 6.2. Those B cells that make antibodies that recognize the antigen now divide rapidly and go into mass production. Thus the antigen determines which antibody is amplifed and produced. Once a matching antibody has bound invading antigens the immune system brings other mechanisms into play to destroy the invaders. Although the antibody that originally recognized the invading pathogen was a good ft for the antigen there is a stage of refne - ment during which those antibodies that bound to the invading antigen are modifed by mutation to ft the antigen better. In addition the immune system keeps a record of antibodies that are actually used. If the same invader ever returns the corresponding antibodies can be rushed into action faster and in greater num- bers than before. Vaccines exploit this capacity by stimulating the immune system to store the antibodies that recognize and destroy a pathogenic virus such as smallpox. Yet the vaccines cause no disease symptoms themselves see later discussion. FIGURE 6.1 Foreign Antigens Are Recog- nized by Antibodies A Antibodies are Y-shaped molecules produced by the immune system in vertebrates. They bind to specifc portions of proteins or antigens of any invading pathogen. B The variable region mediates the binding of an antigen to the antibody. The immune system keeps a repertoire of B cells that are poised to make antibodies to invading patho- gens. When one of these B-cell antibodies is needed the B cell starts dividing so that many antibodies can be produced and they are available to attack the pathogen. Some of these clones are refned by mutation to make a more specifc antibody. ANTIBODIE ANTIGENS AND EPITOPES The term antigen refers to any foreign molecule that provokes a response by the immune system. In practice most antigens are proteins made by invading bacteria or viruses. In par- ticular glycoproteins which carry carbohydrate residues and lipoproteins which carry lipid residues generate strong immune responses that is they are highly antigenic. Other macromolecules can also work as antigens. Polysaccharides are often found as surface compo- nents of infltrating germs and may act as antigens. Even DNA can be antigenic under certain

slide 187:

ChAPTER 6 183 circumstances. Not surprisingly the antigens exposed on the surface of an alien microor- ganism will usually be detected frst Fig. 6.3. Later in infection especially after the cells of some invaders have been disrupted by the immune system molecules from the interior of the infectious agent may be liberated and also act as antigens. Gene shuffling generates vast numbers of different antibodies. Each B cell makes one type. PREPARATION A Foreign antigen appears Foreign antigen is recognized RECOGNITION B Mutation improves antibody binding to antigen REFINEMENT D Memory B cells remember antigens MEMORY E B cells whose antibody fits the antigen divide and manufacture more antibody RESPONSE C FIGURE 6.2 Predesigned Antibodies Are Ready for Foreign Antigens Long before an attack by a pathogen an army of B cells produces a large repertoire of antibodies A. When one of the antibodies binds to an antigen B that particular B cell starts dividing and expanding C. The majority of the B cells refne the antibody so that the antigen/antibody complex binds more tightly and they fght the pathogens D. A small subset of B cells become memory cells that never die waiting for another attack by the same pathogen E.

slide 188:

Immune Technology 184 The immune system mediates immunity to various infectious agents through specifc immunity or acquired immunity. Acquired immunity can be subdivided into humoral immunity and cell-mediated immunity. Humoral immunity is mediated by antibodies in the blood plasma which are also called immunoglobulins. Cell- mediated immunity is mediated by antigen-specifc cells called T lymphocytes which are divided into T H or T helper cells and T C or T cytotoxic cells. Antibodies generally bind to whole proteins whereas T-cell receptors bind to fragments of protein. When an antibody binds to a protein it recognizes a relatively small area on the surface of the protein such as dimples or projections sticking out from the surface. Such recognition sites are known as epitopes Fig. 6.4. Because intact proteins are large molecules they may have several epitopes on their surfaces. Consequently several differ- ent antibodies may be able to bind the same protein. T cells work in the same manner but recognize only antigens expressed on the surface of other body cells particularly macrophages cells infected with a virus or antibody-making B cells as opposed to the microorganism itself. T cells recognize these other cells via cell surface receptor proteins called the class I and class II major histocompatibility complexes class I and class II MHCs. Class I MHCs activate T H cells and class II MHCs activate T C cells. MHC receptors are encoded by a family of genes that are different for every person. They may be used to distinguish people and must be matched in organ transplanta- tion to prevent rejection. Another name for the MHC receptors is human leukocyte antigens HLAs. ThE GREAT DIVERSITY OF ANTIBODIES Since there is an almost infnite variety of possible antigens a correspondingly vast num - ber of different antibody molecules are needed. The amino acids making up protein mol- ecules can certainly be arranged to give an almost infnite number of different sequences FIGURE 6.3 Surface Antigens of Micr oor ganisms The surfaces of bacteria and viruses are coated with glycoprotein and lipoproteins that are recognized by anti- bodies in the host organism. Bacterium Virus Surface proteins and carbohydrates FIGURE 6.4 Antibodies Bind to Epitopes on an Antigen Antibodies recognize only a small ridge on the surface of a protein. The region of the antigen that binds to the antibody is called an epitope. Antibody Protein antigen Epitope Acquired immunity is divided into two branches. Humoral immunity is mediated by antibodies in blood plasma which are produced by B cells. The second part is cellular immunity which is mediated by T cells. Antibodies recognize epitopes or specifc regions of the invading pathogen. T cells recognize cell surface receptors called class I and class II major histocompatibility complexes that are expressed on the surface of body cells that become infected with an invading pathogen.

slide 189:

ChAPTER 6 185 and therefore of different shapes. However this leads to a major genetic problem. If a separate gene encoded each antibody this would require a gigantic number of genes and a vast amount of DNA. Even if the entire mammalian genome was coding DNA it could encode only a few million antibodies which is far too few. Rather than a unique gene for each antibody the immune system generates a vast array of different protein sequences from a relatively small number of genes by shuffing gene segments in a process called VDJ recombination. Instead of storing complete genes for each antibody the immune system assembles antibody genes from a collection of shorter DNA segments. Shuffing and joining these partial genes allows the generation of an immense variety of antibodies. In Figure 6.5 this idea is illustrated using three alterna- tive front ends and three rear ends. Combining them in all possible ways gives nine different genes. The immune system is a fascinating example of how massive genetic diversity can be generated by shuf- fing relatively few segments of genetic information. Animals can make billions of possible antibodies from only a few thousand gene segments. The detailed genetics of antibody diversity is a complex issue and is described in textbooks on immunology. The rest of this chapter discusses those aspects of immunology of importance to biotech- nology. They include antibody structure the bioengineering of antibodies biotechnological techniques that use antibodies and fnally vaccines. The chapter ends with techniques used to identify and produce new vaccines. STRUCTURE AND FUNCTION OF IMMUNOGLOBULINS Depending on the type of heavy chain antibodies are categorized into different classes and they assume different roles in the immune system see Table 6.1. The most abundant and typical antibody has a gamma heavy chain and is called immunoglobulin G IgG. IgG has four different subclasses but as a whole IgG is found mainly in blood serum. About 75 of the serum antibodies are IgG and they are critical to stimulate immune cells to engulf invading pathogens. IgG is the only antibody able to transfer across the placenta during pregnancy. The second most common antibody in serum is secretory IgA. This antibody is also found in mucosal secretions as well as colostrum and breast milk. It is extremely important in fghting respiratory and gastrointestinal infections especially in infants in whom gastrointestinal illnesses are particularly deadly. The third most common is IgM which is usually found as a pentamer. The unusual structure of IgM provides multiple binding sites for antigens 10 in IgM versus 2 in IgG. This structure makes IgM good for clumping microorganisms and then stimulating immune cells to digest the entire complex. IgD is found at low levels and its role is still uncertain. IgE is the least common antibody in serum and is primarily found attached to mast cells. IgE is the antibody that stimulates allergic responses by releasing the histamines that cause all the common symptoms of allergies including runny noses sneezing and coughing. FIGURE 6.5 Modular Gene Assembly Linking different segments of genes creates exponential numbers of unique combina tions. Nine new “genes” FRONTS ENDS COMBINE IN ALL POSSIBLE WAYS Antibodies are very diverse in structure so that all the pathogens can be recognized. Antibodies are produced by shuffing gene segments rather than having one gene code for each differ - ent antibody.

slide 190:

Immune Technology 186 Each IgG antibody consists of four protein subunits two light chains and two heavy chains arranged in a Y-shape Fig. 6.6. The light chains are encoded by one of two gene loci κ and λ. Disulfde bonds between cysteine amino acid residues hold the chains together. Each of the light and heavy chains consists of one to four constant regions and a single variable region. The constant region is the same for all chains of the same class. The variable regions work together to form the paratope which is the region or surface of the antibody that binds to the target molecule the antigen. There are mil- lions of different variable regions which are generated by genetic shuffing called VDJ recombination. In Figure 6.7 the possible segments encoded in the germline are shown at the top they can be inverted or deleted due to rearrangements. These recombination Different Types and Functions of Human Antibodies Antibody Subtype Light Chain Heavy Chain Function Structure IgA IgA 1 κ or λ α 1 Prevents pathogen attachment Secretory piece J chain SECRETORY MONOMER IgA 2 α 2 IgE none κ or λ ε Allergic reac- tion inhibits parasites Extra domain IgD none κ or λ δ Activates lymphocytes Tail piece IgM none κ or λ μ Clumps microbes and activates complement Extra domain J chain MONOMER PENTAMER Tail piece IgG IgG 1 κ or λ γ 1 Activates complement activates other immune cells IgG 1 IgG 2 IgG 4 IgG 3 IgG 2 κ or λ γ 2 IgG 3 κ or λ γ 3 IgG 4 κ or λ γ 4 Note: Light chains are depicted in light blue and heavy chains are purple. T able 6.1

slide 191:

ChAPTER 6 187 events occur in the bone mar- row during early B-cell develop- ment and are initiated by RAG1 and RAG2 which nick the DNA backbones. Nonhomologous end joining NHEJ enzymes recon- nect the ends to form inversions or deletions. Interestingly NHEJ enzymes imprecisely recon- nect these ends thus inducing insertions or deletions. During transcription of the recombined segment various segments are skipped due to alternate splicing producing a transcript with a sin- gle V single J and single C. Fur- ther processing during translation produces the complete unique κ light chain. The heavy-chain locus is encoded on chromosome 14 and has about FIGURE 6.6 Struc- ture of an Antibody Y-shaped antibodies consist of two light chains and two heavy chains. Each consists of segments: CH1 CH2 and CH3 are heavy-chain constant regions CL is the light-chain constant region VH is the heavy-chain vari- able region and VL is the light-chain variable region. Antigens bind to the variable regions. S -S S- S S -S S -S Constant region Variable region Antigen binds here Light chain Light chain CH3 Heavy chain CH2 CH1 CL VL VH VH VL CL CH1 CH2 CH3 Antigen 3–14 1–15 1–17 1–19 1–15 VΚ 40 CΚ 1 JΚ 6 3–14 1–12 1–8 1–15 Germline Rearrangement Deletion Inversion + Transcription mRNA Initial polypeptide Mature K L chain 3–14 1–12 L V J L V J C L V J C V J C C 5´ AAA 3´ 1–8 1–8 5–3 4–1 4 3 2 1 3 2 1 1 2 5 4 35 5–3 5–3 4–1 4–1 VΚ CΚ FIGURE 6.7 VDJ Recombination The κ locus found on chromosome 2 has the segments for the light chain. In the germline there are 75 V segments only around 30 are actually functional 5 J segments and 1 C or constant region. Each segment has a recombination signal sequence at the end that elicits the recombination between 1 V and 1 J segment. Every light chain has a combination of 1 V segment 1 J segment and the constant region C.

slide 192:

Immune Technology 188 39 functional V segments 27 D segments and 6 J segments. When only one of each segment is used there can be greater than 10 4 possible combinations alone. Further diversity arises from rear- rangements of the D segments by alternate splicing nucleotide insertion or deletion during recombination and the addition of nucleotides at random by termi- nal deoxytransferase TdT during recombination. The possible combinations increase to greater than 10 7 from a single heavy-chain locus when these other events are included. Breaking an antibody at the “hinge” where the heavy chains bend yields three chunks: two identical Fab fragments and one Fc fragment Fig. 6.8. Fab meaning “fragment antigen binding” consists of one light chain plus half of a heavy chain. Fc meaning “fragment crystalliz- able” contains the lower halves of both heavy chains. Other com- ponents of the immune system often recognize and bind to the Fc region of an antibody see later discussion. MONOCLONAL ANTIBODIES FOR CLINICL USE There are many clinical uses for antibodies. They are used in diagnostic procedures including the ELISA—see later discussion for pregnancy testing and to detect the presence of proteins characteristic of particular disease-causing agents. In the future they may be used to specifcally kill cancer cells or destroy viruses. Such uses need relatively large amounts of a pure antibody that specifcally recognizes a single antigen. Even if an experimental animal is inoculated with a purifed single antigen its blood serum will contain a mixture of antibodies to that antigen. Remember that a single antigen has mul- tiple epitopes and thus antibodies will vary in both specifcity and affnity. Nowadays such a mixture is referred to as polyclonal antibody because it results from antibody production by many different clones of B cells which all recognized the same antigen. Such a mixture is of little use either for a specifc accurate assay or for other techniques in biotechnology. FIGURE 6.8 Fab Fragments and Fc Fragment of an Antibody Antibodies can be split into two Fab fragments and one Fc fragment by breaking the molecule at the hinge region. S- S S - S S -S Parts of heavy chain Parts of heavy chain Complement recognition Fc receptor recognition ANTIBODY Light chain Light chain CH3 CH2 CH1 CL VL VH VH VL CL CH1 CH2 CH3 Antigen binds here CHEMICAL BREAKAGE TWO Fab FRAGMENTS ONE Fc FRAGMENT IgG antibodies have a Y-shaped structure. The hinge or bend region divides the two Fab fragments from the Fc fragment. There are two light chains and two heavy chains.

slide 193:

ChAPTER 6 189 To create large amounts scientists must isolate and grow a single line of B cells making one particular antibody in culture. Such a pure antibody made by a single line of cells is known as a monoclonal antibody. Unfortunately B cells live for only a few days and survive poorly outside the body. The solution to this problem is to use cancer cells. Myelomas are naturally occurring cancers derived from B cells they therefore express immunoglobulin genes. Like many tumor cells myeloma cells will continue to grow and divide in culture forever if given proper nutrients. To make monoclonal antibodies scientists fuse the relatively delicate B cell which is making the required antibody to a myeloma cell Fig. 6.9. To avoid confusion scientist use a myeloma that has lost the ability to make its own antibody. The resulting hybrid is called a hybridoma. In principle the fused cells can live forever in culture and will make the desired antibody. In practice an animal such as a mouse is injected with the antigen against which antibodies are needed. When antibody production has reached its peak a sample of antibody- secreting B cells is removed from the animal. These cells are fused to immortal myeloma cells to give a mixture of many different hybridoma cells. The tedious part comes next. Many individual hybridoma cell lines must be screened to fnd one that recognizes the target antigen. Once it is found the hybridoma is grown in culture to give large amounts of the monoclonal antibody. hUMANIZATION OF MONOCLONAL ANTIBODIES Monoclonal antibodies could target human cancer cells by recognizing specifc mol - ecules appearing only on the surface of cancer cells. Ironically the main problem with their use as a therapy is that the human immune system regards antibodies from mice or other animals as foreign molecules themselves and so attempts to destroy them One approach that may partly solve this problem is using genetic engineering to make humanized monoclonals Fig. 6.10A. Since the variable or V-region of the antibody rec- ognizes the antigen the constant or C-region may therefore be replaced with a humanized version. To accomplish this scientists isolate and culture the frst-generation hybridoma gen - erally using mouse B cells. Then the DNA encoding the mouse monoclonal antibody is iso- lated and cloned. The DNA for the constant region of the mouse antibody is then replaced with the corresponding human DNA sequence. The V-region is left alone. The human/mouse hybrid gene is then put back into a second mouse myeloma cell for production of antibody in culture. Although not fully human the hybrid is less mouse-like and provokes much less reaction from the human immune system. Further humanization can be accomplished by altering those parts of the V-region that are not directly involved in binding the antigen. A closer look at the V-region of each chain shows that most of the variation is restricted to three short segments that form loops on the surface of the antibody thus forming the antigen-binding site see Fig. 6.10B. These are known as hypervariable regions or as complementarity determining regions CDRs. Overall each antigen-binding site consists of six CDRs—three from the light chain and three from the heavy chain. Full humanization of an antibody involves cutting out the coding regions for these six CDRs from the original antibody and splicing them into the genes for human light and heavy chains. Monoclonal antibodies recognize only one epitope on the antigen and derive from one single B cell. Fusing antigen-stimulated B cells from a mouse spleen with a myeloma cell line produces an immortal- ized hybridoma. Each of the cells can be grown in vitro and evaluated for its affnity to the original antigen to make a monoclonal antibody.

slide 194:

Immune Technology 190 Antigen injected Spleen cells FUSION TEST FOR ANTIBODIES AGAINST TARGET ANTIGEN Cancer cells grown in vitro Myeloma cells Clone 1 Clone 2Clone 3Clone 4 FIGURE 6.9 Principle of the Hybridoma Monoclonal antibodies derive from a single antibody-producing B cell. The antigen is frst injected into a mouse to provoke an immune response. The spleen is harvested because it harbors many activated B cells. The spleen cells are short-lived in culture so they are fused to immortal myeloma cells. The hybridoma cells are cultured and isolated so each hybrid is separate from the other. Each hybrid clone can then be screened for the best antibody to the target protein. STAGE I V-REGION OF MOUSE PLUS C-REGION OF HUMAN STAGE II ONLY CDRs ARE FROM MOUSE REST IS HUMAN Mouse Human CDRS complementarity determining domains Variable region Constant region A B FIGURE 6.10 Humanization of Monoclonal Antibodies Antibodies from a mouse can be altered to become more like a human antibody. A The entire constant region of the heavy and light chain can be replaced with constant regions from a human. B Antibodies have six CDRs that determine the actual antigen-binding site. The entire antibody except the CDR region can be replaced with human sequence.

slide 195:

ChAPTER 6 191 hUMANIZED ANTIBODIES IN CLINICL APPLICTIONS There are currently many different humanized monoclonal antibodies in development to treat a variety of conditions. Many different antibodies have been approved by the Federal Drug Administration FDA for many different conditions. Table 6.2 presents a partial list of different FDA-approved antibodies. The frst humanized monoclonal antibody approved for clinical use trastuzumab Herceptin is for the treatment of breast cancer. The FDA approved this therapeutic agent in 1998. Herceptin recognizes a cell surface receptor called human epidermal growth factor receptor type 2 HER2. This receptor is part of a larger family including HER3 HER4 and the founding mem- ber the epidermal growth factor receptor EGFR. These receptors control whether a cell proliferates differentiates or undergoes programmed suicide by signaling a variety of intracellular proteins that modulate gene expression. In breast cancer patients when the HER2 receptor is overproduced the breast cancer is much more resistant to chemo- therapy. Excess receptor is thus a good indicator that the patient will not survive as long. Herceptin binds to the extracellular domain of HER2 preventing the receptor from being internalized. This prevents the cancer cell from dividing and induces the immune system to attack the cell Fig. 6.11. When Herceptin is used in combination with chemotherapy to treat breast cancer patients survive much longer. The main point to keep in mind is that Herceptin binds one specifc protein therefore the particular breast cancer must have excess amounts of HER2 in order for the treatment to be effective. Removing the constant regions of a mouse antibody and replacing them with human constant regions makes humanized antibodies. Human cells do not reject these antibodies. FDA-Approved Antibodies Product Antigen Target Trade Name Murine Monoclonals Arcitumomab Carcinoembryonic antigen Metastatic colorectal cancer detection CEA Scan Capromab pentetate Tumor surface antigen PSMA Prostate adenocarcinoma detection ProstaScint Chimeric Infiximab TNFα Crohn’s disease Remicade Antibody Fragments Nofetumomab Murine Fab Antigen associated with cancer Detection of small cell lung cancer Verluma Trastuzumab Her-2 Metastatic breast cancer Herceptin Palivizumab Respiratory syncytial virus RSV F protein Respiratory tract disease Synagis Human Phage Display/Synthetic Antibody Adalimumab TNFα Immune disorders Crohn’s disease Humira Table is a subset of information from Khan FH 2014. Antibodies and their applications. In Animal Biotechnology Oxford UK and Waltham MA USA: Academic Press p. 482. T able 6.2

slide 196:

Immune Technology 192 Another chimeric antibody approved by the FDA is Remicade infiximab which is used to treat rheumatoid arthritis RA. The antibody targets tumor necrosis factor alpha TNFα which is present in joints of people with arthritis. TNFα regulates infammation and immune system function. Antibodies to TNFα inhibit infammation in RA by blocking the release of IL-1 a pro-infammatory cytokine. The researchers frst created a hybridoma that expressed antibodies that recognized TNFα and then cloned the variable segments within the heavy- chain gene that were important for binding to the antigen. They also isolated the variable segments of the light-chain gene. These segments were then joined to the human κ light- chain gene and the heavy-chain variable region was joined to the human constant region. These fusions were then transfected into a new myeloma cell and induced to produce the chimeric antibody. After extensive research on effcacy and safety the antibody was released for treatment of infammatory diseases such as RA. The overall cost for treatment is high but the drug was one of the top ten selling drugs on the market in sales for the year 2012. ANTIBODY ENGINEERING Natural antibodies consist of an antigen-binding site called the paratope joined to an effec- tor region that is responsible for activating complement and/or binding to immune cells. From a biotechnological viewpoint the incredibly high specifcity with which antibodies bind to a target protein is useful for a variety of purposes. Consequently antibody engineer- ing uses the antigen-binding region of the antibody. These antibodies are manipulated and are attached to other molecular fragments. To separate an antigen-binding site from the rest of the antibody scientists subclone gene segments encoding portions of the variable antibody chains and express them in bacterial cells. Bacterial signal sequences are added to the amino-terminus of the partial antibody chains which results in export of the chains into the periplasmic space. Here the VH and VL domains fold up correctly and form their disulfde bonds. The antibody fragments used include Fab Fv and single-chain Fv scFv Fig. 6.12. In a Fab fragment an interchain disulfde bond holds the two chains together. However the Fv fragment lacks this region of the antibody chains and thus is less stable. This led to development of the single-chain Fv fragment in which the VH and VL domains are linked together by a short peptide chain FIGURE 6.11 Herceptin Helps Kill Cancer Cells with HER2 Herceptin is a humanized monoclonal antibody that recognizes the HER2 recep- tor on breast cancer cells. When the antibody binds to the receptor the immune system helps destroy the cancer cell and the cancer cell becomes more sensitive to chemotherapeutic treat- ments. Her2 receptor Death Stay alive CANCER CELL IMMUNE SYSTEM KILLS CELL WITH HERCEPTIN BOUND Monoclonal antibodies to HER2 inhibit breast cancer cells from growing and are used as a treatment for breast cancer patients. Chimeric antibodies to TNFα are used to treat rheumatoid arthritis and are one of the top-selling drugs.

slide 197:

ChAPTER 6 193 usually 15 to 20 amino acids long. This chain is introduced at the genetic level so that a single artifcial gene expresses the whole structure VH-linker-VL or VL-linker-VH. A tag sequence such as a His6-tag or FLAG-tag—see Chapter 9 is often added to the end to allow detection and purifcation. Such a scFv fragment is quite small about 25 kDa in molecular weight. Such scFv fragments are attached to various other molecules by genetic engineering. The role of the scFv fragment is to recognize some target molecule perhaps a protein expressed only on the surface of a virus-infected cell or a cancer cell. A variety of toxins cytokines or enzymes may be attached to the other end of the scFv fragment to provide the active portion of the fnal recombinant antibody. In principle this approach provides a way of delivering a therapeutic agent in an extremely specifc manner. At present the clinical applications of engineered antibodies are under experimental investigation. Recent work studying camel antibody structure has elucidated a new structure of an anti- body not seen in any other model organisms studied to date. Antibodies in camels and their relatives llamas and alpacas have only the heavy chain and no light chains and are called heavy-chain antibodies hcAb Fig. 6.13. The ends of the heavy chain have the binding sites for the foreign antigens or paratopes. The streamlined structure has major implications for creating antibodies for therapeutic purposes. The ability to create a small molecule from only the heavy-chain antigen-binding region offers many advantages over other antibody therapeutics. The variable domain of the single heavy- chain antibody called VHH is 12–15 kDa in size which is much smaller than even scFv and S-S S-S S-S S-S S-S Fc Antigen binding Hinge Complement activation Fab Fv IgG Fab fragment CH3 CH2 CH1 CL VL VH Macrophage binding VH VL CL CH1 S-S dsFv fragment VH Leader VL scFv fragment VH Leader VL Linker peptide FIGURE 6.12 Fab and Fv Antibody Fragments Fab fragments are produced by protease digestion of the hinge region. A disulfde bond holds the heavy and light chains together. To make an antibody fragment without any constant region the genes for the VH domain and the VL domain are expressed in bacteria from a plasmid vector. This structure is unstable because of a lack of disulfde bonds. Therefore disulfde bonds are engineered into the two halves dsFv fragment or a linker is added to hold the VH and VL domains together scFv fragment.

slide 198:

Immune Technology 194 therefore a recombinant protein containing only this domain is called a nanobody Nb Fig. 6.13. The structure makes these more amenable to protein engineering since Nbs are small work as monomers have no disulfde bonds and are very stable even maintaining their structure in high heat or denaturing conditions. They have a very high affnity for the antigen but what is most interesting is the structure. As depicted in Figure 6.4 typical antibodies recognize pro- truding regions of the antigen but the paratope of the VHH region actually is fexible. They can recognize epitopes that protrude as regular antibodies and they can recognize epitopes that are dimples or concave in shape. That means VHH domains can bind directly to enzyme active sites buried within a protein. Another key to their potential function is size the engineered form of S-S S- S S -S S -S Fc 2.5 nm Antigen binding Hinge Fab Fv CONVENTIONAL ANTIBODY VHH/ NANOBODY CH3 CH2 CH1 CL VL VH CH3 CH2 VHH HEAVY CHAIN ANTIBODY FIGURE 6.13 Heavy-Chain Antibodies and Nanobodies A conventional antibody has two heavy chains purple and two light chains orange held together with disulfde bonds. A heavy-chain anti- body derives from a single protein and therefore does not have disulfde bonds. The nanobody is the isolated variable region from the heavy-chain antibody that is very small but has high affn - ity for its target antigens.

slide 199:

ChAPTER 6 195 VHH without the constant regions can easily pass through the kidney so they are rapidly cleared from the body. They can pass through the blood–brain barrier to target regions of the brain. For these reasons one of the potential uses is for in vivo imaging or potentially as a biosensor. Nbs can also be humanized and conjugated to different small molecule therapeutics just as scFvs. As of writing this text Nbs to TNFα are in clinical trials for treatment of rheumatoid arthritis. The antigen-binding regions used in antibody engineering may be derived from pre-existing monoclonal antibodies such as the TNFα antibody that was humanized. Alternatively a library of DNA segments encoding V-regions may be obtained from a pool of B cells obtained from an animal or human blood sample. Such a library should in theory contain V-regions capable of recognizing any target molecule. Using a human source avoids the necessity for the com- plex humanization procedures described earlier. However in this case it is necessary to screen the V-region library for an antibody fragment that binds to the desired target molecule. This may be done by the phage display procedure outlined in Chapter 9. The library of V-region constructs is expressed on the surface of the phage and the target molecule is attached to some solid support and used to screen out those phages carrying the required antibody V-region. DIABODIES AND BISPECIFIC ANTIBODY CONSTRUCTS Various engineered antibody constructs are presently being investigated. A diabody con- sists of two single-chain Fv scFv fragments assembled together. Shortening the linker from 15 to 5 amino acids drives dimerization of two scFv chains. This no longer allows intra- chain assembly of the linked VH and VL regions. The dimer consists of two scFv fragments arranged in a crisscross manner Fig. 6.14. The resulting diabody has two antigen-binding sites pointing in opposite directions. If two different scFv fragments are used the result is a bispecifc diabody that will bind to two different target proteins simultaneously. Note that formation of such a bispecifc diabody requires that VH-A be linked to VL-B and VH-B to VL-A. It is of course possible to engineer both sets of VH and VL regions onto a single polypeptide chain encoded by a single recombinant gene as shown in Figure 6.14. In the same manner nanobodies can also be engineered to a bivalent or bispecifc arrangement. Doing so increases their potency to their target antigens. Bispecifc diabodies have a variety of potential uses in therapy because they may be used to bring together any two other mol- ecules for example they might be used to target toxins to cancer cells. Another way to construct an engineered bispecifc antibody is to connect the two different scFv fragments to other proteins that bind together Fig. 6.15. Two popular choices are streptavidin and leucine zippers. Streptavidin is a small biotin-binding protein from the bacterium Streptococcus. It forms tetramers so it allows up to four antibody fragments to be assembled together. Furthermore binding to a biotin column can purify the fnal constructs. Leucine zipper regions are used by many transcription factors that form dimers see Chapter 2. Often such proteins form mixed dimers when their leucine zippers recognize each other and bind together. Leucine zipper regions from two different transcription factors that associate e.g. the Fos and Jun proteins may therefore be used to assemble two different scFv fragments. Linking two scFv fragments together with either polypeptide linker regions or proteins e.g. streptavidin or leucine zipper proteins creates divalent antibodies that is each side of the antibody will recognize a different antigen. These constructs are useful to bring two different proteins in close proximity in the cell. Heavy-chain antibodies from camels can be engineered to create small nanobodies. Nanobodies and single-chain Fvs are linked to various toxins cytokines or enzymes to create recombinant antibodies. These antibodies can be used to precisely deliver the toxin cytokine or enzyme to the antigen that the scFv or nanobody recognizes in vivo.

slide 200:

Immune Technology 196 ELISA ASA Y The enzyme-linked immunosorbent assay ELISA is widely used to detect and estimate the concentration of a protein in a sample. The protein to be detected is regarded as the antigen. Therefore the frst step is to make an antibody specifc for the target protein. A detection system is then attached to the rear of the antibody. Usually this system consists of an enzyme that generates a colored product from a colorless substrate. Alkaline phosphatase which converts X-Phos to a blue dye see Chapter 8 is a common choice. The samples to be assayed are immobilized on the surface of a membrane or in the wells of a microtiter dish Fig. 6.16. The antibody plus detection system is added and allowed to bind. The membrane or microtiter FIGURE 6.14 Engineered Diabody Constructs A Engineering a diabody construct begins by geneti- cally fusing the variable domains of the heavy and light chain VH and VL with a linker. The long linker allows a single polypeptide to form into a single antibody-binding domain. The short linker allows two polypeptides to complex into a diabody with two antibody-binding domains. B Instead of identical Fv units two different Fv chains can be coexpressed in the bacterial cell. The two differ- ent Fv chains will unite into a diabody with two different antibody-binding domains a different one on each side. C Bispecifc antibodies can be made as one single tran- script with a linker between VHA and VLB a linker between the two halves and fnally a linker between VHB and VLA. DNA Bispecific diabody Promoter Linker Linker RBS VHA VLA VLB VHB RBS VH VH VL VL B DNA Signal sequence 15 aa linker scFv Bivalent diabody 5 aa linker Promoter Linker RBS VH VH VL VL VH VH VL VL A DNA C Bispecific single-chain diabody Promoter Linker Linker Linker RBS VHA VLA VLB VHB VH VH VL VL

slide 201:

ChAPTER 6 197 dish is then rinsed to remove any unbound antibody. The substrate is added and the intensity of color produced indicates the amount of target protein in the original sample. A variety of modifcations of the ELISA exist. Often binding and detection are done in two stages using two different antibodies. The frst antibody is specifc for the target protein. The second antibody recognizes the frst antibody and carries the detection system. For example antibodies could be raised in rabbits to a series of target proteins. The second antibody which recognizes rabbit antibodies could be produced in sheep. These are called secondary antibodies and are often described as for example sheep anti-rabbit. The secondary antibody has the detection system and because it will recognize any antibody made in a rabbit it does not have to be re-engineered for each different target protein. This allows the use of the same fnal antibody detection system in each assay even if different primary antibodies are used to identify different proteins. ThE ELISA A A DIAGNOSTIC TOOL The ELISA is used in many different felds. Diagnostic kits that rely on the ELISA are produced for clinical diagnosis of human disease dairy and poultry diseases and even for plant diseases. The diagnostic kits are so simple that most require no laboratory equipment and using them takes as little as 5 minutes. ELISA kits can be used to detect a particular plant disease by crushing a leaf and smearing the leaf tissue on the antibody. When the disease-specifc antigen reacts with the antibody the antibody spot turns blue. In clinical applications ELISA kits can detect the presence of minute amounts of pathogenic viruses or bacteria even before the pathogen has a chance to cause major damage. Clinical ELISA kits detect various disease markers. In certain diseases characteristic FIGURE 6.15 Engineered Bispecifc Antibody Constructs Instead of genetic linkers to hold diabodies various proteins can also hold scFv fragments together. Proteins with a leucine zipper domain dimerize therefore when scFv genes are genetically fused to these proteins the scFv domains come together as dimers. Proteins such as streptavidin or those with four helix bundle domains can be genetically fused to scFv domains. When expressed there are four scFv domains on the outside providing four different antibody-binding sites. Leucine zipper stabilized scFv dimers 4 helix-bundle stabilized scFv tetramers Streptavidin- scFv Leucine-zipper Streptavidin Antibodies are used in ELISA assays to determine the relative concentration of the target protein or antigen in a sample. Primary antibodies recognize the target protein or antigen. Secondary antibodies recognize the primary antibody and often carry a detection system. Secondary antibodies are made to recognize any antibody that is made in sheep cow rabbit goat or mouse.

slide 202:

Immune Technology 198 proteins mark the start of disease progres- sion long before the patient exhibits any symptoms. Detecting such markers can help diagnose and treat a problem before the disease causes serious damage. ELISA diagnostic testing is even available for you to try at home. Home pregnancy kits are a simple over-the-counter ELISA assay for human chorionic gonadotropin hGC. This protein is produced by the placenta and secreted into the bloodstream and urine of pregnant women. The actual pregnancy test has four important features Fig. 6.17. First the entire test is on a piece of paper that wicks the urine from one end to the other. This paper has three regions: frst a region where anti-hCG antibody is loosely attached to the paper strip second a region called the pregnancy window and fnally a control window. As the urine wicks up the paper strip any hCG present is bound by the anti- hCG antibody. If the woman is pregnant the anti-hCG/hCG complex moves up the paper strip. If the woman is not pregnant the anti-hCG antibody moves up the paper strip alone. Even if the woman is pregnant there is excess anti-hCG and so unbound anti-hCG antibody is always found. If the woman is pregnant the anti-hCG/hCG com- plex reaches the pregnancy window where it binds to secondary antibody 1. This is attached to the paper in the shape of a plus sign and cannot move. The secondary anti- body has a color detection system attached to it. When the anti-hCG/hCG complex binds to the secondary antibody it triggers color release and a plus sign forms. The control window contains secondary anti- body 2. This control window recognizes only anti-hCG antibody that is not bound to hCG so its color is activated whether or not the woman is pregnant. VISUALIZING CELL COMPONENTS USING ANTIBODIES Antibodies can be used to visualize the location of specifc proteins within the cell. Immunocytochemistry refers to the visualization of specifc antigens in cultured cells whereas immunohistochemistry refers to their visualization in prepared tissue sections. In either technique the frst step is to prepare the cells. They must be treated to maintain FIGURE 6.16 Principle of the ELISA ELISA detects and quantifes the amount of a particular protein bound to the well of a microtiter dish. Anti-A anti- body is linked to an enzyme such as alkaline phosphatase. The antibody recognizes only the circular protein and not the triangular protein A. After the antibody binds to its target the unbound antibody is washed from the dish B. A colorimetric substrate of alkaline phosphatase is added to each well C and wherever there is antibody the substrate is cleaved to form its colorful product D. The amount of color is proportional to the amount of protein. ADD ANTI-A ANTIBODY COVALENTLY LINKED TO ENZYME A B C D E WASH AWAY UNBOUND ANTIBODY ADD COLORLESS SUBSTRATE FOR ENZYME ENZYME MAKES COLORED PRODUCT MEASURE ABSORBANCE OF LIGHT BY COLORED PRODUCT Sample ASample B ELISA is a powerful diagnostic tool because antibodies can be made to almost any protein. For pregnancy tests any hCG in the urine binds to antibodies to hCG which in turn bind to immobilized secondary antibody to form the plus sign.

slide 203:

ChAPTER 6 199 their cellular architecture so that the cells appear much as they would if still alive. Usually the cells are treated with cross-linking agents such as formaldehyde or with denaturants like acetone or methanol. In immunohistochemistry tissue samples can be frozen and then sliced into small thin sections about 4 mm providing a two-dimensional view of the tissue. Another option is to embed the tissue sample in paraffn wax. Here the cells are frst dehydrated in a series of alcohol solutions and then treated with the wax. The tissue is then sectioned into thin two-dimensional slices as for frozen tissues. Once a single thin layer of prepared cells or tissue sections is readied preserved cells are then permeabilized to make the antigen more accessible to the antibody. If in wax the tissue sections are dewaxed and rehydrated. Fixed tissue sections can be irradiated with microwaves which break the cross-links induced by the fxative or the samples can be heated under pressure. Both methods allow the primary antibody to fnd its antigen within the sample. A secondary antibody contains the detection system to visualize the location of the anti- gen. The secondary antibody binds to the primary antibody/antigen complex and then the appropriate reagents are added to visualize the location of the complex. In some cases a single antibody with an attached detection system is used. Antibody detection systems include enzymes or fuorescent labels. A common enzyme-medi - ated detection system is alkaline phosphatase as with the ELISA see Fig. 6.16. Fluorescently labeled antibodies must be excited with UV light upon which the fuorescent label emits light at a longer wavelength. Samples are directly visualized with a microscope attached to a UV light source Fig. 6.18. Fluorescent antibodies tend to bleach out when exposed to excess UV therefore the microscope is attached to a camera to record the data as a digital image. FIGURE 6.17 Home Pregnancy Tests Are an ELISA Diagnostic Tool The pregnancy test shown here has four important areas along the paper wick. The urine or blood is applied on the far left and wicks to the right. The anti-hCG antibodies loosely attached to the paper are next. If the urine has hGC this binds to its antibody and travels along the paper as a complex. In the next area a secondary antibody that recognizes only the hCG–primary antibody complex is frmly attached in a plus pattern. When the hCG complex binds to the secondary antibody the detection system turns blue. The fnal spot is a different secondary antibody that recognizes the primary antibody without any hCG. This is a positive control to ensure that the antibody was released and wicked up the paper with the urine. Urine or blood applied here Anti-hCG Paper Paper Colorless Turns blue when hCG/anti-hCG attaches hCG bound to some anti-hCG antibodies Turns blue if anti-hCG is wicked to this end of paper Liquid wicks from left PAPER WICK FROM HOME PREGNANCY TEST Secondary antibody 1 Secondary antibody 2 Immunocytochemistry and immunohistochemistry use a primary antibody to a specifc cellular target protein to visualize its location within the cell. The primary antibody is visualized by adding a secondary antibody with a detection system.

slide 204:

Immune Technology 200 Ileum K8/K18 + K20 Ileum K8/K18 + K8 pS79 Liver K8/K18 + K8 pS79 Liver K8/K18 + M30 Pancreas K20 Pancreas K20 AB CD EF FIGURE 6.18 Fluorescent Antibody Staining Keratin fbers provide structural integrity to all types of cells and also comprise skin hair and nails. The different keratin forms that are created by post-translational modifcation can be visualized by using isoform-specifc antibodies that recognize the differ - ent keratins. A Antibodies to keratin K8/K18 isoform were added to a section of mouse intestine ileum. The secondary antibody for K8/K18 was labeled with a red fuorescent tag. In addition antibodies from keratin K20 were added to the ileum section and the secondary to this antibody was labeled with a green fuorescent tag. The areas that had both types of keratins fuoresced yellow. DNA was labeled with propidium iodide which fuoresced blue. B Mouse ileum cells were labeled with two antibodies: one specifc to keratin K8/K18 isoforms and one specifc to the phosphorylated form of K8. The location where both proteins were found is yellow. DNA fuoresces blue. C and D Human cirrhotic liver cells were stained with K8/K18 red and antibodies to the phosphorylated form of K8 green. Arrowheads highlight the phosphorylated isoform. E and F Mouse pancreas cell sections were incubated with antibodies to keratin K20 green. The DNA in the nuclei is labeled blue. From Ku NO et al. 2004. Studying simple epithelial keratins in cells and tissues. Methods in Cell Biol 78 489–517. Used with permission. FLUORESCENCE-ACTIV ATED CELL SORTING As explained in the preceding section fuorescent antibodies are used to fnd the location of intracellular proteins. Fluorescent antibodies are also able to bind to surface antigens. Many cells of the immune system have specifc antigens on their surface that distinguish them from others. Each immune cell can have over 100000 antigen molecules on their surface. These surface antigens characterize the different types of immune cells and are systematically named by assigning them a cluster of differentiation CD antigen number. The antigens

slide 205:

ChAPTER 6 201 were mostly identifed before their physiological function was known. For example CD4 antigens are associated with T-helper cells and CD8 with killer T cells. Monoclonal antibod- ies are available to label many CD antigens especially the most common. Fluorescence-activated cell sorting FACS involves the mechanical separation of a mixture of cells into different tubes based on their surface antigens Fig. 6.19. Because the antibody attaches to the outside of the cell the cell does not have to be prepared as described previously. In this example helper T cells and killer T cells can be separated from other white blood cells based on the presence of CD4 or CD8 surface antigens. First the cell suspension is labeled with monoclo- nal antibodies to the surface antigens of interest. In this example antibodies to CD4 and CD8 are used. Both antibodies have fuorescent labels that are different for the two antibodies. The labeled cell suspension is loaded into a charging electrode. Drops of liquid containing only one cell are released to the bottom and the fuorescence detector notes whether or not the drop of liquid is labeled for CD4 or CD8 by the color of its fuorescence. If the drop has an antigen an electrical charger pulls or pushes the droplet to the right or left separating the two antigens into separate tubes. If the drop has no antigen in it it gets no electrical charge and goes into a third tube. Usu- ally two different antibodies are used but some of the newer FACS machines can sort up to 12 different fuorescently labeled antibodies and can sort up to 300000 cells per minute. Flow cytometry is a related technique to analyze fuorescently labeled cells. As with FACS cells are labeled with monoclonal antibodies to cell-surface antigens. The anti- bodies are conjugated to a variety of different fuorescent labels and each antibody is detected based on its fuorescence. The cells are loaded into a charging electrode and released in small droplets. During fow cytometry the cells are not sorted and saved FIGURE 6.19 FACS Separates CD4 + and CD8 + Cells FACS machines can separate fuorescently labeled cells into different compartments. A mixture of CD4 + CD8 + and unlabeled cells is separated based on their fuorescence. When the fuorescence detector notes green the charged metal plates pull that drop to the left or minus plate allowing those cells to collect into the left tube. If no fuorescence is detected the drop stays neutral and is collected in the middle tube. If the drop fuoresces red the charged plates pull the drop to the plus side and it collects in the right tube. Fluorescence detector CD4 + CD4 + CD8 + CD8 + Laser Photodetector Charged metal plates Unlabeled Computer Cell suspension Charging electrode

slide 206:

Immune Technology 202 instead the sample of cells is measured and discarded. As the cells pass the detector the computer records the fuorescence and plots the number of cells with each of the fuorescent labels. These cells are plotted with a small dot representing each of the cells Fig. 6.20. IMMUNE MEMORY AND V ACCINATION Individuals who survive an infection normally become immune to that particular disease although not to other diseases. The reason is that the immune system “remem- bers” foreign antigens a process called immune memory. Next time the same antigen appears it triggers a far swifter and more aggressive response than before. Consequently the invading microorganisms will usually be overwhelmed before they cause noticeable illness. Immune memory is due to specialized B cells called memory cells. As discussed earlier virgin B cells are triggered to divide if they encounter an antigen that matches their own indi- vidual antibody. Most of the new B cells are specialized for antibody synthesis and they live only a few days. However a few active B cells become memory cells and instead of making antibodies they simply wait. If one day the antigen that they recognize appears again most of the memory cells switch over very rapidly to antibody production. Vaccination takes advantage of immune memory. Vaccines consist of various derivatives of infectious agents that no longer cause disease but are still antigenic that is they induce an immune response. For example bacteria killed by heat are sometimes used. The antigens on the dead bacteria stimulate B-cell division. Some of the B cells form memory cells so later when living germs corresponding to the vaccine attack the vaccinated person the immune system is prepared. The makers of vaccines are constantly trying to fnd different ways to stimulate the immune system without causing disease. CREATING A V ACCINE Because vaccines are such a huge part of the biotechnology industry and such an important part of our health-care system much research and money are invested in fnding new and improved vaccines. Many vaccines are administered to young babies thus ensuring the safety and effective- ness of vaccines is critical see Box 6.1. Many different methods of developing a vaccine exist. FACS and fow cytometry use monoclonal antibodies to surface antigens. The FACS machine can sort the cells into individual samples and fow cytometry simply records the fuorescent label and plots the data on a graph. FIGURE 6.20 Example of Flow Cytometry Data Peripheral blood mono- nuclear cells PBMCs were infected with dengue virus without left and with an antibody that stimulates viral infection right. The cells were then fxed per - meabilized and labeled with anti-E protein or anti-NS1 antibodies two viral proteins. The cells that express both viral proteins represent infected PBMCs which are increased in number in the presence of the stimulatory antibody. From Fu Y et al. 2014. Development of a FACS-based assay for evaluating antiviral potency of compound in dengue infected peripheral blood mononuclear cells. J Viro- logical Methods 196 18–24. Reprinted with permission. 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 PBMCs without virus 0.01 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 PBMCs with virus MOL 1 0.15 NS1 E-protein

slide 207:

ChAPTER 6 203 Most vaccines are simply the disease agent killed with high heat or denatured chemically. Heat or chemical treatment inactivates the virus or bacterium so it cannot cause disease. Yet enough of the original structure exists to stimulate immunity. When the live agent infects the vaccinated person memory B cells are activated and the disease is suppressed. Such whole vaccines elicit the best immune response but many diseases cannot be isolated or cultured to In the United States infants receive vaccines for many differ- ent illnesses including diphtheria tetanus pertussis whooping cough measles mumps rubella chickenpox polio and hepa- titis A and B. All these vaccines are given to children before they enter school. The list is long but many of the vaccines are com- bined into one shot. Paradoxically the effectiveness of vaccines has made many question their use. Many argue that vaccines are not needed because so few people actually get these dis- eases. It is easy to forget that the reason why very few people get diphtheria or measles is that so many are vaccinated. In 1980 about 4 million people contracted measles but only about 10 of the world population had received measles vaccine. In 2012 about 122000 cases of measles were recorded in the world but about 84 of the world population had received the vaccine for measles. The percentage of children receiving the vaccine has increased from 72 in 2000 to 84 which has resulted in a 78 decrease in the number of children diagnosed with measles worldwide. In the United States the number of cases of measles varies from year to year with only 37 in 2004 and 220 in 2011. Overall these numbers are low so if you are not vaccinated the likelihood of contracting measles is very slim. However the more people who opt not to vaccinate their children the more cases of the disease there will be and those who remain unvaccinated will gradually be at increased risk. In 2014 a record number of mea- sles cases were being reported in the United States with a total of 644 cases from 27 states reported to the CDC. In early 2015 a multi-state outbreak for measles occurred and was linked to an amusement park in California. The outbreaks are probably due to an unvaccinated child from another country and demonstrate that the greater number of people who do not receive a vaccine the more likely diseases can spread. These types of outbreaks raise awareness that a vaccine is not just to protect the indi- vidual but it is to protect the community also. Especially those persons in a community that cannot receive vaccines because of other health issues or allergies to vaccine components. If these individuals are surrounded by vaccinated people they are less likely to contract the disease. Other vaccines have been eliminated from the childhood immu- nization schedule because the diseases have been eradicated. For example so many people across the world were vaccinated against smallpox that the disease was not seen at all for years. Now smallpox vaccine is no longer given to the entire population. The only smallpox that exists is kept in two different labs one in the United States and one in Russia. The fear of smallpox re-emerging as a disease is always present but massive immunizations are not needed when there are no people with smallpox as of now. Other vaccines have the opposite issue: Even with widespread vaccination for pertussis the number of cases of whooping cough is on the rise. In 2012 there were 48277 cases and 20 deaths from infection whereas in 2002 the Centers for Disease Control reported 9771 cases in the entire United States. Unfortunately the deaths were mainly in infants who were too young to be immunized for pertussis. To prevent babies from contracting pertussis doctors now urge pregnant mothers to get re-immunized for the disease. In addition another booster shot is now recommended for teens and for any adults who did not receive a booster shot of pertussis vac- cine in their teens. Many different theories exist that try to explain the increase in whooping cough. Some attribute the use of a more sensitive test to diagnose whooping cough and others suggest this may be a natural cycle of B. pertussis pathogenicity. Others attri- bute the increase to waning immunity. Once a child receives the last booster shot at age 5 the immunity to whooping cough wanes after about 10 years. Vaccines cause some adverse side effects. In most cases vac- cines cause a local reaction pain and swelling at the injection site. Other possible side effects are systemic perhaps a fever or a mild form of the disease as is the case with the fu shot. Some vaccines can cause allergic reactions because of impurities in the vaccines. Some vaccines are made in eggs and traces of egg proteins may remain in the vaccines. Often people with allergies to eggs still toler- ate the vaccine but some may have an allergic reaction. Another potential allergenic component is gelatin. Of course anyone aller- gic to a vaccine component cannot be vaccinated and therefore rely upon those people that surround them to be protected from the disease. Other safety concerns about vaccines are based on the preser- vatives. Until 1999 the most common preservative was thimerosal a mercury-containing compound. Thimerosal can cause allergic reactions in some children and has also been thought to cause autism. Unfortunately the timing for diagnosis of autism and receiv- ing the vaccines coincide and therefore many people believe that the shot was the cause for the onset of autism. Although the timing is coincident there are no studies that show that the vaccine causes autism. The number of children that develop autism is identical for the ones that are vaccinated in comparison to those that are not vaccinated. The true cause of autism is an extremely active area of research and hopefully some answers will identify the true cause for this devastating diagnosis. Box 6.1 Vaccine Safety

slide 208:

Immune Technology 204 FIGURE 6.21 Whole Vaccines Include Killed or Attenuated Pathogens A High heat or chemical treatment kills pathogens but leaves enough antigens intact to elicit an immune response. Once exposed to a dead virus or bacte- rium memory B cells are established and prevent the live pathogen from making the person sick. B Attenuated viruses or bacteria have been mutated or genetically engineered to remove the genes that cause illness. The immune system generates antibodies to kill the attenuated pathogen and establishes memory B cells that prevent future attack. Virus HEAT DENATURED VIRUS Bacterium ATTENUATED BACTERIA VIRULENT— CAUSES DISEASE AND IMMUNE REACTION ELICITS IMMUNE REACTION WITHOUT DISEASE Virulence proteins Other virulence gene mutated Virulent gene is deleted Chromosome X X A B Infection with cowpox produces only mild disease but gives immu- nity to the frequently fatal smallpox. In medieval times a substantial proportion of the population caught smallpox. About 20 to 30 of those infected died and the survivors ended up with ugly pock- marks on their faces—hence the name smallpox. Milkmaids rarely suffered from smallpox because most had already caught cowpox from their cows. Consequently milkmaids were seldom pockmarked and gained a reputation for beauty due to their unblemished skin. This observation led to Edward Jenner’s classic experiments in which he inoculated children with cowpox and demonstrated that inocula- tion protected against infection with smallpox. The term vaccination is derived from vacca the Latin for “cow.” Box 6.2 Cowpox and Smallpox make whole vaccines. Other times the cost of culturing the pathogen is prohibitive. Moreover growing live viruses is a dangerous job with potential exposure of lab workers. With these limitations in mind many different strategies have been developed to make improved vaccines. Attenuated vaccines are still-living pathogens that no longer express the toxin or proteins that cause the disease symptoms Fig. 6.2 1. Sometimes viruses or bacteria are genetically engineered to remove the genes that cause disease. Other attenuated vaccines are related but nonpathogenic strains of the infectious agent see Box 6.2. Making attenuated virus does not pose the same risks as for live virus. However much research is needed to identify those genes that cause disease. Another disadvantage is that an attenuated virus might revert to the pathogenic version espe- cially if the attenuated virus has only one of the disease-causing antigens destroyed or mutated. Subunit vaccines are effective against one component or protein of the disease agent rather than the whole disease Fig. 6.22. Subunit vaccines are available only because of recombinant DNA technology. The frst step in creating a subunit vaccine is identifying a potential protein

slide 209:

ChAPTER 6 205 Killed pathogens attenuated pathogens single proteins or epitopes from a disease-causing pathogen are used as vaccines. They are isolated and injected into people to elicit their immune response without causing the disease. Multivalent vaccines contain antigens to different proteins from a pathogen or family of pathogens. or part of a protein that elicits a good immune response. Most subunit vaccines are made from proteins found on the outer surface of the virus or bacterium because they elicit the strongest immune response. Experiments must be done to evaluate the protein chosen for the subunit vaccine. Once a suitable protein is identifed its gene is isolated and then expressed in cultured mammalian cells eggs or some other easily main- tained system. The target protein is isolated from other proteins and used to immunize mice to test its effectiveness. After extensive testing in animals the purifed protein can be used as a vaccine. Sometimes subunit vaccines fail perhaps because the protein does not form the correct structure when expressed in mammalian cells or eggs. In these cases peptide vaccines are cre- ated. These vaccines use just a small region of the protein. Since such peptides are small they are conjugated to a carrier or adjuvant to stimulate a stronger immune response Fig. 6.23. Other vaccines target multiple proteins from a virus or multiple related viruses in one dose to decrease the number of immunizations admin- istered. These multivalent vaccines are common and include the fu vaccine and MMR vaccine measles mumps and rubella. These vaccines have antigens to a number of different related viruses. In the case of the fu vaccine heptavalent forms include antigens to the seven most com- monly found strains of infuenza circulating in the population. Injection of the different antigens elicits an immune response to each of the differ- ent types. Unfortunately infuenza viruses evolve and change rapidly so although the vaccine will protect the person from the known seven strains a newly developed infuenza type could still cause an infection in an immunized patient. Virus Clone gene for surface antigen Isolate secreted viral protein PURIFIED PROTEIN USED AS A VACCINE Nucleus Viral gene CHO cells FIGURE 6.22 Sub- unit Vaccines Rely on a Single Antigen A single antigenic protein from a pathogen is isolated and its gene is cloned into an expression vector. The gene is expressed in cultured mammalian cells such as Chinese hamster ovary CHO cells isolated purifed and used as a vaccine. MAKING VECTOR V ACCINES USING hOMOLOGOUS RECOMBINATION Another method of displaying a foreign antigen for use as a vaccine is the vector vaccine. Here genetic engineering creates a nonpathogenic virus or bacterium that expressed an antigen from the disease-causing virus. When this virus or bacterium infects a person it induces immunity both to the nonpathogenic microorganism and to the attached antigen.

slide 210:

Immune Technology 206 For example vaccinia virus is a nonpathogenic relative of the smallpox virus. Using vaccinia virus is so effective that smallpox was eradicated. If vaccinia virus expresses an antigen from another deadly virus the person vaccinated would gain immunity to smallpox and the other virus at the same time. Indeed multiple genes could be inserted conferring resistance to multiple diseases. The beneft of using vaccinia virus is that it is very potent and stimulates development of both B cells and T cells. Inserting genes into the vaccinia genome is awk- ward because the genome has very few restric- tion enzyme sites but genes can be added using homologous recombination Fig. 6.24. In homologous recombination two segments of similar or homologous DNA sequences align and one strand of each DNA helix is broken and exchanged to form a crossover. A single crossover creates a hybrid molecule if two crossovers occur close together entire regions of DNA are exchanged. During homologous recombination in vaccinia a region of single-stranded DNA is generated from a double-stranded break in the incoming new gene. The single-stranded region invades the double helix of the vaccinia genome to form a triple helix. One of the strands from vaccinia then is free to hybrid- ize to the single-stranded homologous region on the incoming gene. If this occurs on both sides the foreign gene is inserted into the vaccinia genome. There are many examples of vaccines that use vac- cinia virus as a way to stimulate an immune reac- tion. For example a pentavalent vaccinia virus that expresses fve different antigens to protect patients from H5N1 infuenza virus is under development. The antigens include H5 hemagglutinin N1 neur- aminidase protein a nucleoprotein NP and two matrix proteins M1 and M2. In addition the adjuvant contains IL-15 a cytokine produced by the immune system which functions to stimulate natural killer cells and the innate immune response. Mice that were injected with the IL-15 mixed with adjuvant had a higher serum concentration of anti- bodies to the fve infuenza antigens. These antibod - ies were produced faster and in greater numbers than the control mice that only received the vac- cinia vector vaccine without IL-15 suggesting this strategy could provide quicker and stronger immu- nity to the fu Fig. 6.25. Changing a harmless virus or bacteria so that it expresses a protein from a disease-causing pathogen on its surface can trick the immune system into making antibodies to the disease-causing pathogen. FIGURE 6.23 Peptide Vaccines Are Conjugated to Carrier Pr oteins Peptide vaccines are small regions of an antigenic protein from a pathogen. The peptide is often an epitope that elicits a strong immune response. Because the peptide is small multiple peptides are conjugated to a carrier protein to prevent degradation and to stimulate the immune system. Antigenic epitope Linker VIRAL PROTEIN ANTIGEN CLONE AND EXPRESS ANTIGENIC EPITOPE PURIFY PEPTIDE LINK PEPTIDES TO CARRIER Carrier protein

slide 211:

ChAPTER 6 207 REVERSE V ACCINOLOGY Many genomes from infectious agents have now been sequenced. Reverse vaccinology takes advantage of this information to fnd new antigens for use in immunization Fig. 6.26. The primary research begins with cloning each of the genes from the infectious organism into an expression library. Each of the proteins in the library is expressed and isolated. Complex mixtures of these different proteins are screened in mice for immune response and when a pool induces a response the proteins are subdivided until each protein is tested for stimulating the immune sys- tem and for its ability to protect the mice from the actual infectious agent. The proteins that elicit the best response can either be combined into a subunit vaccine or used as separate vaccines. Reverse vaccinology has been used to create a vaccine for Neisseria meningitidis serogroup B which is a major cause of meningitis in children. Attenuated bacteria were not effective as vaccines and until the sequencing of the N. meningitidis genome no vaccine was available. A library of 350 different N. meningitidis proteins was expressed in E. coli and purifed. Each was individually assessed computationally for its ability to induce an immune response. Surface proteins were then screened to see if they elicited an immune response. Of the 350 tested proteins only 29 became potential candidates. There are three most promising isolates: Factor H-binding protein fHbp Neisserial heparin binding antigen NHBA and Neisserial adhesion A NadA. As of writing this chapter a fusion protein of NHBA fHbp and NadA has been evaluated and is so far successful but it still is awaiting full approval. Without the ability to sequence genomes vaccine development was often impossible but now new and emerging diseases can be studied to fnd potential vaccines. Plasmid Vaccinia virus promoter Cloned antigen gene Vaccinia virus thymidine kinase DNA Vaccinia virus thymidine kinase DNA Plasmid Vaccinia virus promoter Thymidine kinase gene Vaccinia virus DNA Recombinant vaccinia virus FIGURE 6.24 Homologous Recombination Adds New Genes to the Vaccinia Genome The plasmid contains two regions homologous to the virus thymidine kinase gene on each side of the cloned antigen gene. When the plasmid aligns with the vaccinia genome the regions of homology elicit a recombination event. The recombinant vaccinia will acquire the cloned antigen gene and lose the gene for thymidine kinase. Reverse vaccinology uses the expressed genomic sequences to fnd new potential vaccines. Normal vaccines are created using the pathogenic organism. The term reverse refers to the use of expressed DNA over the purifed proteins from the organism itself.

slide 212:

Immune Technology 208 FIGURE 6.25 Amount of Antibodies to H5 Hemagglutinin after Vaccination with Flu Vaccine with or without 1L-15 The amount of antibodies to H5 hemagglutinin was mea- sured by ELISA and recorded as OD at 450 nm. The serum from mice before vaccination pre-bleed black line was compared to mice vaccinated with commercially available H5N1 vaccine Aventis H5N green line and mice vaccinated with the pentava- lent vaccine to H5N1 Wyeth/ IL-15/5Flu red line and Wyeth/mut IL-15/5Flu blue line. The addition of IL-15 red stimulated a greater amount of antibodies to H5 hemagglutinin in comparison to mice stimulated with a mutated version of IL-15 blue that does not stimulate the immune system. Serum titers were compared on four different days after vaccination: 6 9 12 and 28. The amount of H5 hemagglutinin antibodies appeared at day 9 to 12 after vaccination whereas the commercially available vaccine took 28 days for the same response. From Poon LL et al. 2009. Vaccinia virus-based multi- valent H5N1 avian infuenza vaccines adjuvanted with IL-15 confer sterile cross- clade protection in mice. J Immunol 182 3063–3071. Reprinted with permission. Day 9 Day 6 0.8 0.1 0.0 0.2 0.3 0.4 0.5 0.6 0.7 500 1000 Serum Dilution 2000 4000 OD at 450 nm 0.8 0.1 0.0 0.2 0.3 0.4 0.5 0.6 0.7 500 1000 Serum Dilution 2000 4000 OD at 450 nm Day 12 0.8 0.1 0.0 0.2 0.3 0.4 0.5 0.6 0.7 500 1000 Serum Dilution 2000 4000 OD at 450 nm Day 28 Wyeth/IL-15/5Flu Wyeth/mut IL-15/5Flu Aventis HSN1 Prebleed 0.8 0.1 0.0 0.2 0.3 0.4 0.5 0.6 0.7 500 1000 Serum Dilution 2000 4000 OD at 450 nm IDENTIFYING NEW ANTIGENS FOR V ACCINES Another approach to creating vaccines is to identify bacterial pathogen genes that are expressed when the pathogen enters the host. These genes usually encode proteins that are different from surface antigens. They encompass a variety of adaptations the pathogen makes in order to live within the host organism. Typically bacteria that enter animals are engulfed by phagocytes and digested by the enzymes within the lysosome. Some pathogens are engulfed as usual but avoid digestion. Modifcations required to live intracellularly include changes in nutrition and metabo - lism and mechanisms to protect against host attacks. Many different types of genes are needed for this switch and the products of these genes are potential antigens for vaccine development. Traditionally identifying genes that are expressed only in host cells relies on gene fusions. Suspected genes or their promoters are genetically fused to a reporter such as β-galactosidase luciferase or green fuorescent protein GFP or to epitope tags such as FLAG or myc see Chapter 9. The fusion gene is introduced into the pathogenic organism which is then allowed to infect the host. The amount of reporter gene expression correlates with the expression level of that particular genomic region whether it is a promoter or actual gene with its promoter. For example if a gene linked to GFP increases in fuorescence after host cell invasion then the target gene is a potential vaccine candidate because it may be important for bacterial pathogenesis. Individual gene fusions are fne for suspected genes but screening for novel genes with this method would be tedious. Instead differential fuorescence induction DFI uses a combination of GFP fusions and FACS sorting see earlier discussion to identify novel genes involved with host invasion Fig. 6.27. First a library of genes or genomic fragments from the pathogenic organism is genetically linked to GFP. The library is transformed into bacterial cells where the gene fusions are expressed to give GFP. The bacteria are then given a specifc stimulus related to host invasion. For example when phagocytes engulf them bac - teria leave a neutral environment pH 7 and enter a compartment that is acidic pH 4. To

slide 213:

ChAPTER 6 209 determine if pH change induces gene expres- sion the bacteria with the fusion library are shifted to an acidic environment. They are then sorted using FACS to collect those with high GFP expression. If the novel gene fused to GFP is truly induced by low pH its GFP levels should drop when it is shifted back to neutral pH. Therefore the cells with high GFP expression are shifted to pH 7 and resorted but this time bacteria with low levels of GFP are collected. The smaller pool of bacteria are again stimulated with low pH and sorted collecting those with high GFP expression. This sorting scheme eliminates genes that are constitutively expressed plus those that are not induced by low pH. The remaining genes are acid-induced genes that adapt the organism to living within the host. They may then be evaluated as antigens for vaccine development. Another method to identify new antigens for vaccine development is in vivo induced antigen technology IVIAT Fig. 6.28. This method takes serum from patients who have been infected with a particular disease to which a vaccine is needed. The serum is a rich source of antibodies against the chosen disease agent. The serum is then mixed with a sample of the disease-causing microorganism. Doing so removes antibodies that bind to cell-surface proteins expressed by the microorganism. This process leaves a pool of antibodies against proteins that are expressed only during infec- tion. To identify the proteins corresponding to these antibodies scientists construct a genomic expression library containing all the genes from the microorganism. The library is expressed in E. coli and is probed by the remaining antibod- ies. When an antibody matches a library clone the gene insert is sequenced to identify the protein antigen. This method directly identi- fes protein antigens that stimulated antibody production during a genuine infection therefore antigens identifed by this method are likely vaccine candidates. FIGURE 6.26 Reverse Vaccinology Reverse vaccinology uses the genes identifed in the genome of pathogenic agents. First the genes are cloned into expression vectors and expressed to give proteins. Each potential antigen is screened for an immune response. EXPRESSION LIBRARY OF GENES FROM INFECTIOUS ORGANISM CHECK EACH PROTEIN FOR IMMUNE RESPONSE IN MOUSE ISOLATE PROTEINS Pathogens must change their metabolism when changing from a free-living organism to the environment within their host. The proteins that help the pathogen adapt to this switch are potential proteins to which a vaccine could be made. DFI and IVIAT are two techniques to identify proteins that allow pathogens to live within an organism. DFI fuses the potential proteins to fuorescent tags and selects the clones that are only expressed inside the organism. IVIAT uses serum from patients who have been infected with the pathogen to fnd the anti - bodies that bind intracellular pathogenic proteins.

slide 214:

Immune Technology 210 FIGURE 6.27 Differ- ential Fluorescence Induction DFI First genes from the pathogen of interest are cloned in frame with the GFP gene. The fusion proteins are then expressed in bacteria. The entire recombinant bacterial population is exposed to low pH. The bacteria express- ing GFP are isolated. These clones either express the GFP protein constitutively or were induced by the low pH. To isolate the clones that are expressed only at low pH scientists shift the green cells to neutral pH and this time they keep only the col- orless cells. Repeating this procedure will ensure a pure set of genes that are induced only under low pH. Bacteria Plasmid Chromosome Random DNA segments from infectious organism Gene for green fluorescent protein GFP Potential acid-induced genes and potential antigens for a vaccine SHIFT TO LOW pH AND SAVE ALL EXPRESSING GFP SHIFT TO NEUTRAL pH AND SAVE ALL NOT EXPRESSING GFP DNA V ACCINES BYPAS ThE NEED TO PURIFY ANTIGENS The principle of the DNA vaccine is to administer just DNA that encodes appropriate antigens instead of providing whole microorganisms or even purifed proteins. Naked DNA vaccines consist of plasmids carrying the gene for the antigen under control of a strong promoter. The intermediate early promoter from cytomegalovirus is often used because of its strong expression. The DNA is then injected directly into muscle tissue. The foreign genes are expressed for a few weeks and the encoded protein is made in amounts suffcient to trigger an immune response. The immune response is localized to the chosen muscle which helps avoid side effects. In addition purifed DNA is much cheaper to prepare than purifed protein and can be stored dry at room temperature avoiding the need for refrigeration. The best method of delivering DNA is attaching it to a microparticle with a cationic surface Fig. 6.29 because the surface binds to the nega- tively charged phosphate backbone. After the DNA-coated microparticle enters the cells the DNA is slowly released from the bead and is then converted into protein. The slow release of DNA elicits a better immune response than a large direct dose of DNA. The immune system has to create more and more antibodies to the proteins.

slide 215:

ChAPTER 6 211 Infectious organism Serum from infected patient with antibodies EXPRESSION LIBRARY FROM INFECTIOUS ORGANISM REMOVE ANTIBODIES TO SURFACE PROTEINS OF INFECTIOUS ORGANISM ISOLATE PLASMID AND SEQUENCE INSERT E. COLI EXPRESSING EACH GENE FIGURE 6.28 In Vivo Induced Antigen Technology IVIAT Finding novel antigens to make a new vaccine relies on identifying proteins that elicit an immune response. IVIAT identifes antigens directly from patients who have been exposed to the pathogenic organism. First an expression library is established that includes each of the genes from the pathogen of interest. Next serum from infected patients is collected and preabsorbed to the infectious organism grown in culture to remove the antibodies that recognize surface proteins. The remaining antibodies are used to screen the expression library. When an antibody recognizes a cloned protein the specifc DNA clone is sequenced to identify the gene product. Rather than injecting a protein some vaccines are simply DNA of a gene that will elicit an immune response. After the DNA enters the cell it is converted into protein which elicits the immune response to create memory B cells to that protein. One problem with DNA vaccines is that certain DNA sequences may induce an immune response directly. In particular some DNA sequence motifs found in bacterial DNA may elicit strong immune responses which in turn may cause the body to target its own DNA thus generating an autoimmune response.

slide 216:

Immune Technology 212 EDIBLE V ACCINES Many vaccines are susceptible to heat and degrade when not refrigerated. In developed countries this is not an issue but in developing countries proper storage is hard to fnd. In addi - tion needles and qualifed personnel are needed to adminis - ter injected vaccines. An alternative to injection is to use oral vaccines. These vaccines are taken by mouth in liquid or pill form. Of course the antigen that is delivered orally must not be degraded by digestive enzymes and must still stimulate the immune system. One example is the oral polio vaccine which contains live attenuated polio virus whereas the injected polio vaccine contains inactivated virus. The advantage of the oral vaccine is that the attenuated viruses colonize the intestine and stimulate the immune system the same way that the virulent form of polio would. The disadvantage is the possibility that the live attenuated virus may convert back to a virulent form and the recipient would get polio. The estimate for this happening is 1 virulent dose in 2.5 million. Where polio itself is very rare this risk is too great. Most children in the United States now receive the inactivated form of polio vaccine. Another method of creating heat-stable low-cost vaccines is to express the antigens in plants and then eat the plant. The benefts of edible vaccines include being able to “manufacture” the vaccine in large quantities cheaply. The patient has to eat a certain portion of plant tissue to acquire immunity. Distributing the vaccine in developing countries is easy and storage is the same as for standard crops. Recent advances in expressing foreign proteins in plants see Chapter 14 have facilitated the development of edible vaccines. Genetically engineered potatoes containing a hepatitis B vaccine have currently entered human trials. The volunteers ate fnely chopped chunks of raw potato expressing a surface protein from hepatitis B. Sixty percent of those who ate the vaccine had more antibodies against hepatitis B. All participants had previously received the traditional vaccine so the potato vaccine simply boosted immunity. The main drawback of using vaccines in a food source is the possibility of the vaccine vegetables being confused with normal vegetables and used as food. Instead of food-based vaccines researchers are now developing heat-stable oral vaccines. Instead of crops like corn and potato other edible plants are being developed to express the vaccine. One potential plant is Nicotiana benthamiana a relative of tobacco that is edible but is not used for food. Another potential plant is the single-celled algae called Chlamydomonas reinhardtii that is a great model organism for studying how cilia form and function. In addi- tion Chlamy are useful to study chloroplast function. Edible vaccines are either live attenuated virus like the oral polio vaccine or an antigenic protein that is expressed in a food. FIGURE 6.29 DNA- Coated Microbeads Microbeads are coated with plasmid DNA encoding an antigen gene and injected into a patient. Once inside the cells the plasmid DNA is slowly released and the protein antigen is expressed over a period of time. The expressed protein elicits an immune response without causing disease thereby vaccinating the person against future exposures to the pathogen. Double-stranded plasmid DNA Microbead Summary The immune system has two different components: humoral immunity and cell-mediated immunity. Humoral immunity includes the production of antibodies by B cells that are found in the serum and other bodily fuids. The antibodies have a general Y shape that consists of two heavy chains and two light chains. The hinge region of the Y divides the molecule into the Fc constant and Fab variable regions. Cell-mediated immunity involves the activation of T cells a subset of white blood cells. The T cells become active when a pathogen invades a cell

slide 217:

ChAPTER 6 213 and the cell starts presenting fragments of the pathogen on the cell surface major histocompat- ibility complexes. In both arms of the immune reaction the antibody or T cell recognizes only small epitopes or distinct regions of the pathogenic proteins. The immune system can make many different antibodies to one protein because only these small areas are recognized. In the laboratory antibodies can be made to specifc proteins by injecting an animal such as a mouse or rabbit with a pure sample of the protein. To make monoclonal antibodies scientists fuse mouse B cells to immortal myeloma cells to make hybridomas. Each B-cell fusion makes an antibody to one specifc epitope of the protein. Polyclonal antibodies on the other hand include all the antibodies to the protein that is the antibodies recognize multiple epitopes. Antibodies are used in ELISA where the amount of the target protein in a mixture can be estimated by the amount of antibody that binds. In immunohistochemistry and immunocytochemistry an anti- body to the target protein is used to localize its position within the cell. Antibodies are also used to sort samples of cells by FACS and are used to count a specifc type of cell in fow cytometry. Vaccines stimulate our immune systems to form antibodies and memory B cells without causing the disease for which the vaccine is providing protection. Vaccines are live attenu- ated viruses inactivated or dead viruses subunits of a virus or simply peptides from a viral protein. The vaccines could also be made from a related but harmless virus or bacteria that express a protein from the pathogenic virus or bacteria. Reverse vaccines and DNA vaccines are created from genomic DNA sequences that are expressed into protein. Reverse vaccines are made in a laboratory whereas DNA vaccines are injected directly into the muscular tissue as DNA. Also some vaccines are made by expressing pathogenic proteins in edible crops. A person can receive resistance to the pathogen by simply ingesting these plants. New antigenic proteins are the key to making a good vaccine. DFI and IVIAT are two methods to identify potential antigenic proteins from the pathogenic organism. 1. What are antigens and antibodies a. Antigens are foreign bodies and antibodies are immune system components that recognize antigens. b. Antibodies are foreign bodies and antigens recognize them and work to destroy them. c. Antigens are produced by B cells in response to antibody accumulation. d. Antigens are foreign bodies and antibodies are a specifc cell type from the immune system. e. none of the above 2. Which of the following is an accurate description of B and T cells a. B cells recognize antigens expressed on the surface of other cells and T cells produce antibodies. b. B cells are components of the cell-mediated immunity and T cells comprise the humoral immunity. c. Major histocompatibility complexes are associated with B cells whereas T cells produce antibodies. d. B cells produce antibodies and T cells recognize antigens expressed on the surface of other cells. e. none of the above 3. How are the variants of antibodies produced a. Each variant is encoded on one gene. b. by post-translational modifcation of the antibodies End-of-Chapter Questions Continued

slide 218:

Immune Technology 214 c. by shuffing a small number of gene segments around d. by splicing the transcript into various confgurations e. all of the above 4. Which of the following statements about antibodies is not correct a. Antibodies consist of two light chains and two heavy chains. b. Polyclonal antibodies are derived from hybridomas. c. Antibodies are classifed into classes and have distinct roles in the immune system. d. One particular antibody made from a clonal B cell is called a monoclonal antibody. e. Monoclonal antibodies are made by fusing B cells to myelomas culturing the hybridomas and screening for appropriate antigen recognition. 5. Which of the following statements about humanized antibodies is correct a. Humanized antibodies to the ClfA protein of S. aureus may provide a way to eliminate the antibiotic-resistant pathogen in patients with nosocomial infections. b. Herceptin has been effective in treating some patients with breast cancer. c. Humanized monoclonal antibodies are created by removing the constant regions of mouse antibodies and replacing them with human constant regions. d. Full humanization of an antibody involves removing the hypervariable regions and splicing them into the heavy and light chains of human antibodies. e. All of the above are correct. 6. How is the creation of recombinant antibodies useful to researchers a. Recombinant antibodies can be used to precisely deliver toxins cytokines and enzymes directly to the antigen. b. The production of recombinant antibodies is strictly theoretical and probably will serve no purpose to biotechnology research. c. Recombinant antibodies allow for more effcient production and isolation of the scFv. d. Recombinant antibodies can deliver toxins cytokines and enzymes but are disseminated throughout the organism. e. none of the above 7. Why is an ELISA used a. to quantify the amount of a specifc protein or antigen in a sample b. to quantify the amount of DNA in a sample c. to determine the amount of antibody within a sample d. to dilute out antibody from serum in a microtiter plate e. none of the above 8. Which of the following is an example of how ELISA is used a. home pregnancy test b. detection of pathogenic organisms c. detection of plant diseases d. detection of dairy and poultry diseases e. all of the above 9. In which application are fuorescent antibodies used a. immunocytochemistry b. fow cytometry

slide 219:

ChAPTER 6 215 c. immunohistochemistry d. fuorescence activated cell sorting e. all of the above 10. Which of the following statements about immunity is not true a. Vaccines use a live infectious agent that is still capable of producing disease in order to elicit an immune response. b. The immune system remembers foreign antigens through memory B cells. c. Vaccines consist of an antigen from an infectious agent that induces an immune response. d. Immunity to a fatal disease can often be triggered by infection with a closely related infectious agent as in the cases of cowpox and smallpox. e. Antibody-producing B cells normally live only a few days but memory cells survive for a long time. 11. How are vaccines made so that they do not cause disease a. killing the infectious agent with heat or denaturing the infectious agent with chemicals b. using a component or protein of the infectious agent instead of the organism itself c. genetically engineering the infectious agent to remove the genes that cause disease d. using a related but non-pathogenic strain of an infectious agent e. all of the above 12. What is reverse vaccinology a. the removing of B cells from a person’s body exposing them to an infectious agent in vitro and then returning them to the body b. the use of expressed genes from an expression library to fnd proteins that elicit an immune response in mice to create new vaccine candidates c. the vaccination of a person with a related but non-pathogenic strain to elicit an immune response d. the vaccination of a person after he or she has already been exposed to the pathogen e. none of the above 13. What is critical to fnding novel antigens for vaccine development a. the growth of live infectious agents to create whole vaccines b. the engineering of genes to attenuate infectious agents c. the identifcation of proteins that elicit an immune response d. the identifcation of the immune system components unique to specifc infectious agents e. none of the above 14. Which of the following statements about edible vaccines is not true a. In developing countries proper storage and availability of needles and personnel to administer the vaccine are limiting factors in vaccinating the population. b. Edible vaccines must not be destroyed by the digestive system and must still elicit an immune response. Continued

slide 220:

Immune Technology 216 c. A problem with using edible vaccines is the possibility that vaccine vegetables could be mistaken for normal vegetables used as food. d. Edible vaccines are usually too expensive to be manufactured in large quantities. e. All of the above are true. 15. Which of the following is not a risk associated with vaccines a. adverse side effects b. aller gic r eactions c. preservatives containing mercury d. induction of autoimmunity in some individuals e. all of the above are potential risks associated with vaccination 16. Which of the following is the mechanism of action for Herceptin a. Herceptin binds to the HER2 promoter to prevent transcription thus lowering amounts of receptor. b. Herceptin binds to intracellular HER2 proteins to prevent cancer cells from dividing. c. Herceptin binds to HER3 and HER4 cell surface receptors to activate the immune system. d. Herceptin binds to extracellular domain of HER2 and prevents internaliza- tion and subsequent cancer cell division. e. Herceptin targets TNFα. 17. All of the following statements are Remicade are true except_______________. a. Remicade targets TNFα in the joints of people with rheumatoid arthritis. b. Remicade is a chimeric antibody. c. Remicade’s mechanism of action includes enhancing the release of IL-1. d. Remicade is a fusion protein produced in a myeloma cell line. e. All of the above are true. 18. Heavy chain antibodies have major implications for therapeutic purposes because_______________. a. they have smaller variable domains and can be purifed more easily b. they only have heavy chains and no light chains and are more easily engineered c. nanobodies are easily constructed from the constant regions of these antibodies d. they are derived from camels and related animals and there would not be recognized by the human immune system e. they recognize much smaller antigens 19. All of the following are features or functions of nanobodies except_______________. a. small monomeric lack disulfde bonds and resistance to denaturation b. high affnity to the antigen c. recognize protruding and recessed paratopes d. cannot be humanized e. consist of only the VHH domain of heavy chain antibodies

slide 221:

ChAPTER 6 217 Further Reading Betts M. R. Brenchley J. M. Price D. A. De Rosa S. C. Douek D. C. Roederer M. et al. 2003. Sensitive and viable identifcation of antigen-specifc CD8 + T cells by a fow cytometric assay for degranulation. Journal of Immunology Methods 281 65–78. Clark D. P. 2005. Molecular Biology: Understanding the Genetic Revolution. San Diego CA: Elsevier Academic Press. Clark M. 2000. Antibody humanization: a case of the “emperor’s new clothes” Immunology Today 8 397–402. Elgert K. D. 1996. Immunology: Understanding the Immune System. New York: Wiley-Liss. Fischer O. M. Streit S. Hart S. Ullrich A. 2003. Beyond Herceptin and Gleevec. Current Opinion in Chemical Biology 7 490–495. Glick B. R. Pasternak J. J. 2003. Molecular Biotechnology: Principles and Applications of Recombinant DNA 3rd ed.. Washington DC: ASM Press. Handfeld M. Brady L. J. Progulske-Fox A. Hillman J. D. 2000. IVIAT: A novel method to identify microbial genes expressed specifcally during human infections. Trends in Microbiology 8 336–339. Patti J. M. 2004. A humanized monoclonal antibody targeting Staphylococcus aureus. Vaccine 228 S39–S43. Scarselli M. Giuliana M. M. Adu-Bobie J. Pizza M. Rappuoli R. 2005. The impact of genomics on vaccine design. Trends in Biotechnology 23 84–91. Valdivia R. H. Falkow S. 1997. Probing bacterial gene expression within host cells. Trends in Microbiology 5 360–363.

slide 222:

CHAPTER 219 Biotechnology Copyright © 2016 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/B978-0-12-385015-7.00007-7 Nanobiotechnology 7 Introduction Visualization at the Nanoscale Scanning Tunneling Microscopy Atomic Force Microscopy Weighing Single Bacteria and Virus Particles Nanoparticles and Their Uses Nanoparticles for Labeling Quantum Size Efect and Nanocrystal Colors Nanoparticles for Delivery of Drugs DNA or RNA Nanoparticles in Cancer Therapy Assembly of Nanocrystals by Microorganisms Nanotubes Antibacterial Nanocarpets Detection of Viruses by Nanowires Ion Channel Nanosensors Nanoengineering of DNA DNA Origami DNA Mechanical Nanodevices Controlled Denaturation of DNA by Gold Nanoparticles Controlled Change of Protein Shape by DNA Biomolecular Motors

slide 223:

Nanobiotechnology 220 INTRODUCTION In 1959 Richard Feynman was the frst scientist to suggest that devices and materials could someday be fabricated to atomic specifcations: “The principles of physics as far as I can see do not speak against the possibility of maneuvering things atom by atom.” Molecular biology originated largely from the study of microorganisms. One micrometer is one millionth of a meter and cells of Escherichia coli the geneticist’s favorite bacterium are roughly 1 micrometer “micron” in length. A nanometer is one thousandth of a micrometer 10 −9 meters Fig. 7.1. The terms micro- and nano- are both from Greek. Mikros means “small.” More imaginative is nanos a little old man or dwarf. Pico- comes from Spanish where it means a small quantity or beak from Latin beccus “beak” ultimately of Celtic origin. Prefxes for even smaller quantities are shown in Table 7.1. As far as length is concerned these are applicable only to subatomic dimensions. Nevertheless when dealing with masses and volumes on the nanoscale we may fnd femtograms and zeptoliters. Recently science has advanced into the area of nanotechnology. As the name indicates the impetus has come from pursuing practical applications especially in the felds of electronics and materials science rather than a quest for theoretical knowledge. Nanotechnology involves the individual manipulation of single molecules or even atoms. Building compo- nents atom by atom or molecule by molecule in order to create materials with novel or vastly improved properties was perhaps the original goal of nanotechnologists. However the feld has expanded in a rather ill-defned way and tends to include any structures so tiny that their study or manipulation was impossible or impractical until recently. At the nanoscale quan- tum effects emerge and materials often behave strangely compared to their bulk properties. The internal components of biological cells are on the same scale as those studied by nanotechnology. As a consequence nanotechnologists have looked to cell biology for useful structures processes and information. Cellular organelles such as ribosomes may be regarded as programmable “nanomachines” or “nano-assemblers.” Thus nanotechnology is spilling over into molecular biology. Much of “nanobiotechnology” is in fact molecular biology viewed from the perspective of materials science and described in novel terminology. All chemical reactions operate at a molecular level. What distinguishes true nanotechnology is that single molecules or nanostructures are assembled following specifc instructions. A ribosome does not merely polymerize amino acids into a chain. It takes specifed amino acids one at a time according to information provided and links them in a specifc order. Thus the critical properties of a nano-assembler include the ability not merely to assemble structures at the molecular level but to do so in a specifc and controlled manner. 1 m 0.1 m 1 cm 1 mm 100 µm 10 µm 1 µm 100 pm 1 nm 10 nm 100 nm 10 pm 1 pm Carbon atom Protein T2 phage Hydrogen atom Electron Electron microscope Light microscope Unaided eye Bacteria Eukaryotic cell Small insect Large dog FIGURE 7 .1 Size Comparisons The objects range in size from 1 meter to 1 picometer.

slide 224:

Chap TER 7 221 The main practical objectives of nanobiotechnology are using biological components to achieve nanoscale tasks. Some of these tasks are nonbiological and have applications in such areas as electronics and computing whereas others are applicable to biology or medicine. The purpose of this chapter is to show by selected examples how biological approaches can contribute to nanoscience. Prefxes and Sizes Length Unit Meters Examples 5.9 Terameters Mean distance from Sun to Pluto Terameter 10 12 150 Gigameters Distance to the Sun Gigameter 10 9 380 Megameters Distance to the Moon 6.3 Megameters Radius of the Earth 3.2 Megameters Length of Great Wall of China Megameter 10 6 Kilometer 10 3 30 Meters Blue whale Meter 1 Large dog Millimeter 10 −3 Small insect Micrometer 10 −6 Bacterial cell 500 Nanometers Wavelength of visible light 100 Nanometers Size of typical virus 3.4 Nanometers One turn of DNA double helix Nanometer 10 −9 Molecules 350 Picometers Molecular diameter of water 260 Picometers Atomic spacing in solid copper 77 Picometers Atomic radius of carbon resolution limit of atomic force microscope as of 2004 32 Picometers Atomic radius of hydrogen Ångstrom 100 picometers 10 −10 meter 2.4 Picometers Wavelength of electron Picometer 10 −12 Femtometer 10 −15 Radius of atomic nucleus Attometer 10 −18 Radius of proton Zeptometer 10 −21 Yoctometer 10 −24 Radius of neutrino T able 7 .1 Many internal components of biological cells are in the nanoscale range. As nanotechnology advances it is developing many links with biotechnology and genetic engineering.

slide 225:

Nanobiotechnology 222 VISUaLIZa TION a T ThE NaNOSCLE To manipulate matter on an atomic scale we need to see individual atoms and molecules. Although individual molecules have been visualized with the electron microscope it was the development of scanning probe micro- scopes that opened up the feld of nano - technology. These instruments all rely on a miniature probe that scans across the surface under investigation. All scanning probe microscopes work by measuring some property such as electrical resistance magnetism temperature or light absorption with a tip positioned extremely close to the sample. The microscope raster-scans the probe over the sample Fig. 7.2 while measuring the property of interest. The data are displayed as a raster image similar to that on a television screen. Unlike traditional microscopes scanned- probe systems do not use lenses so the size of the probe rather than diffraction limits their resolution. Some of these instruments can be used to alter samples as well as visualize them. The frst of these instruments was the scanning tunneling microscope STM which was developed by Gerd Binnig and Heinrich Rohrer at IBM see following section. They received the Nobel Prize in 1986. The STM sends electrons that is an electric current through the sample and so measures electrical resistance. The atomic force microscope AFM is especially useful in biology and measures the force between the probe tip and the sample. SCNNING TUNNELING MICROSCOp Y When a metal tip comes close to a conducting surface electrons can tunnel from one to the other in either direction. The probability of tunneling depends exponentially on the distance apart. Surface contours can be mapped by keeping the current constant and measuring the height of the tip above the surface. This allows resolution of individual atoms on the surface being studied. This is the principle of the scanning tunneling microscope Fig. 7.3. Atoms may also be moved using the STM. In 1989 in perhaps the most famous experiment in nanotechnology D. M. Eigler and E. K. Schweizer fabricated the IBM logo by arranging 35 xenon atoms on a nickel surface. They chose nickel because the valleys between rows of nickel atoms are deep enough to hold xenon atoms in place yet small enough to allow the xenon atoms to be pulled over the surface. To move xenon atoms they placed the STM tip above a xenon atom using imaging mode. Next scanning mode was turned off and the tip lowered until the tunneling current increased several-fold “fabrication mode”. The xenon atom was attracted to the STM tip and was dragged by moving the tip horizontally. The atom was deposited at its new location by reducing the tunneling current. Since then several diagrams have been made in the same way. Carbon monoxide man is shown in Figure 7.4. From a biological perspective the weakness of STM is that it requires a conducting surface in practice generally a metal layer of some sort. The atomic force microscope see the following Flyback Scan Visualization of individual molecules or even atoms is possible using scanning probe microscopes. FIGURE 7 .2 Principle of Raster Scanning In raster scanning the probe moves to and fro across the target region. The probe scans only while moving in one direction “scan”. When the probe travels in the reverse direction it moves more rapidly without making contact “fyback”.

slide 226:

Chap TER 7 223 section has the advantage of not needing conductive material and has therefore been more widely applied in biology. a TOMIC FORCE MICROSCOp Y Visualization at the nanoscale is often performed using atomic force microscopy. As the name indicates it operates by measuring force not by using a stream of particles such as photons as in light microscopy or electrons as in electron microscopy. Physicists sometimes compare the operation of an AFM to an old-fashioned record player which uses a needle to scrape the surface of a record. Perhaps to a biologist the difference between a light microscope and AFM is like the difference between reading text with the eyes and feeling Braille. The atomic force microscope was invented in 1985 by Gerd Binnig Calvin Quate and Christof Gerber. The AFM uses a sharp probe that moves over the surface of the sample and that bends in response to the force between the tip and the sample. The movement of the Tip atoms Sample atoms Current + − Probe detection mechanism Fine positioning control Coarse positioning control Feedback mechanism DISPLAY FIGURE 7 .3 Principle of Scanning Tunneling Microscope The probe tip and surface atoms of the sample are shown in the inset. The scanning tunneling microscope can be used to detect or move individual atoms on a conducting surface.

slide 227:

Nanobiotechnology 224 probe performs a raster scan and the resulting topographical image is displayed on-screen. During scanning the movement of the tip or sample is performed by an extremely precise positioning device that is made from piezoelectric ceramics. These are materials that change shape in response to an applied voltage. It usually takes the form of a tube scanner that is capable of sub- Ångstrom resolution in all three directions. The AFM probe is a tip on the end of a cantilever. As the cantilever bends because of the force on the tip a laser monitors its displacement as shown in Figure 7.5. The beam from the laser is refected onto a split photodiode. The difference between the A and B signals measures the changes in the bending of the cantilever. For small displacements the displacement is proportional to the force applied. Hence the force between the tip and the sample can be derived. The distance between tip and sample is adjusted so that it lies in the repulsive region of the intermolecular force curve that is the AFM probe is repelled by its molecular interaction with the surface. The repulsion gives a measure of surface topography and this is what is generally displayed with color-coding indicating relative height. It is possible to scan a surface for topography and then raise the AFM probe and rescan to detect electrostatic or magnetic forces. These can then be plotted for comparison with the topography. As with the STM it is possible to use the AFM to move single atoms although this was only achieved in 2003. Researchers at Osaka University in Japan removed a single silicon atom from a surface and then replaced it. Through use of the AFM it is possible to visualize polymeric biological molecules such as DNA or cellulose and even to see the individual monomers and at high resolution even the atoms of which they are composed. Perhaps more interesting is the use of AFM to monitor the formation of biological complexes such as the binding of proteins to single molecules of DNA. An example is the formation of the complex between DNA and the bacterial protein RecA which is involved in repair of damaged DNA Fig. 7.6. As shown RecA assembles onto the DNA starting at one end. Eventually the whole length of the DNA molecule will be coated. WEIGhING SINGLE B a CTERIa aND VIRUS paRTICLES It has been known for many years that bacteria are on the order of 1000 nanometers in size and 1 picogram in weight. However in addition to detecting microorganisms via nanotechnology it is now possible to weigh them individually. The oscillation frequency of a diving board depends on the mass applied. Scaling down it is possible to construct a cantilever of micrometer dimensions approximately 6 microns long by 0.5 micron wide with an end platform about 1 micron square. The oscillation frequency can be measured by using a laser and observing the altered light refection. Addition of single bacterial cells or even Laser beam Position-sensitive detector Probe tip FIGURE 7 .4 Carbon Monoxide Man by Zeppenfeld The atoms were arranged by STM. The medium is carbon monoxide on platinum. Courtesy of International Business Machines Corporation. © 1995 IBM. The atomic force microscope can detect atoms or molecules by scanning a surface for shape or electro- magnetic properties. FIGURE 7 .5 The Atomic Force Microscope AFM The defection of the tip of the probe by the surface is monitored by a laser.

slide 228:

Chap TER 7 225 virus particles changes the oscillation frequency of the cantilever. The mass of single cells or virus particles has been measured this way in the laboratory of Harold Craighead at Cornell University Fig. 7.7. To hold the bacteria or viruses in place the cantile- ver is coated with an antibody that recognizes the microorganism to be weighed. A single cell of E. coli was 1430 × 730 nanometers in size and weighed 665 femtograms 665 × 10 −15 grams. Viruses weighing around 1 femtogram can be detected by reducing the size of the cantilever and enclosing it in a vac- uum. By mid-2005 this technique had been refned to weigh a single macromolecule—a double-stranded DNA of approximately 1500 base pairs roughly the size of a typical coding sequence. Today this approach allows measurements of small proteins and other molecules in the zeptogram range 10 −21 grams. A possible future use for weighing individual particles is in identifying and quantitating infectious agents. Although bacteria change weight as they grow and divide individual viruses have characteristic weights because they are assembled. Using antibodies attached to the cantilevers to capture the target particle bacterium virus or even proteins provides specifcity. This approach can even be used to detect and count the number of particles of the prion protein responsible for mad cow disease see Chapter 21 for more information about prions. An array of cantilevers rather than a single cantilever would be used when count- ing particles for quantifcation. NaNOpaRTICLES aND ThEIR USES Nanotechnology began with advances in viewing and measuring the incredibly small. It then moved on to building structures at the nanoscale. Simple nanostructures are now being used for a variety of analytical purposes and a second generation is being developed for clinical use. As their name indicates nanoparticles are particles of submicron scale—in practice from 100 nm down to 5 nm in size. They are usually spherical but rods plates and other shapes are sometimes used. They may be solid or hollow and are composed of a variety of materials often in several discrete layers with separate functions. Typically there is a central functional layer a protective layer and an outer layer allowing interaction with the biological world. The central functional layer usually displays some useful optical or magnetic behav- ior. Most popular is fuorescence. The protective layer shields the functional layer from chemical damage by air water or cell components and conversely shields the cell from any toxic properties of the chemicals composing the functional layer. The outer layer or layers allow nanoparticles to be “biocompatible.” This generally involves two aspects: water solubility and specifc recognition. For biological use nanoparticles are often made water soluble by adding a hydrophilic outer layer. In addition chemical groups must be present on the exterior to allow specifc attachment to other molecules or structures Fig. 7.8. A DNA DNA 250 nm RecA attachments RecA attachments B FIGURE 7 .6 AFM Shows RecA Attached to DNA at One End When RecA protein binds to DNA the complexes form frst at one end of the DNA as shown here by atomic force microscopy. From Li BS Wei B Goh MC 2012. Direct visualization of the formation of RecA/dsDNA complexes at the single- molecule level. Micron 43 1073–1075. Laser monitoring of the oscillation of a nanoscale cantilever allows single bacteria or viruses to be individually weighed.

slide 229:

Nanobiotechnology 226 Nanoparticles see Box 7.1 have a variety of uses in the biological arena: a Fluorescent labeling and optical coding b Detection of pathogenic microorganisms and/or specifc proteins c Purifcation and manipulation of biological components d Delivery of pharmaceuticals and/or genes e Tumor destruction by chemical or thermal means f Contrast enhancement in magnetic resonance imaging MRI A B B FIGURE 7 .7 Weighing a Single Bacterium A Scanning electron micro- graph of cantilever oscillator with length 6 microns width 0.5 microns and a 1 micron by 1 micron paddle. Scale bar corresponds to 2 microns. From Ilic B et al. 2004. Virus detection using nanoelectromechanical devices. Appl Phys Lett 95 2604–2607. Copyright 2004. Reprinted with per- mission from the American Institute of Physics. B Scanning EM of a single E. coli cell attached to the cantilever by antibody. Courtesy of Craighead group Cornell University. From Ilic B et al. 2001. Single cell detection with micromechanical oscilla- tors. J Vac Sci Technol B 19 2825–2828. Copyright 2001. Reprinted with per mission from the American Institute of Physics. Nanoparticles are now widely used in a range of biological procedures. They include both analytical and clinical applications. Nanoparticles are referred to by a variety of nanoterms depending on their shape and structure. The meanings of nanorod nanocrys- tal nanoshell nanotube nanowire and so forth should be obvious enough. And despite what you might think quantum dots are not a new brand of frozen snack but an alternative name for fuorescent nanocrystals small enough to show quantum confnement and used in biological labeling. Box 7.1 Trendy Terminology

slide 230:

Chap TER 7 227 NaNOpaRTICLES FOR LBELING Consider luminescent CdSe nanorods as an example of nanoparticles used for labeling Fig. 7.9. These nanorods can be used as fuorescent labels for molecular biology because they absorb light from the UV to around 550 nm and emit strongly at 590 nm. They were made—appropriately enough—in the lab of Thomas Nann in Freiberg Germany. These nanorods measure approximately 3 nm in width by 10 to 20 nm in length. A core of luminous cadmium selenide CdSe is sur- rounded by a shell of ZnS zinc sulfde wurtz - ite that protects the core against oxidation. Outside this is a layer of silica which allows coupling of phosphonates or amines to the exterior of the nanorod. These hydrophilic groups make the nanorods water soluble. These outer chemical groups also allow attachment of the nanorods to proteins. The scaffold inside eukaryotic cells is built from cylindrical protein structures known as microtubules. These protein structures are often disassembled into monomers known as tubulin and reassembled in different locations. Nanorods can be used to follow this remodeling by attaching them to the tubulin monomers. The addition of guanosine triphos- phate GTP stimulates the assembly of microtubules and the fuorescent nanorods can be seen aggregating into linear structures. Why use a complex multilayered nanostructure instead of a simple fuorescent dye a Although nanocrystals have narrow emission peaks they have broad absorption peaks rather than narrow ones like typical dyes. Consequently they do not bleach during excitation and can therefore be used for continuous long-term irradiation and monitoring. b Nanocrystals have high brightness—the product of molar absorptivity and quantum yield. Molar absorptivity is the absorbance of a one molar solution of pure solute at a given wavelength the higher it is the more light is absorbed. The quantum yield is the ratio of photons absorbed to photons emitted during fuorescence. c The emission maximum of a nanocrystal depends on the size and so can be set to any desired wavelength by making crystals of the appropriate size see later discussion. Physically active internal layers Protective layer Hydrophilic layer Tether Attached molecules FIGURE 7 .8 Typical Layered Structure of Nanoparticles Several layers surround the physically active core. Chemical groups are often added to the exterior to allow attachment of biological molecules. Amine Amine Phosphonate Phosphonate Silica 10–20 nm 3 nm Zinc sulfide Cadmium selenide Hydrophilic groups attached to surface FIGURE 7 .9 CdSe Nanorods Luminescent CdSe nanorods are encased in protective layers of zinc sulfde and of silica. Hydrophilic chemical groups on the outside allow proteins or other biological molecules to be attached.

slide 231:

Nanobiotechnology 228 Nanoparticles can also be targeted to specifc tissues such as cancer cells by adding appro - priate antibodies or receptor proteins to the nanoparticle surface. Fluorescent nanoparticles are often known as quantum dots and are now commercially available for a wide range of biological labeling. Although fuorescent dyes can be attached to other molecules nanopar - ticles are more versatile in this regard. Quantum dots can be used to label DNA molecules as well as proteins. Thus labeling of PCR primers with quantum dots results in fuorescently labeled PCR products—a variant referred to as quantum dot PCR. A variety of materials have been used to give better contrast enhancement in MRI magnetic resonance imaging. Nanoparticles containing assorted compounds are seeing increasing use in this area. For example super-paramagnetic iron oxide nanoparticles SPIONs act as good MRI contrast agents. Their magnetic properties vary with particle size. Larger particles of greater than 300 nm are used for bowel liver and spleen. Smaller particles of 20 to 40 nm show higher diagnostic accuracy for detecting early tumors in lymph nodes than conven- tional materials. Because they are relatively safe SPIONs have also been suggested as possible carriers for delivery of both drugs and DNA. QUaNTUM SIZE EFFECT aND NaNOCRYST aL COLORS When materials are subdivided into suffciently small fragments quantum effects begin to infuence their physical properties. The fuorescent nanoparticles discussed earlier are in fact semiconductors that are small enough to show such quantum effects. Semiconductors are substances that conduct electricity under some conditions but not others. In N-type semiconductors as in normal electric wires the current consists of negatively charged electrons. In P-type semiconductors the current consists of holes. A hole is the absence of an electron from an atom. Although not physical particles holes can move from atom to atom. Electrons and holes may combine and cancel out a process that releases energy. Conversely energy absorbed by certain semiconductors may generate an electron- hole pair whose two components may then move off in different directions. Nanoparticle labels can be made with different emission wavelengths covering the UV visible spectrum and near infrared. Emission wavelengths obviously vary depending on the semiconductor material. However in addition the quantum size effect Fig. 7.10 allows the same semiconductor to emit at different wavelengths depending on the size of the nanoparticle. The smaller the nanoparticle the shorter the wavelength i.e. the higher the energy it emits. Fluorescent nanoparticles may be regarded as miniaturized light-emitting diodes LEDs. These are semiconductors that work by absorbing energy either electrical or light and creat- ing electron-hole pairs. When the electrons and holes recombine light is emitted. For bulk material the energy and hence the wavelength of the emitted light depends on the chemical composition of the semiconductor. However at nanoscale dimensions quantum effects become signifcant. If the physical size of the semiconductor is smaller than the natural radius the Bohr radius of the electron-hole pair extra energy is needed to confne the electron–hole pair. This is referred to as quantum confnement and occurs with nano- crystals of around 20 nm or less. The smaller the semiconductor crystal the more energy is needed and the more energetic shorter in wavelength is the light released. Fluorescent nanoparticles are widely used in biological labeling. They last longer than traditional fuores - cent dyes and are often brighter. The emission wavelength of a fuorescent nanoparticle depends on its size therefore an experimenter may modify it easily.

slide 232:

Chap TER 7 229 NaNOpaRTICLES FOR DELIVERY OF DRUGS DNa OR RNa Because nanoparticles can be targeted to specifc tissues they can be used to deliver a variety of biologically active molecules including both pharmaceuticals and genetic engineering constructs. Large polymeric molecules such as DNA may themselves be compacted to form nanoparti- cles of around 50 to 200 nm in size. This involves the addition of positively charged mol- ecules e.g. cationic lipids polylysine to neutralize the negative charge of the phosphate groups of the nucleic acid backbone. Other molecules may be added to promote selectivity for certain cells or tissues. Alternatively hollow nanoparticles nanoshells may be used to carry other smaller mole- cules. Such nanoshells must be made from biocompatible materials such as polyethyleneimine PEI or chitosan Fig. 7.1 1. The latter alternative is popular because it is both naturally derived and biodegradable. Chitin is a beta-14-linked polymer of N-acetyl-d -glucosamine. It is found in the cell walls of insects and fungi and among biopolymers is second only in natural abundance to cellulose. Chitosan is derived from chitin by removing most of the acetyl groups through alkali treatment and has been shown as safe for administration to humans. An interesting combination of two modern technologies is using nanoshells to carry short- interfering RNA siRNA. Delivery of siRNA triggers RNA interference which results in the destruction of target mRNA see Chapter 5. The siRNA may be targeted against mRNA from genes expressed preferentially in cancer cells see below or genes characteristic of certain viruses see Chapter 21. FIGURE 7 .10 Quantum Size Effect Nanocrystals of different sizes absorb UV light and re-emit the energy. The wavelength of emission depends on the size of the nanocrystal. The smaller the crystal the more energetic the emission. From Riegler J Nann T 2004. Application of luminescent nanocrystals as labels for biological molecules. Anal Bioanal Chem 3797–8 913–919. With kind permission from Springer Sci- ence and Business Media. Hollow nanoparticles may be used to deliver DNA RNA or proteins.

slide 233:

Nanobiotechnology 230 NaNOpaRTICLES IN CNCER ThERap Y It is possible to destroy tumor cells through use of a variety of toxic chemicals or localized heating. In both cases a major issue is delivering the fatal reagent to the cancer cells and avoiding nearby healthy tissue. When toxic chemical reagents are used the reagent not only must be delivered specifcally to the target cells but also prevented from diffusing out of the cancer cells. Both related objectives may be achieved by using hollow nanoparticles to carry the reagent. Nanoparticles may be targeted to tumors by adding specifc receptors or reac - tive groups to the outside of the nanoparticles. These are chosen to recognize proteins that are solely or predominantly displayed on the surface of cancer cells. It is hoped that such nanoparticles will be safe to give by mouth. Diffusion is more diffcult to deal with but may be limited to some extent by designing nanoparticles for slow release of the reagent. A clever alternative is to produce the toxic agent inside the nanoparticle after it has entered the cancer cell. Photodynamic cancer therapy involves generating singlet oxygen by using a laser to irradiate a photosensitive dye. The sin- glet oxygen is highly reactive and in particular destroys biological membranes via oxidation of lipids. After diffusing out of the nanoparticle the toxic oxygen reacts so fast that it never leaves the cancer cell Fig. 7.12. Nanoparticles may also be used to kill cancer cells through localized heating. In one approach nanoparticles with a magnetic core are used. An alternating magnetic feld is used to supply energy and it heats the nanoparticle to a temperature lethal to mammalian cells. Another approach uses metal nanoshells. They consist of a core often silica surrounded by a thin metal layer such as gold. Varying the size of the core and thickness of the metal layer allows such nanoparticles to be tuned to absorb from any region of the spectrum from UV through the visible to the IR. Because living tissue absorbs least in the near infrared the nanoparticles are designed to absorb radiant energy in this region of the spectrum. This results in external near infrared being specifcally absorbed and heating the surrounding tissue. Matrix-type system e.g. nanoparticles Immunization Protein delivery Gene delivery Core-shell system e.g. nanocapsules Transmucosal delivery FIGURE 7 .11 Chitosan Nanoparticle The two main types of chitosan nanocarriers: matrix systems e.g. nanoparticles and core-shell systems e.g. nanocapsules. The chitosan interacts strongly with mucosal surfaces resulting in three potential applications: vaccination protein delivery and gene therapy. Modifed after Garcia-Fuentes M Alonso MJ 2012. Chitosan-based drug nanocarriers: where do we stand J Control Release 161 496 –504.

slide 234:

Chap TER 7 231 Another possible way to use nanopar- ticles against cancer is to control blood vessel development or angio- genesis. For cancer cells to develop into a genuine tumor a blood supply is needed. The balance between pro- and antiangiogenic growth factors that bind to cell surface receptors controls the formation of new blood vessels. Gold nanoparticles that carry relatively short peptides on their sur- faces have been constructed to bind to these receptors. Peptides that bind to the neuropilin-1 receptor such as KATWLPPR block blood capillary formation. Cryptozoology is the study of “undiscovered” creatures such as the Loch Ness monster or Bigfoot. However students of microbiology no longer need to feel left out. The new feld of nanocryptobiology is here. It is perhaps not surprising that some investigators claimed to have discovered “nanobacteria.” They were supposedly 100-fold smaller than typical bacteria yet capable of growth and replication. They were proposed as causative agents in the formation of kidney stones and then linked to heart disease and cancer. Unfortunately “nanobacteria” are too small to contain ribosomes or chromosomes and it has become clear that they are mineral artifacts. Their sup- posed replication was due to the fact that certain minerals can act as nuclei for further crystallization. It scarcely needs adding that “fossilized nanobacteria” have also been seen in meteorites from Mars and have been claimed as evidence for life on Mars. However similar mineral structures have been found in both lunar meteorites and terrestrial rocks. Box 7.2 The Nanobacteria—Nanotechnology Meets Nanomythology Nanoparticles may be used to kill cancer cells by localized heating or by local generation of a toxic prod- uct such as singlet oxygen. Another approach to cancer therapy is using nanoparticles to regulate blood vessel formation. Excited dye aggregates Organically modified silica nanoparticle FRET to photosensitizer MEMBRANE DESTRUCTION 3 O 2 1 O 2 Near-infrared laser FIGURE 7 .12 Nanoparticle for Singlet Oxygen Release The near-infrared laser excites the dyes attached to the nanoparticle. Energy transfer to photosensitizers by fuorescent resonance energy transfer FRET results in conversion of normal triplet oxygen to singlet oxygen. ASEMBL Y OF NaNOCRYST aLS BY MICROORGaNISMS It has been known for many years that bacteria may accumulate a variety of metallic ele- ments and may modify them chemically usually by oxidation or reduction. For example many bacteria accumulate anions of selenium or tellurium and reduce them to elemental selenium or tellurium which is then deposited as a precipitate either on the cell surface or internally. Certain species of the bacterium Pseudomonas that live in metal-contaminated areas and the fungus Verticillium can both generate silver nanocrystals. However so-called ‘’nanobacteria’’ are a mineral artefact see Box 7.2. Recently it has been found that when E. coli is exposed to cadmium chloride CdCl 2 and sodium sulfde it precipitates cadmium sulfde CdS as particles in the 2- to 5-nm size range. In other words bacteria can “biosynthesize” semiconductor nanocrystals. Increasing levels of the sulfhydryl compound glutathione improves the yield. When cadmium chlo- ride and potassium tellurite K 2 TeO 3 were provided a mixture of fuorescent nanocrystals of different sizes and colors made of CdTe was generated Fig. 7.13. Bizarrely enough even earthworms can make nanocrystals. If cadmium chloride and potassium tellurite were added to soil wild-type earthworms made green fuorescent CdTe nanocrystals

slide 235:

Nanobiotechnology 232 Rather more sophisticated is the use of phage display to select peptides capable of organiz- ing semiconductor nanowires. As described in Chapter 9 phage display is a technique that allows the selection of peptides that bind any chosen target molecule. In brief stretches of DNA encoding a library of peptide sequences are engineered into the gene for a bacterio- phage coat protein. The extra sequences are attached at either the C terminus or N terminus where they do not disrupt normal functioning of the coat protein. When the hybrid protein is assembled into the phage capsid the inserted peptides are displayed on the outside of the phage particle. The library of phages is then screened against a target molecule. Those phages that bind the target are kept. Phage display libraries have been screened to fnd peptides capable of binding ZnS or CdS nano - crystals. Protein VIII of bacteriophage M13 was used for peptide insertion. For example ZnS was bound by the peptide VISNHAGSSRRL and CdS by the peptide SLTPLTTSHLRS. Because the bacteriophage capsid contains many copies of the coat protein the displayed peptide is also present in many copies. Consequently an array of nanocrystals forms on the phage surface. Because M13 is a flamentous phage the result is a semiconductor nanowire Fig. 7.14. NaNOTUBES Carbon nanotubes are cylinders made of pure carbon with diameters of 1 to 50 nanometers. However they may be up to approximately 10 micrometers long. Pure elemental carbon exists as diamond or graphite. In diamond each carbon is covalently linked to four others forming a 3D tetrahedral lattice that is extremely strong. In contrast graphite consists of fat sheets of 0 10 20 30 5 010 Size nm Cd Average 5.98 15 20 Volume C AB 0 10 20 30 5 010 Size nm Cd + Te Average 4.8 15 20 Volume D Nanocrystals and nanowires may be assembled using unmodifed bacteria genetically engineered bacteria or sophisticated phage display techniques. FIGURE 7 .13 CdTe Nanoparticles Made by Bacteria Purifcation and size determination of CdTe nanoparticles produced by E. coli. A UV-exposed cell suspensions of E. coli AG1/ pCA24NgshA untreated or exposed to CdCl 2 or CdCl 2 /K 2 TeO 3 from left to right. B Purifed fractions exposed to UV light. C and D Particle sizes of samples from cells exposed to CdCl 2 or CdCl 2 /K 2 TeO 3 respectively. From Monrás JP et al. 2012. Enhanced glutathione content allows the in vivo synthesis of fuo - rescent CdTe nanoparticles by E. coli. PLoS One 711 e48657.

slide 236:

Chap TER 7 233 carbon atoms that form a hexagonal pattern. In the sheets of graphite each carbon atom is covalently bonded to three neighbors and the sheets can slide sideways over each other because there are no covalent linkages between atoms in different sheets. To form a nanotube a single sheet of graphite is rolled into a cylinder. The sheets may be rolled up straight or at an angle to the carbon lattice and may be of various diameters. Depending on the diameter and the torsion the nanotube may act as a metallic conductor or a semi- conductor. Not surprisingly carbon nanotubes are now fnding many uses in electronics a topic beyond the scope of this book. Single-walled carbon nanotubes i.e. those consisting of a single layer of graphite are especially use- ful in biology as they enter cells very readily. In biotechnology nanotubes are beginning to fnd applica - tions. The critical issue is attaching useful biomolecules such as enzymes hormone receptors or antibodies to the nanotube surface. A major problem in attaching proteins is that the surface of carbon nanotubes is hydrophobic. One approach is to frst modify the surface by adding nonionic detergents such as Triton X100. The hydropho- bic portion of the detergent binds to the nanotube surface and the hydrophilic region can be used to bind proteins. Alternatively chemical reagents that react with the carbon surface of the nanotube are used to generate side chains carrying reactive functional groups. Proteins can then be linked covalently by reaction with these Fig. 7.15. Proteins can also be attached to natural magnetic nanoparticles see Box 7.3. Possible applications of carbon nanotubes in biotechnology and medicine include a Imaging. Even without attached dyes carbon nanotubes show luminescence in the near infrared. This can be directly used for imaging by near infra-red microscopy NIRM. FIGURE 7 .14 Nanowire Assembly on Bacteriophage Phage display yielded engi- neered versions of the M13 coat protein protein VIII with inserted peptides. Some of these are capable of binding CdS. In the presence of CdS crystals a nanowire forms along the surface of the bacteriophage. Protein VIII Protein VIII displaying SLTPLTTSHLRS Screen phage display for CdS binding peptide Addition of CdS CdS CdS CdSCdS CdSCdS CdS CdS CdS CdSCdS CdSCdS CdS Nanowire on bacteriophage M13 Engineered bacteriophage M13 Wild-type bacteriophage M13 HOOC H 2 SO 4 /HNO 3 1 R-NHCH 2 CO 2 H CH 2 O n /DMF 2 HCI/DCM HOOC HOOC O N N N H O O O O R O O O A B NH 3 + NH 3 + COOH COOH COOH COOH COOH HOOC HOOC FIGURE 7 .15 Attaching Organic Functional Groups to Nanotubes A Carbon nanotubes can be treated with acids to purify them and generate carboxylic groups. B Alternatively they may react with amino acid derivatives and aldehydes to add more complex hydrophilic groups to the external surface. From Bianco A Kostarelos K Prato M 2005. Applications of carbon nanotubes in drug delivery. Curr Opin Chem Biol 9 674–679. Reprinted with permission.

slide 237:

Nanobiotechnology Naturally occurring magnetic nanoparticles are made by magneto- tactic bacteria such as Magnetospirillum. These microorganisms can detect magnetic felds and orient themselves in response. They contain magnetosomes consisting of nanosized crystals of magnetic iron oxide magnetite Fe 3 O 4 or less often iron sulfde greigite Fe 3 S 4 inside an envelope of protein. The magnetosomes are aligned in chains along the cell axis. Synthesis of the protein shell and mineralization of the magnetic core are under genetic control. At least in some cases the genes responsible for the mag- netosome are clustered on the bacterial chromosome. It is possible to attach other molecules to the outside of magneto- somes by genetically modifying proteins of the magnetosome envelope Fig. A. The gene for the Mms16 protein of Magnetospirillum magneticum has been fused to the genes for luciferase and the dopamine receptor in the lab of Dr. Tadashi Matsunaga of the Tokyo University of Agriculture and Technology. The fused proteins are displayed on the surface of the magnetosomes. After the bacterial cells are disrupted the magneto- somes carrying the attached proteins can be purifed by magnetic sepa - ration. This should allow easier analysis of membrane-bound receptors such as those from the human nervous system in a simplifed system. Box 7.3 Magnetosomes: Natural Bacterial Magnetic Nanoparticles B A Anchor protein transmembrane Foreign protein transmembrane Foreign protein hydrophilic Anchor protein anchoring membrane FIGURE a Protein Display on Magnetosome Membrane A Display of hydrophilic protein using MagA as an anchor. B Display of transmembrane protein using Mms16 as an anchor. From Matsunaga T Okamura Y 2003. Genes and proteins involved in bacterial magnetic particle formation. Trends Microbiol 11 536–542. Reprinted with permission. b Electrochemical sensors. The electrical properties of carbon nanotubes change when attached chemical groups bind or react with other molecules. Thus electrical signals can be generated upon detection of target molecules. c Photothermal killing of cancer cells. As noted previously nanoparticles can be used to kill cancer cells. Carbon nanotubes absorb near infrared radiation and can generate local heating that causes cell death. As above the nanotubes must carry molecules that target them specifcally to the cancer cells. d Drug delivery. When using carbon nanotubes drugs are attached to the outside rather than being encapsulated. This normally requires an intervening linker molecule as shown in Figure 7.16. e Tissue regeneration. Nanotubes that carry appropriate chemical groups may be used as scaffolds for regenerating tissues such as bone and nerves. This approach is still experimental. A range of uses has been found for hollow carbon nanotubes that are fabricated to carry a variety of biologically useful side chains.

slide 238:

Chap TER 7 235 aNTIB a CTERIaL NaNOCRpETS Nanocarpets are structures formed by stacking a large number of nanotubes together verti- cally with their cylindrical axes aligned. Nanocarpets capable of changing color and of killing bacteria have been assembled from specially designed lipids that spontaneously assemble into a variety of nanostructures depending on the conditions. In water nanotubes are formed. Par- tial rehydration of dried nanotubes generates a side-by-side array—the nanocarpet. The lipid consists of a long hydrocarbon chain 25 carbons with a diacetylenic group in the middle of the chain. The individual nanotubes are about 100 nm in diameter by 1000 nm in length. The walls of the nanotubes consist of fve bilayers of the lipid. Both the separate lipid molecules and the assembled nanocarpet kill bacteria. Like other long-chain amino com- pounds they act as detergent molecules and disrupt the cell membrane. Consequently the nanocarpet provides a surface lethal to bacteria. This property could be very useful if nano- carpets are used in biomedical applications. Diacetylenic compounds have the interesting ability to change color. The nanocarpet starts out white but if exposed to ultraviolet light it turns deep blue. UV irradiation causes cross- links to form by reaction between acetylenic groups on neighboring molecules. This polym- erization stabilizes the nanocarpet. Blue nanocarpets change color on exposure to a variety of reagents. Detergents and acids change them from blue to red or yellow and the presence of bacteria such as E. coli gives red and pink shades. Eventually such materials may be used both as biosensors and for protection against bacterial contamination. DEECION OF VIRUSES BY NaNOWIRES Nanowires are what their name suggests. They have nanoscale diameters but may be several microns long. They may be metallic and act as electrical conductors or they may be made from semiconductor materials. Biosensors can be made using silicon semiconductor nanowires. They may be coated with antibodies that bind to a specifc virus. Binding of the virus to the antibody triggers a change in conductance of the nanowire. For a p-type silicon nanowire the conductance decreases when the surface charge on the virus particle is positive and conversely increases if the virus surface Nanotubes may be assembled to create surfaces nanocarpets that are antibacterial or act as biosensors. P O O O O H N O O H N N H O O − O O O O O O O O O O PTX O O O O HN O PTX O PTX O O O O HN Polyethylene glycol PEG Paclitexel PTX: O O O O OH O O OO OH O H OH NH O O n n n n m FIGURE 7 .16 Drug Delivery by Carbon Nanotubes A carbon nanotube has been modifed for paclitaxel PTX delivery. Single-walled carbon nanotubes with bound phospholipids were attached to branched polyethylene glycol PEG chains. These in turn were linked to PTX. The OH group inside the blue ring indicates where the PTX was linked to the PEG. From Gomez-Gualdrón DA et al. 2011. Carbon nanotubes: engineering biomedical applications. Prog Mol Biol Transl Sci 104 175–245.

slide 239:

Nanobiotechnology 236 is negative. Single viruses may be detected by using this approach Fig. 7.17. It is also possible to attach single-stranded DNA to the nanowire. In this case binding of the complementary single strand triggers changes in conductance. Possible future applications include both clinical testing and sensors for monitoring food water and air for public health and/or biodefense. ION ChaNNEL NaNOSENSORS Somewhat more complex than nanotubes and nanowires are nanoscale ion channels that are assembled into membranes. These channels are designed so that they can be controlled to permit the movement of ions under only certain conditions. The ion fow generates an electrical current that is detected amplifed and displayed by appropriate electronic apparatus. Ion channels can be used as biosensors by attaching a binding site for the target molecule at the entry to the channel. Attached antibodies are often used for the binding sites. The simplest arrangement results in the channel being open in the absence of the target molecule and shut when it is detected. A drop in ion fow therefore signals detection of the target molecule. At present such ion channels are being developed using modifed biological components. The ion channel itself can be made using the peptide antibiotic gramicidin A made by the bacterium Bacillus brevis. This transports monovalent cations especially protons and sodium ions. Natural gramicidin spans half of a standard biological membrane. A short-lived chan- nel is formed when two gramicidin molecules line up as shown in Figure 7.18. Permanent channels may be made by covalently linking two gramicidin molecules together. Up to 10 7 ions/second fow through a single gramicidin channel. This gives a picoampere current that is easily measured. An alternative is to monitor a change in the pH due to movement of H + ions. An optical sensor and a fuorescent pH indicator may be used to do this. The channels are made responsive by attaching an appropriate ligand molecule to the front end of the gramicidin so that it projects outward from the membrane surface. This ligand is chosen to bind the target molecule and may be an antigen or other small molecule that is recognized by an antibody or protein receptor. It is also possible to attach a single-stranded segment of DNA that will recognize and bind the complementary sequence. Thus biosensors may be designed to respond to the presence of a variety of biological molecules. The membrane itself may be a lipid bilayer made using natural membrane lipids. Typical phospholipids span half a membrane i.e. one monolayer and the two monolayers can therefore slide relative to one another. Including lipids that span its whole width will stabi- lize the membrane. Such lipids may be found naturally in certain Archaea or may be syn- thesized artifcially. Lipid bilayers are relatively fragile and in practice must be assembled on some solid support. Building a long-lasting and stable membrane structure has so far proven diffcult and practical ion channel sensors are still under development. DNA is a long thin molecule that can move through nanoscale channels in response to a voltage. Since DNA is negatively charged the DNA is attracted through the nanopore to the positive side. A novel DNA sequencing method based on this approach is now being marketed by Oxford Nanopore Technologies. When DNA occupies the channel the normal ion current is reduced. The amount of reduction depends on the base sequence G C T A. Thus a computer can measure the current and decipher the sequence based on the differences. Nanowire sensors are capable of detecting specifc individual viruses. Binding of a virus particle changes the conductance of the nanowire. Ion channel sensors operate by opening or closing the channel in response to binding a specifc molecule. They may be used to detect a variety of target molecules.

slide 240:

Chap TER 7 237 2060 2070 2080 2090 2100 050 100 150 200 Conductance nS Time sec 1 2 3 4 56 1 2 3 4 5 6 Time Conductance A B Time Conductance Time Conductance 900 925 840 975 1000 0800 1600 2400 3200 Conductance nS Time sec NW1 NW3 1 2 3 4 1 3 2 NW2 C FIGURE 7 .17 Nanowire Biosensors A A single virus particle binding to and detaching from the surface of a SiNW nanowire coated with antibody receptors. The corresponding changes in conduc- tance are shown for each step. B Conductance and optical data on addition of infuenza A virus. C Schematic of multiplexed single virus detection. Conductance versus time is recorded simultaneously for three channels specifc for different viruses. NW1 responds to infuenza A and NW3 to adenovirus. The NW2 channel was not used here. Black arrows 1–4 correspond to addition of adenovirus infuenza A pure buffer and a 1:1 mixture of adenovirus plus infuenza A. Red and blue arrows indicate conductance changes due to the diffusion of viral particles past the nanowire without specifc binding for infuenza and adenovirus respectively. Courtesy of Charles M. Lieber Harvard University Cambridge MA.

slide 241:

Nanobiotechnology 238 NaNOENGINEERING OF DNa In “classical” genetic engineering the sequence of DNA is deliberately altered in order to generate new combinations of genetic information. Even when major rearrangements are made in order to function as genetic information the DNA must remain as a base-paired double- stranded helix with an overall linear structure. In nanoengineering the objective is to build structures using DNA merely as structural material rather than to manipulate genetic infor- mation. DNA is attractive because the double helix is a convenient structural module. Moreover its natural base-pairing properties can be used to link separate DNA molecules together. However a critical requirement for assembling 3D structures is branched DNA. Although branched structures do form in biological situations especially the Holliday junction involved in crossing over during recombination they are not permanent or stable. Mixing four carefully designed single strands with different sequences can generate cross-shaped DNA. Each strand base-pairs with two of the other strands over half its length Fig. 7.19. If sticky ends are included in the initial strands it is possible to link the crosses together into a two-dimensional matrix. The nicks can be sealed by DNA ligase if desired. The principles used in branching can be extended to three dimensions and it is possible to build cubical DNA lattices. The DNA double helix is about 2 nm wide with a helical pitch of about 3.5 nm. Hence it can be used to build nanoscale frameworks. These frameworks can be used for the assembly of other components such as metallic nanowires or nanocircuits. Note that while DNA is fexible over longer distances it is relatively rigid over nanoscale lengths up to about 50 nm. The cross-shaped DNA molecules and their 3D counterparts have the drawback that the junctions are fexible and do not maintain rigid 90-degree angles. Rigid DNA components have been made by using double-crossover DX DNA molecules. Two isomers of antiparallel DX DNA exist with an odd DAO or even DAE number of half-turns between the crossover points Fig. 7.20. Double-crossover molecules with parallel strands also exist but they behave poorly from a structural viewpoint. DAE or DAO units can be assembled into a rigid array by providing appropriate sticky ends. It is possible to replace the central short DNA strand of the DAE structure with a longer protruding strand of DNA DAE+J. This allows assembly of branched structures. What is the purpose of building arrays and 3D structures from DNA Perhaps the most plau- sible purpose suggested so far is to use the DNA as the framework for assembling nanoscale electronic circuits. So far normal unbranched DNA has been used as a scaffold to create linear metallic nanowires. Various metals gold silver copper palladium platinum have been used to coat the DNA and diameters range from 100 nm down to 3 nm. Eventually it should be possible to put together these two approaches and build circuits from metal- covered 3D DNA structures. D C Na + K + A Na + K + B hν 1 HO OH O O N N Light hν 2 Na + K + hν 1 hν 2 FIGURE 7 .18 Natural and Modifed Grami - cidin Ion Channels Gramicidin forms a channel for Na + and K + ions. A Natural gramicidin chan- nels are formed when two gramicidin molecules align within a membrane. B Two gramicidin monomers joined by a photosensitive linker. C Absorption of light changes the conformation of the N N bond red of the linker from cis to trans and opens or closes the channel. D Channels are opened or closed by blocking groups blue circles attached by photosensitive linkers to each gramicidin monomer. DNA may be viewed solely as a structural molecule. Three-dimensional frameworks may be built from DNA whose sequence is designed to generate branched structures. Such DNA structures may be used as nanoscale scaffolds for metallic nanowires and circuits.

slide 242:

Chap TER 7 239 A B DAE AB DAO DAE+J FIGURE 7 .20 Rigid DNA Nanomodules Arrays may be assembled from double-crossover DX molecules. A DAE and DAO are two antiparallel DX isomers. DAE+J is a DAE molecule in which an extra junction replaces the nick in the green strand of DAE. B Two-dimensional array derived from DX molecules. Complementary sticky ends are depicted by complementary geometrical shapes. A is a conventional DX molecule but B is a DX+J molecule with a vertically protruding DNA hairpin black circle. From Seeman NC 1999. DNA engineering and its application to nanotechnology. Trends Biotechnol 17 437. Reprinted with permission. G C G T T A G G CG TGCT CA GC ACGA GT TG AT ACCG AC TA TG GC C G C A A T C C C C G A A T G C G G C T T A C G I II A B III IV Y Y Y Y Y X X X X X X Y FIGURE 7 .19 Branched DNA from Four Single Strands A A branched DNA mol- ecule with four arms. Four different color-coded strands combine to produce four arms I II III and IV. The branch point of this molecule is fxed. B Formation of a two-dimensional lattice from a four-arm junction with sticky ends. X and Y are sticky ends and X′ and Y′ are their complements. Four monomers are complexed in parallel orientation to yield the lattice structure. DNA ligase can seal the nicks left in the lattice. From Seeman NC 1999. DNA engineer- ing and its application to nanotechnology. Trends Bio- technol 17 437. Reprinted with permission. DNa ORIGaMI Building nanostructures by assembling multiple different DNA molecules becomes extremely diffcult beyond a certain level of complexity. The DNA origami approach greatly sim- plifes building DNA nanostructures by using one very long DNA strand and folding it up to form a scaffold. A number of much shorter “staple strands” are added in excess to help folding. The “staple strands” bind at specifc sites along the longer scaffold strand to drive folding Fig. 7.21. This approach means that it is no longer necessary to strictly control the ratio of different DNA strands as for the “tradi- tional” DNA folding described in the previous section see especially Fig. 7.19. Assembly is much faster and yields are much higher with the origami approach. Except in very simple cases DNA origami relies on computer- aided design in particular for specifying the DNA sequences required for building the chosen shapes. Figure 7.22 illustrates the procedure for building a complex structure by using this approach. The traditional approach for such structures would take weeks and involve synthesizing and purifying multiple long strands. These must then be assembled in the correct pro- portions and in the correct order. DNA origami in contrast requires one scaffold strand plus a roughly 10-fold excess of the staple strands. These strands are all mixed together heated and then cooled slowly to anneal. This process takes only a few hours.

slide 243:

Nanobiotechnology 240 DNa MEChaNICL NaNODEVICES A rather more futuristic use for 3D DNA structures is as frameworks for mechanical nanodevices. The essential components are moving parts of some kind. Several prototype “DNA machines” have been designed or constructed that illustrate the concept. They all use reversible changes in conformation of a DNA structure driven by changes in base pairing. Such changes may be caused either by changing the physical conditions heat salt etc. or by adding segments of single-stranded DNA ssDNA that base-pair to some region of the DNA machine as illustrated in Figure 7.23. If ssDNA is used then another single strand complementary to the frst is added to convert the machine back to its original conformation. The result is a mechanical cycle that could in principle be used to perform some sort of task. The ssDNA molecules may be regarded as “fuel” and the fnal waste product is a double- stranded DNA consisting of the two paired ssDNA fuel elements. Note that this scheme does not involve breaking covalent chemical bonds. It is thus not an enzymatic reaction and is distinct from using DNA as a deoxyribozyme as described in Chapter 5. CONTROLLED DENa TURaTION OF DNa BY GOLD NaNOpaRTICLES DNA hybridization is widely used to detect target sequences both in the laboratory and in clinical diagnosis. Before hybridization can occur the DNA double helix must be denatured into single strands. This is accomplished by heating bulk DNA. However newly emerging nanotechnology may allow specifc individual DNA molecules to be dissociated when required. Nanoparticles of about 1.4 nm and containing fewer than 100 atoms of gold are attached to double-stranded DNA. When the structure is exposed to radio waves generated by an alternating magnetic feld the gold acts as an antenna. It absorbs energy and heats the DNA molecule to which it is attached. This melts the DNA double helix and converts it to G • C C • G A • T T • A C G • C G • A T • A T • 1 3 2 A Multi-stranded C G • G C • C G • G C • G • C G • C C • G A • T T • A G • C T • A C • G G • C T • A C G • C G • A T • A T • A T • C G • C G • C G • A T • C G • 1 3 2 B Scaffolded origami C G • G C • C G • G C • C G • A T • G C • C G • G C • T A • FIGURE 7 .21 Principle of DNA Origami A The traditional approach uses multiple strands to build DNA nanostructures. B DNA origami uses one long scaffold strand plus several short staple strands that guide folding. From Rothemund PWK 2005. Design of DNA origami. Proc Int Conf Computer-Aided Design ICCAD 471–479 fgure provided by Paul Rothemund Computation and Neural Systems Caltech. In DNA origami nanostructures are built by folding up a single very long strand of DNA. Many much smaller staple strands assist in the folding. DNA has been proposed as a framework for nanomachines. Proof-of-concept prototypes have been constructed.

slide 244:

Chap TER 7 241 single strands. Heating extends over a zone of about 10 nm so surrounding molecules are unaffected. The heat is dissipated in less than 50 picoseconds so the DNA may be rapidly switched between the double- and single-stranded states by turning the magnetic feld on and off. The procedure may be applied to dsDNA with two separate single strands Fig. 7.24 or to stem-and-loop structures formed by folding from a single strand of DNA. Practical applications are several years away. However because radio waves penetrate living tissue very effectively it may eventually be possible to control the behavior of individual DNA molecules from outside an organism. Metal antennas of different materials or sizes could be used to tune different DNA molecules to radio waves of different frequencies. Fill the shape with helices and a periodic array of crossovers. Add helper strands to bind the scaffold together. 1 helix: 10.6 bases 3.6 nm 32 bases 3 turns Raster fill helices with a single long scaffold strand. Helices approximate shape within a single turn. 1 nm interhelix gap x Seam B A C D γ χ –1 1 12 2 Unstrained Helical representation of structure Strained –1 +1 –1 x x xx x x x x x x xx x x x x xx xx x xx x x xx x xx xx xx xx xx xx x x x xx xx x x x x x x x x x x x x x x x x x x x x x x x xx FIGURE 7 .22 Steps in Designing DNA Origami A Fill the chosen shape with helixes plus crossovers needed for stability. B Convert design to a single long folded scaffold. C Insert staple strands to bind scaffold into shape. D Helical representation. From Rothemund PWK 2005. Design of DNA origami. Proc Int Conf Computer-Aided Design ICCAD 471–479 fgure provided by Paul Rothemund Computation and Neural Systems Caltech. Attachment of a metallic antenna allows radio waves to melt DNA into single strands. It might eventually be possible to control the behavior of DNA from outside an organism.

slide 245:

Nanobiotechnology 242 CONTROLLED ChaNGE OF pROTEIN ShapE BY DNa Allosteric proteins change shape in response to the binding of signal molecules allosteric effectors at a specifc site. The essence of allosteric control is that the shape change is transmitted throughout the protein and affects the conformation of distant regions of the protein. In allosteric enzymes binding of an alloste- ric effector at a distant site alters the conformation of the active site and may change its affnity for the substrate. In this way some enzymes are switched on and off in response to signal molecules. For example phosphofructokinase is switched on by the buildup of AMP which signals that energy is in short supply. The response increases fow into the glycolytic pathway which increases energy generation. Similarly many DNA-binding proteins such as repressors and acti- vators also change shape on binding small signal molecules. It is possible to change the shape of a protein artifcially by mechanical force. This has been α α β β Streptavidin Au F Au F Au F Au F B BB B Agarose bead FIGURE 7 .24 Controlling DNA Denaturation by Gold Nanoparticles One strand of DNA has a gold nanoparticle Au attached to one end and a fuorescent dye F at the other end. The complemen - tary strand has a biotin tag B at one end. The biotin is bound by streptavidin and therefore binds the DNA strand to an agarose bead. When the gold absorbs energy it melts the two strands of DNA. The DNA strand with the fuorescent dye is released into the supernatant and its fuorescence is monitored. FIGURE 7 .23 Proto- type DNA Machine A DNA nanomotor designed by J. J. Li and W. Tan. Successive addition of the complementary DNA strands labeled alpha and beta causes a change in confor- mation. The DNA nanomotor alternates between a folded quadruplex structure and a double-stranded structure. The nanomotor expands and contracts in a wormlike motion. From Ito Y Fukusaki E 2004. DNA as a nanoma- terial. J Mol Catalysis B: Enzymatic 28 155–166. Reprinted with permission.

slide 246:

Chap TER 7 243 demonstrated by attaching a single-stranded 60-base segment of DNA between the two poles of a protein. Attaching the DNA requires chemical “handles.” These handles are engineered into the target protein by replacing amino acids at appropriate positions with cysteine. The reactive SH group is then used to chemically attach the DNA. Double-helical DNA is much more rigid than ssDNA. Consequently the addition of the complementary strand generates tension as it binds and creates a double helix. This approach has been demonstrated in the laboratory of Giovanni Zocchi at UCLA with maltose-binding protein and an enzyme guanylate kinase. When maltose-binding protein is stretched its binding site for maltose opens wider than optimum and the affnity for the sugar decreases. For guanylate kinase Fig. 7.25 applying tension decreases enzyme activity by lowering affnity for substrate binding. In this case releasing the tension by adding DNase to digest the DNA switches the enzyme on again. Potential applications are far in the future. However it is possible to imagine biosensors that detect DNA sequences based on this mechanism. In addition it might be possible to exter- nally control enzymes or other proteins by adding appropriate ssDNA or of course RNA. The shape of a protein may be changed artifcially by applying force. This may be demonstrated by attaching DNA strands to the protein. Pairing single-stranded DNA with its complementary strand generates tension and stretches the protein. FIGURE 7 .25 Controlling Protein Shape by DNA Protein-ssDNA chimera for guanylate kinase from Mycobacterium tuberculosis PDB structure 1S4Q. The purple attachment points for the molecular spring cor- respond to mutations Thr75 → Cys and Arg171 → Cys. A Unstretched—a single strand of DNA is attached to the protein. B The protein is stretched by addition of the complementary DNA strand purple. Courtesy of Giovanni Zocchi. A B BIOMOLECULR MOTORS A major aim of nanotechnology is to develop molecular-scale machinery that can carry out the programmed synthesis or rearrangement of single molecules or even atoms or per- form other similar nanoscale tasks. The term nanoassembler refers to a nanomachine that can build nanoscale structures molecule by molecule or atom by atom. And the term nanoreplicator refers to a nanomachine able to build copies of itself when provided with raw materi- als and energy. This of course sounds remarkably like a living cell. Indeed the organelles of living cells may be regarded as nanomachines and have provided both inspiration and compo- nents for nanotechnologists. To operate nanomachines will need energy which will be provided by “molecular motors.” At present such devices are still in development. It has been suggested that biological struc- tures might be used for this purpose. Examples include the ATP synthase the fagellar motor of bacterial cells various enzymes that move along DNA or RNA and assorted motor proteins

slide 247:

Nanobiotechnology 244 of eukaryotic cells. Several of these systems are presently being investigated in the hope of making usable nanodevices that can be coupled to nanomachines to provide energy and/or moving parts. The ATP synthase is a rotary motor whose natural role is to generate ATP. It is embedded in the mitochondrial membrane and uses energy from the proton motive force. The ATP synthase takes three steps to complete each rotation and at each step it makes an ATP. For use in nano- technology the F1 subunit would be detached from the membrane and run in reverse i.e. it would be given ATP as fuel and from a biological perspective rotate backward. Kinesin and dynein are motor proteins that use ATP as energy to move along the microtubules of eukaryotic cells. They there- fore act as linear step motors Fig. 7.26. Their natural role is to transport material. Kinesin moves cargo from the center to the periphery of the cell whereas dynein carries cargo from the periphery to the center. Kinesin takes steps of 8 nanometers and can move at 100 steps per second approximately 3 mm/hour. Each step consumes one ATP for energy. The microtubules they use as tracks are protein cylinders with an outside diameter of 30 nm. In the not so distant future complex chemical analyses might be carried out on the nanoscale see Box 7.4. Proteins that interconvert chemical and mechanical energy have been suggested as possible molecular motors to power future nanomachines. Cargo Motor domain Coiled domain C-terminal tail Microtubule Kinesin receptor Kinesin light chain Kinesin heavy chain + Microfuidics sometimes known as “lab-on-a-chip” refers to the manipulation of liquid samples at the scale of micrometers. Microfuidic devices are available today and are used to process large numbers of small samples. Applications include DNA or protein analysis of blood samples. The volumes involved are usually in the microliter range although some microfuidics devices can use volumes less than 1 microliter that is nanoliter volumes. You might think that this entitles them to be regarded as nanotechnology but remember that the dimensions of volume are the cubes of linear measure. Thus a cube with sides of 1 micrometer 10 −6 m has a volume of 10 −18 cubic meters or 10 −15 liters one femtoliter. A nanoliter 10 −9 liters is the volume of a cube with sides of 100 micrometers. So handling nanoli- ters is not nanotechnology Future prospects are scaling down liquid sample processing to true nanoscale—“lab-in-a-cell.” This would involve a microchip plat- form that uses modifed single cells as analytical devices. This idea is still in the conceptual stage but given the rapid progress in nanotech- nology it may not be so far in the future. Box 7.4 From Merely Micro to Truly Nano: Lab-in-a-Cell FIGURE 7 .26 Kinesin Linear Motor on Microtubules Kinesin consists of light and heavy chains. The light chains bind to kinesin recep- tors on vesicles that are to be transported. The heavy chains each include motor domains that use ATP as energy to move kinesin plus the attached cargo along the surface of a microtubule.

slide 248:

Chap TER 7 245 Summary Many techniques from nanotechnology are now being applied to biological systems. Conversely biological macromolecules and structures are being used in nanotechnology. Scanning probe microscopes have been used to visualize single molecules of biological importance. Individual bacteria and viruses can be detected and weighed by nanoscale devices. An ever-increasing variety of nanoparticles is already in use for biological labeling and various other analytical purposes. Nanoparticles are also being developed for clinical use and can be used to deliver drugs or to kill cancer cells by localized heating. On the other hand microorganisms are capable of assembling nanocrystals from inorganic compounds. More complex nanodevices made from protein or DNA components are being assembled and their properties investigated. In particular DNA is being used to build frameworks for the assembly of nanoscale structures. Some nanoscale sensors and motors are based on biological models and are being built at least partly from biological components. 1. What is nanotechnology a. the individual manipulation of molecules and atoms to create materials with novel or improved properties b. the creation of new terms to describe very small almost unimaginable particles in physics c. the term used to describe the size of cellular components d. the transition of molecular biology into the physical sciences e. none of the above 2. Which property is measured with a scanning probe microscope a. magnetism b. electric r esistance c. light absorption d. temperatur e e. all of the above 3. What is considered a weakness of scanning tunneling microscopy STM a. the inability to move and arrange atoms to create a design b. the possibility of destroying the surface with the metal tip on the microscope c. the requirement for a conducting surface to work properly d. the inability to apply this technology to biology e. all of the above 4. What is an atomic force microscope a. The AFM detects the force between molecular bonds in an object. b. The AFM detects atoms or molecules by scanning the surface. c. The AFM uses photons to predict the structures present on any surface. d. The AFM detects atoms or molecules on a conducting surface. e. none of the above 5. Which principle is utilized to weigh a single bacterial cell or virus particle a. Oscillation frequency is dependent upon the mass applied. b. It is impossible to weigh a single cell or particle. End-of-Chapter Questions Continued

slide 249:

Nanobiotechnology 246 c. Oscillation frequency affects the amount of light refection. d. Scanning electron microscopy can identify the length and width of a cell which can further be converted to mass. e. none of the above 6. What is a potential use of nanoparticles in the feld of biology a. delivery of pharmaceuticals or genetic material b. tumor destruction c. fuor escent labeling d. detection of microorganisms or proteins e. all of the above 7. What is an advantage to using complex multilayered nanocrystals over fuorescent dyes a. They do not bleach during excitation because they have broad absorption peaks. b. Nanocystals are often brighter than fuorescent dyes. c. The emission maximum of nanocrystals can be controlled by adjusting crystal size. d. Nanostructures are longer-lived than fuorescent dyes. e. All of the above are advantages. 8. Why is chitin the most popular material to construct nanoshells a. Chitin is easy to synthesize. b. Chitin has properties that enable it to bind strongly to DNA RNA and other small molecules. c. Chitin is stable and easy to store at room temperature. d. Chitin is naturally derived and biodegradable. e. Chitin is easier to manipulate than the alternative for the creation of nanoshells. 9. How can nanoparticles be used to treat cancer a. Nanotubes can create pores in the cancer cells thus leaking out the cellular components and killing the cell. b. Some nanoparticles can bind to specifc enzymes in cancer cell metabolism to block reactions. c. Nanoparticles can be designed to absorb radiant energy in the IR spectrum which produces heat that destroys only the cancer cells because living tissue does not absorb IR energy. d. Nanoparticles can recruit immune system components directly to the cancer cells. e. All of the above are uses. 10. What characteristic of bacteriophage M13 makes it ideal for synthesizing nanowires a. M13 accumulates certain nanoparticle building blocks in high concentrations. b. M13 phage is easily manipulated in the laboratory to secrete peptides that nanowires can be assembled upon. c. Nanowires can be constructed directly on M13 capsid proteins without any further modifcations. d. M13 is flamentous. e. Nanowires are usually created in bacterial systems not viral systems.

slide 250:

Chap TER 7 247 11. What purpose could nanotubes serve in biotechnology a. as a metallic conductor or semiconductor b. for the creation of components of electronic equipment c. for attachment of biomolecules including enzymes hormone receptors and antibodies d. for the detection of a specifc molecule in a sample such as blood e. all of the above 12. Why do nanocarpets have antibacterial activity a. The long-chain amino compounds in the nanocarpet act like a detergent and disrupt the cell membrane. b. The nanocarpet tubes act as spears and shear the bacterial cells. c. The nanocarpet binds to bacterial cells and blocks the uptake of nutrients. d. The nanocarpet immobilizes the bacterial cell so that the cells can be targeted by treatment with UV light. e. Nanocarpets can act as biosensors so that people can treat the area with antibacterial agents. 13. Which of the following is a structure that can be created by nanoengineering of DNA a. cubical structur es b. nanoscale scaffolds for circuits and nanowires c. frameworks for mechanical nanodevices d. cross-shaped DNA to create 2D matrices e. all of the above 14. How might the behavior of individual DNA molecules be controlled from outside the body a. exposure to UV light b. attachment of a metallic antenna allowing DNA to be melted with radio waves c. addition of fuorescent tags d. using an electrical current to align the DNA molecules e. none of the above 15. Which cellular component is considered to be a nanoassembler a. chr omatin b. lipids c. ribosomes d. DNA e. mRNA 16. Biosynthesis of fuorescent nanocrystals by Escherichia coli occurs when the bacteria are exposed __________ to and __________. a. green fuorescent protein cadmium chloride b. cadmium chloride sodium sulfde c. sodium sulfde potassium tellurite d. cadmium sulfde potassium tellurite e. cadmium chloride potassium tellurite 17. DNA origami __________. a. involves one long DNA strand and several “staple strands” b. produces antibacterial nanocarpets Continued

slide 251:

Nanobiotechnology 248 Further Reading Bartczak D. Muskens O. L. Sanchez-Elsner T. Kanaras A. G. Millar T. M. 2013. Manipulation of in vitro angiogenesis using peptide-coated gold nanoparticles. ACS Nano 76 5628–5636. Billingsley D. J. Bonass W. A. Crampton N. Kirkham J. Thomson N. H. 2012. Single-molecule studies of DNA transcription using atomic force microscopy. Physical Biology 92 021001. Bronstein L. M. 201 1. Virus-based nanoparticles with inorganic cargo: what does the future hold Small 712 1609–1618. Choi B. Zocchi G. Wu Y. Chan S. Jeanne Perry L. 2005. Allosteric control through mechanical tension. Physical Review Letters 95 78–102. Garcia-Fuentes M. Alonso M. J. 2012. Chitosan-based drug nanocarriers: where do we stand Journal of Con- trolled Release: Offcial Journal of the Controlled Release Society 1612 496–504. Gu L. Q. Shim J. W. 2010. Single molecule sensing by nanopores and nanopore devices. Analyst 1353 441–451. Hess H. Bachand G. D. Vogel V. 2004. Powering nanodevices with biomolecular motors. Chemical European Journal 10 21 10–21 16. Ilic B. Yang Y. Craighead H. G. 2004. Virus detection using nanoelectromechanical devices. Applied Physiology Letters 85 27. Kalle W. Strappe P. 2012. Atomic force microscopy on chromosomes chromatin and DNA: a review. Micron 4312 1224–1231. Li B. S. Wei B. Goh M. C. 2012. Direct visualization of the formation of RecA/dsDNA complexes at the single- molecule level. Micron 4310 1073–1075. Monrás J. P . Díaz V. Bravo D. Montes R. A. Chasteen T. G. Osorio-Román I. O. Vásquez C. C. Pérez-Donoso J. M. 2012. Enhanced glutathione content allows the in vivo synthesis of fuorescent CdTe nanoparticles by Escherichia coli. PLoS One 71 1 e48657. Papazoglou E. S. Parthasarathy A. 2007. BioNanotechnology Synthesis Lectures on Biomedical Engineering. San Rafael CA: Morgan and Claypool Publishers. Saccà B. Niemeyer C. M. 2012. DNA origami: the art of folding DNA. Angewandte Chemie International ed. in English 511 58–66. Simmel F. C. 2012. DNA-based assembly lines and nanofactories. Current Opinion in Biotechnology 234 516–521. Wahajuddin A. S. 2012. Superparamagnetic iron oxide nanoparticles: magnetic nanoplatforms as drug carriers. International Journal of Nanomedicine 7 3445–3471. c. produces gold antenna to aid in the denaturation of DNA d. uses a bacteriophage to produce nanotubes e. produces a nanoscale for the mass measurements of single atoms 18. Addition of __________ improves the yield of biosynthesized nanocrystals. a. cadmium sulfde b. fuor ophor es c. potassium tellurite d. glutathione e. cadmium sulfde

slide 252:

CHAPTER 249 Biotechnology Copyright © 2016 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/B978-0-12-385015-7.00008-9 8 Genomics and Gene Expression Introduction Genetic Mapping T echniques Gaps Remain in the Human Genome Survey of the Human Genome Noncoding Components of the Human Genome Bioinformatics and Computer Analysis Medicine and Genomics DNA Accumulates Mutations over Time Genetic Evolution From Pharmacology to Pharmacogenetics Gene Expression and Microarrays Making DNA Microarrays cDNA Microarrays Oligonucleotide Microarrays Hybridization on DNA Microarrays Monitoring Gene Expression Using Whole-Genome Tiling Arrays Monitoring Gene Expression by RNA-Seq Monitoring Gene Expression of Single Genes Epigenetics and Epigenomics Epigenomics in Higher Organisms

slide 253:

Genomics and Gene Expression 250 INTRODUCTION The frst working draft of the human genome sequence was announced in June 2000. The sequence was refned and a fnal high-quality sequence was fnished in April 2003 and pub - lished in 2004. In the decade since the completion of the human genome many advances in technology have increased the speed and accuracy with which an organism’s entire DNA sequence is determined. Rather than a multiyear multibillion dollar investment today’s technology allows an entire human genome to be determined for a few thousand dollars and fnished in a few days’ time. Scientists have now turned to the interpretation of genomic sequences. As of writing just shy of 13000 different genomes from all three domains of life have been sequenced com- pletely and another 27000 are partially sequenced. Although much sequence information is known there is still a disconnection between the sequence data and its functional interpreta- tion. For the human genome the exact number of genes is still enigmatic with the current count at 20805 protein coding genes but there are also short noncoding genes long non- coding genes pseudogenes and various alternately spliced genes that are not included in this tally. Although the general public expected that knowing the entire genome sequence would clarify genome function initial analyses have only added to the complexity. For example some areas of the genome previously thought to serve no function and dubbed “junk DNA” are now known to provide regulatory functions for gene expression a complexity that was not anticipated at the onset of the Human Genome Project. Some of the offshoots from the Human Genome Project are listed in Table 8.1. These proj- ects fall into two categories: understanding variation within and among different genomes and association of key sequences with a particular function or dysfunction. For the frst category although a reference genome has been decoded each person has 4 to 5 million Human Genome Project and Related Initiatives Name of Project Objective Website Human Genome Project Map and sequence the entire human genome http://genome.ucsc.edu/ Genome Reference Consortium Close remaining gaps in the human genome assembly http://www.ncbi.nlm.nih.gov/ projects/genome/assembly/ grc/ 1000 Genomes Project Sequence 1000 entire human genomes for comparisons http://www.1000genomes.org/ International HapMap Project Identify haplotype maps of human chromosomes to identify sequence variation http://hapmap.ncbi.nlm.nih.gov ENCODE project Encyclopedia of DNA Elements Identify all functional elements such as transcription factor binding sites in the human genome http://www.genome.gov/ Encode/ OMIM Online Mendelian Inheritance in Man Compile genes and their phenotypic effects http://www.ncbi.nlm.nih.gov/ omim Human Epigenome Project Analyze DNA methylation and other epigenetic modifcations http://www.epigenome.org/ Human Microbiome Project Identify the microbial inhabitants of the human body http://commonfund.nih.gov/ hmp/index T able 8.1

slide 254:

Chap TER 8 251 differences within his or her genome in comparison with other individuals and/or the reference genome. The 1000 Genomes Project aims to sequence over 1000 different human genomes and compile the different variations. The HapMap Project is also investigating variation among individual human genomes. For the second category the Online Mendelian Inheritance in Man is a list of known phenotypes and the genes responsible. The Human Epigenome Project is focused on identifying DNA methylation patterns in different tissues and different people. Finally the Human Microbiome Project is working to identify all the different microbial inhabitants of the human digestive tract skin mouth nose and female urogenital tract. The idea for this project is to determine whether or not these microorgan- isms are contributing to or protecting from disease and also to see if differences in the microbiota infuence nutrition. This chapter aims to provide a historical perspective on how the human genome was frst deciphered and then discuss how this information is being applied to study genomic variation and gene expression. Before the advent of next-generation sequencing technol- ogy the assembly of the frst human genome sequence was a monumental undertaking. The amount of sequence data to be generated was much larger than any other dataset in the bio- logical sciences. The original plan for sequencing the genome was to create random segments of the genome sequence each piece and then use the sequence overlap to assemble the pieces into contig maps which order the clones into longer linear segments without any gaps Fig. 8.1. The project frst mapped large fragments of human DNA carried by YACs and BACs to their respective chromosomal locations. The mapping was time consuming but necessary to order the sequence data. At the time of startup computers were unable to order more sequence data than found in large chromosomal fragments. During the 1990s computing power increased so rapidly that the mapping became less nec- essary. In 1998 Celera Genomics led by Craig Venter decided to sequence the entire human genome faster and cheaper. Celera Genomics proved its point by sequencing the entire 180 Mb genome of the fruit fy Drosophila between May and December of 1999. Celera used the shotgun sequencing Fig. 8.2 approach which most researchers thought would not work for such large genomes. Venter sequenced many small fragments of DNA and entered the data into the computer which then assembled the information into a working draft. Venter was able to sequence the human genome fast largely because of the increase in computer power. It was possible to use computers for most of the genome although certain problem- atical regions needed some genetic mapping for correct ordering. With the emergence of next-generation sequencing whole genomes may be sequenced in less than a week but the data generally consist of signifcantly shorter reads than with Sanger sequencing. To make up for the short read length the number of reactions in a single sequencing run is huge numbering in the millions. Depth of coverage is the num- ber of individual reads for each section of the genome. Shorter reads are much harder to align by computer so instead of de novo genome assembly next-gen sequencing data are often compared to a reference genome that was partially determined by Sanger sequencing or entirely determined by Sanger sequencing such as the Human Genome Project. Even with next-gen sequencing most genome assembly would be impossible without Sanger sequencing and the genetic and physical mapping techniques developed by the Human Genome Project. FIGURE 8.1 Contig Mapping Small clones have regions that overlap with each other. Ordering the small clones into one sequence forms a contig map. Contig Clone 2 Clone 1 Clone 3 Shotgun sequencing was the technique used to sequence the whole human genome. Each library clone is randomly sequenced and computer analysis orders the clones by identifying overlapping regions.

slide 255:

Genomics and Gene Expression 252 GENETIC MappING TEChNIQUES Genome maps provide a linear series of landmarks for use when putting together sequence data. There are two different methods for creating genome maps: genetic mapping and physical mapping. Genetic mapping provides information through mating pedigree analyses or gene transfer experiments. Genetic maps also called linkage maps are based on the rela- tive order of genetic markers revealed by mating experiments of various kinds. Consequently the actual distance in base pairs between the markers is hard to determine. In contrast physical maps give the distance between markers in base pairs because the markers are physically associated with a location on a chromosome. Physical location can be determined by radia- tion hybrid experiments or fuorescence in situ hybridization FISH. The location along the chromosome is useful to deter- mine the correct order of contigs without having to compare sequence. This is especially useful for regions with lots of repeti- tive sequences. The various approaches to mapping the human genome are summarized in Table 8.2. Traditional genetic maps are based on linkage that is the like- lihood that two mapping features segregate together after mat- ing. Linkage is a direct result of location that is if two markers are far apart or on separate chromosomes then the likeli- hood that they segregate together is low. Since eukaryotic cells undergo recombination even two markers that are on the same chromosome can segregate among progeny after mating but the closer the two markers are the less likely they will be separated by recombination. The percentage in which two markers are found together is therefore used to ascertain relative distances along a chromosome. Genetic maps use landmarks called markers. Many different types of markers can be used and some markers can be used for both physical mapping and genetic mapping. Genes with a specifc phenotype are v ery useful markers for mapping. Unfortunately not enough genes with visible phenotypes exist to provide a detailed map for the human genome. Genes are usually found in two or more different forms. For example a gene for plant height may yield a tall plant if one form of the gene is present or a short plant if a different form of the same gene is present. The different forms called alleles have various sequence alterations that give different phenotypes to the plant. When multiple alleles are present a population FIGURE 8.2 Shotgun Sequencing The frst step in shotgun sequencing an entire genome is to digest the genome into many small fragments that are cloned and sequenced individually. Computers analyze the sequence data for overlapping regions and assemble the sequences into several large contigs. Because some regions of the genome are unstable when cloned some gaps may remain even after this procedure is repeated several times. Original genome Many small fragments BREAK UP DNA CLONE AND SEQUENCE MANY SMALL FRAGMENTS Contig 1 Contig 2 Gap LINE UP SEQUENCES AND FIND OVERLAP BY COMPUTER Genetic versus Physical Mapping Techniques Type of Mapping Markers Used during Mapping Methods to Locate Markers in Genome Genetic Gene biochemical trait DNA markers RFLP VNTRs microsatellites SNPs • Linkage analysis using crosses or matings • Analysis of human family pedigr ees Physical Sequence tagged sites expressed sequence tags VNTRs microsatellites • Restriction enzyme mapping • Radiation hybrid mapping • FISH • Cytogenetic mapping T able 8.2

slide 256:

Chap TER 8 253 is considered as polymorphic. More than just genes can be polymorphic and many DNA sequence features show polymorphism see later discussion. For the Human Genome Proj- ect determining the gene order along the chromosomes was not accomplished by experi- mental mating considered unethical in humans. Instead pedigree analysis determines if different genes segregated together in a family. Other genetic markers used in mapping a genome show DNA sequence differences like the different alleles of a gene but these markers do not necessarily associate with a gene or affect the phenotype of an organism. One of the most widely used is restriction fragment length polymorphisms RFLP. RFLPs are differences in the restriction enzyme recogni- tion sequence so some members of the population have the restriction enzyme site and other members do not. The different versions of the RFLP are identifed using agarose gel electrophoresis. RFLPs are used because they are easy to identify. For small genomes such as yeast monitoring the frequency of recom- bination between two RFLP markers is easy. Diploid yeast cells undergo meiosis and form four haploid cells called a tetrad. Each of these haploid cells can be isolated grown into many identical clones and examined. Thus each RFLP marker can be followed easily from one generation to the next. In humans following such markers is more challenging but studies on groups of closely related people such as large fami- lies or small cultures like the Amish have allowed some RFLPs to be followed in this manner Fig. 8.3. Another marker used in genetic mapping is the variable number tandem repeat or VNTR Fig. 8.4. Sometimes these are called minisatellites. These sequence motifs occur naturally in the genome and consist of tandem repeats of 9 to 80 base pairs in length. The number of repeats differs from one person to the next therefore these can be used as specifc markers on a genetic map. They are also used to identify individuals in forensic medicine or paternity testing. Some repeats are found in multiple locations throughout the genome and cannot therefore be used for making genetic maps but other repeat sequences are found only in one unique location making them very useful for mapping experiments. A fourth type of marker is the microsatellite polymorphism which is also a tandem repeat. However unlike VNTRs microsatellites are repeats of 2 to 5 base pairs in length. In animals they often consist largely of cytosine and adenosine on one strand hence mostly G and T on the complementary strand. Plants seem to have less microsatellite base composition bias . Another type of marker is the single nucleotide polymorphism or SNP pronounced “snip” Fig. 8.5. SNPs are individual substitutions of a single nucleotide that do not affect the length of the DNA sequence. These changes can be found within genes in regulatory regions or in noncoding DNA. When found within the coding regions of genes SNPs may alter the amino acid sequence of the protein. This in turn may affect protein function FIGURE 8.3 RFLPs of Family Members The mother M and father F of this family have a difference in the sequence of their DNA. In the mother the difference adds a restriction enzyme site for EcoRI. The children S1 S2 D1 and D2 have inherited one or the other fragment from their parents. 1000 EcoRI EcoRI 500 M Father FS1S2D1D2 EcoRI Mother EcoRI EcoRI 5 3 GTACTAGACTTA GTACTAGACTTA GTACTAGACTTA GTACTAGACTTA FIGURE 8.4 Tandem Repeat of 12 Base Pairs This individual has only four repeats of this 12 base-pair sequence the sequence for only one strand of the DNA is shown. Other people may have more or fewer repeats. 5 3 to AAGGTAT 5 3 AAGCTAT FIGURE 8.5 SNP The same DNA segment from two different individuals has a single nucleotide difference that is an SNP . Such changes are common when comparing DNA sequences between individual people.

slide 257:

Genomics and Gene Expression 254 i.e. the SNP corresponds to a difference in an actual gene. If a SNP correlates with a genetic disease identifying that SNP may diagnose the disease before symptoms appear. When a SNP falls within a restriction enzyme site it coincides with an RFLP. Markers such as SNPs VNTRs RFLPs and microsatellites are also used in physical mapping techniques such as restriction enzyme mapping FISH or radiation hybrid mapping. These are useful but for large genomes like the human genome they still do not provide enough markers. The map builders needed other types of markers such as sequence tagged sites STSs Fig. 8.6. These sites are simply short sequences of 100 to 500 base pairs that are unique and can be detected by PCR. A specialized type of STS is the expressed sequence tag EST so called because it was identifed in a cDNA library. This means that the EST is expressed as mRNA. These small pieces of sequence data are just portions of larger genes therefore many different ESTs may be found within one single gene. Using physical mapping for markers resembles linkage analysis for genes in the sense that the closer they are the more likely they will remain together. However one method of physi- cal mapping is to use restriction enzyme digestion Fig. 8.7. Either entire genomes or single large clones from a library are digested with a variety of different restriction enzymes. Each enzyme will digest the DNA into different sized fragments which are then probed for several different STS or EST markers. If two markers are close together they will often be found on the same restriction fragment but if they are far apart they will be on different fragments. The fragment sizes are determined and used to deduce the approximate distance between two markers. The difference between this information and RFLP analysis is that restriction FIGURE 8.6 STS and EST Markers on Zebrafsh Linkage Map The relative positions of various markers are shown on the zebrafsh map. The markers include STSs and ESTs that were identifed and mapped relative to one another. In addition the positions of real genes and SNPs are shown relative to the others. The linkages were established using meiotic recombination frequencies and are presented in centimorgans cM. The GAT linkage group 15 map information for this fgure was retrieved from the Zebrafsh Information Network ZFIN the Zebrafsh International Resource Center University of Oregon Eugene OR 97403–5274 http://zfn.org/ . 0.0 fa20c11 fa66g10 z3309 kpna4 Brown STS Blue gene Green EST Red SNP gof18 z21982 z470 dharma hsp47 zsnp1265 zsnp1266 zsnp1267 zsnp1268 zsnp1269 zsnp1270 z3760 mtnr1b z732 chd fa25h06 z1195 z37 z858 gap43 8.3 16.66 54.16 ZEBRAFISH LINKAGE GROUP 15 65.03 80.25 96.2 106.67 113.05 115.27

slide 258:

Chap TER 8 255 mapping can determine the distance between two markers as the number of base pairs whereas RFLP analysis determines how often two different markers are found together but the actual distance is based on cosegregation frequency. Three other physical mapping techniques are radiation hybrid mapping cytogenetic map- ping and FISH. FISH analysis is described in detail in Chapter 3. The physical location of a particular DNA probe is determined in metaphase chromosomes by its location relative to the banding pattern. This method provides clues for ordering sequence data into one large contig. DNA probes can sometimes be unreliable because large cloned segments may actually con- sist of two fragments of DNA from different parts of the genome inserted into the same vec- tor. Radiation hybrid mapping overcomes these limitations by examining STSs or ESTs on original chromosomal fragments Fig. 8.8. To generate these scientists treat cultured human cells with X-rays or γ-rays to fragment the chromosomes. The radiation dosage controls how often the chromosome breaks and thus the average length of the fragments. The human cells possess a marker enzyme that allows them to grow on selective media. After irradiation the human cells are fused to cultured hamster cells using polyethylene glycol or Sendai virus. The hamster cells do not have the selective marker. Consequently only those hamster cells that fuse with human cells survive. The fragments of human chromosomes become part of the hamster nucleus and the individual hybrid cell lines can be examined by STS or EST mapping. Because the average fragment length is known these maps reveal relative distance between two markers. Cytogenetic mapping is another physical technique that uses original chromosomes. When chromosomes are placed on microscope slides and stained they form banding patterns that are visible under a light microscope. This cytogenetic map shows where a gene or marker lies relative to the stained bands Fig. 8.9. Cytogenetic maps are very low resolution com- pared with the other mapping techniques yet they are useful to compare gene locations on a large scale. Chromosome map Fragment collection Centromere Pair of closely linked markers 6 shared fragments 2 shared fragments Pair of less closely linked markers FIGURE 8.7 STS Mapping Using Restriction Enzyme Digests STS mapping is shown for four STS sites on a single chromosome. Various restriction enzyme digests are performed to cut the chromosome into many different- sized fragments. The number of times two STS sequences are found on the same fragment reveals the proximity of the two markers. In this example the two purple STSs are found on the same fragment six times and must be close to each other on the chromosome. The two green STSs are found on the same fragment only two times and are therefore farther apart. The purple and green STSs are never found on the same fragment therefore they must be far apart.

slide 259:

Genomics and Gene Expression 256 Xp22.33 Xp22.32 Xp22.31 Xp22.13 Xp22.2 Xp22.12 Xp22.11 Xp21.3 Xp21.2 Xp21.1 Xp11.4 Xp11.3 Xp11.23 Xp11.22 Xp11.21 Xp11.1 Xq11.1 Xq11.2 Xq12 Xq13.1 Xq13.2 Xq21.1 Xq21.2 Xq21.31 Xq21.32 Xq21.33 Xq22.1 Xq22.2 Xq22.3 Xq23 Xq24 Xq25 Xq26.1 Xq26.2 Xq26.3 Xq27.1 Xq27.2 Xq27.3 Xq28 X-CHROMOSOME Xq13.3 FIGURE 8.9 Banding Pattern of X Chromosome Staining condensed mitotic chromosomes with various DNA-binding dyes forms a distinct banding pattern. The location of a gene or marker can be determined relative to the bands. For example a gene located at Xp21.1 is on chromosome X on the p arm and on band number 21.1. TK positive donor human cells Radiation hybrid line TK positive Donor fragments taken up Chromosomes fragmented TK negative donor hamster cells IRRADIATE CELL FUSION SELECT CELLS THAT EXPRESS TK FIGURE 8.8 Radiation Hybrid Mapping To determine how close STSs and ESTs are to each other scientists must analyze many large chromosome fragments. Radiation hybrid mapping allows large human chromosome fragments to be inserted into hamster cells. First the human chromosomes which carry the thymidine kinase gene TK + are fragmented by irradiation. The human cells are then fused with hamster cells which are TK − . Such hybrid cells should express thymidine kinase and will grow on selective medium. Random loss of human chromosome fragments occurs during this process therefore each radiation hybrid cell line has a different set of human chromosome fragments which can be screened for the STSs and ESTs. Determining the order of various markers allows generation of genetic maps. Markers used include RFLPs SNPs VNTRs and microsatellite polymorphisms. Genetic mapping uses linkage analysis of two markers by mating experiments or pedigree analysis. Physical maps link a marker or DNA sequence to a physical location along a chromosome or large contig. Markers such as ESTs and STSs are used to determine these distances. Techniques include restriction enzyme digestion mapping FISH radiation hybrid analysis and cytogenetic mapping.

slide 260:

Chap TER 8 257 GapS REMaIN IN ThE hUMaN GENOME Although the sequence of the genome is considered complete there are still gaps. One method of fnd - ing the sequence of any gaps is called chromosome walk- ing. In this method a particular clone is sequenced to start the process. Then the new sequence data is used to fnd overlapping clones Fig. 8.10. After those are identifed and sequenced more overlapping clones are identifed. The process goes in order either up or down the chromosome compiling the sequence piece by piece. Usually the frst clone is located relative to a particular marker such as an STS or RFLP. Most of the gaps fall in highly condensed regions of repetitive DNA known as heterochroma- tin which is diffcult to sequence. Three features characterize het- erochromatin: hypoacetylation i.e. lack of acetyl groups on the histones methylation of his- tone H3 on a specifc lysine and methylation on CpG or CpNpG sequence motifs. Heterochroma- tin is not transcribed and comes in two forms: facultative het- erochromatin and constitutive heterochromatin Fig. 8.1 1. The amount of methylation on lysine-9 in histone H3 determines whether or not heterochromatin is considered facultative or constitutive. The constitutive form is found around the centromeres and telomeres of the chromosome and does not change from one generation to the next. Facultative heterochromatin is found in other regions of the chromosomes and its presence is cell-specifc. Once a specifc region of a chromosome becomes heterochromatin all of the cells’ descendants will maintain this pattern. The border for facultative heterochromatin is not static and each cell in a tissue might have a little more DNA condensed than other cells. This is exemplifed by the classic inactivation of the white gene in Drosophila where the fy will have a mottled red and white eye color because the gene is silenced into facultative heterochromatin in some cells and not others. This genetic variation is called position effect variegation PEV. FIGURE 8.11 Facultative versus Constitutive Heterochromatin The amount of methyla- tion on lysine-9 in histone H3 determines whether or not heterochromatin is considered facultative or constitutive. FACULTATIVE CONSTITUTIVE Histone K9 CH 3 DNA Histone K9 CH 3 CH 3 DNA K9 CH 3 CH 3 CH 3 Histone H3 tail FIGURE 8.10 Chromosome Walking Researchers identify the downstream and upstream regions of a gene using chromosome walking. In this example the end of library clone 1 is converted into a probe. The probe is used to screen a library and a second clone is identifed. The two clones overlap and are linked to form a complete gene. Library clone 1 Library clone 2 Front of gene Back of gene Make probe “A” “A” Screen library

slide 261:

Genomics and Gene Expression 258 SURVEY OF ThE hUMaN GENOME The sequence of the human genome is 3.2 × 10 9 base pairs 3.2 Gbp or gigabase pairs in length. If the sequence were typed onto paper at about 3000 letters per page it would fll 1 million pages of text. This extraordinary amount of information is encoded by the sequence of just four bases: cytosine adenosine guanine and thymine. Most people expected the human genome sequence to reveal the exact number of genes that humans possess. In real- ity sophisticated interpretation is needed to identify many of the genes. The tally for the number of protein coding genes is 20805 but with each refnement of the human genome the number varies. In addition many of these genes are alternately spliced and the actual number of proteins is much higher than the number of genes. Of these genes we know the function of only around 50. More than two-thirds of the predicted human proteins are similar in structure to proteins in other organisms. The genome sizes and estimated gene numbers are given for several organisms in Table 8.3. The animal with the highest number of genes is Daphnia pulex or the water fea with about 31000 different genes of which about one-third are unrelated to any other organism that has been studied thus far Fig. 8.12. This tiny crustacean is a key organism to understand how the environment affects gene expression. A rare Japanese fower is the current record holder for genome size with the marbled lungfsh the current second place for genome size. The Japanese fower Paris japonica has almost 50 times more DNA than humans approxi- mately 130000 Mb. In contrast the mammalian parasite Encephalozoon intestinalis has only 2.25 million bases which is the smallest genome of an organism with a nucleus Fig. 8.13. Although wheat has nearly 6 times as much DNA as humans and 95000 genes it is hexa- ploid rather than diploid like most eukaryotes. Thus the wheat genome may be regarded as combining three sets of around 32000 genes. Several other plants in Table 8.3 have less DNA but more genes than any multicellular animal sequenced so far. The largest bacterial genomes have more genes than the smaller eukaryotic genomes. An example is Streptomyces famous as the source of many antibiotics. The smallest bacterial genomes such as Mycoplasma have fewer than 500 genes although—because these bacteria are parasitic they rely on the eukaryotes—they infect for many metabolites. The protozoan parasite Trichomonas vaginalis was proposed to have as many as 60000 genes when its genome was frst sequenced. However further analysis has reduced this to 46000 and the total number of genuine genes may be signifcantly lower. Major problems in annota - tion were due to many duplicated sequences and the presence of partial gene sequences in them. In eukaryotic DNA some genes encompass thousands or even millions of base pairs most of which comprise introns that are spliced out of the mRNA transcript. For example the gene for dystrophin defective in Duchenne’s muscular dystrophy is 2.4 million base pairs long and some of its introns are 100000 base pairs or more in length. In contrast the coding sequence consisting of multiple exons is only about 3000 base pairs. In such situations it is not easy to fnd coding sequences among the noncoding DNA. On the one hand this may result in genes or individual exons being completely missed. On the other hand widely separated exons that are in reality parts of a single coding sequence may be interpreted as separate genes. Gaps in genomes can often be sequenced by chromosome walking where one end of a library clone is used to fnd other overlapping clones. Most gaps result from heterochromatin highly condensed repeti - tive DNA found in specifc sites throughout the genome. The physical nature of heterochromatin makes it diffcult to sequence.

slide 262:

Chap TER 8 259 Another confounding factor in determining the number of genes is the presence of pseudogenes. These duplicated copies of real genes are defective and no longer expressed as proteins. They may be found next to the original or they may be far away on different chromosomes. Determining whether or not a “gene” is a pseudogene or genuine gene may be diffcult using sequence data alone. Often the expression of a partic - ular region of DNA must be confrmed by fnding corresponding mRNA transcripts. DNA microarrays are a popular approach to confrming whether or not a gene is expressed see later discussion. The number of genes also hinges on how we defne a gene. In addition to the approximately 22000 protein-encoding genes there are a thousand or more genes that encode nontrans- lated RNA. The ribosomal RNA and transfer RNA genes are the most familiar of these. How- ever a variety of other small RNA molecules are involved in splicing of mRNA and in the regulation of gene expression. Other sequences of DNA may not even be transcribed yet are nonetheless important. Should these also be regarded as genes Estimated Number of Genes for Various Genomes Organism Genome Size Megabase Pairs Estimated Genes Protein Encoding Plants Wheat Triticum aestivum 17000 95000 Black poplar Populus trichocarpa 520 45000 Rice Oryza sativa 390 38000 Mustard weed Arabidopsis thaliana 125 26000 Japanese fower Paris japonica 149000 unknown Protists Paramecium tetraaurelia 72 40000 Trichomonas vaginalis 160 46000 Encephalozoon intestinalis 2.25 1833 Animals Marbled lungfsh 130000 unknown Human Homo sapiens 3200 21805 Mouse Mus musculus 2800 25000 Roundworm Caenorhabditis elegans 97 20493 Fruit fy Drosophila melanogaster 180 13600 Daphnia pulex 200 31000 Fungi Aspergillus nidulans 30 9500 Yeast Saccharomyces cerevisiae 13 5800 Bacteria Streptomyces coelicolor 8.7 7800 Escherichia coli 4.6 4300 Mycoplasma genitalium 0.58 470 T able 8.3

slide 263:

Genomics and Gene Expression 260 NONCODING COMpONENTS OF ThE hUMaN GENOME The number of protein-encoding genes is only a small fraction of what is important in the human genome. When one is looking at homology with the mouse genome many regions that do not encode protein are highly conserved. These areas are also conserved among the rat and dog genomes and are called conserved noncoding elements CNE. There are 500 regions of 200 or more nucleotides that are perfectly conserved and over 10000 elements that are highly conserved. One estimate suggests that there are mil- lions of regions that are conserved to some degree. Some of these regions are conserved even in fsh species suggesting that these elements are functional and some theorize that they may act as enhancer ele- ments for gene expression. Another proposed function includes insulator sequences that prevent the wrong enhancer elements from activating the wrong gene but the actual functions for most of the CNE are unknown. Table 8.4 lists the major components of the human genome. One common type of noncoding DNA is the introns between the coding segments of genes the exons. About 25 of the human genome consists of genes for proteins. However of this amount only 1 is actual coding sequence and the other 24 comprises the introns. Most introns have no function but occasional examples of whole genes have been found within introns of a different gene. Introns may also contain binding sites for transcription factors and in that sense play a role in gene regulation. FIGURE 8.12 Daphnia pulex water fea The crustacean Daphnia pulex has the highest number of genes 31000 and it is shown in the photo containing a brood of offspring. Daphnia are able to create clones or mate depending on the environmental conditions. These organisms are used to assess environmental quality. Photo by Paul D.N. Herbert University of Guelph. Cour- tesy of Indiana University. AB C FIGURE 8.13 Genome Size Varies Among Organisms A Photo of Paris japonicum that has the largest genome size in base pairs. B The marbled lungfsh has the second largest genome. C Encephalozoon intestina - lis are the black ovoid structures found in the cell. These spores are at different stages of development and when all mature they cause the infested cell to erupt and spread the infection. The number of genes in an organism depends on the defnition of gene and the distinction between real genes and pseudogenes. The absolute number of genes in any sequence is approximate.

slide 264:

Chap TER 8 261 Another feature of the genome is moderately repetitive sequences. Ribosomal RNA genes are found in great numbers because many ribosomes are needed. These genes are con- sidered moderately repetitive elements but of the coding variety. Noncoding repetitive elements include the long interspersed element or LINE which is found in 200000 to 500000 copies and accounts for up to 20 of the human genome Fig. 8.14. These retrovirus-like elements contain genes inside long terminal repeats LTRs similar to retro- viruses. They are autonomous that is the LINE is able to copy itself and insert new copies into other sites in the genome. However most copies of LINEs are defective only a few are still mobile and functional. There are many different types of LINEs the most common by far in mammals being L1. Components of Mammalian Genomes Unique Sequences Protein-encoding genes—comprising upstream regulatory region exons and introns Genes encoding noncoding RNA snRNA snoRNA 7SL RNA telomerase RNA Xist RNA a variety of small regulatory RNAs Nonrepetitive intragenic noncoding DNA Interspersed repetitive DNA Pseudogenes Short Interspersed Elements SINEs Alu element 300 bp ∼1000000 copies MIR families average ∼130 bp mammalian-wide interspersed repeat ∼400000 copies Long Interspersed Elements LINEs LINE-1 family average ∼800 bp ∼200000–500000 copies LINE-2 family average ∼250 bp ∼270000 copies Retrovirus-like elements 500–1300 bp ∼250000 copies DNA transposons variable average ∼250 bp ∼200000 copies Tandem Repetitive DNA Ribosomal RNA genes 5 clusters of about 50 tandem repeats on 5 different chromosomes Transfer RNA genes Multiple copies plus several pseudogenes Telomere sequences Several kb of a 6-bp tandem repeat Mini-satellites VNTRs Blocks of 0.1 to 20 kbp of short tandem repeats 5–50 bp most close to telomeres Centromere sequence alpha-satellite DNA 171-bp repeat binds centromere proteins Satellite DNA Blocks of 100 kbp or longer of tandem repeats of 20 to 200 bp most close to centromeres Mega-satellite DNA Blocks of 100 kbp or longer of tandem repeats of 1 to 5 kbp various locations Numbers of copies given is for the human genome. T able 8.4

slide 265:

Genomics and Gene Expression 262 When a LINE retro-element is active it moves to a new location via an mRNA inter- mediate. First the active LINE is transcribed into mRNA using an internal promoter. The mRNA goes to the cytoplasm where ribo- somes translate it creating two proteins. One of these the combined reverse transcriptase/ endonuclease protein binds to the mRNA forming a ribonucleoprotein RNP. This is transported back into the nucleus where the endonuclease domain nicks the DNA at the new location in the genome generating a free 3′-OH end. Next the reverse transcrip- tase domain makes a DNA copy of the LINE mRNA using the 3′-OH end as primer. This results in a copy of LINE1 inserted into a new location in the genome. Cellular repair enzymes fll in the gaps to create duplicated sequences fanking both sides of the new LINE. When a LINE moves to a new location it may disrupt an essential gene which would prove fatal to the cell. Control of LINE movement is critical. Too much movement is disruptive and might destroy both the host cell and the LINEs it contains. Conversely too little movement and the LINE will fail to reproduce effectively. In humans many LINEs are found in gene-poor A/T-rich regions of the genome suggesting that some mechanism exists to keep these elements away from vital genetic information. Most functional LINE elements are silenced by methyla- tion of their promoters and by methylation of the histones bound to them. In addition RNA surveillance pathways piRNA see Chapter 5 monitor and degrade the LINE RNA transcripts. In addition to moderately repetitive sequences the human genome is flled with highly repetitive DNA. The short interspersed elements or SINEs see Fig. 8.14 are retro- elements like the LINEs and account for around 13 of the human genome. The most common type of SINE was named the Alu element because an Alu restriction enzyme site falls within it. The human genome contains about 300000 to 500000 Alu elements. SINE elements cannot move to new locations in the genome without help from the LINE reverse transcriptase/endonuclease protein. Unlike LINEs SINEs are found in gene-rich regions of the human genome but they are shorter and often inert so their presence does not usually interfere with gene function. Another type of highly repetitive element found in the human genome is the minisatellite or VNTR. These were used in mapping the human genome and are scattered around the entire genome see earlier discussion. BIOINFORMa TICS aND COMpUTER aNaL YSIS As noted previously the use of computers has revolutionized the way in which genetic information is gathered and analyzed. The term bioinformatics has been coined to describe the scientifc discipline of using computers to handle biological information. It encompasses a large number of subfelds Table 8.5. Bioinformatics includes the storage retrieval and analysis of data about biomolecules. By far the greatest achievement of the bioinformatics FIGURE 8.14 General Structure of LINE and SINE LINE elements contain an internal promoter for RNA polymerase II and two open reading frames that encode proteins. The frst protein has an unknown function. The second is a bifunc- tional protein with reverse transcriptase and DNA endonuclease domains. SINE elements usually contain only an internal promoter for RNA polymerase III and some sort of tRNA stem-loop structure followed by a polyA tail. Target site direct repeat Multiple stop codons ORF1 LINE 6kb SINE 0.3kb ORF2 Target site direct repeat A/T rich region A/T rich region Noncoding genomic DNA includes many different types of elements—for example LINE SINE and satellite DNA. LINE sequences are transcribed into RNA and translated into proteins. One of these proteins reassociates with the mRNA. The RNP complex re-enters the nucleus where it integrates a LINE copy into a new genome location.

slide 266:

Chap TER 8 263 revolution has been the sequencing of the human genome. The term bioinformatics is now used to include analyses of data from DNA microarrays see later discussion assessment of the function of genomes and the comparison of different genomes. Because bioinformatics is so widely used it is important to make genomic data available to researchers. The data from the Human Genome Project is available through the National Center for Biotechnology Information website http://www.ncbi.nlm.nih.gov/. Some other websites that present sequence data are listed below. On the NCBI home page you can explore the human genome in many different ways. Using “Gene” you can identify a specifc gene by name. The record for each gene contains the gene name and description its loca- tion a graphical representation of the introns and exons for all the protein isoforms that are known and a summary of all the information known about the gene. Additionally the vari- ous domains within the protein such as actin-binding sites are listed with links to explain the domain and its function. Finally genes and/or regions of DNA from other organisms that are homologous to the gene are shown. The page also contains links to research papers on the gene’s function. Some Bioinformatics Websites: n GenBank and linked databases m http://www.ncbi.nlm.nih.gov/Gene/ m http://www.ncbi.nlm.nih.gov/mapview/ m http://www.ncbi.nlm.nih.gov/genome/guide/human/ n Genome Database GDB human genome m http://genome.ucsc.edu/index.html n European Bioinformatics Institute including EMBL and Swissprot m http://www.ebi.ac.uk/ n Flybase Drosophila genome/Wormbase C. elegans genome/Yeast genome Saccharo- myces genome m http://fybase.org/ m http://wormbase.org m http://www.yeastgenome.org/ n RCSB Protein Data Bank m http://www.rcsb.org/pdb/ n PIR Protein Information Resource PIR m http://pir.georgetown.edu/ Fields of Study Related to Bioinformatics Field Description Computational biology Evolutionary population and theoretical biology statistical models for biological phenomena Medical informatics The use of computers to improve communication understanding and management of medical information Cheminformatics The combination of chemical synthesis biological screening and data mining to guide drug discovery and development Genomics The analysis and comparison of the entire genetic complement of one or more species Proteomics The global study of proteins Pharmacogenetics Using genomic and bioinformatic methods to identify individual differences in response to drugs Pharmacogenomics Applying genomics to the identifcation of drug targets T able 8.5

slide 267:

Genomics and Gene Expression 264 The program Map Viewer http://www.ncbi.nlm.nih.gov/mapview/ is used to browse the human genome without any particular gene in mind. For example individual chromosomes can be explored via a graphical interface that allows you to zoom in and out of various regions. Another genome browser can be found at http://www.ensembl.org. The amount of information generated by the Human Genome Project is tremendous and understanding this information without the use of computers is too diffcult. Data mining refers to the use of computer programs to search and interpret the data. Many bioinformat- ics researchers develop programs that search the genomic data banks and sift sort and flter the raw sequence data. Data mining programs often process information using the following steps: 1. Selection of the data of interest. 2. Preprocessing or “data cleansing.” Unnecessary information is removed to avoid slow- ing or clogging the analysis. 3. Transformation of the data into a format convenient for analysis. 4. Extraction of patterns and relationships from the data. 5. Interpretation and evaluation. These programs can be designed to search for related sequences determine areas of coding and noncoding DNA by looking at codon bias or search for known consensus sequences just to name a few applications. Searching for related sequences or similarity searches allows researchers to identify a potential function for a gene. If a gene of unknown function from humans is very similar to a characterized gene from fies the two encoded proteins may have similar functions. This type of research is called comparative genomics. More than one gene can be compared. For example entire pathways are often similar in different spe- cies. Thus human insulin attaches to a receptor on the cell surface and controls gene tran- scription via several intracellular proteins. Remarkably very similar insulin signaling proteins are found in the roundworm Caenorhabditis elegans. Diffculties may arise if scientists use only comparative genomics to study a new protein. Sometimes similar sequences play radically different roles as similar proteins may evolve new functions. Thus sequence similarity does not always imply functional similarity. Finally the databases themselves are not perfect and may contain mistakes that are misleading. Comparative genomics must be complemented with other studies to reliably assign a role to a novel protein. Other programs determine coding and noncoding areas of the genome by looking at codon bias. Identifying coding regions is critical for fnding genes and can be accomplished by look - ing at the wobble position third base of the codon. Although a particular amino acid is often encoded by multiple codons some codons are used preferentially. This codon bias varies from one organism to another. Most of the tRNAs for a particular amino acid will recognize the favored codons. For example Escherichia coli genes preferentially use CGA CGU CGC and CGG for the amino acid arginine but rarely use AGA or AGG. Consequently very few tRNAs for arginine are produced that recognize the AGA and AGG codons. Codon bias is seen in regions that encode proteins but in noncoding regions the wobble position will not maintain this bias. Thus a potential gene in E. coli would contain relatively few AGA and AGG codons. Finally programs that identify consensus sequences allow researchers to fnd various signa - tures or motifs associated with particular functions. For example a site that binds to ATP has specifc amino acids in specifc locations. These sequences in an unknown protein may help identify one of its functions. Other motifs include actin-binding domains which indicate that the protein binds to the cytoskeleton or protease cleavage sites that suggest the protein is subject to intracellular modifcation by proteases. Any potential motif in the sequence must be confrmed experimentally. For example a protein with an ATP binding site signa - ture must be shown to bind ATP experimentally. Thus sequence analysis provides a basis for further experiments.

slide 268:

Chap TER 8 265 MEDICINE aND GENOMICS One of the greatest applications for human genomics data is in disease and its diagnosis. Medical applications of genomics are abundant and the later chapters of this book cover some of them. Gene testing is the most common present application. Once genes have been associated with particular diseases people can be screened for genetic mutations within the gene. Such tests can diagnose diseases such as muscular dystrophy cystic fbrosis sickle cell anemia and Huntington’s disease because these are strictly inherited disorders. In diseases with an environmental component genetic testing offers information that may change how a person lives his or her life. Perhaps those with a genetic predisposition to colon cancer will have more screenings earlier than usual and perhaps alter their diet to minimize the chance of cancer developing. Other applications include gene therapy see Chapter 17. The use of genomics in medicine has expanded greatly in the last decade and will change rapidly over the coming years. The cost of sequencing an entire human genome is now low enough for this to be used as a diagnostic test. As of writing next-generation sequencing techniques have been streamlined to identify the causative mutation in diseases. Instead of sequencing the entire genome many scientists prefer whole exome sequencing where only the exons from the human genome are amplifed and sequenced. This smaller dataset allows the identifcation of mutations in the sequences that specifcally affect proteins and can sug - gest whether or not that variation is the causative agent of the disease. Since exons comprise less than 2 of the genome the cost for this analysis is much cheaper. As of this writing around 3000 different diseases have been identifed using genomics and pedigree analysis. These are termed Mendelian diseases because they are inherited as would be expected if the disease resulted from a single mutation in a single gene. Once a causative gene is identifed then a strategy for treatment can be developed. For example Marfan syn - drome is a disease whose features include long arms legs and fngers tall/thin body type curved spine sunken chest fat feet and fexible joints. The most serious of the problems is the threat of aortic aneurisms due to weak collagen in the aorta. The disease is caused by defects in connective tissue and in some cases may result in hypermobile joints. The disease is typically inherited from parents but some cases occur spontaneously. Next-generation sequencing of families affected by Marfan syndrome has identifed that a mutation in the fbrillin-1 gene increases the amount of transforming growth factor beta TGFβ in those affected by the disease. The increase in TGFβ in turn prevents the tissues of the body from developing properly. The breakthrough discovery that too much TGFβ was the cause allowed doctors to try treating patients with TGFβ inhibitors which allows the tissues to develop properly and prevents the aorta from enlarging from increased pressure. Besides Mendelian diseases common diseases can also be studied with genomics. Many common diseases are polygenic that is they are caused by a variety of genes not one specifc mutation. Since sequencing of the human genome was completed scientists have associated different genetic variations with common disorders such as Crohn’s disease type 2 diabetes autoimmune disease kidney disease and psychiatric disorders. The association of a genetic variation and a disease is done by genome-wide association study GWAS. In these studies whole genome analyses of two distinct populations are compared. For example all the variations from a population of people without diabetes are compared to a population of people with type 2 diabetes. Any variant that has a frequency greater than 1 in the affected population is a potential variation that can cause this phenotype. This type of study is analogous to a mutagenesis screen in a model organism. The scientist looks for Bioinformatics is the study of biological information using computers. Data mining uses computer algorithms to study sort and compile information from genomic data- bases. Information about genes can be obtained by comparing sequences from different organisms.

slide 269:

Genomics and Gene Expression 266 a subgroup with a similar phenotype from a population of individuals. Each of the indi- viduals has mutations that cause the phenotype but these may or may not be in the same gene. When the scientist looks at enough individuals the key genes can be identifed. In this example each individual of a population with type 2 diabetes has some genetic change that makes him or her ill. So far around 40 different loci or potential genes are associated with the disease. Some of these loci are associated with insulin secretion suggesting that prob- lems with insulin release may be a causative factor for the disease. As with all complicated diseases the actual changes identifed so far can account for only 20 of the heritability in type 2 diabetes. DNa a CCUMULTES MUT a TIONS OVER TIME The Human Genome Project has opened the doors to improved analyses for many areas includ- ing evolutionary biology. The sequence features of the human genome arose over millions of years as mutations or alterations of the genetic material occurred and were passed on to successive generations. During the course of human history many different events have sculpted our genetic history and resulted in our current genetic state. Each individual has undergone some sort of genetic recombination and/or mutation to become unique in physical and emo- tional constitution. Genetic mutations constantly occur throughout all the cells of our bodies. Most of the defective cells die and undergo apoptosis see Chapter 20. When a mutation occurs in the somatic cells the children or offspring do not inherit the mutation only when a muta- tion occurs in the germline or sex cells are the mutations passed on to the next generation. Many different types of mutations cause genetic diversity Table 8.6. The most common are base substitutions in which one nucleotide is exchanged for another. When a purine base is replaced by another purine or a pyrimidine is replaced by another pyrimidine this is called a transition. If a pyrimidine is exchanged for a purine or vice versa this is a Genomics has a wide-reaching effect on biotechnology and medicine. Types of Mutations Base Changes Normal Mutant Transitions GAACGT GAGCGT Transversions GAACGT GATCGT Missense mutation GAA CGT GAT CGT Glu Arg Asp Arg Conservative substitution ACT CGT TCT CGT Thr Arg Ser Arg Radical replacement GAT CGT GCTCGT Asp Arg Ala Arg Nonsense mutation GAA CGT TAA CGT Glu Arg Stop Insertions GAACGT GAAACGT Deletions GAACGT GACGT T able 8.6

slide 270:

Chap TER 8 267 transversion. These mutations create the SNPs used in genomic maps. Because different human individuals vary by about 1 in 1000 to 2000 bases there are on average 2.5 million SNPs over the whole genome. SNPs or single base substitutions can fall anywhere in the genome in either coding or noncoding DNA. When the SNP falls within a gene it may alter protein sequence and func- tion. When the base substitution alters one amino acid in a protein the mutation is called a missense mutation. Some missense substitutions have little effect on protein structure or function because one amino acid replaces another with similar properties. This is known as a conservative substitution. An example would be replacing threonine by serine as these vary only slightly in size but not in chemistry both have an –OH group. Radical replace- ments on the other hand can alter the protein function or structure because they involve replacing amino acids with others that have a different chemistry. For example aspartic acid or serine are often involved in hydrogen bonding and when either is replaced by a neutral amino acid like valine the protein structure may become unstable. Sometimes missense substitutions create conditional mutations in which the protein will work under certain conditions but not others. A common conditional mutation is a temperature-sensitive mutation in which the mutation does not alter the protein function at the permissive tem- perature but the protein is defective at the restrictive temperature. When base substitutions change a codon for an amino acid into a stop codon this results in a truncated version of the original protein. These are nonsense mutations. Theoretically mutations may result in the insertion or deletion of one or more bases. As for single base substitutions location is the key to what effect the mutation will have. If the deletion or insertion of a few bases falls within a gene it may alter the reading frame of the protein which will create random polypeptide after the mutation. Often the altered read- ing frame creates a stop codon which truncates the protein. Large deletions may of course completely remove all or part of a gene. Larger segments of DNA can undergo alterations due to inversions translocations and duplications. Inversions occur when DNA segments become inverted relative to the original sequence. Translocations occur when DNA segments are moved to new locations. Duplications occur when the DNA segment is copied and then moved resulting in two identical regions. Theoretically mutations occur randomly throughout the genome. However mutation hot spots are regions where mutations occur at much higher frequencies. Mutations often occur at methylated sites because methylated cytosine often loses an amino group turning into thymine. DNA polymerase can also induce mutations during DNA replication. Occasionally the proofreading ability of the polymerase fails and single wrong bases are incorporated. More often DNA polymerase undergoes strand slippage when a segment of DNA is highly repetitive. The result is either a duplication or deletion depending on the orientation of the slippage. Genetic variation in the human genome refects recombination hot spots. In fact most regions of the human genome are passed from one generation to the next in segments called haplotype blocks or hapblocks. Because recombination usually occurs only in certain defned spots the regions in between the two hotspots will be inherited together as a group or block. Each of the blocks has a variety of different variations and they always segre- gate as one thus the region is called a hapblock. The rates at which mutations occur help in understanding how mutations have affected the course of evolution. The rate of mutation is low and depends on the organism and even the particular gene being considered. Nonetheless over long periods of time many muta- tions will occur. As Table 8.7 suggests the rate of mutation is much lower in genomes that are larger. In E. coli mutations occur at 5.4 × 10 −7 per 1000 base pairs per generation but in humans mutations occur over 10 times more slowly at only 5.0 × 10 −8 per 1000 base pairs. However when the mutation rate is corrected for effective genome size i.e. coding capacity rather than total DNA it is approximately the same for most organisms. This suggests that some mechanism must actively control the mutation rate.

slide 271:

Genomics and Gene Expression 268 GENETIC EVOLUTION Molecular phylogenetics is the study of evolutionary relatedness using DNA and protein sequences. Comparing sequences from different organisms shows the number of changes that have occurred over millions of years. All cellular organisms including bacteria plants and ani- mals have ribosomal RNA. These sequences can be compared and the differences can be used to determine relatedness. This approach is less subjective than using physical characters for taxonomy. The cladistic approach assumes that any two organisms ultimately derive from the same com- mon ancestor if we go far enough back and that at some point bifurcation or separation into two clades occurred in their line of descent. The difference between the two organisms indicates how long ago the split occurred. Taxonomy may be based on visible characteristics—that is the phenotype. This works well to a frst approximation in organisms with plenty of obvious features such as mammals and plants. But in organisms such as bacteria the method falls apart. However molecular phylogenetics allows making family trees for every organism. When molecular data are used to study relatedness it is essential that the sequences are cor- rect and have truly come from the organisms under study. This can be complicated in higher organisms because some sequences have been derived from other organisms such as viruses or bacteria. This problem applies to all organisms to some extent. For example many bacte- rial genomes contain inserted bacteriophage genomes. Another important point is to ensure that sequences being compared are truly homologous that is they have all descended from one shared ancestral sequence. When gene sequences are compared they are aligned so that the regions of highest similarity correspond Fig. 8.15. This type of alignment can determine the relatedness of two or more proteins or genes. The relatedness can be represented graphically by drawing phylogenetic trees. The tree has vari- ous features: a root nodes and branches Fig. 8.16. The root represents the common ances- tor and the branching indicates the separations that occurred during evolution. Individual nodes represent common ancestors between two subgroups of organisms. Branches represent clades that is groups of organisms with a common ancestor. The length of the branches indicates the number of sequence changes so if the branches are short the two groups of organisms bifurcated relatively recently and if the branches are long the split occurred long ago. Mutation Rates in DNA Genomes Mutation Rate per Generation Organism Genome Size Kilobases Per kb Per Genome Uncorrected Per Effective Genome Bacteriophage M13 6.4 7.2 × 10 −4 0.005 0.005 Bacteriophage Lambda 49 7.7 × 10 −15 0.004 0.004 Escherichia coli 4600 5.4 × 10 −7 0.003 0.003 Saccharomyces cerevisiae 12000 2.2 × 10 −7 0.003 0.003 Caenorhabditis elegans 80000 2.3 × 10 −7 0.018 0.004 Drosophila 170000 3.4 × 10 −7 0.058 0.005 Human 3200000 5.0 × 10 −8 0.16 0.004 T able 8.7 Mutations occur in all organisms at random places in the genome with an approximately similar rate. Mutations can be simple base substitutions inversions deletions or insertions. The length of insertions and deletions is variable.

slide 272:

Chap TER 8 269 Based on alignments genes have been grouped into families groups of closely related genes that arose by successive duplication and divergence. Gene superfamilies occur when the functions of the various genes have steadily diverged until some are hard to rec- ognize. For example the transporter superfamily encompasses many proteins that trans- port molecules across biological membranes. This superfamily has members that transport M M M M M H H P H H L L F L L : K K K K K 100 100 100 100 100 142 142 142 142 142 L L L L L G G G G G V V V V V S S S S G . S S S S S L L L L L H H H H Q : A A A A A S S S S S C C C C C Q Q Q Q Q P P A G A L L L L F : V V V V I : A A D E A L L L L L K K K K K D D D D D V V V V V G G A G G . K K K K K A A C S I K K K K K K K K K K E E D D A L L L L L K K K K K H H H H H V V V V V A A N A G L L H H H A A A A V . A A C A I P P P P P D D D D A W W W W F : A A G A A . A A A A A G G G G T T T T T T N N K S E . G G G G A . H H H H H H H H H H G G G A E P P P P P A A A A A A A G G G . L L L L L D D D D D G G G G G D D D D D M M L L I : A A E A A F F F F F N N G G G . A A A A T : L L L L L A A A A T : L L L L L T T T T T S S S S S L L A A T V V V V V D D D D D S S A S T : S S S S S 50 50 50 50 50 L L L L L L L L L L F F F F Y : V V V V V F F F F F P P P P P H H H H H V V V V V F F I F F : T T T T T H H H H H P P P P P D D D D D P P S P P . N N N N N D D D D D F F F F F L L V V L : S S S S A : A A A A A T T T T P . K K K K K H H H H H T T T T T R R R R R L L L L L T T T T T R R R R R Y Y Y Y Y Y Y Y Y Y K K K K K K K K K K K K K K K P P P P A . E E E E E S S S S G . L L L L L F F F F F A A A A C . L L L L L E E Q E E : S S S S A : S S S S S R R R R R V V V V V A A T A K M M M M M V V M V V : A A D G N A A G G A . A A A A E V V A A A . H H H H H A A A A A V V V L I : E E E E E S S S S S D D E D D : Y Y Y Y Y T T T T V . H H H H H T T T S N . L L L L V : G G G G G N N N N N A A A A A K K K K K V V I I V : F F F F L : T T A A I V V I I I : Human alpha 1 Human alpha 2 Rat alpha 1 Mouse alpha 1 Chicken alpha-A Human alpha 1 Human alpha 2 Rat alpha 1 Mouse alpha 1 Chicken alpha-A Human alpha 1 Human alpha 2 Rat alpha 1 Mouse alpha 1 Chicken alpha-A FIGURE 8.15 Alignment of Related Hemoglobin Sequences The hemoglobin sequences were aligned using ClustalW http://www.ebi.ac.uk/clust alw/. Amino acids marked with are identical in all sequences those marked with : and . are not identical but are conserved in the type of amino acid. 0.1 Human α2 Human α1 Mouse Rat Chicken FIGURE 8.16 Phylogenetic Tree of Hemoglobin The amino acid sequences of chicken globin-A rat hemoglobin alpha 1 chain mouse hemoglobin alpha 1 chain and the alpha 1 and alpha 2 chains from humans were compared. The length of lines represents the number of sequence differences the longer the line the more changes in sequence. The differences were analyzed with ClustalW and the tree was drawn using Phylodendron http://iubio.bio.indiana.edu/treeapp/.

slide 273:

Genomics and Gene Expression 270 sugars into bacteria transport water into human cells and even export antibiotics out of bacteria. They are found in almost all organisms. Another gene superfamily is the globin family Fig. 8.17. The family includes myoglobin and hemoglobin from different organisms. These proteins all carry oxygen bound to iron but myoglobin is specifc to muscle cells whereas hemoglo - bin is specifc to blood. The theory is that early in evolution one gene for an ancestral globin existed. At some point this gene was duplicated and the copies diverged so that one was specialized for blood and the other for muscle. Hemoglobin itself also diverged later into different forms each used at vari- ous stages of development. New genes may be generated one at a time but in addition whole chromosomes or genomes may be duplicated. In some organisms particularly plants genome duplications are rela- tively stable and have occurred quite often. An example is the modern wheat plant. Its ances- tors were typical diploids but modern bread wheat is hexaploid being derived from three different ancestral plants. The hexaploid varieties arose through hybridization and natural mutation and were exploited because of the higher protein content and better yield. The wheat used to make pasta durum wheat is tetraploid and represents an intermediate step. Although genomes have characteristic average rates of mutation individual genes may mutate at very different rates. Essential proteins evolve or mutate more slowly than average. Conversely the less critical a gene is for survival the more mutations can be tolerated and the protein evolves more rapidly. Thus the gene for cytochrome c an essential component in the electron transport chain has incorporated only 6.7 changes per 100 amino acids in 100 million years. In contrast fbrinopeptides which are involved in blood clotting have had 91 mutations per 100 amino acids in 100 million years. As noted earlier ribosomal RNA is useful to establish family trees for distantly related organisms. It is found in every organ- ism and is essential to survival therefore it is slow to evolve. What happens if a scientist wants to classify organisms that are closely related Essential gene sequences do not provide enough genetic variation to differentiate such organisms. Non- essential genes may help but sometimes even they are too close. In such cases the wobble position of coding regions or even noncoding regions may be used. As noted in Chapter 2 the wobble position is the third nucleotide of a codon. The same amino acid is often encoded by several codons which vary only in this third base. Alterations at this position usually have no net effect on protein function or structure and may occur between very closely related species or between individuals of the same species. Mitochondrial or chloroplast genomes are also compared in order to determine the related- ness of organisms. These genomes accumulate mutations at a higher rate than the nuclear genomes in the same organisms. The organelle genomes vary particularly in the noncoding regions. One drawback to using organelle genomes is that mitochondria and chloroplasts are inherited maternally and thus trace the evolutionary lineage only on the maternal side. FIGURE 8.17 Globin Family of Genes Over the course of evolution a variety of gene duplication and divergence events gave rise to a family of closely related genes. The frst ancestral globin gene was duplicated giving hemoglobin and myoglobin. After another duplication the hemoglo- bin gene diverged into the ancestral alpha-globin and ancestral beta-globin genes. Continued duplication and divergence created the entire family of globin genes. Ancestral globin Myoglobin Ancestral hemoglobin Ancestral β-globin Ancestral α-globin αζ γε σβ GLOBIN FAMILY TREE Molecular phylogenetics uses genomic sequences of different organisms to determine their evolutionary relatedness. Essential proteins have fewer mutations over time. Less essential proteins have more muta- tions over time.

slide 274:

Chap TER 8 271 FROM phaRMa COLOGY TO phaRMa COGENETIC Another feld that has undergone many changes due to the Human Genome Project is pharmacology the study of drugs. Drug development has traditionally been a hit-or-miss matter with drug discovery often a by-product of other research. Penicillin is one of the 20th century’s greatest discoveries but was found by accident. Alexander Fleming was grow- ing Staphylococcus bacteria and left his plates while on vacation. When he came back mold had contaminated the plates. Miraculously the staphylococci were not growing close to the mold which was evidently secreting something that stopped bacterial growth. Even Viagra was discovered by accident. Scientists were trying to develop heart medications when they noticed the “side effect.” One factor that makes drug development costs so high is that many of the paths chosen lead to dead ends. Another problem of drug development is adverse drug reaction ADR. Adverse reactions may happen in some patients while others respond well to the same drug. Most drugs are developed with the average patient in mind yet there is often a subset of people who react badly to the drug. In the United States the frequency of serious ADRs is approximately 7. Such ADRs are a signifcant cause of hospitalizations and death. Differences in drug response often depend on a person’s genetic makeup. Pharmacogenetics is the study of inherited differences in drug metabolism and response and pharmacogenomics is thus the study of all the genes that determine drug response. One major goal of these felds is to reduce the number of ADRs by determining the genetic makeup of the patient before offering a specifc drug. The key to “genetic” diagnosis is the use of SNPs see earlier discussion. Single changes in coding regions can often be corre- lated with adverse drug reactions. For example if a certain subpopulation of people does not respond to a drug then their DNA can be examined for a specifc SNP that is absent in patients who do respond. Before the drug is given to new patients DNA from a blood sample can be tested for the diagnostic SNP. Testing for SNPs can be done by microarray analysis see later discussion in the doctor’s offce thus reducing the number of offce or hospital visits. SNP analysis is also used to screen for hereditary defects. Specifcally SNPs can be identi - fed using a technique called Zipcode analysis Fig. 8.18. Here many different SNPs can be examined simultaneously. First PCR is used to amplify the regions containing each SNP being investigated. The PCR fragments could be sequenced fully but because SNPs differ by only one base single base extension analysis is done instead. For this a primer is designed to anneal just one base pair away from the SNP location. This primer also carries a “zipcode” region that is used to identify this specifc SNP and each SNP has a different zipcode. After the Zipcode primer anneals to the PCR fragments DNA polymerase plus fuorescently labeled dideoxynucleotides are added. This results in a single base being added to the primer. Note that dideoxynucleotides block chain elongation and so only one base can be added. Each base is labeled with a different fuorescent dye allowing it to be identifed. Next beads linked to complementary zipcode cZipcode sequences are added to grab the zipcoded primers. The trapped Zipcode primer with the labeled nucleotide has a different color depending on which base was incorporated and elucidate the identity of the nucleotide in the patient. The different color beads are sorted and counted by fuorescent-activated cell sorting FACS see Chapter 6. One of the spin-offs from the Human Genome Project is the Pharmacogenetics and Pharma- cogenomics Knowledge Base PharmGKB http://www.pharmgkb.org/. It records genes and mutations that affect drug response. Consider asthma a condition in which people overreact to inhaled irritants by cutting airfow in and out of the lungs. The muscle cells around the bronchial tubes constrict decreasing airfow. Albuterol is a drug used to open the bronchial tubes by relaxing the muscle cells. Albuterol affects the beta2-adrenergic receptor and muta- tions in this receptor alter the effcacy of albuterol. A single nucleotide change that replaces glycine at position 16 with arginine gives a receptor protein with a better response to albuterol.

slide 275:

Genomics and Gene Expression 272 Another key fnding concerns the cytochrome P450 family of enzymes. These enzymes play a role in the oxidative degradation of many foreign molecules including many pharma- ceuticals. The CYP2D6 isoenzyme oxidizes drugs of the tricyclic antidepressant class and different alleles of this enzyme affect how well a person metabolizes these drugs. Much as for albuterol identifying which allele a patient has will prevent overdosages or adverse reactions. As time goes on more medical treatments will be designed for the individual rather than the average person. GENE EXpRESSION aND MICRO aRRa YS As noted earlier a major issue in determining the correct number of genes is decid- ing whether or not a sequence is really a gene. Measuring whether a presumed gene is transcribed into mRNA is the frst step to deciding if it is genuine. Gene expression was once done on a single gene basis but now functional genomics studies gene expres- sion over the entire genome. Functional genomics encompasses the global study of all the RNA transcribed from the genome—the transcriptome all the proteins encoded by the PCR ADD PRIMER PLUS NUCLEOTIDES A SINGLE BASE ELONGATION B DISSOCIATE DNA STRANDS C CAPTURE PRIMER Chromosomal DNA A SNP site Primer Zipcode sequence Chromosomal DNA G SNP site AG A T C G A C T G A G cZipcode T C Bead Bead FIGURE 8.18 Zipcode Analysis and Single Base Extension of SNPs A segment of DNA that includes an SNP site is gener- ated by PCR only a single strand of DNA is shown here for simplicity. Single base extension is performed with a primer that binds one base in front of the SNP . Person I has an A at the SNP site and therefore ddT is incorporated. In person II a G at the same position results in incorpora- tion of ddC. The bases are labeled with different fuores - cent dyes. Use of dideoxynu- cleotides prevents addition of further bases. The elongated primer is then trapped by binding its Zipcode sequence to the complementary cZip- code which is attached to a bead or other solid support for easy isolation. Pharmacogenetics is the study of inherited differences in drug metabolism and response. Some SNPs affect how a person metabolizes a certain drug. When scientists determine what SNP correlates with what drug sensitivity new patients can be screened and possibly avoid adverse drug reactions ADRs.

slide 276:

Chap TER 8 273 genome—the proteome and all the metabolic pathways in the organism—the metabolome. Because the entire human genome contains only around 21000 different genes using microarrays to study gene expression is feasible. DNA microarrays or DNA chips con- tain thousands of different unique DNA sequences bound to a solid support such as a glass slide. Microarrays are based on hybridization between a “probe” and target molecules in the experimental sample. However in a microarray the probes are attached to the solid support and the experimen- tal sample is in solution. The microarray often represents the genome of the organism being tested and includes sequences corresponding to each gene in the organism. To monitor gene expression scientists test RNA extracted from a cell sample against a microarray. The experimental RNA sample is usually fuorescently tagged. Hybridization of the mRNA to the DNA probes on the solid support indicates whether or not a gene is expressed and to what degree. The level of fuorescence at each point on the array correlates with the level of the corresponding mRNA in the sample. Microarrays can be used to analyze RNA isolated from cells grown under a variety of conditions—for example heat shock acid exposure cancer or other disease states. The same array can be hybridized to two or more samples of RNA control versus experimental to compare gene expression. Each RNA sample is labeled with a different fuorescent dye for example red and green. If a particular RNA is present in only one sample the correspond- ing spot on the microarray will be red or green Fig. 8.19 whereas if the RNA is present in both experimental and control samples the spot will be yellow i.e. red plus green. Mod- ern arrays can accommodate thousands or millions of different probes allowing the entire genome for most organisms to be examined at once. Some arrays are clustered so that all the genes involved in say protein synthesis or heat shock are together. The computer reads the color and fuorescence intensity for each of the spots and carries out an analysis. The results can provide a global view of gene expression in different conditions. For example slides can be made with every named gene from the yeast genome. These genes may be analyzed for expression at different stages of the cell cycle. For this a culture of yeast cells is synchronized and arrested at different stages of the cell cycle by adding α factor or by using mutant yeast that freeze at particular stages of mitosis. The gene expression patterns for each stage are compared and compiled Fig. 8.20. MaKING DNa MICRO aRRa YS There are two major types of DNA microarrays: one contains cDNA fragments 600 to 2400 nucleotides in length and the other uses oligonucleotides of 20 to 50 nucleotides in length. Each type of microarray is manufactured differently. When a cDNA microarray is made each of the different probes must be chosen independently and made by PCR or traditional FIGURE 8.19 DNA Chip Showing Detection of mRNA by Fluorescent Dyes DNA chips can monitor many different mRNAs at one time. Each spot on the grid has a different DNA sequence attached. To determine which genes are expressed under which conditions scientists isolate mRNA and label each sample with a different fuorescent dye. If two different dyes are used as shown here the same chip can be used for both. It is then visualized in three different ways: one shows only the red dye another only the green and the third merges the two images so overlapping spots look yellow. Array treated with RNA from cells grown under condition 1 and labeled with red fluorescent dye Array treated with RNA from cells grown under condition 2 and labeled with green dye Array treated with both samples of RNA yellow spots reveal genes expressed under both conditions Functional genomics includes the global study of all the RNA transcribed from the genome the transcriptome. DNA microarrays or DNA chips contain thousands of different unique sequences bound to a solid support such as a glass slide. When fuorescently labeled RNA is incubated with the microarray comple - mentary sequences hybridize. The level of fuorescence corresponds to the amount of RNA that is bound to the DNA microarray.

slide 277:

Genomics and Gene Expression 274 cloning. Then all the DNA probes are spotted onto the slide. When an oligonucleotide array is made the oligonucleotides are synthesized directly on the slide. cDNa Microarrays The frst step in making a cDNA microarray is to determine the numbers and types of probes to attach to the solid support. Since entire genomes have been sequenced for a variety of organisms identifying potential genes is relatively easy. During the sequenc- ing of these genomes many cloned segments of DNA containing all or part of various genes were generated. Researchers can either obtain these clones or amplify genes from a sample of DNA using PCR. Each PCR product must be purifed before attach - ment to the glass slide so that all the extra nucleotides Taq polymerase and salts are removed and only pure DNA attaches to the slide. Pure cDNA samples can be used directly. The next step is to create the chip using a microarray robot. Purifed samples of each DNA are put into small wells arranged in a grid in microtiter plates. The size of the grid depends on the number of probes. If every predicted human gene is present once approximately 25000 different wells are needed. In practice probes for each gene are attached more than once in different areas of the chip to provide several readings for each gene. A grid of pens or quills is dipped into the wells one pen for each well using a robotic arm. The pen tips are then touched to a glass slide where a tiny drop of DNA solution is left behind. The robotic arm continues to manufacture spotted slides until the DNA in the well is used up. Using a robot makes chips cheap and easy to produce. Finally the DNA is cross-linked to the glass slide with ultraviolet light which causes thymine in the DNA to cross-link to the glass. Figure 8.21 shows DNA on a microarray grid visualized by an atomic force microscope. In newer cDNA microarrays the samples are spotted onto a glass slide using inkjet printer technology. The cDNA samples are sucked into separate chambers of the inkjet printer head no cell cycle regulation 01 I II III IVa IVb V VI VII VIII IX X XI XII XIII XIV XV XVI 234 5 Chromosome Position 10 5 bp 6 789 10 11 early G1 phase late G1 phase S phase G2 phase M phase multiple phases no data FIGURE 8.20 Gene Expression during Yeast Cell Cycle Color coding indicates the time of the cell cycle for maximum gene expression. More than 800 different genes that respond to changes in the cell cycle were monitored on the 16 yeast chromosomes. From Cho RJ et al. 1998. A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell 2 65–73. Reprinted with permission. FIGURE 8.21 AFM of DNA on Microarray A region of a yeast microar- ray after hybridization. The DNA is clearly deposited in suffcient density to permit many strand-to-strand interactions. The width of the fgure represents a scanned distance of 2 micrometers. Reprinted by permission from Macmillan Publishers Ltd.: Duggan DJ Bittner M Chen Y Meltzer P Trent JM 1999. Expression profling using cDNA microarrays. Nature Genetics 12 82–89 copyright 1999.

slide 278:

Chap TER 8 275 and then spotted onto the glass slide much as ink is spotted onto paper in a printer. Inkjet technology prevents variations in size and quantity of cDNA in the sample spots. Special adap- tors prevent the inkjet sample channels from mixing thus preventing cross-contamination. Oligonucleotide Microarrays Oligonucleotides are traditionally synthesized chemically on beads of controlled pore glass CPG see Chapter 4. Therefore it is a small logical leap to synthesize many different oli- gonucleotides side by side on a glass slide. The main difference between synthesizing single nucleotides on beads versus making arrays on glass slides is that the array has thousands of different oligonucleotides and each must be synthesized in its proper location with a unique sequence. To accomplish this photolithography and solid-phase DNA synthesis are com- bined Fig. 8.22. Photolithography is a process used in making integrated circuits where a mask allows a specifc pattern of light to reach a solid surface. The light activates the surface it reaches while the remaining surface remains inactive. A glass slide is frst covered with a spacer that ends in a reactive group. It is then covered with a photosensitive blocking group that can be removed by light. In each synthetic cycle those sites where a particular nucleotide will be attached are illuminated to remove the blocking group. Each of the four nucleotides is added in turn. At each addition a mask is aligned with the glass slide. Light passes through holes in the mask and activates the ends of those growing oligonucleotide chains that it illuminates. Much as in traditional chemical synthesis each nucleotide has its 5′-OH protected. Thus after each addition the end of the growing chain is blocked again. These protective groups are light activated so at each step a new mask is aligned with the slide and light deprotects the appropriate nucleotides. The entire process continues for each nucleotide at each position on the glass slide. Making the masks is the key to this technology Fig. 8.23. hYBRIDIZa TION ON DNa MICRO aRRa YS Hybridization on a microarray is similar to what occurs during other hybridization proce- dures such as Southern blots or Northern blots. All these techniques rely on the comple- mentary nature of nucleic acid bases. When two complementary strands of DNA or RNA FIGURE 8.22 Photolithography Light passing through the mask makes a particular pattern on the glass slide. If the slide is coated with a light-activated substance only the regions that are illuminated will be activated for the addition of another nucleotide. GLASS SLIDE MASK LIGHT DNA microarrays are made with cDNA or oligonucleotides. cDNA arrays are created by spotting pure samples of cDNA clones onto the glass in a small spot. The DNA is cross-linked to the glass with UV light. For oligonucleotide arrays DNA is synthesized directly onto the glass slide.

slide 279:

Genomics and Gene Expression 276 Glass chip Blocking group 1 Coupling group Mask Glass chip 2 LIGHT TREATMENT LIGHT TREATMENT COUPLING GROUPS REVEALED COUPLING GROUPS REVEALED COUPLE WITH ACTIVATED AND BLOCKED G COUPLE WITH ACTIVATED AND BLOCKED T CONTINUE Glass chip 3 Glass chip 4 G T T T T GG Glass chip 5 GG Glass chip 6 7 GG Glass chip GG FIGURE 8.23 On-Chip Synthesis of Oligonucleotides Arrays may be created by chemically synthesizing oligonucleotides directly on the chip. First spacers with reactive groups are linked to the glass chip and blocked. Then each of the four nucleotides is added in turn in this example G is added frst then T. A mask covers the areas that should not be activated during any particular reaction. Light activates all the groups not covered with the mask and a nucleotide is added to these. The cycle is repeated with the next nucleotide.

slide 280:

Chap TER 8 277 are alongside each other the bases match up with their complement that is thymine or uracil with adenine and guanine with cytosine. On a DNA microarray hybrid- ization is affected by the same parameters as in these other techniques. How the DNA is attached to the slide can affect how well the probe DNA and target DNA hybridize especially for oligonucleotide microarrays Fig. 8.24. The short length of oligonucleotides requires that the entire piece be acces- sible to hybridize. The length of the spacer between the oligonucleotides and the glass slide optimizes hybridiza- tion. An oligonucleotide attached with a short spacer has many of its initial nucleotides too close to the glass and inaccessible to incoming RNA or DNA. Oligonucleotides with longer spacers may fold back and tangle up. Oligonu- cleotides with medium-sized spacers are far enough from the glass but not so far as to get tangled. Thus medium- sized spacers give the best access for hybridization. Hybridization of two lengths of DNA or RNA with DNA depends on certain sequence features. One impor- tant property is the ratio of A:T base pairs to G:C base pairs. G:C base pairs have three hydrogen bonds hold- ing them together whereas A:T base pairs have only two hydrogen bonds. Thus more GC base pairs give stronger hybridization. If the sequence has too many A:T base pairs the duplex may form slowly and be less stable. Another important consideration is secondary structure. If the probe sequence can form a hairpin structure it will hybridize poorly with the target. If the probe has several mismatches relative to the target the duplex may not form effciently. All these issues must be addressed when making an oligonucleotide microarray. Computer programs are available to identify suitable regions of genes with sequences that will produce effective probes. cDNA arrays are less prone to the problems seen in oligonucleotide arrays. cDNAs are double-stranded so secondary structures such as hairpins are less likely to be a problem. During a hybridization reaction cDNA arrays must be denatured either with heat or chemi- cals making the probes single-stranded. Then the single-stranded RNA samples are allowed to hybridize on the slide under conditions that promote duplex RNA:cDNA without any mismatches. MONITORING GENE EXpRESSION USING WhOLE-GENOME TILING aRRa YS Whole-genome tiling arrays WGAs are oligonucleotide microarrays that cover the entire genome. The frst entire genome to be represented by a whole-genome array was from Arabidopsis. A gene chip was designed with 25-mer oligonucleotides that overlapped each other and covered the entire sequence of the genome. Complementary oligonucleotide sequences were tiled back to back along each entire chromosome and ordered so that the array could be conveniently analyzed for gene expression Fig. 8.25. FIGURE 8.24 Length of Spacers and Target Molecules Affect Hybridization on Microarrays A When the spacer between the glass slide and oligo- nucleotide is too short the oligonucleotides are condensed and not accessible to hybridize. If the spacer region is too long the oligonucleotides and spacers tangle and fold preventing optimal hybridization. B When the target for hybrid- ization is too long the target sequences may form hairpins with themselves rather than bind to the array oligonucleotides. OLIGO DENSELY PACKED OLIGO SPREAD OUT SHORT VS. LONG SPACER LONG VS. SHORT TARGETS A B Oligonucleotide microarrays must have a suffcient spacer and little secondary structure in order to hybridize with the samples.

slide 281:

Genomics and Gene Expression 278 For the human genome tiling arrays have been made to cover the entire sequences of chromosomes 21 and 22. They also use 25-mer oligonucleotides but rather than overlap- ping the oligonucleotides were spaced 35 base pairs apart along the sequence. These are therefore strictly only “quasi-whole-genome arrays.” Compared to arrays that include only known genes tiling arrays have the potential to identify novel regions that are transcribed whether these encode unknown protein-encoding genes or nontranslated RNA. The RNA extracted from many different cell lines and tissues has been used to monitor gene expres- sion assess differences in splicing patterns fnd new genes and fnd RNA-binding protein target sequences. The most interesting fnding from studying human chromosomes 21 and 22 is that much larger portions of these chromosomes are transcribed into RNA than previously predicted from computer analyses of exon regions. About 90 of the transcribed regions occurred outside the known exons. The majority of the transcribed regions generated noncoding RNA mostly of less than 75 base pairs in length. This suggests that noncoding RNA may have a much greater role than previously thought. These arrays also identifed new exons that were previously unknown. In addition these arrays can identify novel alternatively spliced mRNAs and hence protein variants. The WGAs for chromosomes 21 and 22 have also been used to compare the FIGURE 8.25 Whole-Genome Array Designs Whole-genome arrays WGAs contain oligonucle- otide probes that cover the entire genome in an overlapping set. In quasi- whole-genome arrays the probes are spaced equal distances apart through the genome. The probes thus cover the entire genome except for the gaps between the probes. Splice junction arrays have only probes that span the upstream and down- stream regions of known splice junctions in mRNA. Exon-scanning arrays have probes derived from exon sequences only. WHOLE-GENOME TILING ARRAY Probes Oligonucleotide probes Splice junction probes Exon-scanning probes 5 3 Genomic DNA QUASI-WHOLE-GENOME ARRAY 5 3 Genomic DNA SPLICE-JUNCTION ARRAY EXON-SCANNING ARRAY 5 5 3 3 Gene mRNA

slide 282:

Chap TER 8 279 level of expression of exons within the same gene. About 80 of the genes had exons with varied levels of expression implying most genes have some sort of alternate splicing. Another use for whole genome arrays is to analyze the results of chromatin immunoprecip- itation ChIP. ChIP begins by cross-linking all the various transcription factors to chroma- tin essentially freezing them in place. Next the chromatin is sheared into smaller fragments and the DNA/transcription factor complexes are isolated. Affnity purifcation isolates each particular transcription factor from the others e.g. using antibodies to the transcription fac- tor Jun isolates all the Jun/DNA complexes from this mixture. Finally the DNA sequences that are bound to the chosen transcription factor are identifed using WGA. The entire proce - dure including the analysis on a gene chip is called ChIP-chip. This analysis can precisely locate transcription factor binding sites. Curiously binding locations for NF-κB for example have been found within both coding and noncoding regions such as introns or the 3′ ends of genes. These surprising fndings suggest that transcription factors may also function out - side the traditional upstream promoter region. Another use for WGA is to identify regions of the genome that are methylated. Methylation prevents the inappropriate expression of various genes especially those used only during development of young organisms or those genes from transposons or viruses that could be detrimental. Cancerous cells have methylation patterns much different from those of normal cells suggesting that this type of regulation is critical to proper growth control. To identify the methylated regions scientists frst treat genomic DNA with sodium bisulfte which deaminates nonmethylated cytosine to uracil yet does not affect methylated cytosine. The treated DNA is then hybridized to a WGA. Those regions with nonmethylated cytosine no longer hybridize to the array because the cytosines have been converted to uracil which pairs with A not G. Those regions of the genome that are methylated still hybridize well because methylated cytosine and guanine form a stable base pair. Finding genetic variations and polymorphisms is critical to genome analysis and whole- genome arrays offer a nonbiased method to analyze samples. A WGA that has the reference sequence for the human genome can be used to identify and catalog all different types of polymorphisms including SNPs VNTRs and repetitive elements. Indeed an overlapping WGA made to the entire reference sequence of the human genome and spaced by a single base pair could be used to effectively resequence the entire genome with ease and speed. Whole-genome arrays are oligonucleotide arrays that have sequences that cover the entire genome. They can be used to identify transcription factor binding sites regions of methylation SNPs VNTRs repetitive elements and so forth. MONITORING GENE EXpRESSION BY RNa -Seq A completely different approach to analyzing the transcriptome is to isolate total cel- lular RNA and then sequence all of it. RNA molecules present in multiple copies will be sequenced multiple times and thus the number of copies will be revealed. In practice sequencing RNA is technically unfeasible. Therefore the RNA is converted to complemen- tary DNA cDNA by reverse transcriptase and the DNA is sequenced. Until recently this approach was impractical due to the colossal amount of sequencing needed. However major advances in DNA sequencing technology have transformed the feld of transcriptomics since an entire cDNA library can now be sequenced quickly and cheaply. RNA-Seq or whole transcriptome shotgun sequencing creates a cDNA library from frag- mented total mRNA and then every cDNA is sequenced Fig. 8.26. These sequences are then aligned with the genome of the organism. The relative copy number of each cDNA sequence is an indication of gene expression levels.

slide 283:

Genomics and Gene Expression 280 Relative to typical microarrays the major advantages of RNA-Seq are as follows Fig. 8.27: a RNA-Seq is probe independent. Consequently it gives more accurate measurements of the relative expression level of different RNA molecules by counting them directly. b RNA-Seq has a greater dynamic range. The fuorescent signals used for microarrays have both a minimum sensitivity and become saturated at high levels. Direct counting is not sub- ject to saturation. c RNA-Seq monitors both coding and noncoding RNA. d RNA-Seq detects alternatively spliced transcripts and yields their relative numbers. e When a diploid or polyploid organism contains multiple different alleles of the same gene RNA-Seq will monitor allele specifc expression. f Even when the genome sequence for an organism is not available for reference RNA-Seq can still be performed. g The small amount of RNA required for RNA-Seq means that it is possible to carry out analy- ses of single cells. h “Dual RNA-Seq” allows simultaneous analysis of RNA from host cells infected with a patho- gen. Computer analysis allocates the different sequences to the two organisms. As noted whole genome tiling arrays can also perform some of these measurements such as detecting noncoding RNA and alternatively spliced transcripts. Nonetheless RNA-Seq is more sensitive and gives more quantitative data in these cases. Conversely RNA-Seq does have the disadvantages of being relatively expensive and requiring sophisticated data analysis. RNA-Seq can be used to compare gene expression of the same cells under different condi- tions. Alternatively gene expression of cells from different tissues of higher organisms may be compared. The example shown in Figure 8.28 shows the comparison of placenta ver- sus several other body tissues. Nearly 300 genes showed signifcantly greater expression in placental tissue and several transcriptional regulators were identifed that were probably involved in this elevated expression. FIGURE 8.26 RNA-Seq Protocol The entire transcriptome can be identifed by sequencing a cDNA library in its entirety. Next-generation sequencing makes this process possible resulting in the identifcation of each copy of every RNA that was expressed. AAAAA AAAAA AAAAA TTTTT AAAAA TTTTT AAAAA AAAA AAAA TTTT AAAA TTTT AAAA T T T T CONVERT TO cDNA SEQUENCE WITH NEXT- GENERATION TECHNIQUE ISOLATE mRNA BY BINDING TO POLYT BEADS 5 5 5 5 AAAA 5 TTTTT TTTT TTTTT TTT T TTTTT

slide 284:

Chap TER 8 281 MICROARRAY Extract RNA Bind to probes on array Label RNA Measure fluorescence Sequence DNA RNA-SEQ Reverse transcriptase cDNA Cells FIGURE 8.27 Microarray versus RNA-Seq The microarray approach relies on using probes to bind labeled RNA. The level of expression is deduced from the fuorescent signal due to the hybridization of RNA with probe. In contrast RNA-Seq is independent of probes and relies on directly counting the number of copies of sequenced cDNA derived from the RNA. FIGURE 8.28 Placental Transcrip- tome by RNA-Seq The transcriptome of the human placenta was compared to that of several other tissues. This Circos diagram shows RNA-seq data for the 23 chromosome pairs. Track 1: Chromo- somes with bands. Track 2: Location of top 100 highly expressed placental genes in placenta. T rack 3: Average RPKM values reads per Kb per million summarized over 6 MB regions. Track 4: Locations of placenta- related genes in OMIM database. Track 5: Genes specifcally expressed in placenta 3-fold over 7 other tissues. Track 6: Functional clustering of placental genes. Used with permission from Saben J Zhong Y McKelvey S Dajani NK Andres A Badger TM et al. 2013. A comprehen- sive analysis of the human placenta transcriptome. Placenta 352 125–131.

slide 285:

Genomics and Gene Expression 282 RNA-Seq has many clinical applications. In cancer research gene fusions occur due to chromosomal rearrangements. RNA-Seq can reveal if the fused genes are expressed into mRNA and estimate the relative abundance of the fusion product. The technique can also identify expressed single nucleotide polymorphisms SNPs. This type of information can identify the genes responsible for a particular disease by comparing the expression of SNPs from affected individuals and their healthy family members. Restricting comparison to expressed sequences eliminates any irrelevant SNPs found in the DNA since many SNPs are not in expressed areas of the genome. RNA-Seq can also identify post-translational editing of mRNA that is not evident from looking at just the DNA sequence. This can suggest a new function for a gene. New generation sequencing technologies have allowed the analysis of RNA amounts by the RNA-Seq approach. This technique can be used to measure global gene expression. In addition RNA-Seq can be used to monitor noncoding RNA and novel splice variants. MONITORING GENE EXpRESSION OF SINGLE GENES Although microarrays and RNA-Seq provide a global view of gene expression they do not provide extensive information on individual genes. Further details for specifc genes of interest are often obtained using reporter genes. These are genes whose products are unusually easy to assay thus allowing the investigator to carry out a large number of analyses on organisms grown under many conditions. In this approach the gene of inter- est is physically linked to a reporter gene creating a gene fusion. The regulatory region of the gene of interest is isolated frst. This segment is normally found upstream of the gene of interest and includes sites for transcription factors to bind plus various enhancer elements. The coding sequence of the gene of interest is replaced with the reporter gene so that the regulatory elements now control the reporter gene rather than the original gene of interest. Reporter genes often encode enzymes whose activity is easy to assay. One of the most widely used reporters is the lacZ gene from E. coli which encodes the enzyme β-galactosidase Fig. 8.29. This enzyme splits disaccharide sugar molecules into their monomers but also cleaves various artifcial substrates. When the substrate ONPG is cleaved one of the cleav - age products forms a visible yellow dye. When X-Gal is cleaved by β-galactosidase one of the products reacts with oxygen to form a blue dye. The phoA gene is another reporter gene that encodes alkaline phosphatase which removes phosphate groups from many different substrates Fig. 8.30. Artifcial sub - strates are designed so that when the phosphate is removed they either change color or fuoresce. Another popular reporter gene is luciferase which emits a pulse of visible light when the correct substrate luciferin is supplied Fig. 8.31. Luciferase is an enzyme encoded by the lux gene in bacteria or the luc gene in frefies. The two luciferases are not related and have different enzyme mechanisms. Both genes work well as reporter genes and have been cloned onto vectors that work in a variety of different organisms. Detecting the light emitted by luciferase is diffcult because of its low levels and requires special equipment such as a lumi - nometer or scintillation counter. Another extremely popular reporter protein is green fuorescent protein or GFP which is not an enzyme Fig. 8.32. This protein has natural fuorescence that does not require any cofactors or substrates. Better still the fuorescence is active in living tissues so that when the protein is expressed the organism gains a green fuorescence. This is

slide 286:

Chap TER 8 283 H H H OH H CH 2 OH HO HOH GALACTOSE β14 GLUCOSE LACTOSE I O H H H OH H CH 2 OH HOH OH O O H H H OH H CH 2 OH HO HO HOH OH D-GALACTOSE D-GLUCOSE O H H H OH H CH 2 OH HOH OH O β-galactosidase H H H NO 2 OH H CH 2 OH HO HOH o-NITROPHENYL GALACTOSIDE ONPG II O O H H H OH H CH 2 OH HO HOH OH OH D-GALACTOSE o-NITROPHENOL bright yellow O β-galactosidase NO 2 H H H Cl Br OH H CH 2 OH HO HOH 5-BROMO-4-CHLORO-3- INDOLYL GALACTOSIDE X-GAL III O O H H H OH H CH 2 OH HO HOH OH OH D-GALACTOSE SPONTANEOUSLY REACTS WITH OXYGEN IN AIR INDIGO-TYPE DYE dark blue and insoluble 5-BROMO-4-CHLORO- 3-INDOXYL unstable O β-galactosidase Cl Br H N N H OH O Cl Cl Br Br H N N H FIGURE 8.29 β-Galactosidase Has Multiple Substrates The enzyme β-galactosidase normally cleaves lactose into two monosaccharides: glucose and galactose. β-Galactosidase also cleaves artifcial substrates such as ONPG and X-Gal releasing groups that form visible dyes. ONPG releases the bright yellow substance o-nitrophenol whereas X-Gal releases an unstable group that reacts with oxygen to form a blue indigo dye.

slide 287:

Genomics and Gene Expression 284 NO 2 o-NITROPHENYL PHOSPHATE O O OH HO P 5-BROMO-4-CHLORO-3- INDOLYL PHOSPHATE X-PHOS CH 3 4-METHYLUMBELLIFERYL PHOSPHATE 4-METHYLUMBELLIFERONE FLUORESCENT PHOSPHATE Alkaline phosphatase O O O OH HO P OH O OH HO P CH 3 HO O O O OH HO P Cl Br N H O FIGURE 8.30 Substrates Used by Alkaline Phosphatase Alkaline phosphatase removes phosphate groups from various substrates. When the phosphate group is removed from o-nitrophenyl phosphate a yellow dye is released. When the phosphate is removed from X-Phos further reaction with oxygen produces an insoluble blue dye. When the phosphate is removed from 4-methylumbelliferyl phosphate this releases a fuorescent molecule. FIGURE 8.31 The Luciferase Reaction Emits Light from Luciferin Luciferase from bacteria uses a long-chain aldehyde oxygen and the reduced form of the cofactor FMN favin mononucleotide as its luciferin. Firefy luciferase uses ATP oxygen and frefy luciferin to produce light. Bacterial Luciferase: FMNH 2 + O 2 + R- CHO R- COOH + FMN + H 2 O + light Firefly Luciferase: Luciferin + O 2 + ATP oxidized luciferin + CO 2 + H 2 O + AMP + PPi + light A B C FIGURE 8.32 Transgenic Organisms with Green Fluorescent Protein The gene for GFP has been integrated into the genome of animals plants and fungi. After exposure to long-wavelength UV the organisms emit green light. A Transgenic mice with GFP among normal mice from the same litter. The gfp gene was injected into fertilized egg cells to create these mice. GFP is produced in all cells and tissues except the hair. Credit: Eye of Science Photo Researchers Inc.B Phase contrast and C fuorescent emission of germlings of the fungus Aspergillus nidulans. Original GFP was used to label the mitochondria and a red GFP variant DsRed for the nucleus. From Toews MW et al. 2004. Establishment of mRFP1 as a fuorescent marker in Aspergillus nidulans and construction of expression vectors for high-throughput protein tagging using recombination in vitro GATEWAY. Curr Genet 45 383–389.

slide 288:

Chap TER 8 285 especially noticeable when the organism is transparent like zebrafsh or the worm Cae- norhabditis elegans. GFP is excited by long-wavelength UV light of 395 nm and then emits light at the green wavelength of 510 nm. The original protein is from the jellyfsh Aequo- rea victoria and is encoded by the gene gfp. Many new variants of GFP have been devel- oped that emit light at different wavelengths including red blue and yellow. The main advantage of using GFP as a reporter is the ability to see expression in living tissues. Other techniques are useful to confrm or extend gene expression data. Differential display PCR see Chapter 4 is useful to compare mRNA expression patterns from different tissue samples or experimental conditions. Finally Northern blot see Chapter 3 analysis can monitor expression levels of mRNA that vary in different experimental conditions. Gene expression data from vari- ous sources can be compiled into regulatory networks where the different gene products RNA and protein work together to create an intact organism during development see Box 8.1. Fusing regulatory sequences from an individual gene of interest to a reporter gene allows detailed moni- toring of the expression pattern of the gene. Reporter genes encode enzymes such as β-galactosidase alkaline phosphatase and luciferase that cleave their substrates to form a visible dye or light. Green fuorescent protein has luminescent properties that allow it to absorb one wavelength of light and emit a longer wavelength. EpIGENETICS aND EpIGENOMICS Some phenotypic changes can be inherited despite no accompanying alterations in the DNA base sequence. In the early days of genetics such events were regarded as exceptions to the laws of Mendelian genetics and often ignored as awkward. Today we know that inherited changes in gene expression are responsible for such effects and the phenomenon is referred to as epigenetic inheritance. Epigenetics is perhaps best viewed as an “extra” level of inheri- tance superimposed on top of the DNA sequence. The epigenome is the total number of possible epigenetic changes that can be imposed on any particular genome. Most epigenetic events are indeed due to changes to the DNA but to alterations that do not change the base sequence. The simplest and most common examples result from methyla- tion of the DNA. Clearly this is a chemical alteration to DNA that does not change the base sequence. In higher organisms another common epigenetic mechanism is the chemical modifcation of the histone proteins around which eukaryotic DNA is wound. Both of these alterations can greatly affect gene expression. Epigenetics is not always due to DNA modifcations. Both RNA and protein-based mecha - nisms may occur. RNA interference RNAi is discussed in Chapter 5. In some organisms such as nematode worms and plants the RNAi response can be inherited as a result of the amplifcation and persistence of the short-interfering RNA siRNA molecules that trigger the response. This therefore counts as RNA-based epigenetics. Protein-based epigenetics is seen in the prions of yeast discussed in Chapter 21. In this case regulatory proteins with a changed conformation are inherited and are ultimately responsible for environmental adap- tation via altered gene expression. Such alterations are not true epigenetic effects unless the altered gene expression state is inher- ited by another generation of cells. For single-celled organisms this is unambiguous but in multicellular organisms epigenetic inheritance may occur at two levels: between cells within the same organism or across generations via the gametes and sexual reproduction. Unfortunately now that epigenetics and epigenomics have become fashionable these terms are often wrongly used to include cases in which DNA methylation or modifcation of the histone proteins causes changes in gene expression even when they are not passed on between generations. Examples of epigenetics in bacteria due to methylation of DNA that persist between genera- tions have been known for some time but have rarely been discussed from the viewpoint of epigenetics. They include the methylation of the genome DNA by the modifying enzyme

slide 289:

Genomics and Gene Expression 286 Using a variety of techniques together can elucidate the network of gene regulation controlling the formation of an entire embryo. In the sea urchin Strongylocentrotus purpuratus the embryo undergoes a specifc set of spatial and temporal events that control development of the endomesoderm. The precursor cells of the blastula start the process and continue to develop and divide into the adult sea urchin. Control of development is due both to altering gene expression and varying protein–protein interactions. Perturbing the function of these genes with various techniques such as morpholino antisense mRNA mRNA overexpression and two-hybrid analysis in the sea urchin together with methods to confrm location such as whole-mount in situ hybridization has allowed a network of gene functions to be pro- posed Fig. A. Arrows show each gene that exerts its infuence on other genes or proteins. Such networks can be constructed and fur- ther tested to refne the model. The latest version of the endomeso - dermal network is at http://sugp.caltech.edu/endomes/. Box 8.1 Endomesoderm Specification in Sea Urchin Embryos Maternal and Early Interactions PMC Endomesoderm Mes Veg1 Endo Endo FIGURE a Genome View of Endomesodermal Gene Regulatory Network The gene regulatory network is divided into spatial domains. Each gene is depicted as a short horizontal line from which extends a bent arrow indicating transcrip- tion. Genes are indicated by the names of the proteins they encode. From Oliveri P Davidson EH 2004. Gene regulatory network controlling embryonic specifca - tion in the sea urchin. Curr Opin Genet Dev 14 351–360. Reprinted with permission.

slide 290:

Chap TER 8 287 of restriction/modifcation systems see Chapter 3 and regulation of phase variation in those cases in which methylation of DNA is responsible. Phase variation is the random and reversible switching of phenotypes between alternative states: “on” and “off.” It is typi- cally seen with surface components of bacteria such as pili fagella and outer membrane proteins and can have major effects on bacterial lifestyle and infectivity. Clearly changing proteins that are accessible on the cell surface can protect against detection by the immune system. Moreover attachment to or invasion of host cells depends on which bacterial appendages are present. For example the papBA operon of uropathogenic Esch- erichia coli encodes the pyelonephritis-associated pilus that allows binding to membranes of host cells in the kidneys. Synthesis of the pilus is thus required for urinary tract infections to occur. Synthesis of the pilus fip-fops between on and off depending on the methylation state of two sequences in the promoter of the papBA operon Fig. 8.33. This in turn deter- mines whether the two DNA-binding regulatory proteins Lrp and PapI bind. EpIGENOMICS IN hIGhER ORGaNISMS Regulation of gene expression by methylation of DNA is of much greater signifcance in eukaryotes and is especially important in the development of multicellular organisms. In humans methylation of the genome typically occurs on the cytosine of CpG motifs. Many such motifs are present in the upstream regulatory regions in front of genes and in these cases methylation usually turns off the genes. Methylation differences accumulate between identical twins during their lives and may account for some of their divergence in metabo- lism and/or behavior. Overall human DNA methylation decreases with age and this has been speculatively linked to a variety of diseases including cancer. The term methylome denotes the total number of methylated sites on the DNA whether the methyl groups are inherited and hence truly epigenetic or not. Most methyl groups on eukaryotic DNA are on cytosines especially on CpG sequences. The methylome is therefore analyzed by bisulfte sequencing. Treatment with sodium bisulfte converts nonmethylated cytosine into uracil but methylated cytosine is not affected. Consequently sequencing with and without bisulfte treatment followed by comparison of the two sequences allows methyl - ated sites to be identifed Fig. 8.34. Some major roles of epigenetic regulation in mammals are as follows: a Genome integrity: Nearly half the human genome consists of assorted mobile elements. In higher plants the proportion is even higher. Although most of these elements are defective uncontrolled replication or movement of those still active could cause major damage to the genome. In practice these elements are mostly covered with DNA methylation and histone modifcations that prevent expression. These often vary depending on the nature of the ele - ment. Thus DNA transposons typically have high levels of methylation of Lys9 on histone 3 whereas retrotransposons tend to be methylated on Lys20 of histone 4. b X chromosome inactivation: In female mammals one of the pair of X chromosomes in each cell is inactivated in order to keep gene dosage the same as in males where there is only FIGURE 8.33 Phase Variation of the PapAB Pilus The promoter region of the papAB operon found in some pathogenic E. coli has two clusters of binding sites for the Lrp DNA-binding protein. Each of these clusters may be methylated or nonmethyl- ated. The Lrp protein will only bind to nonmethylated DNA. When Lrp binds close to the promoter sites 4 5 and 6 transcription is blocked. Conversion from one form to the other depends on the PapI protein which helps Lrp bind to the more distal sites numbers 1 2 and 3. papBA PHASE OFF 123 456 CH 3 Lrp binding sites Lrp Lrp Lrp papBA PHASE ON CH 3 Transcription proceeds Lrp Papl Papl Papl Papl Lrp Lrp RNA pol Epigenetic effects are due to inherited changes in gene expression but are not due to changes in the DNA base sequence. Alterations in DNA methylation or histone modifcation are often responsible for epigen - etic effects.

slide 291:

Genomics and Gene Expression 288 a single X chromosome. This is done by DNA methylation and histone modifcations and requires the Xist noncoding RNA. c Parental imprinting: As discussed in Chapter 17 in a few genes only one of the pair of alleles is expressed. Which is chosen depends on whether the allele comes from the mother or the father. The allele from the other parent is silenced by methylation of the DNA and histones. d Development and differentiation: The cells of multicellular organisms all share the same genome but perform very different biological roles. A variety of regulatory mechanisms including epigenetic changes play a part differentiation. In particular once cells have reached a fnal specialized form they often rely on epigenetic modifcations to ensure that all their descendants are of the same type. Although instances are rare and diffcult to analyze environmental effects can trigger epigen - etic changes that affect how genes are expressed in future generations of organisms not just of cells. In humans it is hardly surprising that epigenetic changes that occur in the mother may affect the offspring. However a few surveys have suggested that the grandparents’ diet may have effects on gene expression and hence the health of children two generations later. Proving this conclusively with humans is not feasible. However experiments with mice have shown that the fathers’ diet can infuence the expression of genes involved in metabolism in the offspring even when all the mothers had the same diet and the new generation never met their fathers. This makes good sense from an evolutionary perspective. Epigenetics can pre-adapt the new generation to the environment they will probably encounter without the need for permanent Cytosine N H NH 2 O N Cytosine sulfonate SO 3 − N H NH 2 O N H Uracil sulfonate SO 3 − N H O O HN Uracil N H O O HN HSO 3 − OH − 5-methylcytosine Bisulfite treatment N H NH 2 O N HSO 3 − OH − OH − HSO 3 − H 2 O NH 4 + A B CH 3 No reaction 5–GAGTCACCGTTCGTTAA–3 CH 3 CH 3 Amplify by PCR and sequence 5–GAGTUACUGTTCGTTAA–3 5–GAGTTACTGTTCGTTAA–3 CH 3 CH 3 FIGURE 8.34 Bisulfte Sequencing of the Methylome To discover which cyto- sines in the genome are methylated scientists carry out sequencing on both untreated DNA and DNA that has been treated with bisul- fte. The bisulfte converts nonmethylated cytosines green to uracils A. During PCR the uracil is replaced by thymine B. However methylated cytosine red is protected and remains as cytosine. The two sequences are then compared.

slide 292:

Chap TER 8 289 alterations in the genome. Untangling the effects of genetics epigenetics and environment especially for complex conditions such as obesity or diabetes is very diffcult and still in its infancy. In higher animals epigenetics may involve cell generations within a single organism or distinct genera- tions of individual organisms. A variety of environmental factors can cause epigenetic changes that persist between generations. Summary Genomics is the study of the total nucleotide sequence for an organism of interest including genes pseudogenes noncoding regions and regulatory regions. In human genomics identifying the sequence of the entire set of chromosomes was a major achievement. The sequence of the human genome was determined by making DNA libraries sequencing each of the clones and then compiling the sequences. Physical maps genetic maps and computer algorithms were used to arrange the sequences in the correct order. Without the great advances in computing the Human Genome Project would have taken much longer and cost more money. Data mining of the infor- mation has identifed many potential protein coding regions regulatory elements and different types of repetitive elements. The human genome is predicted to contain about 21000 genes plus SINES LINEs and tandem repeats such as telomeres centro- meres and satellite DNA. Computer analysis of such sequence information is called bioinformatics. Genomics has changed many felds of study including medicine pharmacology and evolutionary biology. Understanding and identifying new genes related to diseases is changing the way diseases are treated and diagnosed. Much of this textbook is devoted to these advances. The study of genomics focuses on mutations in the genome by iden- tifying single nucleotide polymorphisms methylation patterns or differences in tandem repeats. Mutations include single nucleotide changes inversions deletions and inser- tions. Pharmacologists hope to correlate these differences with drug sensitivity thus preventing adverse drug reactions. In evolutionary biology physical features have previ- ously been used to determine the relatedness of two organisms. Since the genomes of many organisms have now been sequenced their DNA sequences can be used to deter- mine relatedness. Over time mutations accumulate within every genome. The more essential genes change slowly over time whereas less essential genes incorporate more changes. Genomics also encompasses gene expression which was frst done on a global scale using genomic DNA microarrays that have either cDNA or synthetic oligonucleotides linked to a glass slide. Fluorescently labeled mRNA is then added. Where an mRNA hybridizes to the immobilized genomic DNA that region will fuoresce. The amount of fuorescence correlates to the amount of mRNA and hence gene expression. More recently arrays have been partly superseded for transcriptome analysis by the use of RNA-Seq. This technique depends on sequencing all the RNA extracted from a cell after conversion to cDNA and counting how many copies of each RNA molecule are pres- ent. In single gene analysis specifc regulatory regions defned by genomics are linked to a variety of different reporter genes including β-galactosidase alkaline phosphatase luciferase and green fuorescent protein. These studies replace the gene of interest with the reporter gene which is much easier to assay and thus allow analysis of the regulatory region under study.

slide 293:

Genomics and Gene Expression 290 1. Which of the following is utilized in genomic research a. micr osatellite polymorphism b. restriction fragment length polymorphism c. single nucleotide polymorphism d. variable number tandem repeat e. all of the above 2. What is contig mapping a. Determination of regions that overlap from one clone to the next in a library b. The distance in base pairs between two markers c. The use of landmarks in the genes to put together sequencing data d. The relative order of specifc markers in a genome e. The mapping that determines if a library sequence is from one continuous gene or two gene segments cloned into one vector 3. Which method was used to sequence the human genome a. cytogenetic mapping b. shotgun sequencing c. chr omosome walking d. radiation hybrid mapping e. All of the above were used in combination to complete the project. 4. Which organism has the most genes a. H. sapiens b. D. melanogaster c. O. sativa d. P . trichocarpa e. A. thaliana 5. What is a gene a. a segment of DNA that encodes a protein b. a segment of DNA that encodes non-translated RNA c. sequences of DNA that are not transcribed d. a segment of DNA that is transcribed e. all of the above 6. Which of the following is considered a feld of study related to bioinformatics a. pr oteomics b. computational biology c. genomics d. cheminformatics e. all of the above 7. How is data mining useful to biotechnology research a. It allows researchers to determine sequence similarity which usually trans- lates into functional similarity. b. Data mining allows researchers to use computers to study sort and com- pile the vast amounts of raw data generated through bioinformatics. End-of-Chapter Questions

slide 294:

Chap TER 8 291 c. Data mining is the act of gathering the raw data from research projects such as sequencing into one central location. d. Data mining usually provides too much information which only slows down the research project and is therefore not very useful. e. None of the above 8. Which type of mutation is the most common a. insertion of one or more bases b. depletion of one of more bases c. base substitutions d. inversion of DNA segments e. duplications of DNA segments 9. Which of the following statements about mutations is not true a. Mutations occur in all organisms at the same rate. b. DNA polymerase never produces mutations during replication because of the proofreading ability of this enzyme. c. Mutations often occur at methylated cytosine residues. d. Mutations such as duplications or deletions occur due to repetitive sequences causing strand slippage. e. When comparing mutation rates to coding capacity mutation rates are usually the same for most organisms which suggests a mechanism to control the rate. 10. Which one of the following is often used to establish family trees for organ- isms because it is present in all organisms and does not accumulate muta- tions quickly a. rRNA b. fbrinopeptides c. hemoglobin d. chlor oplasts e. mitochondrial DNA 11. Which of the following statements about DNA microarrays is not correct a. Fluorescently labeled mRNA from the organism hybridizes to the DNA on the glass slide. b. DNA microarrays contain thousands of DNA segments on a support such as a glass slide. c. Hybridization to a DNA microarray can only occur once. d. The amount of fuorescence correlates with the amount of mRNA in the sample. e. The data obtained from DNA microarrays represents a global view of gene expression even under particular growth conditions. 12. What is the term used to describe the process of synthesizing oligonucle- otides directly on the glass slide a. photosynthesis b. photolithography c. light-activated oligosynthesis d. on-chip oligosynthesis e. pr otected oligosynthesis Continued

slide 295:

Genomics and Gene Expression 292 13. Which of the following statements highlights the issues surrounding oligo- nucleotide microarrays a. A duplex may not properly form if the mRNA probe has several mis- matches compared to the oligonucleotide sequence. b. The ability to hybridize to the oligonucleotides will be decreased if the probe is able to form a stem-loop structure. c. The A:T content of the oligonucleotide may affect the stability of the duplex. d. Depending on the size of the spacer incoming probes may not be able to hybridize if the spacer is too small or the oligonucleotide may fold back on itself if the spacer is too long. e. All of the above are issues surrounding oligonucleotide microarrays. 14. What can whole-genome arrays identify a. regions on the DNA that are methylated b. transcription factor binding sites c. various polymorphisms d. r epetitive elements e. all of the above 15. Which one of the following fusion proteins does not require some kind of chemical substrate to observe activity a. luciferase b. alkaline phosphatase c. green fuorescent protein d. β-galactosidase e. all of the above 16. Why is RNA-Seq more advantageous than typical microarrays a. RNA-Seq does not need a probe. b. Both coding and non-coding RNAs can be monitored with RNA-Seq. c. Alternative splicing is detected by RNA-Seq. d. Allele-specifc expression can be monitored with RNA-Seq. e. All of the above are advantages over typical microarrays. 17. What technique could you use to simultaneously monitor host and pathogen RNA a. RNA-seq b. micr oarrays c. R T -PCR d. gene fusions e. ChIP 18. Epigenetics___________. a. is the total number of possible changes within a particular genome b. refers to inherited changes in gene expression c. is due to DNA modifcations d. is strictly limited to inherited proteins e. results from DNA base changes

slide 296:

Chap TER 8 293 Further Reading Alföldi J. Lindblad-Toh K. 2013. Comparative genomics as a tool to understand evolution and disease. Genome Research 23 1063–1068. Clark D. P. Pazdernik N. J. 2012. Molecular Biology 2nd ed.. Waltham MA: Elsevier Academic Press/Cell Press. Feil R. Fraga M. F. 2012. Epigenetics and the environment: emerging patterns and implications. Nature Reviews Genetics 13 97–109. Flicek P. et al. 2014. Ensembl 2014. Nucleic Acids Research 42 Database issue D749–D755. Haggarty P. 2012. Nutrition and the epigenome. Progress in Molecular Biology Translational Science 108 427–446. International Human Genome Sequencing Consortium. 2001. Initial sequencing and analysis of the human genome. Nature 409 860–921. International Human Genome Sequencing Consortium. 2004. Finishing the euchromatic sequence of the human genome. Nature 431 931–945. Liolios K. Chen I. M. Mavromatis K. Tavernarakis N. Hugenholtz P. Markowitz V. M. Kyrpides N. C. 2010. The Genomes On Line Database GOLD in 2009: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Research 38 Database issue D346–D354. Lodish H. Berk A. Kaiser C. A. Krieger M. Bretscher A. Ploegh H. Amon A. Scott M. P. 2012. Molecular Cell Biology 7th ed.. New York: WH Freeman. McGettigan P. A. 2013. Transcriptomics in the RNA-Seq era. Current Opinion in Chemical Biology 17 4–1 1. Mutz K. O. Heilkenbrinker A. Lönne M. Walter J. G. Stahl F. 2013. Transcriptome analysis using next- generation sequencing. Current Opinion in Biotechnology 24 22–30. Pagani I. Liolios K. Jansson J. Chen I. M. Smirnova T. Nosrat B. Markowitz V. M. Kyrpides N. C. 2012. The Genomes OnLine Database GOLD v.4: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Research 40 D571–D579. Qu H. Fang X. 2013. A brief review on the Human Encyclopedia of DNA Elements ENCODE project. Genomics Proteomics Bioinformatics 11 135–141. Saben J. Zhong Y. McKelvey S. Dajani N. K. Andres A. Badger T. M. Gomez-Acevedo H. Shankar K. 2014. A comprehensive analysis of the human placenta transcriptome. Placenta 35 125–131. 19. Which of the following is not an observed epigenetic change a. RNAi inheritance in nematodes and plants. b. Inheritance of yeast prions as observed through conformation changes in regulatory proteins. c. The effects of DNA methylation and histone modifcation on the gene expression within a single cell. d. Regulation of phase variation in bacteria due to DNA methylation patterns. e. Persistance of siRNAs during cell division. 20. When a silenced allele is inherited from one parent this is called__________. a. chr omosome inactivation b. imprinting c. development and differentiation d. envir onmental infuence e. maintenance of genome integrity

slide 297:

Genomics and Gene Expression 294 Scheinfeldt L. B. Tishkoff S. A. 2013. Recent human adaptation: genomic approaches interpretation and insights. Nature Reviews Genetics 14 692–702. Venter J. C. et al. 2001. The sequence of the human genome. Science 291 1304–1351. Westermann A. J. Gorski S. A. Vogel J. 2012. Dual RNA-seq of pathogen and host. Nature Reviews Microbiology 10 618–630. Wetterstrand K. A. 2014. DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program GSPAvailable at www.genome.gov/sequencingcosts.

slide 298:

CHAPTER 295 Biotechnology Copyright © 2016 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/B978-0-12-385015-7.00009-0 Proteomics 9 Introduction Gel Electrophoresis of Proteins Western Blotting of Proteins High-Pressure Liquid Chromatography Separates Protein Mixtures Digestion of Proteins by Proteases Mass Spectrometry for Protein Identifcation Preparing Proteins for Mass Spectroscopy Protein Quantifcation Using Mass Spectrometry Protein T agging Systems Phage Display Library Screening Protein Interactions: The Y east T wo-Hybrid System Protein Interactions by Co-immunoprecipitation Protein Arrays Metabolomics

slide 299:

Proteomics 296 INTRODUCTION Today we have the genome sequences for humans as well as many other animals plants fungi and bacteria. All these data have given scientists a global view of the various genes in humans and others. However genes are only the frst step to understanding how an organism works. Genes are transcribed into mRNA and then translated into protein. So if we are to truly understand gene function the gene product or protein must be characterized also—hence the advent of proteomics. Proteomics refers to the global analysis of proteins and the proteome refers to the entire protein complement of an organism. The translatome or the complement of proteins expressed under specifc circumstances also falls into this feld of study. Note that the translatome is dynamic and changes when environmental conditions change. The relationships among the genome proteome and translatome are not linear. The genome of a species is the most stable but differences do exist between one person and the next and between one generation and the next. The proteome correlates highly with the genome because proteins are the products of the majority of genes. However some genes encode nontranslated RNA and so do not contribute to the proteome. In addition some genes especially in higher eukaryotes may give rise to multiple proteins because of alternative splicing. In contrast the translatome is highly dynamic changing from minute to minute depending on many different stimuli. The genome ultimately dictates the changes in the translatome and proteome but genomic changes do not always affect the translatome or proteome. Sometimes for example mRNA tran- scripts are made but never translated into protein. MicroRNAs and siRNAs control the expression of many different proteins at the translational level. The rate of mRNA degradation and transla- tion will have a huge impact on how much protein is actually made. Thus although some genes give rise to a lot of mRNA very little protein is made because the transcripts are very unstable. The translatome and proteome are also affected by modifcations that occur after translation. For example the function of many proteins is altered by addition or removal of various func- tional groups such as phosphate acetyl AMP or ADP-ribose. Also many proteins especially in eukaryotes are altered by chemical modifcation of amino acid residues. Proteins also undergo proteolytic cleavage when no longer needed. Hence the composition of the trans- latome is affected by the rate of protein degradation also. Finally some proteins themselves may affect the expression of other proteins via assorted regulatory effects. All of these factors affect the protein makeup of the cell. Proteomics is the study of the protein complement of an organism. GEL ELECTROPHORESIS OF PROTEINS Studying proteins requires the ability to isolate and identify the proteins in a particular sample. The frst step is to separate them by size. Much as electrophoresis on agarose gels is used to sepa - rate DNA fragments by size so polyacrylamide gel electrophoresis PAGE is used to separate proteins by size Fig. 9.1. Polyacrylamide has smaller pores than agarose and is thus suitable for proteins because they are generally smaller than DNA molecules. When an electric feld is applied to a sample of proteins those with a smaller size are able to circumnavigate the pores of the acrylamide more easily and will migrate away from the negative pole faster than larger proteins. Unlike DNA most proteins do not have a net negative charge therefore protein samples are treated by boiling with sodium dodecyl sulfate SDS. It unfolds the polypeptide chains and coats the entire strand of amino acids with negatively charged SDS. The amount of SDS and therefore the amount of charge correlates with the length i.e. molecular weight of the protein. As in DNA electrophoresis the protein sample is loaded into a well and an electric current is applied so that the proteins migrate through the acrylamide gel. Finally the separated proteins are visualized using either Coomassie blue a dark blue dye or the more sensitive silver stain

slide 300:

CHaPTER 9 297 both of which bind tightly to all proteins. Coomassie can detect only proteins that are present above the picomole amount 10 −12 moles whereas silver stain can detect proteins that are in the femtomole order 10 −15 moles. Thousands of proteins are found in cells and many of these proteins are similar in size. However resolving individual pro- teins into single bands on an acrylamide gel requires that the sample have only a few proteins and they must be of different sizes. If the sample is from an entire cell then the sample has so many proteins that individual bands will smear together. Using two-dimensional PAGE 2D-PAGE can help alleviate this problem by frst separating the proteins by their native charge in one dimension and then separating by size in a sec- ond dimension Fig. 9.2. Isoelectric focusing is the term for separating proteins by their native charge. All proteins have an inherent natural charge due to the side chains of their amino acid residues. The total number of positively and negatively charged amino acids determines the natural charge on a protein. For separation by charge the sample is loaded into the top of a gel with a pH gradient. When an electric feld is applied the proteins move along the pH gradient until their charge is neutralized. This step is usually done on a tube- shaped gel. After this step is run the gel is removed from its tube. The second dimension of 2D-PAGE is separation by size. The tube gel containing the separated proteins is treated with SDS to denature the proteins and coat them with negative charges as in regular PAGE. The tube gel is laid along the top of a polyacrylamide slab gel and the proteins are separated by size. The gel is stained as described earlier. Two-dimensional PAGE can resolve many proteins. Early studies on E. coli used 2D-PAGE to characterize all the proteins present under different conditions about 1000 different proteins. Larger 2D-PAGE gels have been developed to study larger proteomes and can separate more than 10000 different proteins into single spots. Identifying each spot on a 2D gel is a major task. Scientists interested in studying the proteome or translatome can stain the proteins and cut out each spot of interest. The Direction of movement APPLY POWER Larger proteins slow moving Smaller proteins fast moving + − + − Gel slab of polyacrylamide FIGURE 9.1 SDS Polyacrylamide Gel Electrophoresis Proteins are denatured by boiling in buffer with SDS which coats the surface of the unfolded amino acids to give the protein a net negative charge. The total charge correlates to the size of protein. After being loaded into a sample well of a polyacrylamide gel the proteins migrate away from the negative pole and toward the positive pole. The sieving action of the gel allows the smaller proteins to move faster than the larger. The distance traveled in a given time is proportional to the log of the molecular weight. FIGURE 9.2 Two-Dimensional Polyacrylamide Gel Electrophoresis First a sample containing large numbers of proteins is separated by charge along a pH gradient using isoelectric focusing. The proteins migrate until their charge is neutralized. The tube gel is then removed treated with SDS and placed on top of a slab gel. The proteins are then separated according to size in the second dimension as described before. Isoelectric focusing Protein mixture pH3 pH7 SEPARATE BY NATURAL CHARGE a REMOVE GEL FROM TUBE b TREAT WITH SDS c PLACE ONTO SLAB GEL pH10 + − Polyacrylamide gel Run SDS-PAGE in second dimension Glass tube Protein bands

slide 301:

Proteomics 298 protein trapped in each piece of gel can be digested into peptide fragments and identi- fed by mass spectrometry see later discussion. Another key feature of 2D-PAGE is the ability to quantify the relative amounts of different proteins. The size of the spot indi- cates the relative abundance of that protein. This can be quantifed by scanning with a laser. Computer analysis then determines the density of the spot and relative abundance. Up to three different samples can be run together on the same gel to compare the proteins pres- ent under different conditions in a process called differential gel electrophoresis DIGE Fig. 9.3. Each sample is labeled with a different fuorescent dye. Fluorescent labels such as methyl-Cy5 Cy2 and propyl-Cy3 will fuoresce different colors but do not affect protein separa - tion in 2D-PAGE. For example two different samples can be mixed and separated by 2D gel elec- trophoresis. If one set of proteins is labeled with Cy5 those proteins will fuoresce red. If there is a unique protein in that sample the resulting protein spot will only fuoresce red. If the other set of proteins is labeled with Cy3 any protein unique to that sample will fuoresce green. If there is a protein that is found in both samples then the two will migrate to the same location in the 2D gel and when the green and red fuoresce at the same time the spot will look yellow. The unique proteins can be isolated digested and identifed using mass spectroscopy see later discussion. Although 2D-PAGE has been widely used whether for a single sample or two different samples it does have some disadvantages. Certain classes of proteins are underrepresented on a gel because they do not migrate through acrylamide. Extremely large proteins often cannot enter into the gel matrix whereas very small proteins may travel off the ends of the gel. Hydrophobic proteins may travel through the gel but their dynamics is altered because of their hydrophobic surfaces. These proteins often run at different positions than expected based on molecular weight. Proteins that are scarce within the cell such as transcription factors are barely visible if at all with even the most sensitive dyes. Abundant proteins on the other hand can distort the location of nearby proteins within the gel because the gel becomes saturated at that location. Another issue with 2D-PAGE is that proteins must be isolated from the acrylamide for analysis by mass spectrometry see later discussion. CY5 NORMAL CANCEROUS CY3 COLLECT PROTEIN SAMPLES LABEL WITH TWO DIFFERENT FLUORESCENT DYES MIX AND SEPARATE WITH 2-D PAGE FIGURE 9.3 Two-Color 2D-Gel Proteins from two different conditions e.g. normal and cancerous can be compared directly on the same gel by labeling each with a different fuorescent dye. When the gel is visualized to see the dyes the proteins found only in normal tissue form red spots the proteins found only in cancerous tissue form green spots and proteins found in both normal and cancerous tissue look yellow because green and red fuorescent dyes appear yellow when mixed. Polyacrylamide gel electrophoresis separates a mixture of proteins based on their size. The proteins must frst be coated with SDS which gives the proteins a net negative charge proportional to their size. Applied electrical current moves the protein away from the negative pole and toward the positive pole. Two-dimensional PAGE frst separates proteins based on their inherent charge using isoelectric focus - ing. The proteins are then separated by size as in SDS-PAGE. More than one protein sample can be run on the same gel using differential gel electrophoresis.

slide 302:

CHaPTER 9 299 WESTERN BLOTTING OF PROTEINS Often researchers will use Western blotting to identify proteins. Western blots rely on having an antibody to the protein see Chapter 6. Antibodies are extremely specifc and will bind only to one target protein. The frst step is to separate the proteins by size by using either standard SDS-PAGE or 2D-PAGE. The proteins are then transferred from the gel to a type of membrane made of nitrocellulose. Other types of membranes are made from nylon and are stronger. Either way the membrane must have a positive charge so that the negatively charged proteins will stick to its surface. The proteins are moved from the gel to the membrane with an electric current as shown in Figure 9.4. After the proteins are attached to the nitrocellulose membrane many areas of the membrane will not have any protein bound because the corresponding area of the protein gel was empty. These blank areas are positively charged and can bind nonspecifcally to the antibody. Therefore these sites must be blocked. Often the membranes are soaked in reconstituted nonfat dry milk. The milk proteins mask the unused sites on the mem- brane and will not bind to the antibody. Next the anti- body is added to a buffer solution and swirled around the membrane for a few hours. The antibody will bind only to its target protein and nowhere else on the mem- brane because it recognizes only one specifc epitope of its target protein see Chapter 6. The next step is to visualize the location of the primary antibody thus revealing the location of the target protein Fig. 9.5. To achieve this result researchers add a secondary antibody conju- gated to a tag or label. This antibody recognizes the heavy chain of the primary antibody with- out affecting its binding to the protein. Often the tag is horseradish peroxidase which oxidizes luminol with hydrogen peroxide to form an excited state of 3-aminophthalate that decays to emit a pulse of light at 425 nm. To identify the location of the light emission researchers visualize the membrane with a CCD camera that displays the location of light as bands in a digital image. Alternately a piece of photographic flm is placed over the membrane. The light pulses turn the flm black where the secondary antibody/primary antibody/target protein complex is located on the membrane. Western blots are used extensively in protein analyses since the results determine whether or not a protein is expressed in the sample and how much protein is present. FIGURE 9.4 Electrophoretic Transfer of Proteins from Gel to Nitrocellulose A “sandwich” is assembled to keep the gel in close contact with a nitrocellulose membrane while in a large tank of buffer. The sandwich consists of the gel gray and nitrocellulose green between layers of thick paper and a sponge yellow. The entire stack is squeezed between two solid supports so that none of the layers can move. The sandwich is then transferred to a large tank flled with buffer to conduct the current. As in SDS-PAGE the proteins are repelled by the negatively charged cathode and attracted to the positively charged anode. As the proteins move out of the gel they travel into the nitrocellulose where they adhere. Gel Sponge Filter paper Filter paper Solid support Nitrocellulose Cathode Anode Western blots identify the location of a specifc protein after it has been separated by SDS-PAGE. First a primary antibody recognizes the protein of interest. The location of the primary antibody is visualized by adding a secondary antibody conjugated to a detection system. HIGH-PRESSURE LIQUID CHROMa TOGRaPHY SEP aRa TES PROTEIN MIXTURES Chromatography is a general term for many separation techniques where a sample of molecules the analyte is dissolved in a mobile phase and then forced through a station- ary phase. For 2D-PAGE the mobile phase is the buffer and the stationary phase is the

slide 303:

Proteomics 300 + + Nitrocellulose membrane with attached proteins Primary antibody binds to specific protein ADD NONSPECIFIC MILK PROTEINS AND PRIMARY ANTIBODY ADD SECONDARY ANTIBODY CONJUGATED TO HORSERADISH PEROXIDASE ADD LUMINOL AND HYDROGEN PEROXIDE TO DETECT LOCATION OF SECONDARY ANTIBODY Secondary antibody binds to primary antibody Light emitted by HRP reaction indicates location of the protein HRP HRP HRP HRP NH 2 H 2 O 2 O NH NH O NH 2 O O – O – O LUMINOL LIGHT NH 2 O 425nm O – O – O 3-AMINOPHTHALATE FIGURE 9.5 Western Blot After a mixture of proteins has adhered to the nitrocellulose membrane one specifc protein can be detected using an antibody. The antibody is added with a solution of milk proteins and incubated with the membrane. The milk proteins block those regions of the nitrocellulose that do not have any proteins attached. The primary antibody attaches only to the target protein. A secondary antibody with attached horseradish peroxidase binds specifcally to the primary antibody. This allows the position of the target protein to be visualized. In this example the horseradish peroxidase oxidizes luminol and hydrogen peroxide to form an excited molecule called 3-aminophthalate. When this moves to its unexcited state a pulse of light is emitted which can be detected and visualized directly by a digital CCD camera.

slide 304:

CHaPTER 9 301 gels. In high-pressure liquid chromatography HPLC the sample is dissolved in a mobile phase and separated based on a specifc characteristic by passing over a stationary phase Fig. 9.6. In HPLC the mobile phase is forced through a chromatography column that is a narrow tube packed with the stationary phase under high pressure. As the mobile phase travels through the column the mixture separates and different fractions are collected at the column exit. The mixture is forced through the column by constantly adding more mobile phase in a process called elution. As the mobile phase exits the col- umn a detector emits a response to molecules in the eluting sample and draws a peak on the chromatogram. HPLC has many applications including separation identifcation purifcation and quantifcation of proteins or other analytes. Preparative HPLC is used to isolate and purify one specifc protein from a mixture. Using HPLC to identify a compound requires a specifc detection method. For example if the target protein carries a fuorescent label then a fuorescence detector would be used. One application of HPLC quantita - tive HPLC can be used to determine the amount of target protein by comparing it to a set of standard proteins with known amounts. This allows measurement of changes in amounts of a specifc protein under different conditions. A major beneft that HPLC offers to proteomics researchers is that the separated proteins are already in a liquid state making further analysis easier. Although HPLC seems simple in theory the actual process of separating the mixture into its components is complex. Each mixture has different chemistries and so many differ- ent solid phases are used to separate them. Even before it is loaded into the column the mixture can be manipulated to remove certain components or change their chemistry. For example treating with a phosphatase will remove phosphate groups from proteins. Such manipulations can increase the effciency with which the protein of interest is isolated from the mixture. Solvent Pump Injector HPLC column Detector Waste Data FIGURE 9.6 High-Pressure Liquid Chromatography The mobile phase far left is pumped into the HPLC column with the protein sample which is added through the injection port. The column separates the protein sample into different fractions that are measured by the detector. The fractions containing the protein of interest are saved and the remaining are sent to waste. The computer records the data.

slide 305:

Proteomics 302 HPLC is very adaptable because of the availability of different types of stationary phase materials. Size exclusion chromatography columns contain porous beads that separate mixtures of proteins by size. Large molecules do not enter the pores of the beads and travel through the column quickly while smaller compounds are delayed. Many different pore sizes are available for mixtures of different size ranges. Reverse-phase HPLC uses columns packed with hydrophobic alkyl chains attached to silica-based material. The column binds and delays hydrophobic molecules while hydrophilic molecules elute faster. Ion-exchange HPLC uses a stationary phase with charged functional groups that bind oppositely charged molecules in the sample. Such molecules remain in the column after the sample has passed through. To elute them the mobile phase is changed. For example if the pH of the mobile phase is adjusted the net charge on many proteins will be altered and they will be released from the column. Other stationary phases form hydrogen bonds with the analyte and separate based on overall polarity. For affnity HPLC the stationary phase contains a molecule that specifcally binds the target protein for example an antibody. When a mixture passes over the stationary phase only the target protein is bound and other proteins pass through. Changing the mobile phase so as to disrupt the interaction releases the protein of interest. As molecules exit the column they must be detected. Many different detectors exist. These usually respond by plotting peaks as molecules pass by. Refractive index detectors monitor whether the exiting mobile phase refracts any light by shining a light beam through it. Com- pounds present in exiting fractions scatter light and a photo-detector records this as a posi- tive signal. The amount of scatter affects the height of the peak and the length of time of the scatter determines the width of the peak. Ultraviolet detectors have a UV light source and a detector to determine when the passing mobile phase absorbs the UV light. Such detectors may monitor one or more wavelengths depending on the substance being examined. Fluorescence detectors detect compounds that fuoresce that is absorb and re-emit light at different wavelengths radiochemical detectors detect radioactively labeled compounds and electrochemical detectors measure compounds that undergo oxidation or reduction reactions. An approach that is increasingly used for proteomics is detection by mass spectrometry. This combination allows proteins separated by HPLC to be fed directly into a mass spectrometer for identifcation see later discussion for mass spectrometry. A critical aspect of HPLC is getting a good separation between the different proteins in the sample that is good resolution. Each peak that comes off the column should be sym- metrical and as narrow as possible. For high resolution many experimental conditions can be adjusted. The most obvious is changing the stationary phase. Sometimes just changing the particle size of the stationary phase improves separation. An alternative is to adjust the composition of the mobile phase. Temperature also affects many separations and may need to be controlled. An analyte is a sample of molecules in a liquid. This is the mobile phase that moves through a chroma- tography column. The stationary phase is the actual column packed with different materials. The properties of these materials determine what proteins are retained in the column and what proteins are expelled from the column. Different detectors are at the end of the column to determine when the protein exits. These detectors can identify proteins by refractive index ultraviolet absorption fuorescence radioactivity or electrochemistry. DIGESTION OF PROTEINS BY PROTEAES Proteases also known as proteinases or peptidases hydrolyze the peptide bond between amino acid residues in a polypeptide chain. Proteases may be specifc and limited to one or more sites within a protein or they may be nonspecifc digesting proteins into individual

slide 306:

CHaPTER 9 303 amino acids. The ability to digest a protein at specifc points is critical to mass spectrometry see later discussion and many other protein experiments. For example proteases are used to cleave fusion proteins or remove single amino acids for protein sequencing. Proteases are found in all organisms and are involved in all areas of metabolism. During programmed cell death proteases digest cellular components for recycling see Chapter 20. Plants deploy proteases to protect themselves from fungal or bacterial invaders. In the bio- technology industry proteases have many uses. For example they are included as additives in detergents to digest proteins in ketchup blood or grass stains. Proteases are classifed by three criteria: the reaction catalyzed the chemical nature of the catalytic site and their evolutionary relationships. Endopeptidases cleave the target protein internally. Exopeptidases remove single amino acids from either the amino- or carboxy- terminal ends of a protein. Exopeptidases are divided into carboxypeptidases or amino- peptidases depending on whether they digest proteins from the carboxy- or amino-termi- nus respectively. Proteases are also divided based on their catalytic site architecture. Serine proteases have a serine in their active site that covalently attaches to one of the protein fragments as an enzymatic intermediate Fig. 9.7. This class includes the chymotrypsin fam- ily chymotrypsin trypsin and elastase and the subtilisin family. Cysteine proteases have a similar mechanism but use cysteine rather than serine. They include the plant proteases papain from papaya and bromelain from pineapple as well as mammalian proteases such as calpains. Aspartic proteases have two essential aspartic acid residues that are close together in the active site although far apart in the protein sequence. This family includes the digestive enzymes pepsin and chymosin. Metalloproteases use metal ion cofactors to facili- tate protein digestion and include thermolysin. Finally the ffth class of proteases threonine proteases has an active-site threonine. Researchers who study proteases have not escaped the “-omics” culture and sometimes the term degradome is used for the complete set of proteases expressed at one specifc time by a cell tissue or organism. Understanding the degradome of an organism relies on much the same techniques as used for proteomics although modifcations are made to look only at proteases rather than at the entire proteome. For example protease chips contain only antibodies to known proteases rather than to all proteins see later discussion. Many different proteases exist in nature and they are useful for a variety of applications in biotechnology. Specifc proteases recognize specifc amino acids and cut the peptide bond in a specifc location. MAS SPECTROMETRY FOR PROTEIN IDENTIFICTION Mass spectrometry is a technique to determine the mass of molecules. In mass spectrometry a molecule is fragmented into different ions whose masses are accurately measured. The ions generate a spectrum of unique peaks which therefore determines the identity of the original molecule. The molecule 2-pentanol is used as an example in Figure 9.8. Here an electron beam aimed at the sample fragments the 2-pentanol into different ions: the molecular ion gains an electron m-1 loses hydrogen from the alcohol group m-15 loses a methyl group m-17 loses the alcohol group and m/e 45 loses the alkyl chain. These ions are accelerated into a vacuum tube by an ion-accelerating array. The ions travel through the tube at different speeds due to a magnetic feld that causes the ions to follow a curved path within the tube Fig. 9.9. The curves eliminate ions that are too small or too big. Ions that are too small gain so much momentum from the magnetic feld that they collide with the wall. Those that are too big are not defected by the magnetic feld and also collide with the wall. Ions in the right size range are defected by the magnetic feld around

slide 307:

Proteomics 304 both curves to hit the collector where they are recorded as peaks in the mass spectrum. The base peak is the most intense and other peaks are measured relative to the base peak. The time ions take from the accelerator to the collector correlates directly to the size of the ion. Each peak is plotted based on the mass to charge ratio m/z. The losses of mass such as m-17 or m-15 His SERINE PROTEASE ASPARTIC PROTEASE CYSTEINE PROTEASE METALLOPROTEASE Protease Ser His CH 2 OH NH OC CH HC R n+1 R n Protein Protein Protease Cys His S − NH OC CH HC R n+1 R n Protein Protein Protease Cys His S HN C CH HC R n+1 R n Protein Protein Protease Ser His CH 2 CH 2 CH 2 H 2 O O NH O O O O O H O − O O O − O OH H Protease Cys His S − C + ++ CH HC R n+1 R n Protein Protein Protease Ser CH 2 OH COOH HC R n Protein CH NH Asp Asp C C C R n H H H O Protein CH R n+1 Protein C-term peptide release C-term peptide release N-term peptide release N-term peptide release C-term peptide release N-term peptide release C-term peptide release COOH HC R n Protein N-term peptide release Acyl ENZYME INTERMEDIATE Acyl ENZYME INTERMEDIATE CH NH Zn ++ C C R n OH H Protein CH R n+1 Protein Protease CH 2 CH 2 HO OH O O O − HC NH Asp Asp C C C HO R n Protein CH R n+1 Protein HC HN COOH COOH R n Protein CH R n+1 Protein Protease Protease CH 2 Asp CH 2 Asp COO − Protease O − O − O CH NH Zn ++ C HO C R n Protein CH R n+1 Protein Protease O O CH OH H 2 N C C R n Protein O Zn ++ Zn ++ CH OH C R n Protein CH R n+1 Protein Protease O C Protease AB C D O – O – FIGURE 9.7 Mechanism of the Four Classes of Endopeptidase A Serine proteases cleave the peptide bond of a protein by the formation of an acyl enzyme intermediate. The active-site serine forms a temporary bond with the amino-terminal half of the digested protein. B Cysteine proteases are similar to serine prote- ases but use cysteine at the active site. C Aspartic proteases have two active-site aspartic acids that coordinate hydrolysis of the peptide bond in the target protein. D Metalloproteases hydrolyze a target protein using a metal ion such as Zn 2+ .

slide 308:

CHaPTER 9 305 refer to the loss of specifc groups from the parent molecule and are most informative to the structure of the sample molecule because each such group has a characteristic mass. Until recently very large molecules such as proteins were beyond the range of mass spectrometry. Two different ionization techniques have been developed that have made proteins manageable. The frst technique which embeds peptides in a solid matrix before ionization is called MALDI or matrix-assisted laser desorption-ionization Fig. 9.10. Here the peptides are embedded in a material such as 4-methoxycinnamic acid that absorbs laser light. The matrix absorbs and transfers the laser energy to the peptides causing them to release different ions. The ions are accelerated through a vacuum tube by a charged grid. At the far end the time-of-fight TOF detector records the intensity and calculates the mass. In between is a fight tube that is free of electric felds. The ions are accelerated with the CCCC C H H Hor HHHH HHH OH 2 - PENTANOL VARIOUS IONS H OH OH 30 INTENSITY 40 50 60 70 80 90 100 110 120 130 140 150 m + molecular ion + O +• m−1 loss of H at the alcohol + m−17 loss of −OH group m−17 m−15 m−1 2−pentanol m + + m/e 45 loss of alkyl chain m/e 45 OH OH m−15 loss of a methyl group + + FIGURE 9.8 Basic Mass Spectrometry for 2-Pentanol Every substance can be fragmented into multiple ions. This example shows the molecular structure of all the ions from 2-pentanol. A mass spectrometer separates these ions by size and graphs the results. The spectrum is always the same for each substance and so an unknown substance can be identifed by comparing its spectrum with a database of known substances.

slide 309:

Proteomics 306 same kinetic energy and when they reach the fight tube the lighter ions move faster than the heavier ions. The time-of- fight is proportional to the square root of mass to charge ratio m/z. MALDI is able to handle ions up to 100000 daltons. The other method for preparing ions from peptides is electrospray ionization ESI Fig. 9.1 1. Here the peptides are dissolved in liquid and very small droplets are released from a narrow capillary tube. The droplets enter the electro- static feld where a heated gas such as hydrogen causes the solvent to evaporate and the droplets to break up. This causes the peptide to release ions into the vacuum tube where they are accelerated by the electric feld. The detector at the far end varies based on the sample being studied. A TOF detector may be used as described earlier. Other detectors use quad- rupole ion traps or Fourier transform ion cyclotron resonance to determine the mass of the ions. Quadrupole ion traps capture the ions in an electric feld. The ions are then ejected into the detector by a second electric feld. The electric feld controls what size ions can pass to the detector and vary- ing the feld allows different-sized ions to be detected. Combination detectors exist that use both TOF and quadrupole ion traps. The advantage that ESI has over MALDI is that proteins isolated from HPLC see earlier discussion require no special preparation and can be used directly. The disadvantage of ESI is that masses of about 5000 are the maximum. The use of mass spectroscopy will become more prevalent as the methodology improves. For example surface-enhanced laser desorption-ionization SELDI mass spectroscopy takes liquid samples and ionizes the peptides that adhere to a treated metal. The technique shows great promise for bodily fuids such as blood and it is hoped will help in identifying a par - ticular protein profle for different disease states. Perhaps one day patients will be diagnosed Sample Magnetic field H 0 Electron beam Accelerating array Collector FIGURE 9.9 Schematic Diagram of a Mass Spectrometry Tube The sample travels through a narrow slit and then passes through a beam of electrons that disrupts it into a mixture of ion fragments. The accelerating array moves the fragments into the C-shaped tube. This is surrounded by a strong magnetic feld that prevents ions that are too small or too large from exiting the tube. A collector detects the exiting fragments and measures the time it took for them to travel the tube. The computer then converts the time of travel into size and charge information and plots this as a mass spectrum not shown. FIGURE 9.10 MALDI/TOF Mass Spectrometer Mass spectrometry can be used to determine the molecular weight of peptides. The peptides are crystallized in a solid matrix and exposed to a laser which releases ions from the peptides. These travel along a vacuum tube passing through a charged grid which helps separate the ions by size and charge. The time it takes for ions to reach the detector is proportional to the square root of their mass to charge ratio m/z. The molecular weight of the peptide can be determined from these data. + Sample molecule Ionized sample molecule Matrix Charged grid − − − TOF detector Drift zone Laser

slide 310:

CHaPTER 9 307 with cancer long before any symptoms are detected. A change in the peptide profle in their blood could denote that cancer cells are forming. MALDI and ESI are sensitive enough to detect changes in proteins due to phosphorylation glycosylation and so forth. The tech- nique can identify which amino acid is modifed because only one specifc ion is altered. Another improvement for MALDI to charac- terize protein samples is a technique called multidimensional protein identifcation technique MuDPIT. In this method a more complex mixture of proteins can be frag- mented into peptides and then identifed by mass spectroscopy. Traditional mass spectros- copy has limited numbers of different pro- teins that can be identifed. Usually samples with more than 100 proteins are too complex and have to be further fractionated by SDS- PAGE or chromatography. In MuDPIT a sample with up to 3000 different proteins can be reliably evaluated. The key to the technol- ogy is using a 2D LC microcapillary column to separate the peptide fragments prior to standard mass spectroscopy Fig. 9.12. This column improves peptide separation over traditional HPLC since there are two differ- ent stationary phases packed into a tube with an inner diameter of only 100 microns. The 2D aspect refers to the two stationary phases the frst is a strong cation exchange resin followed by a reverse-phase resin. The peptides are frst separated by charge on the cation resin and then by size and hydrophobicity on the reverse-phase column. Next the proteins are eluted with a combination of two solutions: an ammonium salt gradi- ent that releases the peptides by charge and an organic solvent gradient like acetonitrile that releases the peptides based on their hydrophobicity. As the peptide releases from the column and enters the needle-like exit portal it is ionized by electrical charge and the ions are separated using TOF analysis. This technique is powerful because many different proteins can be identifed and quantifed using one column and one mass spectroscopy experiment. − − − − − − − + Solution of sample molecules Droplet Smaller droplet Naked ions Charged grid − − − − Ion trap or quadrupole detector Drift zone Capillary inlet tube − − FIGURE 9.11 Electrospray Ionization ESI Mass Spectrometer ESI mass spectrometry uses a liquid sample of the peptide held in a capillary tube. After exposure to a strong electrostatic feld small droplets are released from the end of the capillary tube. A fow of heated gas within the drift zone evaporates the solvent and releases small charged ions. The charged ions vary in size and charge and the pattern of ions produced is unique to each peptide. The ions are separated by size using a charged grid to either impede or promote the fow toward the detector. FIGURE 9.12 MuDPIT Uses 2D LC Microcapillary Columns to Separate Peptides Prior to Mass Spectroscopy A Microcapillary column containing two different stationary phases separates proteins for mass spectro- scopy. The frst part of the column has a strong cation exchange resin which sepa- rates the peptide fragments by charge. The second part of the column has reverse- phase resin which separates the peptides by size and hydrophobicity. B After the peptides adhere to the column they are eluted by alternating an increasing ammonium salt concentration with a reverse- phase gradient. The peptides with the lowest charge are eluted frst to the reverse- phase portion of the column. Next an organic solvent gradient releases the lowest charged peptides in order of size and hydrophobicity. The eluted peptides pass through the needle and are ionized for mass spectroscopy. A B Peptide fragments Cation exchange MS/MS Data to computer ELUTION PROFILE Salt Time Reverse phase

slide 311:

Proteomics 308 PREPaRING PROTEINS FOR MAS SPECTROSCOPY Determining the sequence of a short peptide is readily achieved using mass spectroscopy techniques Fig. 9.13. To determine the sequence of a peptide researchers must obtain a pure sample of the protein either by cutting a spot from a two-dimensional gel or by HPLC purifcation. First the proteins are treated with reducing agents to break apart any disulfde bridges. To keep the –SH group from reforming a disulfde bridge the proteins are also alkylated. The protein is then digested into fragments using a protease such as trypsin which cuts proteins on the carboxy-terminal side of arginine and lysine. Cutting a protein into pep- tides helps reduce undesirable characteristics of the entire protein. For example membrane proteins are hydrophobic and stick together and digesting them into peptide fragments destroys this characteristic. Solubility issues can also often be resolved by digesting a protein into peptides. Determining the sequence of these peptides will yield the sequence of the original protein. Usually the peptide sequence from only one or two fragments is suffcient for identifying the original protein from which they derived. During mass spectroscopy the peptide mixture is ionized into multiple fragments Fig. 9.14. For peptides common ions include a doubly protonated form M + 2H 2+ where M is the mass of the peptide and H + is the mass of a proton. The ion peaks are plotted versus the mass to charge ratio m/z. For the doubly protonated peptide ion this would be the mass of the ion divided by 2. For example if the original peptide was 1232.55 daltons the double pro- tonated ion would have a mass of 1232.55 daltons + 2 × 1.0073 for each added hydrogen. The peak would appear at 617.2828. Note: The peak is plotted at the mass to charge ratio. That is the mass for this ion is 1234.5646 and the charge is +2. The peak appears at m/z or 1234.5646/2. When the mass spectrometer separates peptide ions the frst step is to deter - mine the charge state of the ion. Usually a cluster of peaks occurs for each peptide ion. If peaks are 1 dalton apart the charge state of the peptide is 1. If the peaks are 0.5 daltons apart the charge state is 2. To determine the peptide sequence researchers use two rounds of mass spectroscopy. This is called tandem mass spectroscopy because one ion is produced in the frst round of mass spectroscopy then that ion is fragmented by collision with a gas such as hydrogen argon or helium. As before the ion fragments are separated based on their mass to charge ratio. Each peak usually varies by one amino acid and the size difference between the peaks determines the amino acid sequence. Sometimes the spectrum obtained for a peptide ion is ambiguous so databases of peptide ion spectra are used for comparison. Mass spectroscopy ionizes a sample into smaller parts and measures the time it takes for these ions to reach the detector. The amount of time correlates with the size of the ion. Proteins can be ionized for mass spectroscopy after they are embedded either in a matrix for MALDI or in liquid as for ESI. Both techniques can identify the protein by its pattern of fragmentation and the techniques can identify any modifcations such as phosphorylation and glycosylation. MuDPIT separates more complex protein mixtures using 2D LC microcapillary column separation prior to ionization. The 2D LC column separates by charge hydrophobicity and size using two different stationary phases in tandem. Each amino acid degrades into predicted ions in mass spectroscopy. The amino acid sequence for an unknown peptide can be deduced based on the pattern of ions produced in comparison with the known patterns.

slide 312:

CHaPTER 9 309 ISOLATE PROTEINS PROTEIN DIGESTION SEPARATE PEPTIDES WITH HPLC ELECTROSPRAY IONIZATION OR MALDI SEPARATE PEPTIDE IONS DATA ANALYSIS Cell culture SDS-PAGE Peptides Mass spectrometry m/z FIGURE 9.13 Preparation of Proteins for Mass Spectrometry Because mass spectrometry is so sensitive the use of large whole proteins is limited. Instead peptide fragments are gener- ated by protease digestion. The peptides are easily separated with HPLC and then specifc peptides are subjected to mass spectrometry.

slide 313:

Proteomics 310 PROTEIN QUaNTIFICTION USING MAS SPECTROMETRY Mass spectrometry can also be used to quantify a particular peptide from a protein which directly correlates to the amount of protein Fig. 9.15. To purify the protein researchers prepare the sample and add a small amount of standards. The amount of peptide and therefore protein is determined by comparison to the standards. To compare the relative amounts of one protein in two different experimental conditions researchers grow samples of cells with and without amino acids tagged with a stable isotope in a technique called Stable Isotope Labeling by Amino acids in Cell culture SILAC. The heavy isotope is 13 C or 15 N. These isotopes increase the mass of all proteins in that particular sample. The cells from each condition are lysed and the proteins isolated. The two samples are mixed and analyzed using one of the forms of chromatography followed by mass spectroscopy. Each individual peptide will now have two peaks: one from the normal sample and one from the heavy sample. The ratio of the two peaks will determine the relative change in level of the protein of interest between the samples. FIGURE 9.14 Mass Spectroscopy Trace of Peptide Fragments Post-source decay spectrum of a tryptic peptide m/z 1187.6 from the 50-kDa subunit of DNA polymerase from Schizosaccharomyces pombe. The spectrum was acquired on a Voyager mass spectrometer Applied Biosystems. From Medzihradszky KF 2005. Peptide sequence analysis. Methods Enzymol 402 209–244. Reprinted with permission. Changing experimental conditions affects the amount of a particular protein. This change can be deter- mined using mass spectroscopy by adding heavy isotopes to one of the samples in a process called SILAC. PROTEIN Ta GGING SYSTEMS Protein tagging systems are tools for the isolation and purifcation of single target proteins from a mixture. The target protein is genetically fused to a segment of DNA that codes for a “tag” creating a hybrid gene. This hybrid coding system is inserted into a vector with the

slide 314:

CHaPTER 9 311 appropriate promoters and terminators to express the tagged protein of interest. The gene construct is transformed into a suitable host organism for expression. When the cells are grown and disrupted to release the proteins the target protein can be easily isolated because of its tag. Many different types of tags are used to isolate proteins because the chemistry and size of the tag may affect the protein of interest in a negative way. The frst widely used tag called the polyhistidine or His6 tag consists of six histidine residues in a row Fig. 9.16. Histidine binds very tightly to nickel ions therefore His-tagged proteins are purifed on a column to which Ni 2+ ions are attached. Once attached to the column the His6-tagged protein is removed by disrupting the Ni 2+ –His interaction with free histidine or imidazole. The polyhistidine tag may be attached to the carboxy- or amino-terminal end of the protein of interest. Because the His6 tag is very short the target protein is rarely affected by adding it. Other short tags for proteins include FLAG which is recognized by a specifc antibody. FLAG has the peptide sequence AspTyrLysAspAspAspAspLys. As before the gene for the target Cells grown in normal conditions Cells grown with “heavy” arginine in experimental condition COMBINE CELL LYSATES AND PURIFY PROTEINS DIGEST PROTEINS TANDEM MS COMPARE PEAK RATIOS TO QUANTIFY PROTEINS Intensity m/z “Light” peptide “Heavy” peptide FIGURE 9.15 Quantifcation of Peptides Using Mass Spectrometry Mass spectrometry can compare the amount of a particular protein in samples from two different conditions. Cells in one condi- tion are grown with a “heavy” amino acid such as arginine which is incorporated during protein synthesis. The proteins are digested with proteases into small peptide fragments. These are ionized by ESI mass spectrometry. The analysis gives pairs of light and heavy peaks for each peptide. Peak sizes correlate to the amount of peptide and hence protein. The “heavy” peak is more abundant in this example thus this protein is more abundant under the experimental conditions.

slide 315:

Proteomics 312 HHHHHH HHHHHH Ni Ni HHHHHH Mixture of proteins Nickel column HHHHHH Ni Ni Ni Ni Ni Ni His tagged proteins bind Other proteins pass through His tagged protein is displaced Ni HHHHHH PROTEINS POURED THROUGH COLUMN ELUTE WITH HISTIDINE OR IMIDAZOLE FIGURE 9.16 Nickel Purifcation of His6 Tagged Protein To isolate a pure sample of one specifc protein the gene for the protein is genetically linked to a coding region for six histidine residues. The expressed fusion protein can be isolated from a mixture of proteins because of the chemistry of histidines. The histidines bind to nickel-coated beads and the remaining untagged proteins pass through the column. The histidine-tagged protein is then eluted by passing free histidine or imidazole over the column.

slide 316:

CHaPTER 9 313 protein plus a short DNA segment encoding FLAG is cloned into a vector and the hybrid protein is produced in either bacteria or a cell line. FLAG-tagged proteins can be isolated from a cell lysate using the anti-FLAG antibody either bound to beads or attached to a col- umn. Only the tagged protein attaches to the beads/column. Finally the FLAG-tagged protein is separated from the antibody by adding free FLAG peptide. The short peptide is present in surplus and competes for antibody with the tagged protein which is therefore eluted from the beads or column. Another short tag is the “Strep” tag provided by a short DNA segment that encodes a 10-amino-acid peptide with a similar 3D structure to biotin see Chapter 3. The biotin-like peptide binds tightly to the proteins avidin or streptavidin so Strep-tagged proteins are iso- lated by binding to streptavidin-coated beads or a streptavidin column. Besides short tags longer tags that consist of entire proteins are used for some applications. Four popular tags include green fuorescent protein GFP from jellyfsh protein A from Staphylococcus glutathione-S-transferase GST from Schistosoma japonicum and maltose-binding protein MBP from Escherichia coli. Just like the short tags the genes for these longer tags are genetically fused to the target gene. The hybrid gene constructs are expressed by using appropriate transcriptional promoters and terminators. Once the host cells express the hybrid gene the fusion protein is isolated by purifying the protein tag. GFP-tagged proteins offer a specifc advantage since the tag also autofuoresces under UV light. The exact location in which a protein is expressed can be determined by microscopy frst and then binding antibodies to GFP can purify the protein. In a similar manner specifc antibodies to protein A can be used to purify protein A-tagged proteins. Purifed GFP or protein A-tagged proteins can be released from the antibody by lowering the pH. MBP binds to maltose attached to beads or a column and the fusion protein is released by adding free maltose. GST binds to its substrate glutathione on beads or a column and free glutathione is used to release the hybrid protein. Once the fusion protein has been isolated it must be cleaved to separate the target protein from the tag protein. A useful feature of the longer tags is the presence of a protease cleavage site between the target protein and the tagging protein. The vector for pMAL New England Biolabs Inc. Ipswich MA has the gene for MBP followed by a spacer region a recogni- tion site for factor Xa then the polylinker region for cloning the target gene Fig. 9.17. Factor Xa is a specifc protease used in the blood clotting system and inserting its recognition sequence allows the MBP portion of the hybrid protein to be cleaved from the target protein. After the hybrid protein is eluted from the purifcation column with maltose the original protein is iso - lated through protease treatment. This is extremely useful when pure native protein is needed for analysis. An even easier way to obtain a pure sample of native protein is the self-cleavable intein tag. The approach is based on inteins self-splicing intervening segments found in some proteins. Inteins are the protein equivalent to introns in RNA. The intein removes itself from its host protein via a branched intermediate Fig. 9.18. The Intein Mediated Purifcation with Affnity Chitin-binding Tag IMPACT system from New England Biolabs uses a modifed intein from the VMA1 gene of Saccharomyces cerevisiae. Intein cleavage is used to release the target protein after purifcation of the fusion protein Fig. 9.19. This yeast intein originally cleaved both its N-terminus and its C-terminus but it has been modifed so that it cleaves only its N-terminus. The chitin-binding tag of this system is the small chitin-binding domain CBD from the chitinase A1 gene of Bacillus circulans. Chitin is the substance that forms the exoskeleton of insects. The vector has a FIGURE 9.17 Maltose-Binding Protein Fusion Vector Vectors such as pMAL have polylinker regions for cloning a target gene in frame with the gene for a tag protein such as MBP the malE gene. The fusion protein is easily isolated because MBP binds to maltose columns. The fusion protein also has a binding site for the protease factor Xa. When the fusion protein is bound to the column factor Xa will release the target protein and leave behind the MBP domain. Asn 10 spacer Factor Xa site Strong terminator Ampicillin resistance Origin of replication pMAL vector Polylinker tac promoter malE Target gene lacZα bla

slide 317:

Proteomics 314 PROTEIN PRIOR TO SPLICING BRANCHED INTERMEDIATE INTEIN CLEAVED OFF EXTEINS REARRANGE FINAL PROTEIN FORMED SH S N-terminus Extein 1 Intein Extein 2 Cys/ser Cys Cys S Cys Extein 1 Cys Extein 2 SH Intein His/Asn FIGURE 9.18 Mechanism of Intein Removal The intervening intein segment splices itself out in two stages. The intein has a Cys or Ser at the boundary with extein 1 and a basic amino acid at its boundary with extein 2. The downstream extein 2 has a Cys residue at the splice junction. Extein 1 is cut loose and attached to the sulfur side chain of the cysteine at the splice junction. This forms a tempo- rary branched intermediate. Next the intein is cut off and discarded and the two exteins are joined to form the fnal protein. tac promoter Chitin-binding domain Chitin Protein Target protein released DNA Target gene Intein CBD TRANSCRIPTION AND TRANSLATION POUR INTO CHITIN COLUMN INDUCE SELF-CLEAVAGE 4C DTT FIGURE 9.19 Intein-Mediated Purifcation System Inteins that can self-cleave at their amino-terminus allow specifc proteins to be purifed and cleaved from a fusion protein in one step. First the fusion protein is purifed by passing over a column made of chitin. Note: The CBD or chitin-binding domain recognizes and binds the chitin mol- ecule. The column is incubated with DTT in a refrigerator. The intein cleaves itself at its amino terminus and releases the target protein.

slide 318:

CHaPTER 9 315 polylinker or cloning site for the target gene followed by the DNA segment encoding the intein followed by the CBD. The fusion protein is expressed and cell lysates containing the hybrid protein are isolated. When the lysate passes through a chitin column the hybrid protein binds to the column via the CBD and the remaining cellular proteins elute. The column is then incubated at 4°C with dithiothreitol DTT a thiol reagent that activates the intein to cleave its N-terminus. Thus the target protein is released from the column leaving behind the intein and CBD regions. Protein tags are either short peptides or entire foreign proteins fused genetically to a protein of interest. Protein tags provide a means to isolate the protein of interest from the rest of the cellular proteins. In the case of GFP-tagged proteins the cellular location can also be identifed using microscopy. PHa GE DISPL Y LIBRaRY SCREENING A phage display library is a collection of bacteriophage particles that have segments of foreign proteins protruding from their surface Fig. 9.20. Normal bacteriophages have outer coats made of proteins. The outer coat of M13 bacteriophage has about 2500 copies of the major coat protein gene VIII protein and about fve copies of the minor coat protein gene III protein. Gene III protein is located at the end of the cylindrical bacteriophage particle with its N-terminus facing outward. One popular phage display system fuses the foreign sequence to the N-terminus of the gene III protein. The result is that M13 now has about fve copies of the foreign protein segment on its surface at TRANSCRIPTION AND TRANSLATION INSERT INTO PHAGE COAT DNA PHAGE PARTICLE Coat protein gene Display sequence Coat protein III Coat protein Displayed peptide Display peptide FIGURE 9.20 Principle of Phage Display To display a peptide on the surface of a bacteriophage researchers must fuse the DNA sequence encoding the peptide to the gene for a bacteriophage coat protein. In this example the chosen coat protein is encoded by gene III of phage M13. The N-terminal portion of gene III protein is on the outside of the phage particle whereas the C-terminus is inside. Therefore the peptide must be fused in frame at the N-terminus to be displayed on the outside of the phage.

slide 319:

Proteomics 316 one end of the bacteriophage particle as shown in Figure 9.20. M13 is especially conve- nient because it does not lyse the bacteria it infects the viral particles are simply secreted through the bacterial cell envelope. For the phage to display a foreign protein the gene for that protein must be fused to gene III to produce a hybrid protein. The gene of interest must be in frame with gene III for proper expression. The M13 genome can accommodate extra DNA because the fla - mentous bacteriophage particles are simply made longer if a larger genome needs to be packaged. The M13 genome containing the gene of interest is transformed into E. coli where the bacteriophage DNA directs the synthesis of new particles containing the protein of interest in the coat. Bacteriophage can display artifcial peptides as well as segments of natural proteins. Random oligonucleotides generated by PCR can be cloned and fused to gene III of M13. Each random oligonucleotide will encode a different peptide. These constructs are transformed into E. coli and each transformant produces bacteriophage with different foreign peptides fused to gene III protein. The collection of displayed protein segments or peptides can be screened by biopanning to fnd those with a particular property perhaps a specifc protein-binding domain or a specifc peptide structure that binds an antibody Fig. 9.21. In biopanning the library of phages displaying the foreign peptides is incubated with a target protein such as an antibody bound to a bead or membrane. All the recombinant phages that bind to that antibody adhere to the solid support and the others are washed away. All the bound phages are released and incubated with E. coli to replicate the phage. The procedure is usually repeated in order to enrich for peptides that bind specifcally because some non - specifc binding could occur. Once a phage with a useful peptide is identifed the clone is sequenced to determine the structure of the peptide. Full-length protein libraries can also be studied using phage display but they pose some extra problems. When M13 is used coding sequences for full-length proteins must be cloned Binding protein Binding protein LIBRARY OF PHAGE WITH DISPLAYED PEPTIDES BIND PHAGE TO BINDING PROTEIN WASH AWAY UNBOUND PHAGE RELEASE SELECTED PHAGE AB CD FIGURE 9.21 Biopanning of Phage Display Biopanning is used to isolate peptides that bind to a specifc target protein which is usually attached to a solid support such as a membrane or bead. The phage display library A is attached to the binding protein B. Those phages that bind to the target protein will be retained C but the others are washed away. The phage that does recognize the binding protein can be released isolated and purifed.

slide 320:

CHaPTER 9 317 in frame with both a signal sequence at their N-terminal end and gene III at their C-terminal end. The signal sequence is required to direct the hybrid protein to the viral coat. Ensuring the correct reading frame is reasonable for one or two genes but for an entire library there is too much room for error. Besides the possible creation of a stop codon at either fusion junction would prevent the hybrid protein from being expressed. The solution is to use T7 bacteriophage for libraries of full-length proteins. T7 has a coat protein whose C-terminal tail is exposed to the outside. To be expressed on the bacteriophage surface the protein library must therefore be fused to the C-terminus of the coat protein. This requires only one fusion junction. Furthermore even if library sequences are cloned out of frame or contain stop codons the coat protein itself is unaffected and still assembled although the attached library proteins will be defective. Being able to express full-length proteins for biopanning is very useful to proteomics researchers. To identify a protein that binds to a particular cell surface receptor a researcher can biopan a phage display library for receptor binding. Another example is fnding RNA binding proteins. Here RNA is anchored to a solid support and the phage display library is incubated with this RNA “bait.” The phages that stick to the RNA bait are isolated and enriched by repeating the procedure. Each isolated clone can then be sequenced to identify which proteins bind RNA. Phage display is a technique in which foreign proteins or peptides are fused to a coat protein on the surface of the phage. The phage then displays them for analysis. Biopanning identifes binding partners for a protein of interest. The protein of interest is incubated with a phage display library. When a phage binds to the protein of interest it is isolated and the sequence for the displayed peptide is determined. PROTEIN INTERa CTIONS: THE YEAT TWO-HYBRID SYSTEM In addition to protein function and expression proteomics attempts to fnd relevant protein interactions. For those who like “-omics” terminology the total of all protein– protein interactions is called the protein interactome. For example hormones usually bind to receptors that pass on the signal. Often this involves a protein relay in which one protein activates another which in turn activates yet another. To understand hor- mone function researchers must identify all the proteins in the signal cascade. Phage display is one way to identify interactions but the displayed proteins may not fold correctly or specifc cofactors may be missing when mammalian proteins are expressed in bacteria. An approach to overcoming these diffculties is to use the yeast two-hybrid system in which the binding of two proteins activates a reporter gene. The binding of a transcrip- tional activator protein GAL4 to the promoter region of the reporter gene activates tran- scription which results in synthesis of the reporter protein. GAL4 contains two domains needed to turn on the reporter gene. The DNA-binding domain DBD recognizes the promoter element and positions the second domain the activation domain AD next to RNA polymerase where it activates transcription. These two domains can be expressed as separate proteins but cannot activate the reporter gene unless they are brought together Fig. 9.22. In the two-hybrid system the two domains are each fused to different proteins by creat- ing hybrid genes. The bait is the DBD genetically fused to the protein of interest and the prey is the AD fused to proteins that are being screened for interaction with the bait. When the bait and prey bind the DBD and AD activate transcription of the reporter gene.

slide 321:

Proteomics 318 Two vectors are needed to perform two-hybrid analysis Fig. 9.23. The frst vector has a mul - tiple cloning site for the bait protein at the 3′-end of the GAL4-DBD therefore the fusion protein has the Bait protein as its C-terminal domain. The second vector has a multiple cloning site for the Prey protein at the 5′-end of the GAL4-AD and the fusion protein has the Prey protein as its N-terminal domain. Both plasmids must be expressed in the same yeast cell. If the bait and prey proteins interact the reporter gene is turned on. The reporter genes must be engineered to be under control of the GAL4 recognition sequence. Common reporter genes include HIS3 which encodes an enzyme in the histidine pathway and whose expression allows yeast cells to grow on media lacking histidine or URA3 which allows growth without uracil. These reporter systems require yeast host cells DNA Transcription activated Reporter gene Recognition site DNA- binding domain Activation domain RNA polymerase Transcription factor DNA No transcription Bait does not bind prey Bait binds prey Reporter gene DNA Transcription activated Reporter gene RNA polymerase RNA polymerase Prey Prey Bait Bait A B C FIGURE 9.22 Principle of Two-Hybrid Analysis A Yeast transcription factors have two domains: the DBD purple recognizes regulatory sites on DNA and the AD red activates RNA polymerase to start transcription of the reporter gene. For two-hybrid analysis two proteins Bait and Prey are fused sepa- rately to the DBD and AD domains of the transcription factor. The Bait protein is joined to the DBD and the Prey protein to the AD. B The Bait protein and Prey protein do not interact and the reporter gene is not turned on. C The Bait binds the Prey bringing the transcription factor halves together. The complex activates RNA polymerase and the reporter gene is expressed.

slide 322:

CHaPTER 9 319 that are defective in the corresponding genes. However they do allow direct selection of positive isolates. Another reporter used is lacZ from E. coli which encodes β-galactosidase. Both bacteria and yeast that express lacZ turn blue when grown with X-Gal. β-galactosidase cleaves X-Gal releasing a blue product. The reporter genes are usually integrated into the yeast genome rather than being carried on a separate vector. Promoter Bait vector Prey vector High-affinity protein-protein interaction Select for growth in His - medium Identify blue colonies on X-gal Yeast origin Bacterial origin DNA HIS3 amp r DBD Bait TRP1 AD Prey amp r LEU2 GAL4 DBD Bait GAL4 AD Prey RNA polymerase DNA RNA polymerase lacZ FIGURE 9.23 Vectors for Two-Hybrid Analysis Two different vectors are necessary for two-hybrid analysis. The bait vector has coding regions for the DBD and for the Bait protein. The Prey vector has coding regions for the AD and for the Prey protein. These two different constructs are expressed in the same yeast cell. If the Bait and Prey interact the reporter gene is expressed. Two reporter systems are shown here. The His3 gene allows yeast to grow on histidine-free media. The lacZ gene encodes β-galactosidase which cleaves X-gal forming a blue color.

slide 323:

Proteomics 320 The yeast two-hybrid system has been used to identify all the protein interactions in the yeast proteome by mass screening with mating Fig. 9.24. Yeast has about 6000 dif- ferent proteins and each of them has been cloned into both vectors via PCR. This way each protein can be used as both bait and prey. All the bait vectors were transformed into haploid yeast of one mating type and the prey vectors into the other mating type. Haploid cells carrying bait are fused to haploid cells with prey and the resulting diploid cells are screened for reporter gene activity. This analysis thus examined 6000 × 6000 combinations for protein interaction. BAIT LIBRARY IN α MATING TYPE YEAST MATE BY REPLICA PLATING ONTO SAME MEDIUM TEST BY COLOR ON X-gal MEDIUM TEST BY GROWTH ON MEDIUM LACKING HISTIDINE DIPLOIDS FORM THAT EACH CONTAIN ONE BAIT AND ONE PREY PLASMID PREY LIBRARY IN a MATING TYPE YEAST FIGURE 9.24 Two-Hybrid Analysis: Mass Screening by Mating To identify all possible protein interactions using the two-hybrid system haploid α yeast are transformed with the Bait library and haploid a yeast are transformed with the Prey library. When the two yeast types are mated with each other the diploid cells will each contain a single bait fusion protein and a single prey fusion protein. If the two proteins interact they activate the reporter gene which allows the yeast to grow on media lacking histidine yeast His3 gene or turn the cells blue when growing on X-gal medium lacZ from E. coli. This process can be done for all 6000 predicted yeast proteins using automated techniques.

slide 324:

CHaPTER 9 321 The yeast two-hybrid system has signifcant limitations. Because transcription factors must be in the nucleus to work the target proteins must also function in the nucleus. For some proteins entering the nucleus may cause the protein to misfold. For other proteins the nucleus does not contain the proper cofactors and the protein may be unstable. Large proteins may not be expressed well or may be toxic to the yeast leading to false negative results. When protein interactions are checked by other methods it is clear that the two-hybrid system misses many interactions and generates a signifcant number of false positive interactions. Performance can be improved by using multiple sets of two-hybrid vectors that use other transcription factors than GAL4 or systems that fuse the Bait protein to the N-terminus of the DNA-binding domain and the Prey protein to the C-terminus of the activation domain. Yeast two-hybrid analysis fnds proteins that bind together. Cellular proteins are linked to the AD of GAL4 or the DBD of GAL4. When two cellular proteins bring the AD and DBD together the completed GAL4 binds to the reporter gene promoter. Reporter gene products allow the yeast to grow on histidine-free media or turn blue on media that contain X-Gal. PROTEIN INTERa CTIONS BY CO-IMMUNOPRECIPIT a TION Co-immunoprecipitation is a technique to examine protein interactions in the cyto- plasm rather than the nucleus Fig. 9.25. Here the target protein is expressed in cul- tured mammalian cells which are lysed to release the cytoplasmic contents. The target protein is precipitated from the lysate with an antibody. Other proteins that are associ- ated with the target protein remain associated with the antibody-protein complex. If no antibody exists for the target protein a small tag such as FLAG or His6 see earlier discussion can be engineered onto the protein. Protein A from Staphylococcus in turn binds the antibodies. The protein A is attached to beads before it is added to the cell lysate. This generates very large target protein/antibody/ protein A/bead complexes which are gently isolated from the rest of the cellular proteins by centrifuga- tion. The complexes are separated by size with SDS-PAGE. The gel should show the target protein the antibody protein A and other bands that represent interacting proteins. These can be identifed with protein sequencing and/or mass spectrometry. Co-immunoprecipitation is often used to confrm the results from yeast two-hybrid analysis espe- cially for mammalian proteins. Many two-hybrid experiments reveal novel uncharacterized proteins. To confrm the interac - tion researchers tag both proteins for easy isolation. Adding a tag is FIGURE 9.25 Co- immunoprecipitation To determine whether protein P and Q interact within the cytoplasm researcher fuse each protein to a different tag for easy isolation. Each fusion protein is expressed in mammalian cells which are then lysed to release the cell proteins. The cells must be lysed gently to avoid disrupt- ing the protein interactions. The fusion proteins are iso- lated using the tag sequence. Each tagged protein and all its associated proteins are isolated independently. For example on the left Flag- tagged protein P is isolated with an antibody to the Flag sequence and on the right His6-tagged protein Q is iso- lated with an antibody to the His6 sequence. The protein complexes are separated by SDS-PAGE. This example shows the two tagged pro- teins P and Q interacting. PROTEIN P AND Q INTERACT BY BINDING PROTEIN P HAS A FLAG TAG PROTEIN Q HAS A HIS TAG IMMUNOPRECIPITATE WITH ANTIBODY TO FLAG IMMUNOPRECIPITATE WITH ANTIBODY TO HIS SEPARATE PROTEINS WITH SDS-PAGE Presence of both proteins confirms interaction

slide 325:

Proteomics 322 much easier than generating a specifc antibody to each new protein. For example pro - tein P is tagged with FLAG while protein Q is tagged with His6. Each vector construct is transformed into a mammalian cell line and each protein is expressed. The cell lysate is harvested and divided into two samples. The protein P complexes are isolated from the frst sample whereas the protein Q complexes are isolated from the second sample. Each of the complexes is isolated with protein A-coated beads. The different proteins from each sample are separated by SDS-PAGE. If the two proteins interact both proteins will be found in both samples. Co-immunoprecipitation determines whether two proteins bind together in the cytoplasm. PROTEIN aRRa YS Protein-detecting arrays may be divided into those that use antibodies and those based on using tags. In the ELISA assay see Chapter 6 antibodies to specifc proteins are attached to a solid support such as a microtiter plate or glass slide. The protein sample is then added and if the target protein is present it binds its complementary antibody. Bound proteins are detected by adding a labeled second antibody. Another antibody-based protein-detecting array is the antigen capture immunoassay Fig. 9.26. Much like the ELISA this method uses antibodies to various proteins bound to a solid surface. The experimental protein sample is isolated and labeled with a fuorescent dye. If two conditions are being compared proteins from sample 1 can be labeled with Cy3 which fuoresces green and proteins from sample 2 can be labeled with Cy5 which fuo - resces red. The samples are added to the antibody array and complementary proteins bind to their cognate antibodies. If both sample 1 and 2 have identical proteins that bind the same antibody the spot will fuoresce yellow. If sample 1 has a protein that is missing in sample 2 then the spot will be green. Conversely if sample 2 has a protein missing from sample 1 the spot will be red. This method is good for comparing protein expression profles for two different conditions. In the third method the direct immunoassay or reverse-phase array the proteins of the experimental sample are bound to the solid support Fig. 9.27. The proteins are then probed with a specifc labeled antibody. Both presence and amount of protein can be moni - tored. For example proteins from different patients with prostate cancer can be isolated and spotted onto glass slides. Each sample can be examined for specifc protein markers or the presence of different cancer proteins. The levels of certain proteins may be related to the stages of prostate cancer. This immunoassay helps researchers to decipher these correlations. The main problem with immunology-based arrays is the antibody. Many antibodies cross- react with other cellular proteins which generates false positives. In addition binding proteins to solid supports may not be truly repre- sentative of intracellular condi- tions. The proteins are not purifed or separated therefore samples contain very diverse proteins. Some proteins will bind faster and FIGURE 9.26 Ideal Results for Antigen Capture immuno- assay Various different antibodies are fused to different regions of a solid surface. Each spot has a different antibody. If the antibody recognizes only proteins labeled with Cy5 the region will fuoresce red left. If the antibody recognizes only proteins labeled with Cy3 the region will fuoresce green middle. If the anti- body recognizes proteins in both conditions the spot will fuoresce yellow right. Red SIDE VIEW TOP VIEW Green Yellow

slide 326:

CHaPTER 9 323 better than others. Also proteins of low abundance may not compete for binding sites. Another problem is that many proteins are found in complexes so other proteins in the complex may mask the antibody-binding site. Rather than using antibodies protein interaction arrays use a fusion tag to bind the protein to a solid support Fig. 9.28. The use of protein arrays to deter- mine protein interactions and protein function is a natural extension of yeast two-hybrid assays and co-immunoprecipitation. Protein arrays can assess thou- sands of proteins at one time making this a powerful technique for studying the proteome. Protein arrays were frst used systematically in yeast because its proteome contains only about 6000 proteins. Libraries have been constructed in which each protein is fused to a His6 or GST tag. The proteins are then attached by the tags to a solid support such as a glass slide coated with nickel or glutathione. To build the array researchers isolate each protein individually and spot it onto the glass slide. The tagged proteins bind to the slide and other cellular components are washed away. Each spot has only one unique tagged protein. Once the array is assembled the proteins can be assessed for a particular function. In the laboratory of Michael Snyder at Yale University the yeast proteome has been screened for proteins that bind calmodulin a small Ca 2+ binding protein or phospholipids Fig. 9.29. Both calmodulin and phospholipid were tagged with biotin and incubated with a slide coated with each of the yeast proteins bound to the slide via His6-nickel interactions. The biotin-labeled calmodulin or phospholipid was then visualized by incubating the slide with Cy3-labeled streptavidin. Streptavidin binds very strongly to biotin. The results identifed 39 different calmodulin-binding proteins only six had been identifed previously and 150 different phospholipid-binding proteins. Protein microarrays ready for screening are now com- mercially available for yeast and humans. The ProtoArray® Human Protein Microarray available from Invitrogen includes about 9000 human proteins about 40 of the human proteome. Sample 1 Glass slide Protein sample Sample 2 Sample 3 FIGURE 9.27 Direct Immunoassay The direct immunoassay binds the protein samples to different regions on a solid support. Each spot has a different protein sample. Next an antibody labeled with a detection system is added. The antibody binds only to its target protein. In this example the antibody recognizes only a protein in patient samples 1 and 2. FIGURE 9.28 Pro- tein Interaction Microarray— Principle To assemble a protein micro- array researchers incubate a library of His6-tagged proteins with a nickel-coated glass slide. The proteins adhere to the slide wherever nickel ions are present. Library of tagged proteins Glass slide with attached Nickel ions His6 tag HHHHHH HHHHHH HHHHHH HHHHHH HHHHHH HHHHHH PROTEINS BIND TO Ni VIA HIS6 TAGS + Ni Ni Ni Ni Ni Ni Ni Ni Ni Ni Ni Ni HHHHHH HHHHHH HHHHHH HHHHHH HHHHH H HHHHH H

slide 327:

Proteomics 324 His6 tag Phospholipid Proteins to be screened PHOSPHOLIPID TAGGED WITH BIOTIN AVIDIN WITH CY3 FLUORESCENT LABEL Slide Ni Ni Ni Ni Ni Ni Slide Biotin Slide Fluorescent dye Avidin Ni Ni Ni Ni Ni Ni Ni Ni Ni Ni Ni Ni HHHHHH HHHHHH HHHHHH HHHHHH HHHHHH HHHHHH HHHHHH HHHHHH HHHHHH HHHHHH HHHHHH HHHHHH HHHHHH HHHHHH HHHHHH HHHHHH HHHHHH HHHHHH FIGURE 9.29 Screening Protein Arrays Using Biotin/Streptavidin Protein microarrays can be screened to fnd proteins that bind to phospholipids. The protein microarray is incubated with phos - pholipid bound to biotin. Then the bound phospholipid is visualized by adding streptavidin conjugated to a fuorescent dye. Spots that fuoresce represent specifc proteins that bind phospholipids.

slide 328:

CHaPTER 9 325 Proteins in an array can be screened for binding other proteins or small molecules such as enzyme substrates or signal molecules. It is thus possible to screen a proteome for enzymes that use a particular substrate provided that the substrate is available and can be fuores - cently labeled without preventing activity. Recently protein arrays have been screened for binding of small hairpin RNA molecules Fig. 9.30. Sense RNA Label with fluorescent dyes Bind to human protein microarray Promoter Promoter Sequence to transcribe Antisense RNA TRANSCRIPTION Cy3 Cy5 Overlay FIGURE 9.30 RNA Binding to Human Protein Microarray A human microarray was screened for binding of RNA molecules. Both sense and antisense strands were used for each RNA molecule examined. The two strands were labeled with fuorescent dyes of different colors red for sense and green for antisense. If a protein bound both strands then a yellow spot was seen. Modifed after Siprashvili Z et al. 2012. Identifcation of proteins binding coding and non-coding human RNAs using protein microarrays. BMC Genomics 13 633. Various arrays are used to screen proteins. They may be divided into immunology-based approaches and tag-based approaches. The immunoassays depend on binding of antibodies to their target proteins. Either the antibody or target protein is labeled with a fuorescent dye. In tag-based arrays proteins are attached to a support via tags such as His6 or GST. They are then screened with a variety of target mol- ecules that carry fuorescent dyes. METaBOLOMICS As methods for identifying small molecules become more accurate and sensitive metabolic research has become more global. The metabolome consists of all the small molecules and metabolic intermediates within a system such as a cell or whole organism at one particular time. Understanding the metabolome is complex because small metabolites affect many other components of a cell. Metabolites fow in a complex network and form many different

slide 329:

Proteomics 326 transient complexes. The network of metabolites may be compared to city streets. At each corner a decision on which route to take must be made and such decisions continue until the fnal destination is reached. Each metabolite molecule follows a specifc pathway often with several potential branches and at each junction a decision is made before moving on to the next step. Characterizing the metabolome under particular conditions is known as metabolic fngerprinting . Several techniques that involve separating and/or identifying many small metabolites simul- taneously have made metabolomics possible. Nuclear magnetic resonance NMR of extracts from cells grown with 13 C-glucose has allowed simultaneous measurement of multiple meta- bolic intermediates. Metabolites have also been identifed by thin layer chromatography after growth in 14 C-glucose or by HPLC with UV or fuorescence detection. These methods are not very sensitive and some metabolites may not be separated or identifed. Nonetheless NMR is widely used for the following reasons: a it is highly reproducible b it is nondestructive c it is rapid little sample processing and no chemical derivatization are necessary and d it yields detailed structural data. Overall mass spectroscopy offers the best way to analyze whole metabolomes. The tech- nique can identify many different metabolites even novel ones and is extremely sensitive. Mass spectroscopy can determine the exact molecular formula for a compound so every metabolite can be identifed. Even if isomers exist their fragmentation patterns will be differ - ent although the molecular formula is the same. The use of mass spectroscopy is often combined with other separation methods to simplify analysis. Different types of chromatography are used to separate the complex cellular extract into different fractions which can then be analyzed by mass spectroscopy. These methods include liquid chromatography gas chromatography and capillary electrophoresis. The dis- advantages of these mass spectroscopy-based methods are that they are slow and separation by the associated chromatography techniques is often dif- fcult to reproduce accurately. In practice a good approach is an initial survey by NMR followed by more detailed analysis via mass spectroscopy. Metabolomics is especially valuable in studying plants because metabolites affect the pigments scents favors and nutrient content. These traits are all commercially important and using mass spectroscopy to analyze these metabolites will aid in developing better-tasting and fresher produce. For example in strawberries 7000 metabolites can be identifed by mass spectroscopy Fig. 9.31. Compar- ing white and red strawberries has identifed which of the metabolite peaks in the mass spectrum corresponds to the intermediates in pigment synthesis. As with other “-omics” recent technical advances now allow metabolomics to be performed on single cells or even organelles. Both plant cells Fig. 9.32 and animal cells have been analyzed by using this approach. Differ- ent types of cells within higher organisms carry out very diverse roles and also respond very differently to stimuli. Consequently they show highly varied metabolomes. Single cell analysis is valuable in revealing cellular diver- sity and allowing comparison of different cell types. For example when pollen lands on the stigma of a female plant it germinates to produce a tube. Use of mass spec- troscopy coupled with gas chromatography can detect over FIGURE 9.31 Metab- olome Analysis of Strawberry Nontargeted metabolic analysis in strawberry. A Four consecutive stages of strawberry fruit development G green W white T turn- ing R red were subjected to metabolic analysis using Fourier transform mass spec- trometry FTMS. Similar fruit samples were used earlier to perform gene expression analysis using cDNA microar- rays. B An example of high-resolution 100000 separation of very close mass peaks in data obtained from the analysis of green and red stages of fruit devel- opment. Peaks marked with an X have the same mass whereas peak Y is different by a mere 3 ppm. Courtesy of Phenomenome Discoveries Inc. Saskatoon Canada. A B

slide 330:

CHaPTER 9 327 • Guard cells • Trichomes glandular/ nonglandular • Epidermal cells pavement basal • Mesophyll cells • Secretory cavities • Petal cells • Secretory cavities • Epidermal parenchyma • Pollen • Columella • Root hairs • Stele • Pericycle cells • Cortex • Endodermis • Trichomes glandular/ nonglandular • Phloem parenchyma • Xylem fibers • Sclereids • Sieve tubes phloem exudates Single cell Spectra MS Metabolite METABOLOMICS Subcellular LEAF FLOWER AND FRUIT ROOT PLANT STEM • LASER MICRODISSECTION • MICROMANIPULATION • MECHANICAL ISOLATION • PROTOPLASTING • CELL SORTING FIGURE 9.32 Plant Single-Cell Metabolomics Single cells can be isolated by laser microdissection micromanipulation mechanical isolation protoplasting and cell sorting. It is also possible to isolate some subcellular components such as chloroplasts or vacuoles. Cells from different tissues such as stems roots leaves and fowers will give very different metabolomes. Analysis is carried out by multiple techniques especially NMR and mass spectroscopy. Modifed after Misra BB Assmann SM Chen S 2014. Plant single-cell and single-cell-type metabolomics. Trends Plant Sci 19 637–646.

slide 331:

Proteomics 328 250 metabolites in single pollen cells. As the pollen tube grows energy-generating pathways are activated as shown by increases in intermediates belonging to glycolysis and the citric acid cycle. The metabolome consists of the entire complement of small molecules and metabolites within a system such as a cell or whole organism. Because the metabolome is dynamic a metabolic fngerprint is the entire complement of small metabolites at one particular point in time. Summary Proteomics is the study of the protein complement for an organism. Because proteins change in response to many conditions the proteome is dynamic adapting to new challenges and environments. The term translatome refers to the proteome at one particular point in time. SDS-PAGE separates proteins by size. These proteins are trapped inside a gel but can be transferred to nitrocellulose for Western blot analysis. In the Western blot a protein of interest can be visualized by adding labeled antibodies. The Western blot can be used to determine the relative abundance of the protein of interest. Another useful tool to separate proteins is HPLC. This method keeps the proteins in a liquid and the column materials separate the proteins of interest from the mixture. These columns vary greatly making HPLC a key method in separating protein mixtures. Many proteins are diffcult to isolate because of their chemistry or size. Proteases are enzymes that break the peptide bond. There is a huge variety of proteases some with specifc binding/cutting sites and others with nonspecifc effects. Using proteases to digest a protein makes them more manageable for research. Mass spectroscopy breaks molecules into their ions and records their mass to charge ratio which is calculated by the time it takes for the ions to travel through the fight tube. Until the advent of MALDI and ESI proteins were too large and complex for mass spectroscopy. MALDI and ESI are two new methods for preparing the protein for ionization. ESI is par- ticularly useful because the proteins are ionized from a liquid solvent. This method can be linked with HPLC as in MuDPIT. Here a two-dimensional liquid chromatography column separates complex peptide mixtures by size charge and hydrophobicity before analyzing them by mass spectroscopy. Once the protein of interest has been identifed by HPLC mass spectroscopy or SDS-PAGE the gene sequence can be found. Once cloned the gene then can be expressed into protein under the control of a regulated promoter. To identify the protein of interest from the remaining cellular proteins researchers use genetic fusions to either short tags or full proteins. A function for the protein of interest is often hard to determine and being able to express the protein and isolate the protein via these tags is key to studying its function. Sometimes fnding new protein-binding partners can further the understanding of the protein’s function. In yeast two-hybrid analysis the protein of interest is genetically fused to one half of a transcription factor GAL4. Potential binding partners are genetically fused to the other half of GAL4. When the protein of interest binds to a different protein then the transcription factor turns on the reporter gene. This changes the yeast physiology marking the cell. The gene for the binding partner can be isolated from the yeast sequenced and pos- sibly identifed. In a related experiment co-immunoprecipitation fnds binding partners for a protein of interest. Just as microarrays are used for mass screening of nucleic acids protein arrays can be used for global protein analysis. They consist of arrays with many different proteins applied as spots

slide 332:

CHaPTER 9 329 on a solid support. One type of protein array contains antibodies for screening. The other kind represents the proteome and consists of many different cellular proteins attached by tags. Such arrays may be screened for binding of a variety of labeled molecules to the proteins. These include enzyme substrates other proteins and RNA. Metabolomics is a newer feld of research that looks at changes in metabolites. This method is useful in understanding how cells change biochemical pathways in response to the environ- ment or during development. Metabolites are analyzed using mass spectroscopy techniques. 1. Why is SDS used in the electrophoresis of proteins a. SDS coats the protein with a negative charge so that the sample can run through the gel. b. SDS is a specifc protease that digests large proteins in the sample. c. SDS allows the coomassie blue stain to bind to the proteins in the gel so that they may be visualized. d. SDS adds more molecular weight to each sample so that the proteins do not run off the end of the gel. e. none of the above 2. What is an issue with using 2D-PAGE a. Hydrophobic proteins may not run as expected due to the hydrophobic surfaces. b. Highly expressed proteins may cover up proteins that are not as abundant but running in the gel nearby. c. Some proteins may not migrate through polyacrylamide and therefore not be represented on the gel. d. Rare cellular proteins are hard to visualize with Coomassie blue protein stain. e. All of the above are issues with 2D-PAGE. 3. Which one of the following is not used during Western blotting a. secondary antibody with a conjugated detection system b. agarose gel electrophoresis c. non-fat dry milk d. primary antibody that recognizes the protein e. nitr ocellulose membrane 4. Which of the following statements about HPLC is not correct a. There are two phases to HPLC: mobile and stationary. b. Separation identifcation and purifcation of proteins are just a few of the applications for HPLC. c. The downside to HPLC is that it is not very adaptable due to the availability of stationary phase material. d. Adjusting the experimental conditions changing the particle size of the sta- tionary phase and controlling temperature are factors that affect resolution. e. All of the above statements are true. 5. Which of the following is not an example of protease activity a. Some proteases cleave the phosphodiester bond between nucleic acid residues. End-of-Chapter Questions Continued

slide 333:

Proteomics 330 b. Some proteases cleave within a protein sequence and other proteases snip off residues from either end. c. Some proteases contain serine cysteine threonine or aspartic acid residues within their active sites. d. Proteases hydrolyze the peptide bond between amino acid residues. e. Metalloproteases contain metal ion cofactors within their active site. 6. Which of the following statement about mass spectroscopy is incorrect a. MS ionizes the sample and then measures the time it takes for the ions to reach the detector. b. SELDI-MS has great potential for analyzing protein profles of body fuids and may in the future be used to identify diseases before symptoms appear. c. Glycosylation and phosphorylation of proteins can be identifed using ESI or MALDI. d. ESI is able to handle much larger ions than MALDI. e. The time-of-fight for ions is directly correlated with the mass of the ion in mass spectroscopy. 7. Which of the following statements is not true about sequencing peptides with mass spectroscopy a. The entire protein can be sequenced all at once using mass spectroscopy. b. Some purifed proteins must be digested with proteases to eliminate undesirable characteristics such as hydrophobicity and solubility. c. Two rounds of mass spectroscopy are used to determine the sequence. d. In order to determine the sequence a pure sample of protein is obtained through 2D-PAGE or HPLC. e. A database of protein ion spectra is used to compare the peaks of the unknown peptide to determine the sequence. 8. Which of the following is used to quantify proteins with mass spectroscopy a. 2 H b. 33 P c. 35 S d. 32 P e. 125 I 9. Why are protein tags useful a. Protein tags are exactly the same thing as reporter fusions and perform similar functions. b. Tags allow the protein to be isolated and purifed from other cellular proteins. c. Tags allow the protein to be quantitated. d. Protein tags enable the protein to which they are fused to perform their function more readily. e. None of the above. 10. How is biopanning useful to proteomics research a. To express large amounts of protein on the cell surface of yeast. b. To screen expression libraries in E. coli. c. To alter the cell membrane structures of cells by expressing foreign proteins on the cell surface.

slide 334:

CHaPTER 9 331 d. To isolate specifc peptides that bind to a specifc target protein. e. all of the above 11. Which of the following is needed to perform yeast two-hybrid assays a. Two vectors are needed to express the bait and prey proteins. b. A reporter gene under the control of the GAL4 recognition sequence. c. The DBD of a transcription factor genetically fused to the protein of interest also called the bait. d. The AD domain of a transcription factor genetically fused to proteins that are being screened for interactions with bait. e. All of the above are needed. 12. For what is co-immunoprecipitation used a. to determine if a protein-of-interest binds to a specifc DNA sequence b. to examine protein–protein interactions in the nucleus instead of in the cytoplasm c. to examine protein–protein interactions in the cytoplasm instead of the nucleus d. to allow a protein to be expressed in mammalian cell culture e. none of the above 13. What is a problem associated with immuno-based arrays a. Proteins that are bound to solid supports may not be representative of intracellular conditions. b. The antibody may cross-react with other cellular proteins producing a false positive. c. Low concentrations of some proteins may not be able to compete for active sites compared to those that are in abundance. d. Proteins that are often found in complexes may have the antibody binding site masked by the other proteins in the complex. e. All of the above are problems associated with immuno-based arrays. 14. Which of the following has been extensively studied using protein interaction arrays a. proteins in yeast that bind calmodulin or phospholipids b. proteins in yeast that bind to glutathione-S-transferase c. proteins that are able to bind to biotin and streptavidin d. proteins that are able to bind to various cofactors present in the sample e. none of the above 15. Which of the following methods is the best way to analyze a metabolome a. high-pressure liquid chromatography b. mass spectr oscopy c. nuclear magnetic resonance d. thin layer chromatography e. ELISA 16. Which of the following would be the best method to identify thousands of different proteins simultaneously a. MALDI-TOF b. traditional MS c. SILAC d. MuDPIT e. SELDI Continued

slide 335:

Proteomics 332 17. Proteins in a protein arrays can be screened for binding to all of the following except _______________. a. small hairprin RNAs b. enzyme substrate c. pr oteins d. signaling molecules e. DNAs 18. All of the following statements concerning metabolomics is true except _______________. a. metabolomics can help produce fresher produce b. metabolomics is the identifcation of small molecules and metabolic intermediates within a system c. NMR and MS methods in metabolomics are not very sensitive methods for studying the metabolome d. prior to MS cell extract is separated by chromatography e. in higher organisms different cell types have different metabolomes Further Reading Bell M. R. Engleka M. J. Malik A. Strickler J. E. 2013. To fuse or not to fuse: what is your purpose Protein Science 221 1 1466–1477. Gubbens J. Zhu H. Girard G. Song L. Florea B. I. Aston P. et al. 2014. Natural product proteomining a quantitative proteomics platform allows rapid discovery of biosynthetic gene clusters for different classes of natural products. Chemical Biology 21 707–718. Hamdi A. Colas P. 2012. Yeast two-hybrid methods and their applications in drug discovery. Trends in Pharmacological Sciences 332 109–1 18. Hong Y. S. 201 1. NMR-based metabolomics in wine science. Magnetic Resonance in Chemistry: MRC 49Suppl 1 S13–21. Jorge I. Burillo E. Mesa R. Baila-Rueda L. Moreno M. Trevisan-Herraz M. et al. 2014. The human HDL proteome displays high inter-individual variability and is altered dynamically in response to angioplasty-induced atheroma plaque rupture. Journal of Proteomics 106C 61–73. Medzihradszky K. F. 2005. Peptide sequence analysis. Methods in Enzymology 402 209–244. Oliveira B. M. Coorssen J. R. Martins-de-Souza D. 2014. 2DE: the phoenix of proteomics. Journal of Proteomics 104 140–150. Misra B. B. Assmann S. M. Chen S. 2014. Plant single-cell and single-cell-type metabolomics. Trends in Plant Science 1910 637–646. Paget T. Haroune N. Bagchi S. Jarroll E. 2013. Metabolomics and protozoan parasites. Acta Parasitologica 582 127–131. Rakonjac J. Bennett N. J. Spagnuolo J. Gagic D. Russel M. 201 1. Filamentous bacteriophage: biology phage display and nanotechnology applications. Current Issues in Molecular Biology 132 51–76. Rotilio D. Della Corte A. D’Imperio M. Coletta W. Marcone S. Silvestri C. et al. 2012. Proteomics: bases for protein complexity understanding. Thrombosis Research 129 257–262. Salzano A. M. Novi G. Arioli S. Corona S. Mora D. Scaloni A. 2013. Mono-dimensional blue native-PAGE and bi-dimensional blue native/urea-PAGE or/SDS-PAGE combined with nLC-ESI-LIT-MS/MS unveil membrane protein heteromeric and homomeric complexes in Streptococcus thermophilus. Journal of Proteomics 94 240–261. Siprashvili Z. Webster D. E. Kretz M. Johnston D. Rinn J. L. Chang H. Y. Khavari P. A. 2012. Identifcation of proteins binding coding and non-coding human RNAs using protein microarrays. BMC Genomics 13 633. Stynen B. Tournu H. Tavernier J. Van Dijck P. 2012. Diversity in genetic in vivo methods for protein–protein interaction studies: from the yeast two-hybrid system to the mammalian split-luciferase system. Microbiology and Molecular Biology Reviews: MMBR 762 331–382.

slide 336:

CHaPTER 9 333 Sun H. Chen G. Y. Yao S. Q. 2013. Recent advances in microarray technologies for proteomics. Chemical Biology 205 685–699. Toniolo L. D’Amato A. Saccenti R. Gulotta D. Righetti P. G. 2012. The Silk Road Marco Polo a bible and its proteome: a detective story. Journal of Proteomics 751 1 3365–3373. Valiente M. Obenauf A. C. Jin X. Chen Q. Zhang X. H. -F. Lee D. J. et al. 2014. Serpins promote cancer cell survival and vascular co-option in brain metastasis. Cell 156 1002–1016. Washburn M. P. Wolters D. Yates J. R. 2001. Large-scale analysis of the yeast proteome by multidimensional protein identifcation technology. Nature Biotechnology 19 242–247. Zenobi R. 2013. Single-cell metabolomics: analytical and biological perspectives. Science 3426163 1243259.

slide 337:

CHAPTER 335 Biotechnology Copyright © 2016 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/B978-0-12-385015-7.00010-7 10 Recombinant Proteins Proteins and Recombinant DNA T echnology Expression of Eukaryotic Proteins in Bacteria Insulin and Diabetes Cloning and Genetic Engineering of Insulin T ranslation Expression Vectors Codon Usage Efects Avoiding T oxic Efects of Protein Overproduction Inclusion Bodies and Protein Refolding Increasing Protein Stability Improving Protein Secretion Protein Fusion Expression Vectors Protein Glycosylation Expression of Proteins by Eukaryotic Cells Expression of Proteins by Y east Expression of Proteins by Insect Cells Expression of Proteins by Mammalian Cells Expression of Multiple Subunits in Mammalian Cells Comparing Expression Systems

slide 338:

Recombinant Proteins 336 PROTEINS AND RECOMBINANT DNA TECHNOLOGY Proteomics has opened the door to identify more and more clinically relevant proteins. Once identifed these proteins need to be studied in detail including expression of the protein in model organisms by using recombinant DNA techniques see Chapter 3. Some proteins will become therapeutic agents and large amounts of purifed protein will be required. Once a gene has been cloned the protein it encodes can be produced in large amounts with relative ease. Smaller nonprotein molecules which seem simpler to an organic chemist would need half a dozen proteins enzymes working in series to synthesize them. Thus paradoxically proteins despite being macromolecules have been more susceptible to genetic engineering than simpler products such as antibiotics. Pathway engineering to produce small organic molecules will be discussed in Chapter 13. Today over 100 recombinant proteins are in use as therapeutic agents. Of these nearly half are monoclonal antibodies discussed in Chapter 6. In terms of sales 2010 data antibodies accounted for about 50 billion insulin and analogs for about 16 billion blood clotting factors and erythropoietin 16 billion and the rest 25 billion. Here we consider those proteins that are not antibodies and that may be subdivided by function as follows: a Replacements for proteins that are missing or defective b Increasing the amount of proteins already present c Inhibition of infectious agents such as bacteria or viruses d Carriers for other molecules mostly still in development These categories are not mutually exclusive for example use of interferons to combat virus infection falls into both classes b and c. Most of those used therapeutically are human proteins although those from animals are also found. Some examples of recombinant proteins in clinical use are given in Table 10.1. Proteins Produced by Recombinant Technology Protein Function Erythropoietin Promoting red blood cell formation in the treatment of anemia Factor VIII Helping blood clots form in hemophiliacs Filgrastim and sargramostim blood cell– stimulating bone marrow factors Boosting white blood cell counts after radiation therapy or transplantation Insulin Treating diabetes Insulin-like growth factor 1 IGF1 Treating certain growth problems Interferon alpha Treating hepatitis B and C genital warts certain leukemias and other cancers Interferon beta Treating multiple sclerosis Interferon gamma Treating chronic granulomatous disease Interleukin-2 Killing tumor cells Somatotropin Treating growth hormone defciency Tissue plasminogen activator t-PA Dissolving blood clots to prevent heart attacks and lessen their severity T able 10.1

slide 339:

CHAPTER 10 337 Expressing a gene for large-scale production brings extra problems compared to a labo- ratory setting. The more copies of a gene that a cell contains the higher the level of the gene product. Thus cloning a gene onto a high-copy-number plasmid will usually give higher yields of a gene product. However high-copy plasmids are often unstable especially in the dense cultures used in industrial situations. Although the presence of antibiotic resistance genes on most plasmids provides a method to maintain the plasmid in culture antibiotics are expensive especially on an industrial scale. One solution to prevent plasmid loss is to integrate the foreign gene into the chromosome of the host cell. This however decreases the copy number of the cloned gene to one. Attempts have been made to insert multiple copies of cloned genes in tandem arrays. However the pres- ence of multiple copies results in instability due to recombination between homologous sequences of DNA. EXPRESSION OF EUKARYOTIC PROTEINS IN BACTERIA In Chapter 3 we discussed the basics of cloning genes onto a variety of vectors. Obviously bacterial genes will usually be expressed when carried on cloning vectors in bacterial host cells provided that they are next to a suitable bacterial promoter. Special plasmids known as expression vectors are often used to enhance gene expression. As noted in Chapter 3 these vectors provide a strong promoter to drive expression of the cloned gene. Expression vectors also contain genes for anti- biotic resistance to allow selection of the vector and therefore the recombinant protein. In addition they must have an origin of replication appropriate to the host. The expression of eukaryotic proteins is more problem- atic. Although eukaryotic cells can be used to express eukaryotic proteins bacteria are simpler to grow and manipulate genetically. Therefore it is often desirable to express eukaryotic proteins in bacteria Fig. 10.1. Because eukaryotic promoters do not work in bacterial cells it is necessary to provide a bacterial promoter. In addition bacteria cannot process introns therefore it is standard procedure to clone the cDNA version of eukary- otic genes which lacks the introns and consists solely of uninterrupted coding sequence. In fact the cDNA version of eukaryotic genes is generally used even for expression in eukaryotic cells not only to avoid possible processing problems but also because the amount of cloned DNA is much smaller and consequently easier to handle. Even if a cloned eukaryotic gene is transcribed at a high level production of the encoded protein in bacteria may Recombinant proteins are clinically relevant proteins produced in large scale. The gene for the protein of interest is cloned into a vector and expressed into protein in a model organism. FIGURE 10.1 Expression of Eukaryotic Gene in Bacteria—Overview Eukaryotic genes must be adapted for expression in bacteria. First the mRNA from the gene of interest is converted to cDNA to provide uninterrupted coding DNA. The cDNA is cloned between a bacterial promoter and a bacterial terminator so the bacterial transcription and translation machin- ery express the coding sequence. Eukaryotic DNA DNA Exon Exon Intron Intron Exon Promoter TRANSCRIPTION SPLICING REVERSE TRANSCRIPTASE INSERT INTO PLASMID TRANSFORM INTO BACTERIAL CELL AND EXPRESS Primary transcript RNA Messenger RNA Plasmid Strong terminator Multiple cloning site MCS Bacterial promoter cDNA coding sequence Exon Exon Intron Intron Exon Transcription

slide 340:

Recombinant Proteins 338 be limited at the stage of protein synthesis. Different mRNA molecules are translated with differing effciencies. Several factors are involved: a The ribosome-binding site may interact poorly or not at all with the ribosome. b mRNA may be unstable or have strong secondary structure. c Codons common in the cloned gene may be rare in the bacterial host and have a limited supply of the corresponding tRNAs. In addition to standard expression vectors more sophisticated vectors exist to optimize these other aspects of protein production. In this chapter we discuss the use of translation vectors and fusion vectors to increase the synthesis of a recombinant protein from a cloned gene. INSULIN AND DIABETES Insulin was the frst genetically engineered hormone to be made commercially available for human use. Before cloned human insulin was available people with diabe- tes gave themselves injections of insulin extracted from the pancreas of animals such as cows or pigs. Although this approach worked well on the whole occasional allergic reactions occurred usually to low-level contaminants in the extracts. Today genuine human insulin Humulin marketed by Eli Lilly Inc. made by recombinant bacteria is available. Diabetes mellitus is actually a group of related diseases in which the level of glucose in the blood and/or urine is abnormally high. There is considerable variation in the detailed symptoms and multiple genes are involved. Many cases of diabetes are due to the absence of insulin a small protein hormone made by the pancreas which controls the level of sugar in the blood. Lack of insulin results in high blood sugar and causes a variety of complications. In patients with insulin-dependent diabetes mellitus IDDM injections of insulin keep blood sugar levels down to near normal. Other defects affect the insulin receptor and so they do not respond to insulin treatment. Insulin is a protein made of two separate polypeptide chains: the A- and B-chains Fig. 10.2. Disulfde bonds hold the two chains together. Although the fnal protein has two polypeptide chains insulin is actually encoded by a single gene. The original gene product preproinsulin is a single polypeptide chain which contains both the A- and B-chains together with the C- or connecting peptide and a signal sequence. Preproinsulin itself is not a hormone but must frst be processed to give insulin. The signal sequence at the N-terminal end is required for secretion and is then removed by signal peptidase. This leaves proinsulin. Removal of the C-peptide requires endopeptidases that cut within the polypeptide chain. They recognize pairs of basic amino acids at the junctions of the C-peptide with the A- and B-chains. Finally the terminal Arg and Lys residues are trimmed off by carboxypeptidase H. Expression vectors are used to make eukaryotic proteins in bacteria. The vector has the ribosome-binding site terminator sequences and a strong regulated promoter. The eukaryotic gene is a cDNA copy of the mRNA. Insulin is a hormone produced as preproinsulin. The signal sequence is removed by signal peptidase the C-peptide is removed by endopeptidase and the fnal arginine and lysine are trimmed by carboxypepti - dase H. After processing insulin has two chains A and B linked by disulfde bonds. Some diabetics do not produce any insulin and require the insulin as a shot. Other diabetics do not have the insulin receptor so insulin cannot act on the target cells.

slide 341:

CHAPTER 10 339 CLONING AND GENETIC ENGINEERING OF INSULIN As noted above insulin was the frst hormone to be cloned and made by recombinant bacteria for clinical use. Its production is rather unusual compared to many recom- binant proteins because of com- plications due to processing. If the insulin gene is cloned and directly expressed in bacteria preproinsulin is made. Because bacteria lack the mammalian processing enzymes the preproinsulin cannot be con- verted into insulin see Fig. 10.2. Another problem is that disulfde bonds do not readily form in the cytoplasm of E. coli. However if proteins are secreted into the peri- plasm some disulfde bond forma - tion occurs. Indeed proteins such as human growth hormone with relatively simple arrangements of disulfdes may be correctly folded. This is due to the Dsb proteins of E. coli which form and reshuffe disulfdes. Unfortunately proteins with more complex multiple disul- fdes especially those with more than one polypeptide chain such as insulin often form incorrect disulfde linkages. Overexpression of DsbC protein which reshuffes disulfdes often improves the yield of such proteins. This approach is still experimental. One approach to expressing cloned insulin would be to purify the preproinsulin and treat it with enzymes that convert it into insulin. This means the processing enzymes would have to be manufactured as well. Clearly this process is overly complex. The solution chosen was to make two artifcial mini-genes one for the insulin A-chain and the other for the insulin B-chain Fig. 10.3. Two pieces of DNA encoding the two insulin chains were synthesized chemically. The two DNA molecules were inserted into plasmids that were put into two separate bacterial hosts. Thus the two chains of insulin were produced separately by two bacterial cultures. They were then mixed and treated chemically to generate the disulfde bonds linking the chains together. This approach gives insulin that works well. Nonetheless natural insulin even natural human insulin is not perfect and tends to form hexamers. This clumping covers up the surfaces by which the insulin molecule binds to the insulin receptor thus preventing most of the insulin from activating its target cells Fig. 10.4. In vivo insulin is secreted from the pancreas as a monomer and is distributed rapidly by the bloodstream before it gets a chance to clump. However when insulin is injected a high concentration of insulin FIGURE 10.2 Processing of Insulin The gene for insulin produces one transcript that is translated into a single protein called preproinsulin. The signal sequence of preproinsulin is removed after secretion. Next an endopeptidase removes the C-peptide. This leaves the A- and B-chains held together by disulfde bonds. Last carboxypeptidase H trims terminal Arg and Lys residues leaving active insulin. Signal sequence B 30 amino acids 35 amino acids 21 amino acids CA Pair of basic amino acids Disulfide bond PREPROINSULIN Secretion and removal of signal sequence Endopeptidase Carboxypeptidase H INSULIN B-chain Connecting peptide A-chain

slide 342:

Recombinant Proteins 340 Promoter Promoter β-galactosidase β-galactosidase Active insulin Gene for A-chain Fused protein A-chains Plasmid Plasmid β-galactosidase Gene for B-chain TRANSFORM PLASMIDS INTO E. COLI SPLIT WITH CYANOGEN BROMIDE PURIFY A AND B CHAINS MIX A AND B AND OXIDIZE TO FORM DISULFIDE BONDS GROW BACTERIA BREAK OPEN CELLS AND PURIFY FUSED PROTEIN B-chains FIGURE 10.3 Cloning of Insulin as Two Mini-Genes The genes for the A- and B-chains of insulin were cloned on two separate plasmids. Both mini-genes were fused to β-galactosidase because this protein is easy to purify. The plasmids were transformed into bacteria and expressed in separate cultures. The bacteria from each culture were harvested and the fusion proteins purifed. The A- and B-chains were cleaved from β-galactosidase by cyanogen bromide and then mixed under oxidizing conditions to form the disulfde bonds thus making human insulin.

slide 343:

CHAPTER 10 341 is present in the syringe and clumping occurs. After injection it takes a while for the hexamers to dissoci- ate and it may take several hours for the patient’s blood glucose to drop to normal levels. Insulin was genetically engineered to prevent clumping. The DNA sequence of the insulin gene was altered to change the amino acid sequence of the resulting protein. A proline in the B-chain that is located at the surface where the insulin molecules touch each other when forming the hexamer was replaced with aspartic acid whose side chain carries a negative charge. So when two modifed insulin molecules approach each other they are mutually repelled by their negative charges and no longer clump Fig. 10.5. The altered insulin causes a faster drop in blood sugar than native insulin. The ProB28Asp insulin NovoLog was the frst fast-acting insulin and is marketed by the Danish phar - maceutical company Novo. Other modifcations to gener - ate fast-acting variants of insulin have been introduced by other pharmaceutical companies. Fast-acting insulin is better for Type 1 diabetes which results from destruction of the insulin producing cells of the pancreas. However the converse modifcation slow- acting insulin is preferred for Type 2 diabetes which is caused by obesity. The obesity epidemic has resulted in a relative increase in Type 2 diabetes. Nowadays about 10 of cases of diabetes are Type 1 and 90 are Type 2. The major slow-acting variant in use today Lantus sold by Sanof-Aventis has two extra Arg residues added to the end of the B-chain plus an AspA21Gly replacement in the A-chain. The increased positive charges result in precipitation upon injection. The Arg–Arg extension is nibbled off by exopeptidases and the insulin slowly redissolves over 16–24 hours. Surface that binds to insulin receptor is hidden in hexamer + X 3 INSULIN MONOMER A + B DIMER HEXAMER PRO Uncharged Hexamers formed Hexamers not formed Replace Pro with Asp PRO PRO PRO ASP ASP NORMAL INSULIN NEW IMPROVED INSULIN Negative charges repel each other FIGURE 10.4 Insulin Forms Hexamers High concentrations of insulin cause the monomers to clump into hexamers. The proteins stick to one another by their receptor binding sites. FIGURE 10.5 Engineered Fast-Acting Insulin Natural insulin has a sticky patch around a proline resi- due which causes two insulin molecules to dimerize and eventually form a hexamer. Using genetic engineering the proline was replaced with a negatively charged aspartic acid residue. The negative charges repel each other and prevent hexamer formation. Insulin used for treating diabetes is now produced as a recombinant protein in bacteria. Recombinant insulin is expressed as two mini-genes rather than expressing preproinsulin. Changing the proline to aspartic acid prevents recombinant insulin from clumping. TRANSLTION EXPRESSION VECTORS As discussed in Chapter 2 bacterial ribosomes bind mRNA by recognizing the ribosome- binding site RBS also known as the Shine–Dalgarno sequence. The RBS base pairs with the sequence AUUCCUCC on the 16S rRNA of the small subunit of the ribosome. The closer the RBS is to the consensus sequence i.e. UAAGGAGG the stronger the association.

slide 344:

Recombinant Proteins 342 Generally this leads to more effcient initia - tion of translation. In addition for optimal translation the RBS must be located at the correct distance from the start codon AUG. Expression vectors are designed to optimize gene expression at the level of transcription see Chapter 3. However it is also possible to design translational expression vectors to maximize the initiation of translation. These vectors possess a consensus RBS plus an ATG start codon located an optimum distance 8 bp downstream of the RBS. The cloned gene is inserted into a cloning site that overlaps the start codon. The restriction enzyme NcoI is very convenient because its recognition site C/CATG