Home | Looking for something? Sign In | New here? Sign Up | Log out

Sunday, January 16, 2011

SEQUENCING DNA IS EASY

Sunday, January 16, 2011





DNA sequencing reactions are just like the PCR reactions for replicating DNA (refer to the previous page DNA Denaturation, Annealing and Replication). The reaction mix includes the template DNA, free nucleotides, an enzyme (usually a variant of Taq polymerase) and a 'primer' - a small piece of single-stranded DNA about 20-30 nt long that can hybridize to one strand of the template DNA.

The reaction is initiated by heating until the two strands of DNA separate, then the primer sticks to its intended location and DNA polymerase starts elongating the primer. If allowed to go to completion, a new strand of DNA would be the result. If we start with a billion identical pieces of template DNA, we'll get a billion new copies of one of its strands.



Dideoxynucleotides: We run the reactions, however, in the presence of a dideoxyribonucleotide. This is just like regular DNA, except it has no 3' hydroxyl group - once it's added to the end of a DNA strand, there's no way to continue elongating it.

Now the key to this is that MOST of the nucleotides are regular ones, and just a fraction of them are dideoxy nucleotides.

Replicating a DNA strand in the presence of dideoxy-T

MOST of the time when a 'T' is required to make the new strand, the enzyme will get a good one and there's no problem. MOST of the time after adding a T, the enzyme will go ahead and add more nucleotides. However, 5% of the time, the enzyme will get a dideoxy-T, and that strand can never again be elongated. It eventually breaks away from the enzyme, a dead end product.

Sooner or later ALL of the copies will get terminated by a T, but each time the enzyme makes a new strand, the place it gets stopped will be random. In millions of starts, there will be strands stopping at every possible T along the way.

ALL of the strands we make started at one exact position. ALL of them end with a T. There are billions of them ... many millions at each possible T position. To find out where all the T's are in our newly synthesized strand, all we have to do is find out the sizes of all the terminated products!




Here's how we find out those fragment sizes.

Gel electrophoresis can be used to separate the fragments by size and measure them. In the cartoon at left, we depict the results of a sequencing reaction run in the presence of dideoxy-Cytidine (ddC).

First, let's add one fact: the dideoxy nucleotides in my lab have been chemically modified to fluoresce under UV light. The dideoxy-C, for example, glows blue. Now put the reaction products onto an 'electrophoresis gel' (you may need to refer to 'Gel Electrophoresis' in the Molecular Biology Glossary), and you'll see something like depicted at left. Smallest fragments are at the bottom, largest at the top. The positions and spacing shows the relative sizes. At the bottom is the smallest fragment that's been terminated by ddC; that's probably the C closest to the end of the primer (which is omitted from the sequence shown). Simply by scanning up the gel, we can see that we skip two, and then there's two more C's in a row. Skip another, and there's yet another C. And so on, all the way up. We can see where all the C's are.


Putting all four deoxynucleotides into the picture:

Well, OK, it's not so easy reading just C's, as you perhaps saw in the last figure. The spacing between the bands isn't all that easy to figure out. Imagine, though, that we ran the reaction with *all four* of the dideoxy nucleotides (A, G, C and T) present, and with *different* fluorescent colors on each. NOW look at the gel we'd get (at left). The sequence of the DNA is rather obvious if you know the color codes ... just read the colors from bottom to top: TGCGTCCA-(etc).

(Forgive me for using black - it shows up better than yellow).



An Automated sequencing gel:

That's exactly what we do to sequence DNA, then - we run DNA replication reactions in a test tube, but in the presence of trace amounts of all four of the dideoxy terminator nucleotides. Electrophoresis is used to separate the resulting fragments by size and we can 'read' the sequence from it, as the colors march past in order.

In a large-scale sequencing lab, we use a machine to run the electrophoresis step and to monitor the different colors as they come out. Since about 2001, these machines - not surprisingly called automated DNA sequencers - have used 'capillary electrophoresis', where the fragments are piped through a tiny glass-fiber capillary during the electrophoresis step, and they come out the far end in size-order. There's an ultraviolet laser built into the machine that shoots through the liquid emerging from the end of the capillaries, checking for pulses of fluorescent colors to emerge. There might be as many as 96 samples moving through as many capillaries ('lanes') in the most common type of sequencer.

At left is a screen shot of a real fragment of sequencing gel (this one from an older model of sequencer, but the concepts are identical). The four colors red, green, blue and yellow each represent one of the four nucleotides.

The actual gel image, if you could get a monitor large enough to see it all at this magnification, would be perhaps 3 or 4 meters long and 30 or 40 cm wide.



A 'Scan' of one gel lane:

We don't even have to 'read' the sequence from the gel - the computer does that for us! Below is an example of what the sequencer's computer shows us for one sample. This is a plot of the colors detected in one 'lane' of a gel (one sample), scanned from smallest fragments to largest. The computer even interprets the colors by printing the nucleotide sequence across the top of the plot. This is just a fragment of the entire file, which would span around 900 or so nucleotides of accurate sequence.

The sequencer also gives the operator a text file containing just the nucleotide sequence, without the color traces.


As you have seen, we can get the sequence of a fragment of DNA as long as 900 or so nucleotides. Great! But what about longer pieces? The human genome is 3 *billion* bases long, arranged on 23 pairs of chromosomes. Our sequencing machine reads just a drop in the bucket compared to what we really need!

To do it, we break the entire genome up into manageable pieces and sequence them. There are two approaches currently in use:

* The Publically-funded Human Genome Project: The National Institutes of Health and the National Science Foundation have funded the creation of 'libraries' of BAC clones. Each BAC carries a large piece of human genomic DNA on the order of 100-300 kb. All of these BACs overlap randomly, so that any one gene is probably on several different overlapping BACs. We can replicate those BACs as many times as necessary, so there's a virtually endless supply of the large human DNA fragment.

In the Publically-funded project, the BACs are subjected to shotgun sequencing (see below) to figure out their sequence. By sequencing all the BAC's, we know enough of the sequence in overlapping segments to reconstruct how the original chromosome sequence looks.

* A Privately-Funded Sequencing Project: Celera Genomics An innovative approach to sequencing the human genome has been pioneered by Celera Genomics. The founders of this company realized that it might be possible to skip the entire step of making libraries of BAC clones. Instead, they blast apart the entire human genome into fragments of 2-10 kb and sequence those. Now the challenge is to assemble those fragments of sequence into the whole genome sequence.

Imagine, for example that you have hundreds of 500-piece puzzles, each being assembled by a team of puzzle experts using puzzle-solving computers. Those puzzles are like BACs - smaller puzzles that make a big genome manageable. Now imagine that Celera throws all those puzzles together into one room and scrambles the pieces. They, however, have scanners that scan all the puzzle pieces and huge computers that figure out where they all go.

It is controversial still as to whether the Celera approach will succeed on a puzzle as large as the human genome. Whether it does or not, they have certainly stirred up the intellectual pot a bit.



Shotgun sequencing: assembly of random sequence fragments
To sequence a BAC, we take millions of copies of it and chop them all up randomly. We then insert those into plasmids and for each one we get, we grow lots of it in bacteria and sequence the insert. If we do this to enough fragments, eventually we'll be able to reconstruct the sequence of the original BAC based on the overlapping fragments we've sequenced!

1 comments:

454 sequencing technology said...

When the customer gives the bulk of the sample to the commercial sequencing company, they often will be informed that some samples can not be given the result, measured not good, or they can not provide a map. How do it happen?

 

Popular Posts