RiboKit : HiTRACE

High-Throughput Robust Analysis for Capillary Electrophoresis

Step 4: Sequence Assignment


Now we’re ready for the sequence assignment. The positions of the various bands are stored in a vector xsel. Let’s initiate it with empty []:

xsel = []; clf;

Only run this line at first time! It creates an empty xsel. If a variable named xsel already exists in the current workspace, its content will be erased. Do not run this line when you are revisiting this step!

clf clears a figure. It makes sure the interactive interface will inherit a full-size figure window.

The goal is to specify the y-coordinates for each sequence position on the CE trace matrix, like the one showed in the overview.

Let’s start the interactive annotation. The core command is annotate_sequence. It brings up an interactive window, with your mouse as a crosshair.

[xsel, seqpos, area_pred] = annotate_sequence(d_align, xsel, sequence, offset, data_types, first_RT_nucleotide, structure);

annotate_sequence Figure empty

Actions via keystrokes or mouse clicks are listed here:

Keys Actions
left click Add a sequence position to xsel at the y-coordinate clicked.
middle click, e Erase the seqeuence position closest to where mouse clicked.
up/down, i/k Move the view up or down.
-/+, j/l Zoom in or out (vertically) of the view.
1/2 Adjust the display contrast darker or lighter.
x, y Run auto-assign algorithm.
r Reset all sequence positions, i.e. clear out xsel and start over.
q Save xsel and quit interactive session.

More guides on how to operate:

annotate_sequence() supports a mode that disables the “close window” button thus preventing from accidental close. To use it, call the command with the argument JUST_PLOT = 2. See help annotate_sequence for more details.

In JUST_PLOT = 2 mode, if error occurs and you want to close the disabled window, use command close_window().


Get started with some help!

Admittedly, this step is the most time-consuming (annoying) part of the entire HiTRACE pipeline. For starters, you may want to lend some help from the auto-assign algorithm that is aimed to do this for you. Although it does not get the job done perfectly, it is a good starting point that gonna save you a lot of time and trouble.

You only need to click 2 bands for it!

These 2 coordinates will be interpreted as the first and last band corresponding to your sequence. This helps the auto-assign algorithm in narrowing down the search space. Now hit x.

annotate_sequence Figure 2 bands

Once it finishes the calculation, you will see sequence positions filled out for you:

annotate_sequence Figure Auto-assign

The horizontal lines are colored based on sequence identity (A, U, C, G). The red circles are where reactive bands are predicted to appear based on the combination of sequence, structure, and data_types. For example, an A residue that is unpaired is predicted to be reactive in a DMS experiment; while any residue that is base-paired is predicted to be unreactive in a SHAPE experiment. For ddNTPs ladder lanes, the circles correspond to sequence identity as in an old-fasion sequencing gel.

Note that there is a single-register shift between modifier profiles and ddNTPs profiles for the same nucleotide position. This is due to the nature of reverse transcription (RT): 1) RT stops BEFORE incorporation at the covalently modified nucleotide; 2) RT stops AFTER incorportaion of a ddNTP nucleotide.

Please remember that HiTRACE is aware of such single-register shift! It has shifted the numbers by 1 for you, so you only need to seek for a visual match between colors/circles and gel bands. If you read the labels on the figure carefully, you will find they are shifted by 1 from the circles.

Not done, yet …

As mentioned before, the auto-assign result need to be double checked before acceptance. Here are some guidelines on what to look out for:

Depending on where the modification is, RT can produce a cDNA whose length corresponds to the modified sequence position. Besides these S (length of sequence) products, there is one that corresponds to when no modification occurred, which is the full-length extended product. Under single-hit kinetic conditions, the majority of RNA transcripts are not modified. Thus, this bottom band is very strong.

We need to place a xsel band at the full-length extended band for attenuation correction in Step #6. It requires the intensity of this full-length extended band.

Remember, the ddNTPs ladders are only used for assisting you in this step by tracking where the sequence goes. The band intensities of ddNTP ladders are not of interest for later quantitation.

It is common that a region of consecutive G residues causes local compression. These bands may appear closer to each other than average.

Play with the contrast to reveal less obvious “hints” from the gel to help you locate. Since CE is gel-based, the spacing of the top (shorter fragments) portion of the traces is slightly larger.

If you have the GAGUA reference hairpin at both ends (or either end) of your RNA transcript, they serve as good checkpoints for your sequence assignment.

annotate_sequence Figure GAGUA 5' annotate_sequence Figure GAGUA 3'

All 5 residues of GAGUA are reactive to SHAPE; (only) both As are reactive to DMS; and only the U residue is reactive to CMCT.

Example of local shifts

Here are examples showing symptoms of incorrect sequence assignment.

annotate_sequence Missing 1 Band annotate_sequence Having 1 Extra Band

Once finished, the annotate_sequence() command returns the following variables:

Variable Type Description
xsel 1x(S+1) double Band y-coordinates from the sequence assignment. Note that the command takes xsel as an argument, and updates it with your changes and returns it. S is the length of sequence.
seqpos 1x(S+1) int Numbers of sequence positions from 5′ to 3′ end.
area_pred SxN int The predicted band positions, used for red circles drawing.

As an example of 2D data analysis, the commands are the same as 1D for this step. However, there is one difference:

Yes, for 2D datasets, you should aim for 120/120, excluding the bottom full-length extra one. This is because:

In Mutate-and-Map data, we process it into Z_score as pseudo-free energy bonus instead of absolute reactivity. The calculation of Z_score is row by row (nucleotide by nucleotide), with the attenuation cancelled out. Thus, there is no need to include the full-length extended band for quantitation.

annotate_sequence Figure 2D

Because we are using number strings in data_types, the red circles mark where the predicted pertubation would show up based on structure and mutpos. It is obvious that these red circles form a main diagonal marking mutations (bottom left -> top right), as well as cross-diagonals marking helices based on given structure.

Built with Jekyll using a RiboKit Theme . Hosted on GitHub Pages.