High-Throughput Robust Analysis for Capillary Electrophoresis
Now we have the profiles in reasonable aligned from store in the matrix d_align
, we need to assign which bands correspond to which sequence. We need to provide more information on the experiment before proceeding to Step #4.
First we declare the sequence
, whose letters will show up as we do the band annotation:
sequence = 'GGAGACCTCGAGTAGAGGTCAAAAGGGTCGTGACTGGCGAACAGGTGGGAAACCACCGGGGAGCGACCCCGGCATCGATAGCCGCCCGCCTGGGCAAACAACTCGAGTAGAGTTGACAACAAAGAAACAACAACAACAAC';
sequence = strrep(sequence, 'T', 'U');
Note that we repalce ‘T’ to ‘U’ as RNA sequence.
Next, we declare the structure
, represented in dot-bracket notation. It is recommended to use a reference/crystallographic secondary structure, or any hypothesis you are testing. Use empty ''
instead if you don’t have a clue. We do not recommend using untested predictions.
structure = '...((((((.....))))))....(((((((....[[[[....(((((....))))).....)))))))...........(((..]]]]...)))...((((((.....)))))).........................';
Make sure
structure
is the same length assequence
. Use[]
for pseudoknot base-pairs.
Next, we should register any information that we have about features or data that might show up in individual lanes. This requires creating a variable data_types
which is a cell of _string_s, one label for each profile/lane.
In our example 1D data, the first 8 lanes are nomod (regardless of ligand or dilution), followed by 16 of DMS, CMCT, and SHAPE in order. Last, it has 2 lanes of each ddNTPs (A, T, C, G). The following code declares the data_types
with help with built-in function repmat()
.
data_types = [ ...
repmat({'nomod'}, 1, 8), repmat({'DMS'}, 1, 16), repmat({'CMCT'}, 1, 16), repmat({'SHAPE'}, 1, 16), ...
repmat({'ddATP'}, 1, 2), repmat({'ddTTP'}, 1, 2), repmat({'ddCTP'}, 1, 2), repmat({'ddGTP'}, 1, 2), ...
];
Next, we need to play with some numbers. The first one is offset
. This is used for adjusting sequence numbers for two reasons:
Offset for the 5′ flanking sequence.
Follow conventional numbering.
In our example 1D data, we added 24 nucleotides on the 5′ end (GGAGACCTCGAGTAGAGGTCAAAA) before our region of interest (ROI) GGGTCG…. We would like the first G in ROI to be numbered as 1. Thus, the offset
is:
% offset = - length_of_5_flanking_region + N_first_nt_ROI_should_be - 1
offset = -24;
first_RT_nucleotide = length(sequence) - 20 + offset;
Lastly, we need to describe where the reverse transcription primer was bound. HiTRACE accepts a number describing the first nucleotide that was reverse transcribed. In this case, its the nucleotide 20 residues from the end (the primer is 20 nucleotides long, binding to Tail 2 sequence AAAGAAACAACAACAACAAC).
Note that
first_RT_nucleotide
has number that is based onoffset
.
The
offset
variable only matters for display. It converts the ‘natural’ numbering of nucleotides in the physical RNAsequence
to the traditional numbering you are comfortable of (e.g. set first nucleotide of you ROI as 1, or other numbers if your RNA is a small part of a larger sequence).
The
20
infirst_RT_nucleotide
correspond the length of pairing region between Tail 2 and the reverse transcription primer. If you followed the guidelines in designing your RNA construct, you do not need to change this number.
As an example of 2D data analysis, it is similar as 1D for this step except for data_types
.
sequence = 'GGAGACCTCGAGTAGAGGTCAAAAGGGTCGTGACTGGCGAACAGGTGGGAAACCACCGGGGAGCGACCCCGGCATCGATAGCCGCCCGCCTGGGCAAACAACTCGAGTAGAGTTGACAACAAAGAAACAACAACAACAAC';
sequence = strrep(sequence, 'T', 'U');
structure = '...((((((.....))))))....(((((((....[[[[....(((((....))))).....)))))))...........(((..]]]]...)))...((((((.....)))))).........................';
offset = -24;
first_RT_nucleotide = length(sequence) - 20 + offset;
For a Mutate-and-Map experiment, we have comprehensively created all single mutants for an RNA, its often helpful to have marks show up at the mutated positions, since there are typically obvious perturbations there. Thus, the modifier label is less important than the mutational positions (or mutpos
historically, though no longer in use). Instead of repeating 'SHAPE'
72 times for the 2D dataset, we should specify as:
data_types{1} = 'NaN';
for i = 2:72;
data_types{i} = [num2str(i - 1)];
end;
Note that
data_types
take string, not number. So usenum2str
to convert values.
Use
'NaN'
for the first lane in Mutate-and-Map data (which is the wild-type).
Built with Jekyll using a RiboKit Theme . Hosted on GitHub Pages.