TCCC '23 Sept P4 - DNA Lengths
View as PDFThornhill Computer Club 2023 - September Contest - Problem 4
As part of his final culminating assignment, has been tasked to pick and translate a single amino acid sequence out of a long template DNA strand.
For every DNA template strand, a mRNA strand is created using the complementary base pairs of the template DNA strand. Every base pair of a DNA strand is translated into RNA as follows:
A = U
T = A
C = G
G = C
After converting the DNA’s base pairs to mRNA, will have to break the mRNA strand into groups of triplet base pairs called codons which each code for a single amino acid inside of the amino acid sequence beginning with the start codon AUG, and ending with a stop codon with either UAA, UAG, or UGG. The start codon must be followed by an end codon to form a complete amino acid sequence. A new sequence cannot begin until the previous sequence has ended.
However, with his newfound biology knowledge, has instead decided to scout out the longest amino acid sequence to apply his impeccable codon translating skill.
will need his program to find and return the longest amino acid sequence represented by its mRNA codons separated by dashes (-) from the given template DNA strand starting at the start codon and ending before the end codon.
Input Specification
The first and only line of input will contain a string representing a DNA sequence of no greater than 1000 characters.
Output Specification
The first and only line of output should be the longest amino acid sequence translated into mRNA codons separated by dashes (-).
Sample Input 1
GGGAGACTTACTAAGACGACTTAACACTGCGAATTGTGAAGGCAGTGTAGAGCACTTGTCCTTGTATCACGCAGTGTTTATCCCAGGTCACACAACACCG
Sample Output 1
AUG-AUU-CUG-CUG-AAU-UGU-GAC-GCU
Explanation of Sample Output 1
First, we translate the DNA into RNA by converting each base pair into the opposing base pair using the key above.
CCCUCUGAAUGAUUCUGCUGAAUUGUGACGCUUAACACUUCCGUCACAUCUCGUGAACAGGAACAUAGUGCGUCACAAAUAGGGUCCAGUGUGUUGUGGC
Then we look for the first occurrence of a start codon AUG and divide the remaining string into codons until we reach one of the possible end codons, in this case UAA.
AUG-AUU-CUG-CUG-AAU-UGU-GAC-GCU-UAA-CACUUCCGUCACAUCUCGUGAACAGGAACAUAGUGCGUCACAAAUAGGGUCCAGUGUGUUGUGGC
We then remove the remaining sequence including the end codon to obtain a possible amino acid sequence.
AUG-AUU-CUG-CUG-AAU-UGU-GAC-GCU
This process is repeated on the sequence following the above amino acid until no more possible amino acid sequences remaining.
In this case, there is only one possible sequence and is therefore the longest one.
Sample Input 2
TCCATCATTTCCTACTCCATTATTCTTACTTACCTCCTCCACATCCTCCTCCTCCTCCTCCTACTCCGATTTCCATCTACATTATCATTTACTGTACTCCGATTATCATACATTTACTCCTATCCAGGATCCTACCTACTTCTACTACTACCTAATTTCCGATCGATCGTACTCCATCATCCTCCTATTTAAATCTACTACTATTATCCATCATTACATT
Sample Output 2
AUG-AGG-AUA-GGU-CCU-AGG-AUG-GAU-GAA-GAU-GAU-GAU-GGA-UUA-AAG-GCU-AGC
Explanation for Sample Output 2
Translating the sequence gives us
AGGGUAGUAAAGGAUGAGGUAAUAAGAAUGAAUGGAGGAGGUGUAGGAGGAGGAGGAGGAGGAUGAGGCUAAAGGUAGAUGUAAUAGUAAAUGACAUGAGGCUAAUAGUAUGUAAAUGAGGAUAGGUCCUAGGAUGGAUGAAGAUGAUGAUGGAUUAAAGGCUAGCUAGCAUGAGGUAGUAGGAGGAUAAAUUUAGAUGAUGAUAAUAGGUAGUAAUGUAA
Extracting all possible amino acid sequences gives us these possible sequences
AUG-AGG | AUG-AAU-GGA-GGA-GGU-GUA-GGA-GGA-GGA-GGA-GGA-GGA-UGA-GGC | AUG | AUG-ACA-UGA-GGC | AUG | AUG-AGG-AUA-GGU-CCU-AGG-AUG-GAU-GAA-GAU-GAU-GAU-GGA-UUA-AAG-GCU-AGC | AUG-AGG
Out of these possible sequences AUG-AGG-AUA-GGU-CCU-AGG-AUG-GAU-GAA-GAU-GAU-GAU-GGA-UUA-AAG-GCU-AGC is the longest and is, therefore, the solution.
Comments