Thornhill Computer Club 2023 - September Contest - Problem 4
As part of his final culminating assignment,
has been tasked to pick and translate a single amino acid sequence out of a long template DNA strand.For every DNA template strand, a mRNA strand is created using the complementary base pairs of the template DNA strand. Every base pair of a DNA strand is translated into RNA as follows:
A
= U
T
= A
C
= G
G
= C
After converting the DNA’s base pairs to mRNA, AUG
, and ending with a stop codon with either UAA
, UAG
, or UGG
. The start codon must be followed by an end codon to form a complete amino acid sequence. A new sequence cannot begin until the previous sequence has ended.
However, with his newfound biology knowledge,
will need his program to find and return the longest amino acid sequence represented by its mRNA codons separated by dashes (-
) from the given template DNA strand starting at the start codon and ending before the end codon.
Input Specification
The first and only line of input will contain a string representing a DNA sequence of no greater than 1000 characters.
Output Specification
The first and only line of output should be the longest amino acid sequence translated into mRNA codons separated by dashes (-).
Sample Input 1
GGGAGACTTACTAAGACGACTTAACACTGCGAATTGTGAAGGCAGTGTAGAGCACTTGTCCTTGTATCACGCAGTGTTTATCCCAGGTCACACAACACCG
Sample Output 1
AUG-AUU-CUG-CUG-AAU-UGU-GAC-GCU
Explanation of Sample Output 1
First, we translate the DNA into RNA by converting each base pair into the opposing base pair using the key above.
CCCUCUGAAUGAUUCUGCUGAAUUGUGACGCUUAACACUUCCGUCACAUCUCGUGAACAGGAACAUAGUGCGUCACAAAUAGGGUCCAGUGUGUUGUGGC
Then we look for the first occurrence of a start codon AUG
and divide the remaining string into codons until we reach one of the possible end codons, in this case UAA
.
AUG-AUU-CUG-CUG-AAU-UGU-GAC-GCU-UAA-CACUUCCGUCACAUCUCGUGAACAGGAACAUAGUGCGUCACAAAUAGGGUCCAGUGUGUUGUGGC
We then remove the remaining sequence including the end codon to obtain a possible amino acid sequence.
AUG-AUU-CUG-CUG-AAU-UGU-GAC-GCU
This process is repeated on the sequence following the above amino acid until no more possible amino acid sequences remaining.
In this case, there is only one possible sequence and is therefore the longest one.
Sample Input 2
TCCATCATTTCCTACTCCATTATTCTTACTTACCTCCTCCACATCCTCCTCCTCCTCCTCCTACTCCGATTTCCATCTACATTATCATTTACTGTACTCCGATTATCATACATTTACTCCTATCCAGGATCCTACCTACTTCTACTACTACCTAATTTCCGATCGATCGTACTCCATCATCCTCCTATTTAAATCTACTACTATTATCCATCATTACATT
Sample Output 2
AUG-AGG-AUA-GGU-CCU-AGG-AUG-GAU-GAA-GAU-GAU-GAU-GGA-UUA-AAG-GCU-AGC
Explanation for Sample Output 2
Translating the sequence gives us
AGGGUAGUAAAGGAUGAGGUAAUAAGAAUGAAUGGAGGAGGUGUAGGAGGAGGAGGAGGAGGAUGAGGCUAAAGGUAGAUGUAAUAGUAAAUGACAUGAGGCUAAUAGUAUGUAAAUGAGGAUAGGUCCUAGGAUGGAUGAAGAUGAUGAUGGAUUAAAGGCUAGCUAGCAUGAGGUAGUAGGAGGAUAAAUUUAGAUGAUGAUAAUAGGUAGUAAUGUAA
Extracting all possible amino acid sequences gives us these possible sequences
AUG-AGG | AUG-AAU-GGA-GGA-GGU-GUA-GGA-GGA-GGA-GGA-GGA-GGA-UGA-GGC | AUG | AUG-ACA-UGA-GGC | AUG | AUG-AGG-AUA-GGU-CCU-AGG-AUG-GAU-GAA-GAU-GAU-GAU-GGA-UUA-AAG-GCU-AGC | AUG-AGG
Out of these possible sequences AUG-AGG-AUA-GGU-CCU-AGG-AUG-GAU-GAA-GAU-GAU-GAU-GGA-UUA-AAG-GCU-AGC
is the longest and is, therefore, the solution.
Comments