TCCC '23 Sept P4 - DNA Lengths

View as PDF

Submit solution

Points: 5
Time limit: 1.0s
Memory limit: 128M

Author:
Problem types

Thornhill Computer Club 2023 - September Contest - Problem 4

As part of his final culminating assignment, JoePeewee has been tasked to pick and translate a single amino acid sequence out of a long template DNA strand.

For every DNA template strand, a mRNA strand is created using the complementary base pairs of the template DNA strand. Every base pair of a DNA strand is translated into RNA as follows:

A = U
T = A
C = G
G = C

After converting the DNA’s base pairs to mRNA, JoePeewee will have to break the mRNA strand into groups of triplet base pairs called codons which each code for a single amino acid inside of the amino acid sequence beginning with the start codon AUG, and ending with a stop codon with either UAA, UAG, or UGG. The start codon must be followed by an end codon to form a complete amino acid sequence. A new sequence cannot begin until the previous sequence has ended.

However, with his newfound biology knowledge, JoePeewee has instead decided to scout out the longest amino acid sequence to apply his impeccable codon translating skill.
JoePeewee will need his program to find and return the longest amino acid sequence represented by its mRNA codons separated by dashes (-) from the given template DNA strand starting at the start codon and ending before the end codon.

Input Specification

The first and only line of input will contain a string representing a DNA sequence of no greater than 1000 characters.

Output Specification

The first and only line of output should be the longest amino acid sequence translated into mRNA codons separated by dashes (-).

Sample Input 1

GGGAGACTTACTAAGACGACTTAACACTGCGAATTGTGAAGGCAGTGTAGAGCACTTGTCCTTGTATCACGCAGTGTTTATCCCAGGTCACACAACACCG

Sample Output 1

AUG-AUU-CUG-CUG-AAU-UGU-GAC-GCU

Explanation of Sample Output 1

First, we translate the DNA into RNA by converting each base pair into the opposing base pair using the key above.

CCCUCUGAAUGAUUCUGCUGAAUUGUGACGCUUAACACUUCCGUCACAUCUCGUGAACAGGAACAUAGUGCGUCACAAAUAGGGUCCAGUGUGUUGUGGC

Then we look for the first occurrence of a start codon AUG and divide the remaining string into codons until we reach one of the possible end codons, in this case UAA.

AUG-AUU-CUG-CUG-AAU-UGU-GAC-GCU-UAA-CACUUCCGUCACAUCUCGUGAACAGGAACAUAGUGCGUCACAAAUAGGGUCCAGUGUGUUGUGGC

We then remove the remaining sequence including the end codon to obtain a possible amino acid sequence.

AUG-AUU-CUG-CUG-AAU-UGU-GAC-GCU

This process is repeated on the sequence following the above amino acid until no more possible amino acid sequences remaining.
In this case, there is only one possible sequence and is therefore the longest one.

Sample Input 2

TCCATCATTTCCTACTCCATTATTCTTACTTACCTCCTCCACATCCTCCTCCTCCTCCTCCTACTCCGATTTCCATCTACATTATCATTTACTGTACTCCGATTATCATACATTTACTCCTATCCAGGATCCTACCTACTTCTACTACTACCTAATTTCCGATCGATCGTACTCCATCATCCTCCTATTTAAATCTACTACTATTATCCATCATTACATT

Sample Output 2

AUG-AGG-AUA-GGU-CCU-AGG-AUG-GAU-GAA-GAU-GAU-GAU-GGA-UUA-AAG-GCU-AGC

Explanation for Sample Output 2

Translating the sequence gives us

AGGGUAGUAAAGGAUGAGGUAAUAAGAAUGAAUGGAGGAGGUGUAGGAGGAGGAGGAGGAGGAUGAGGCUAAAGGUAGAUGUAAUAGUAAAUGACAUGAGGCUAAUAGUAUGUAAAUGAGGAUAGGUCCUAGGAUGGAUGAAGAUGAUGAUGGAUUAAAGGCUAGCUAGCAUGAGGUAGUAGGAGGAUAAAUUUAGAUGAUGAUAAUAGGUAGUAAUGUAA

Extracting all possible amino acid sequences gives us these possible sequences

AUG-AGG | AUG-AAU-GGA-GGA-GGU-GUA-GGA-GGA-GGA-GGA-GGA-GGA-UGA-GGC | AUG | AUG-ACA-UGA-GGC | AUG | AUG-AGG-AUA-GGU-CCU-AGG-AUG-GAU-GAA-GAU-GAU-GAU-GGA-UUA-AAG-GCU-AGC | AUG-AGG

Out of these possible sequences AUG-AGG-AUA-GGU-CCU-AGG-AUG-GAU-GAA-GAU-GAU-GAU-GGA-UUA-AAG-GCU-AGC is the longest and is, therefore, the solution.


Comments

There are no comments at the moment.