EternaBot is a secondary structure algorithm created by players of Eterna project.

Eterna players have designed and experimentally tested over 700 sequences. Based the experimental results, they have proposed a set of rules for robust RNA design.

EternaBot is built to design a sequence based on those rules. It compiles each design rule as a scoring function. The bot then tries to create a sequence that maximizes the combination of the scoring functions.

## Player-designed rules used in Eternabot 2.0

Rules were selected with Least Angle Regression. Weights were determined by linear regression. Description shows the original design rule statement by participants. The scoring functions show the pseudocode of the scoring function coded from the corresponding design rule. In the scoring functions, the "Number [Number]" notation represents an originally proposed parameter and an optimized parameter respectively. Optimization was done using the downhill simplex algorithm provided by scipy.optomize.fmin to minimize the average squared error between the predicted and actual structure mapping for the training set).

##### Title: A Basic Test
###### Player Description

Let's try out the Strategy Market feature with some simple criteria...

50% of pairs are UA
Free energy = –1.5 * number of pairs [e.g. 48 kcal if there are 32 pairs]
Melting point between 77 and 97°C

###### Pseudocode

$$\begin{array}{l} score\, =\, 100\\ score\, =\, score\, -\, abs(\frac{(\text{number of UA pairs})}{(\text{total number of pairs})}\, –\, 0.50 [0.42]) * 100.00 [93.04]\\ score\, =\, score\, -\, abs(–1.50 [–1.87] * (\text{total number of pairs})\, –\, (\text{free energy})) * 1.00 [1.15]\\ score\, =\, score\, –\, max (\\ \qquad 77.00 [63.60]\, –\, (\text{melting point}),\\ \qquad (\text{melting point})\, –\, 97.00 [102.00],\\ \qquad 0\\ ) * 1.00 [0.94]\\ \end{array}$$

Weight: 0.092

Original Proposal
##### Title: Modified Berex Test
###### Player Description

Melting Point between 97 and 107
Free Energy between –30 and –60
G bases of 22%
U bases of 13%
C bases of 20%

###### Pseudocode

$$\begin{array}{l} score\, =\, 100\\ score\, =\, score\, -\, abs((\text{number of Gs})/(\text{sequence length})\, -\, 0.22 [0.15])*100.00 [116.23]\\ score\, =\, score\, -\, abs((\text{number of Us})/(\text{sequence length})\, -\, 0.13 [0.07])*100.00 [129.23]\\ score\, =\, score\, -\, abs((\text{number of Cs})/(\text{sequence length})\, -\, 0.20 [0.19])*100.00 [117.62]\\ length\_weight\, =\, \frac{2.5}{\sqrt{2\Pi}}e^{\frac{-((\text{sequence length})\, -\, 100)^2}{5000}}\\ \text{if} (\text{free energy}) < –60.00 [–68.34]\\ \qquad score\, =\, score\, -\, abs((\text{free energy})\, -\, –60.00 [–68.34])*1.00 [1.12]*length\_weight\\ \text{if} (\text{free energy}) > –30.00 [–30.20]\\ \qquad score\, =\, score\, -\, abs((\text{free energy})\, -\, –30.00 [–30.20])*1.00 [1.12]*length\_weight\\ \text{if} (\text{melting point}) < 33.4\\ \qquad score\, =\, score\, -\, abs((melting point)\, -\, 97.00 [35.47])*1.00 [1.36]*length\_weight\\ \text{if} (\text{melting point}) > 107.00 [133.66]\\ \qquad score\, =\, score\, -\, abs((melting point)\, -\, 107.00 [133.66])*1.00 [1.36]*length\_weight\\ \end{array}$$

Weight: 0.36

Original Proposal
##### Title: Direction of GC-pairs in multiloops + neckarea
###### Player Description

I make a wish for a strategy that says:

All GC-pairs in the in multiloopjunctions, have to turn in same direction. (Red nucleotide to the right and green nucleotide to the left.) Exception: the GC-pair connecting multiloop and neck,are allowed to turn in both directions, without being penalized.

I would like to give –2 point for each wrong turning GC-pair.

###### Pseudocode

$$\begin{array}{l} score\, =\, 100\\ score\, =\, score\, -\, (\text{number of GC pairs in wrong directions adjacent to multiloops except those in the first stack from 5' end}) * 1.00 [5.71]\\ score\, =\, score\, -\, (\text{number of non-GC pairs adjacent to multiloops}) * 2.00 [6.63] \end{array}$$

Weight: 0.21

Original Proposal
##### Title: Clean plot, stack caps, and safe GC
###### Player Description

plot_score = (number of white cells in the upper triangle of the pairwise probabilities plot) / (total number of cells in the upper triangle of the pairwise probabilities plot)

cap_score = ((number of GC pairs that are at the end of a stack) + 0.5 * (number of GC pairs that are 1 away from the end of a stack)) / (3 * total number of stacks)

gc_penalty = 2 if 80% or more of the design's pairs are GC pairs, 0 otherwise.

A design's total score is: (2 + plot_score + cap_score - gc_penalty) * 25

The +2 and *25 are just to make it come out to between 0 and 100.

###### Pseudocode

$$\begin{array}{l} score\, =\, plot\_score * 1.00 [0.88]\, +\, cap\_score * 1.00 [1.05]\\ score\, =\, score\, -\, (\text{gc_penalty with GC pair threshold on 0.80 [0.83]}) * 2.00 [2.10]\\ score\, =\, (score\, +\, 2) * 25 \end{array}$$

Weight: 0.12

Original Proposal
##### Title: Numbers of yellow nucleotides pr length of string
###### Player Description

I would like to ad a strategy for numbers of yellow nucleotides allowed pr. lengt of string (neckarea excluded):

If a string/arm is this number of nucleotides long, then allow this number of yellow adenine. For each yellow nucleotide below the minimum or above the maximum, penalize with –2.

String length (yellow nucleotides)
3 (1-2) String eg. the bulged cross and the asymmetry
4 (1-2)
5 (1-3)
6 (2-3)
7 (3-4)
8 (2-5)
9 (1-4)

This could be used to rule out some of the cub scouts and a few christmas threes.

###### Pseudocode

$$\begin{array}{l} score\, =\, 100\\ \text{for each stack}\\ \qquad score\, =\, score\, –\, max(\\ \qquad\qquad (\text{number of AU pairs})\, –\, (\text{upper bound on number of AU pairs}),\\ \qquad\qquad (\text{lower bound on number of AU pairs})\, –\, (\text{number of AU pairs}),\\ \qquad\qquad 0\\ \qquad ) * 2.00 [10.50] \end{array}$$

Weight: 0.22

Original Proposal

## Data deposition

All experimental data used to train EternaBot can be found here in Eterna Project.
A spread sheet of design rule scoring on the entire experimental data can be found here in .csv format.