On a personal level, this is just more of my long-standing obsession with trying to make rigorous versions of Sewall Wright's fitness landscape cartoons 2/npic.twitter.com/IOzLS8n6La
U tweetove putem weba ili aplikacija drugih proizvođača možete dodati podatke o lokaciji, kao što su grad ili točna lokacija. Povijest lokacija tweetova uvijek možete izbrisati. Saznajte više
On a personal level, this is just more of my long-standing obsession with trying to make rigorous versions of Sewall Wright's fitness landscape cartoons 2/npic.twitter.com/IOzLS8n6La
And as part of the paper we do that, revealing the simple qualitative structure underlying a 20^4=160,000 genotype fitness landscape using data from @wchnicholas 3/npic.twitter.com/tSvK1lkvIE
But the specific problem that we're dealing with in this paper is how to extend fitness or phenotypic measurements for specific sequences to a model for all of sequence space without making assumptions about the precise form of the model (e.g.pair-wise, global epistasis, etc) 4/n
Put simply, our proposed solution is that given known fitnesses for some subset of genotypes, we assign fitnesses to all unobserved genotypes in a manner that minimizes that total amount of epistasis in the landscape as a whole. 5/n
More precisely, we minimize the mean squared epistatic coefficient between pairs of random mutations in random genetic backgrounds, so that mutational effects change as little as possible between adjacent backgrounds 6/npic.twitter.com/lxS4OmWAL3
Why does this work? One way of thinking about this is to suppose that the fitness landscape really is some kind of 2D surface like in Wright's cartoons. 7/n
If we sample some genotypes and want to reconstruct the landscape, one standard way of doing this is using thin plate splines, which estimates the landscape with the least overall curvature compatible with our observations 8/npic.twitter.com/VItppMAs9X
In particular, it chooses the surface that goes through the sampled points (gray dots above) but otherwise minimizes the integral of the sum of the squared second-order partial derivatives 9/n
It turns out that our minimum epistasis interpolation method is precisely the thin plate spline solution, adapted to the discreteness of sequence space. 10/n
In particular, the epistatic coefficient measures how much one mutation influences the effect of another, so it's just the discrete analog of a mixed partial derivative. 11/npic.twitter.com/Irw8rqnUwP
And so minimizing the average squared value of these epistatic coefficients essentially results in a discrete thin plate spline problem (i.e. a boundary value problem for a second-order Laplace operator, whose solution comes down to solving a single matrix equation). 12/n
Moreover, just as a thin-plate spline can take a complicated shape close to the data but becomes less curved the farther you get from the data, our method can be highly epistatic near the data, but becomes nearly additive where data is sparse or absent. 13/n
So that's why it works. But what is the broader significance of this advance? 14/n
The key insight provided by this method has to do with the possibilities that are opened up by incorporating higher-order epistasis, i.e. genetic interactions between 3 or more sites. 15/n
Historically, higher-order epistasis has been viewed as something complicated, something to be worried about @DanWeinreich 16/n
https://doi.org/10.1016/j.gde.2013.10.007 …
But more recently we've seen that even "smooth" single-peaked fitness landscapes can harbor a substantial amount of higher-order epistasis due to global or non-specific epistasis 17/n
and so the presence of higher-order epistasis doesn't necessarily mean that the landscape is complicated as shown by e.g. @EvolBiochemist 18/nhttps://doi.org/10.1534/genetics.116.195214 …
Our method turns this around -- minimum epistasis interpolation is the smoothest possible solution compatible with the data and achieves this smoothness by allowing epistatic interactions of all orders. 19/n
The way to see this is by viewing higher-order interactions as pair-wise interactions whose strength can vary across sequence space 20/n
And one possibility this opens up is interactions whose strength decays as you get farther from the data 21/n
Which allows our method to accommodate strong pair-wise interactions where the data require it, but have these interactions decay away in regions of sequence space where the evidence does not support them. 22/n
This more broadly raises the possibility that fitness landscapes may qualitatively be compatible with relatively simple geometric models that nonetheless contain epistasis of all orders. 23/n
For instance in the paper we show that the GB1 landscape has three different high fitness regions, where the fitness landscape is approximately additive within each region (see inset scatter plots). 24/npic.twitter.com/QEUDWf9aUo
The method we describe also suggests a way forward in terms of dealing with how to make local models within the vastness of sequence space (tagging @granex who helped convince me that this is a key problem) 25/n
Standard regularized regression models with pair-wise interactions have the problem that they work well near the data but give unreasonable predictions far from it. 26/n
Our work suggests that higher-order epistasis can be harnessed to solve this problem, by allowing predictions that behave in a complex way near the data but "fail elegantly" far from it. 27/n
So one of the big lessons for me in doing this work is that the presence of higher-order epistasis can actually make fitness landscapes less complicated, and that we can exploit this to make better models 28/n
or, in other words, "How I learned to stop worrying and love higher-order epistasis" 29/29 FIN
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.