Brad Mc Fall
6 min readJun 27, 2021

--

Notes towards a Hyperbolic Statistical Analysis of virial genomes

The recent addition of the lab-leak hypothesis to culture’s discussion of the natural origin of Sars Cov 2 belies a need to have a means to differentiate artificial from natural selections of genomic information sequentially encoded in DNA. This need shows up the failure of evolutionary theory to provide professional virologists with the concepts necessary for them to create such statistical discriminators. Because discussions of natural origins are theoretically founded on the extirpation of older transmission concepts of phenotypic differences for genetic sequence analysis of stable genotypes populationally there is little consideration societally for an issue first raised by Mueller scientifically in 1911 in that older context.

https://academic.oup.com/ije/article/45/6/1733/3056539

Muller argued that it was impossible to separate the absence or presence of genetic factors from multiple forms of variability of the same factorized gene but because viruses supposedly form quasi-species rather than Mendelian populations the obvious inapplicability of this genetic conceptualization to coding regions of DNA can be directly systematized into the present concern over discovering the origin of Sars Cov 2 despite the paucity of determined judgements on origins.

What is at issue analytically in increasing the efficiency of recovery of an origin for Covid is the division of sequence mutational variation into categories that arose by natural process from those caused by human manipulation. A lab-leak by simple infection of employees would be natural on this basis of a synthesis from sequence information alone obviously. Thus one wants to discriminate when genes or sequence stretches might have be inserted or changed by us versus have changed naturally by mutation and selection. This is the same thing as Muller contemplated when trying to ascertain causality whether due to the presence or absence of a gene vs the alleomorphic variation of any given factor. Bernstein’s used random mating amongst pure classes (of biotypes) that needs modify the presence and absence hypothesis could also be used in this context.

https://journals.sagepub.com/doi/pdf/10.1177/053901847601500417

Can a statistical test be developed that is capable of separating natural and artificial selections of viral sequences by categories of natural transmissions as alleomorphic variations and artificial changes by presence and absence of those variations? Given that there is no reliable universal theory of multi-factor gene combinations , as also questioned by Muller for genes in general, across functionally operative and long stretches, whether phenotypically or genotypically derived it would appear that given the current state of statistically technology that the answer is no. This appears to have been the response of so many virologists when confronting the question of discovering the origin of Sars Cov 2 without having critical access to information suspected of existing outside that of what has become public knowledge. There is however a little developed statistical innovation introduced in 1993 that could help, hyperbolic statistical analysis.

https://link.springer.com/chapter/10.1007/978-1-4613-8324-6_2

Because viral genes are not mendelian genes it is possible to reason about their statistics in ways that are independent of those used by evolutionary theory so far regardless of whether they actually format into quasi-species or not. Now diving in…in any origins investigation, one wants to know, if the genes (genic sequences) have been transmitted from natural “parents” or by human manipulations. A currently antiquated biometric position on genic transmission held (first proposed by Galton) that so-called offspring variance regressed correlationally onto parental deviations but with the rise of statistics under the Fisherian position that independent causes can be multiplied or added together makes it prima facie if not actually impossible to be certain that simple human insertion of genic sequence can be be correlationally and thus causally separated from a natural insertion or recombination for instance. Again, I am assuming there is not global rule towards linking these correlations an causes but in any investigation of origins it is clear that a human manipulation IS independent of a natural spillover even if one doubts one’s ability to recover and discover the origin. Hyperbolic statistical analysis which permits data points to be divided into qualities of intrinsic and extrinsic similarities, not possible in the Eucledian heritage offers a new range from which it is potentially possible to discriminate domains of compatibility conditions for human vs natural change. Donato’s type of analysis can be applied to viral genic sequences when the dependent and independent variables are parents and offspring with the parents alternatively being natural or human and the resulting minimal surface on which the correlational regression is diagrammed is fitted together hyperbolically rather than euclideanlly under different arrays of gene intractability formats of separable phenotypic expressivities. Thus configuring, only intrinsic data contributes to natural parentage while extrinsic data may interpolate when under artificial manipulation and alteration. Next I show how to construct such an analysis on viral gene data under known states of natural and human origin and manipulation, but first a little bit about HAS ( hyperbolic statistical analysis) itself. One might presuppose that virologists themselves have worked with a view of variation ( natural variation within vs transformism of type with contamination) that completely prevents them from finding human changes of their fellow collegues but I shall leave that point of view to the work of historians.

In the case of understanding Galton’s with current genomics , the extension with HSA, is of value even without the knowledge of the actual changes -

“For highly heritable traits such as height, we conclude that in applications in which parental phenotypic information is available (eg, medicine), the Victorian Galton’s method will long stay unsurpassed, in terms of both discriminative accuracy and costs. For less heritable traits, and in situations in which parental information is not available (eg, forensics), genomic methods may provide an alternative, given that the variants determining an essential proportion of the trait’s variation can be identified.”

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2986552/

Here we extend this accuracy and low cost to COvid origins.

In virology this is available as “donor-recipient” regressions.

Using HSA for viral DR regressions with a non-flat manifold surface, the connection of multiple such surfaces can express the intrinsic and extrinsic data points at the corners of cubes bounding the surface limits of deviations during transmission. This is possible because the lines of curvature connecting intrinsic points will be smooth but for extrinsic will be generally discontinuous with respect to two different parent -offspring array Interactivities. Proove of impossibility of continuous motion in the sets of discontinuous spaces can be used to separate natural from artificial changes in DNA sequences. One consequence of using HSA is that due to less than 180 degrees per triangle more space is afforded for the same correlation-regression and thus if the extrinsic data points are space there may be more diversity of intrinsic data points available than otherwise analyzable, and thus if the application is shown to possess real relation to geneic transmission for the phenotypes as well then it may be possible to derive the difference of natural and man made viruses by measures of diversity of mutiatons , b eing limited by human manipulation than what can appear in nature because of long range correlations which are disrupted by human intervention (making independent what was dependent)

Thus this application of HSA uses genotypic parents and offspring not phenotypes. Any subsequent changes to development do not affect combinatins permutations of gene relations .

The compatibility conditions arise because the total surface must have a sum of zero curvature. This will permit a set of gene relations to exist in certain a geometric format over other randomized ones which could exist for human manipulations. Thus b y specifying the format one will be able to develop tests for man made vs natural gene virus combinations and this can be started by a known zero curvature format for three genes independent of each other.

When multiple genes are combined if the curvature still exists for the total surface combined then this is an irreversible transmission path and yields phenotypes , if however the combinations of genes results in a zero curvature that otherwise individually exists then the path is reversible and the phenotype may not exist for those geneotypes. The reversibility of transmission genetics has been confounded with the non-existent of transfer of traits across generations

By developing HSA for genetics we avoid the confounding issue of Muller over the which chemicals are associated with which genes because we the curvatures transmissible genetics superviens over the attractions and repulsions involved in whatever forces per vital units affinities in soma developments as they arise per substance. We use the curvature of the substance and thus need not worry about the force in the structure. Dissecting the reversible cases by isolated irreversible ancestral transmissions would permit access to the forces.

The 2 parent offspring surface for a given gene has no outside direction but once two or more gene regressions are compounded an outside with respect to the total sum of curvatures equally zero does. The unit normal is capable of describing symmetric skew and possibly derives the Pearson (contra Fisher ) approach to the Edgworthian exponential distributions for modal ordinates. The normal mapped to the regression sphere is the direction/angle between the parent and offspring as a curvature.

--

--

Brad Mc Fall

AS in Computer Science and BS in Biology from Cornell University. Interested in evolution and blockchains