Hông Vân Lê is using diffeological spaces for information geometry in singular statistical models (including motivated by machine learning), but I do not know if this has to do directly with singular optimization problems as understood by Watanabe. The following are some of the references of her work.

For singular statistical models (including those arising in machine learning) one needs more version of Fisher metric beyond manifolds; one possibility is in the framework of diffeologies,

Hông Vân Lê,

*Diffeological statistical models and diffeological Hausdorff measures*, video yt, slides pdfHông Vân Lê,

*Natural differentiable structures on statistical models and the Fisher metric*, Information Geometry (2022) arXiv:2208.06539 doiHông Vân Lê, Alexey A. Tuzhilin,

*Nonparametric estimations and the diffeological Fisher metric*, In: Barbaresco F., Nielsen F. (eds) Geometric Structures of Statistical Physics, Information Geometry, and Learning, p. 120–138, SPIGL 2020. Springer Proceedings in Mathematics & Statistics**361**, doi arXiv:2011.13418

In this paper, first, we survey the concept of diffeological Fisher metric and its naturality, using functorial language of probability morphisms, and slightly extending Lê’s theory in (Le2020) to include weakly $C^k$-diffeological statistical models. Then we introduce the resulting notions of the diffeological Fisher distance, the diffeological Hausdorff–Jeffrey measure and explain their role in classical and Bayesian nonparametric estimation problems in statistics.

- Hông Vân Lê,
*Diffeological statistical models,the Fisher metric and probabilistic mappings*, Mathematics 2020, 8(2) 167 arXiv:1912.02090

found links for these books:

Sumio Watanabe,

*Algebraic geometry and statistical learning theory*, CRC Press (2009) [doi:10.1017/CBO9780511800474]Sumio Watanabe,

*Mathematical theory of Bayesian statistics*, Cambridge University Press (2018) [ISBN:9780367734817, pdf]

have hyperlinked *deep neural network* and *statistical distribution* and have given this and related entries the context menu “Probability theory”

Added reference to

- Singular Learning Theory seminar, (webpage)

I remember during my machine learning phase thinking that deep neural networks should have more to tell us than support vector machines:

]]>Physicists have known for decades that the macroscopic behavior of the systems we care about is the consequence of critical points in the energy landscape: global behavior is dominated by the local behavior of a small set of singularities. This is true everywhere from statistical physics and condensed matter theory to string theory. Singular learning theory tells us that learning machines are no different: the geometry of singularities is fundamental to the dynamics of learning and generalization. (Hoogland).

A stub to note some references.

]]>