Want to take part in these discussions? Sign in if you have an account, or apply for one below
Vanilla 1.1.10 is a product of Lussumo. More Information: Documentation, Community Support.
This is a brief description of the construction that started appearing in category-theoretic accounts of deep learning and game theory. It appeared first in Backprop As Functor (https://arxiv.org/abs/1711.10455) in a specialised form, but has slowly been generalised and became a cornerstone of approaches unifying deep learning and game theory (Towards Foundations of categorical Cybernetics, https://arxiv.org/abs/2105.06332), (Categorical Foundations of Gradient-based Learning, https://arxiv.org/abs/2103.01931).
Our group here in Glasgow is using this quite heavily, so since I couldn’t find any related constructions on the nLab I decided to add it. This is also my first submission. I’ve read the “HowTo” page, followed the instructions, and I hope everything looks okay.
There’s quite a few interesting properties of Para, and eventually I hope to add them (most notably, it’s an Para is an oplax colimit of a functor BM -> Cat, where B is the delooping of a monoidal category M).
A notable thing to mention is that I’ve added some animated GIF’s of this construction. Animating categorical concepts is something I’ve been using as a pedagogical tool quite a bit (more here https://www.brunogavranovic.com/posts/2021-03-03-Towards-Categorical-Foundations-Of-Neural-Networks.html) and it seems to be a useful tool getting the idea across with less friction. If it renders well (it seems to) and is okay with you, I might add more to the Optics section, and to the neural networks section (I’m hoping to get some time to add our results there).
Bruno Gavranović
Or maybe lenses aren’t specifically a related concept, but anyway we tend to have a section for these.
Like the graphics, by the way.
Is it clear that you want the “para construction” instead of the standard Kleisli category of a reader comonad?
The latter has
a fixed paramter object $P$;
morphisms from $A$ to $B$ are of the form $P \times A \xrightarrow{\;\; f \;\;} B$;
composition from $A$ to $B$ to $C$ is much as in “para”, but precomposed with the diagonal morphism:
$P \times A \xrightarrow{ \Delta \times id } P \times P \times A \xrightarrow{ id \times f_1 } P \times B \xrightarrow{\;\; f_2 \;\;} C$Do you have considered this? If so, could you say in which examples/applications this is insufficient?
My gut feeling would be that this Reader-comonad Kleisli category is the relevant one in applications, as you will want to be parameterizing over a fixed set of parameters, not regard every imaginable set as a potential set of parameters in a given application. No?
Re. #2:
Hi, Bruno! Finally, we have animated GIFs on the nLab, and I’d like to second David Corfield’s sentiment in #5.
Re. #7:
you will want to be parameterizing over a fixed set of parameters, not regard every imaginable set as a potential set of parameters in a given application.
In the original construction of Fong et al., the underlying category is the category of Euclidean spaces. The dimension of the parameter object can vary, since it is the number of edges connecting two layers of a neural network, and this can vary in most neural network architectures. Hence, the parameter object need not be fixed.
Is it clear that you want the “para construction” instead of the standard Kleisli category of a reader comonad?
It could be, though the comonad would be formed from different objects in the same underlying category. Perhaps one could link the page to Kleisli category, though it would make the page easier to find and reference if it was renamed to, say, “Para”. The bicategory could perhaps be called the bicategory of “parametric morphisms”, in a nod to parametric right adjoints (and to avoid the s/z choice in spelling).
Since differential categories appear in this context, I’d also like to highlight a paper by Wallbridge^{1} that gives a model of differential linear logic using, in part, the Kleisli category of the jet comonad. Wallbridge was motivated by potential computational applications, so this should be interesting for Bruno and others.
James Wallbridge, Jets and differential linear logic, Mathematical Structures in Computer Science, Vol 30, Issue 8, September 2020, pp. 865 - 891. (arXiv:1811.06235, journal) ↩
Renaming the entry to something descriptive like “parametric morphisms” would be good.
Re #7
As rongmin pointed out, indeed the Kleisli category of the reader comonad is not what we want (aside: I’ve seen that called also a CoKleisli category, is there a consensus on how to call these things?) in machine learning/game theory. The parameter space of a neural network/economic agent is the data of the morphism itself, and is not fixed beforehand. Even more, when composing two agents together, we don’t want the composite agent to copy the strategy and use it twice: we want each individual agent to be able to choose their own strategy.
But nonetheless, these constructions are related. If we think of the reader comonad as an oplax functor $1 \to \mathbf{Cat}$, then its Kleisli category is the oplax colimit of that functor. Similarly, we recover $\mathbf{Para}(\mathcal{C})$ as an oplax colimit of the composite oplax functor
$\mathbf{B}\mathcal{C} \xrightarrow{\mathbf{B}\otimes'} \mathbf{B}[\mathcal{C}, \mathcal{C}] \hookrightarrow \mathbf{Cat}$where $\mathbf{B}(\mathcal{C})$ is the delooping of a monoidal category and I’m using $\otimes'$ to mean $\otimes : \mathcal{C} \times \mathcal{C} \to \mathcal{C}$ under the tensor-hom adjunction.
I think this is the idea behind graded comonads, but I’m not entirely sure. I also think special care needs to be taken to get 2-cells in $\mathbf{Para}(\mathcal{C})$ to come up nicely that way (or we can just think of $\mathbf{Para}(\mathcal{C})$ as a category, appropriately quotiented).
Thanks! That would be good to discuss in the entry, to show that there is category theory behind these definitions.
I think this is the idea behind graded comonads…
So we’ve got how to generalize monad to graded monad, say as lax 2-functor from the delooping of $M$ to $Cat$, $B M$ rather than $\mathbf{1}$.
At comonad we don’t give alternative definitions. What is it, oplax 2-functor?
At comonad we don’t give alternative definitions. What is it, oplax 2-functor?
Indeed, it should be an oplax 2-functor.
Re. #10:
I’ve seen that called also a CoKleisli category, is there a consensus on how to call these things?
I think some people would like to call the Kleisli category of a comonad a “co-Kleisli category” to allow them to omit specifying a comonad, whereas others want to fix the name “Kleisli category” and specify whether it is of a monad or a comonad.
If we think of the reader comonad as an oplax functor $1 \to \mathbf{Cat}$, then its Kleisli category is the oplax colimit of that functor. Similarly, we recover $\mathbf{Para}(\mathcal{C})$ as an oplax colimit of the composite oplax functor $\mathbf{B}\mathcal{C} \xrightarrow{\mathbf{B}\otimes'} \mathbf{B}[\mathcal{C}, \mathcal{C}] \hookrightarrow \mathbf{Cat}$
Very interesting.
I think this is the idea behind graded comonads,
and graded monads^{1} are apparently also known as “parametric monads”, so could we say that $\mathbf{Para}(\mathcal{C})$ is the Kleisli category of some $\mathcal{C}$-parametric comonad?
we can just think of $\mathbf{Para}(\mathcal{C})$ as a category, appropriately quotiented
I think experience has shown that we get more interesting things if we don’t take quotients, but it is more expedient to do so.
The graded monad entry mentions coeffects, but there’s nothing there at the moment, so I googled and found this website about context-aware programming languages by Tomas Petricek. ↩
1 to 14 of 14