Want to take part in these discussions? Sign in if you have an account, or apply for one below
Vanilla 1.1.10 is a product of Lussumo. More Information: Documentation, Community Support.
I’ve started writing duploid which generalizes thunk-force category to include negative types as well as positive.
The missing concept for category with runnable monad as a version of thunk-force category for negative types was introduced in the same thesis as duploid but without much explicit definitions since it is immediately subsumed by duploids.
More to come, especially the relationship to adjunctions.
Interesting. I think you’re a bit inconsistent with the order of composition; did you mean for it to be diagrammatic or functional? Also should the third associativity law be $(f\circ g) \bullet h$ on the right?
This looks to me almost like an instance of a category enriched in a bicategory, but the failure of general associativity is weird. I don’t quite understand what it means for a composition $f \odot g$ to “evaluate $f$ first” or “evaluate $g$ first”; surely in computing $f(g(x))$ we have to compute $g(x)$ before we can plug it into $f$?
Thanks for finding those errors Mike, I’ve fixed them. Also I noticed that I flipped the convention from the thesis for $\bullet, \circ$ so I swapped them back, and changed the names of the morphisms to be consistent too.
surely in computing $f(g(x))$ we have to compute $g(x)$ before we can plug it into $f$?
This is exactly the difference between call-by-value and call-by-name. In call-by-value $g(x)$ is evaluated to a value $v$ then we evaluate $f(v)$, but in call-by-name, the entire expression $g(x)$ is substituted into $f$.
Most programming languages use call-by-value, in which case you would be right, but for example Haskell uses call-by-need (essentially an optimization of call-by-name for an idempotent effect).
In haskell f is evaluated first as you can see if you run the following program (ignore the :: Int
parts):
let f x = error "f evalutated first"; g x = error "g evaluated first" in (f (g 3 :: Int) :: Int)
error
essentially crashes the program with an error message and you can see if you run this in https://www.tryhaskell.org/ it will print “f evaluated first”, whereas a similar program in most languages (Python, Java, Ocaml, scheme/racket) would print “g evaluated first”.
The way this reduces is that the expression g 3
is substituted into the body of f
, which doesn’t use it so the error g
tried to raise is lost.
Here’s a similar program that uses let
instead of functions since that’s a more direct analogue to duploid composition:
let x = error "Strict evaluation!" in (error "lazy evaluation" :: Int)
you can run that to see what kind of language you are working in.
I think the failure of associativity will make more sense once you see how duploids arise from adjunctions, which I’lll put up soon.
Okay, but then why do we have $f\bullet (g\circ h) = (f\bullet g) \circ h$? Shouldn’t the LHS evaluate $g$ then $h$ then $f$ while the RHS evaluates $g$ then $f$ then $h$?
I’m looking at the construction from adjunctions, and it looks weird too. I’m familiar with the graph of a profunctor but this seems to be something different; why are morphisms in $\mathcal{D}$ between positive types defined by applying $F$ to them rather than just using the morphisms in $C_+$?
Great questions.
Shouldn't the LHS evaluate $g$ then $h$ then $f$ while the RHS evaluates $g$ then $f$ then $h$?
That’s not quite how it works. It might be a better intuition to say that in call by name composition $g \circ f$, $g$ has control whereas in call by value $g \bullet f$ $f$ has control. You can think of it as $g \circ f$ means that $f(x)$ is passed as an unevaluated thunk to $g$ whereas in $g \bullet f$, $g$ is passed as an “unevaluated continuation” to $f$ (continuation-passing style makes this literally true). So in $f \bullet (g \circ h)$ and $(f \bullet g) \circ h$, $g$ is in control, it has a thunk $h(x)$ that it can force (or not) but it also has a continuation $f$ it may run (or not).
Again as concrete programs, if $f,g,h$ are all one-argument functions in a call-by-value language we can write $g \circ f$ as $\lambda x. g (\lambda y. f(x))$ where $y$ is does not appear in $f$ and $g \bullet f$ as just $\lambda x. g(f(x))$. Then $f \bullet (g \circ h)$ looks like:
$\lambda x. f (g (\lambda y. h(x))$and $(f \bullet g) \circ h$ looks like:
$\lambda x. (\lambda y. f(g(y)) (\lambda z. h(x))$which is the same after doing a $\beta$ reduction.
I'm looking at the construction from adjunctions, and it looks weird too. I'm familiar with the [[graph of a profunctor]] but this seems to be something different; why are morphisms in $\mathcal{D}$ between positive types defined by applying $F$ to them rather than just using the morphisms in $C_+$?
Right, it’s similar to the cograph of a profunctor but different. The thing to remember is that, like in a thunk-force category, we never get direct access to the morphisms of the original categories, we only get them in the view of the “Kleisli category”/”co-leisli category”. Then after the fact we can recover them (to some extent) as the linear and thunkable morphisms. The intuition (for me) comes from effectful programming languages, where you don’t get a direct syntax for “pure” or “linear” arrows, only the effectful arrows. So a morphism between positive types isn’t a “pure” arrow from the original category $C_+$, but instead an “effectful arrow” from the Kleisli category $O(P,F P) = C_-(F P, F P) = C_+(P, GF P)$
if $f,g,h$ are all one-argument functions in a call-by-value language we can write $g \circ f$ as $\lambda x. g (\lambda y. f(x))$ where $y$ does not appear in $f$
What are the types of these things? If $f:A\to B$ and $g:B\to C$, then it seems that $\lambda y.f(x)$ should have a type like $Y\to B$ (where $y:Y$), which is different from $B$ and hence is not the sort of thing that $g$ can be applied to.
In general I’m having trouble figuring out exactly how to think about a morphism $f:A\to B$. It feels like you’re saying that if $B$ is positive, then a morphism $f:A\to B$ is a function that eats an $A$-value and spits out a $B$-value, whereas if $B$ is negative then a morphism $f:A\to B$ is a function that eats both an $A$-value and a $B$-eating continuation as arguments. Is that right? Does the intuition depend also on whether $A$ is positive or negative?
Is there a real-world programming language you can point to that has this polarization of types? Or just a completely specified $\lambda$-calculus-like syntax?
I’m used to thinking of polarity as a property of how a type is constructed, e.g. sum types are positive while function types are negative. But it seems that duploids are using polarity as a sort of marker placed on a type, like a “strict” modifier in a type signature, rather than a property of types that already exist. How are these notions of polarity related?
I would say the intuition is that in the most general situation (i.e., the most effects) a positive input is given as a value, but a negative input is given as a thunk. A negative continuation is a linear evaluation context but a positive continuation is given as an explicit continuation.
Is there a real-world programming language you can point to that has this polarization of types? Or just a completely specified λ-calculus-like syntax?
Yes, I probably should have brought this up earlier. Duploids are based on explicit term calculi that people have made for the sequent calculus, analogous to how $\lambda$-calculus models natural deduction. These usually go by the name “System L”, after Gentzen naming sequent calculi LK and LJ, though older work calls them System $\lambda\mu\mu\~$. They were introduced in this paper, and types did not have an explicit polarity. System L is also presented in Guillaume’s thesis where he defined duploids (linked in the article).
In particular in polarized L, what I said above about positive inputs being passed as values is literally true
How are these notions of polarity related?
This is something that I don’t quite understand well enough to say very definitively, I’m looking at duploids in part to understand all of this better. Guillaume’s thesis and this expository article about L-like calculi talks about what polarity means in the context of System L.
My rough understanding is that since values of negative types are uninspectable thunks, it doesn’t really make sense for them to have a syntactic mapping-out universal property, instead you have to do it with their continuations. And dually for positive types syntactically their continuations are uninspectable continuations so you can only express a mapping-out universal property by the shape of their values.
Note: you can quote previous comments without making them verbatim by putting a >
at the beginning of the line instead of four spaces.
If I am writing a function that takes an input of type $A$, what difference is there in the kind of code I’m allowed to write based on whether my input is given as value or as a thunk? In lazy Haskell I write the same code as I would in a call-by-value language, it’s just that the argument doesn’t get evaluated until necessary. Is it that a thunk is something I can also choose to force even if I don’t need its value (whereas if I do need it then I have to force it)? Or that I can choose to force it more than once?
I’m excited to hear there is a connection with sequent calculus! I don’t think I’ve heard before anything about a systematic study of term calculi for sequent calculus. Maybe if I have some time I’ll try to read some of that.
Fortran would be an example of a language that can typically be thought of call by reference in a more direct sense (i.e. I’m not sure that laziness should be taken as a prototypical example of call by reference). One certainly would typically code a bit differently, e.g. it is much more natural to make use of mutability. There is also of course a big difference in how things are working when one goes down to nuts and bots (i.e. in the compiled code, the machine instructions); it is easier to write optimised code in a more natural way in call-by-reference, I would say (e.g. one does not typically need pointers, as one would in C).
I think call by reference is different from call by name.
Ah, sorry, was not reading closely enough!
If I am writing a function that takes an input of type $A$, what difference is there in the kind of code I’m allowed to write based on whether my input is given as value or as a thunk? In lazy Haskell I write the same code as I would in a call-by-value language, it’s just that the argument doesn’t get evaluated until necessary. Is it that a thunk is something I can also choose to force even if I don’t need its value (whereas if I do need it then I have to force it)? Or that I can choose to force it more than once?
Typically the code you write looks the same but has different behavior. In “true” call-by-name, a thunk can be evaluated more than once, right so a program like:
let x = print "0"; 3 in x + x
would print “00” and return 6, whereas in call-by-value it would print just “0” and return 6. Haskell on the other hand is call-by-need so the first time the thunk is evaluated, the result is stored and returned if the thunk is invoked later. Traditional call-by-name $\lambda$-calculi would not have an explicit way to force something without using the value, but Haskell does support this because it is very useful for managing memory usage.
Warning: minor rant incoming.
It’s probably worth pointing out that there hasn’t been a “real” programming language to use true call-by-name since maybe Algol60 (as in 1960), since it’s easier to simulate call-by-name using thunks than the other way around and Haskell uses call-by-need and avoids any real effects besides divergence. However, call-by-name languages survived a lot longer in the denotational semantics world because there was a perception that they had “better” semantics because they have the $\beta$ and $\eta$ laws of the pure $\lambda$-calculus, whereas call-by-value only has $\eta$ for values (this also helped the perception of denotational semantics as not applicable to “real” programming languages). In category terms this is because the models of call-by-name are cartesian closed because cartesian-closure only requires mapping-in universal properties and so are inherited by the Kleisli category of a comonad. On the other hand, these categories don’t typically have sums, whereas call-by-value calculi do because mapping-out universal properties are inherited by the Kleisli category of a monad. Calculi like call-by-push-value and System L are an attempt to get the best of both worlds.
a program like:
let x = print "0"; 3 in x + x
would print “00” and return 6
Can I ask a more naive question: why would one ever want that behavior? It seems highly counterintuitive to me, for instance, for the two expressions x + x
and 2 * x
to have such highly different behavior. In a language with side-effects, I would want to control precisely what code gets evaluated and what doesn’t, and how many times, rather than leaving it up to the details of how the values returned by that code gets used. Certainly there are situations in which one wants to conditionally evaluate an argument, but one can always deal with that by explicitly passing a function (thunk?) rather than a value.
Perhaps a related question is why, when combining call by name and call by value, you would ever think of dividing it up by considering some types to be called by name and others by value? It seems to me that calling by name and calling by value is an orthogonal consideration to the type of the value in question, so I would expect rather to have a marker on each argument of a function, separate from its type, indicating whether it is passed by value or by name.
Sorry if my comments are running a little long here.
Can I ask a more naive question: why would one ever want that behavior? … In a language with side-effects, I would want to control precisely what code gets evaluated and what doesn’t, and how many times, rather than leaving it up to the details of how the values returned by that code gets used.
I agree with you for a language that has IO effects, I’ve never programmed in such a language but it seems kind of insane.
However consider a language like Haskell where we can idealize slightly and say the only effects are program-ending: a computation diverging and error "blah"
that crashes the program.
There the difference between an effect performed once or twice is immaterial because the program essentially crashes the first time.
The advantage you get is that it makes more programs equal. Specifically, it means you can freely substitute an expression for a variable and know that the program does the same thing. In a call-by-value langauge you have to know a lot of context (that the program is pure, i.e., in semantic terminology thunkable) to know that this transformation is allowed. You can also freely re-order a sequence of let-bindings, which again takes a good deal of contextual knowledge to validate in call-by-value.
These kinds of transformations are probably even more useful to compilers, which try very hard to move code around, and are much freer to do so in a lazy language like Haskell. In cbv compilers, they need to do static analysis to see if something is thunkable, which is quite expensive to do, and so most compilers will only do it for very obvious special cases (like constants). There’s a whole area of research on “fusion” about discovering these types of program transformations where multiple passes over a data structure are fused into a single pass, that are almost never valid transformations for cbv languages.
Perhaps a related question is why, when combining call by name and call by value, you would ever think of dividing it up by considering some types to be called by name and others by value? It seems to me that calling by name and calling by value is an orthogonal consideration to the type of the value in question, so I would expect rather to have a marker on each argument of a function, separate from its type, indicating whether it is passed by value or by name.
First, I would say that the shifts are exactly a marker you can put on a type to switch it from data to codata or vice-versa. Also I don’t think it’s orthogonal, you can only get the right $\eta$ rule for positive types/data if they are treated in a call-by-value manner and you only get the right $\eta$ rule for negative types/codata if they are treated in a call-by-name manner. Think of it as a modal distinction, some types naturally come from the “negative” category, some naturally come from the “positive” category and there are modalities (the shifts) that let you change categories.
Also it should be said that these calculi are not necessarily being promoted as surface languages, but rather as better for automated tools like proof search/program synthesis and as an internal representation for compilers. In my experience they are quite hard to read, just like sequent calculus proofs are harder to read than natural deduction proofs.
Sure, I believe in Haskell. But in #15 you were specifically distinguishing between Haskell and “true” call-by-name, so I was wondering why one would want the latter.
Did you ever answer #7? If so I missed it.
I’m still resistant to the idea that associativity can fail. Intuitively, associativity in a category is a representation of the most basic property of time: if you do X and then Y and then Z, it doesn’t matter how you parenthesize it. So if I see some category-like structure that fails to be associative, it seems to me that it must be encoded as a category in the wrong way.
it seems to me that it must be encoded as a category in the wrong way.
Or less combatively (-: that there should at least be some other way of encoding the same data as a category-like structure that is associative.
In this case, it feels to me like if in $g\bullet f$ it is $f$ that “has control”, then perhaps we should think of $g$ as being passed to $f$ rather than vice versa. Then $f\bullet (g\circ h)$ and $(f\bullet g)\circ h$ would both be some kind of multicategorical composition $g \circ (f,h)$, whereas the unequal $f\circ (g\bullet h)$ and $(f\circ g)\bullet h$ would be something like $f \circ h\circ g$ and $h \circ f\circ g$ respectively (though with some multicategorical annotation that I don’t understand yet to indicate how things are plugged together).
Did you ever answer #7? If so I missed it.
Oh I think I missed that one, your comment is right. In order to express call-by-name evaluation in a call-by-value langauge we have to use thunks, so if we want to have a call-by-name composite $g \circ f$, we need to change the type in the middle to be thunked: $f : A \to (1 \to B)$ and $g : (1 \to B) \to C$ and change all the places where $g$ uses its input from $x$ to $x(*)$.
In this case, it feels to me like if in $g\bullet f$ it is $f$ that “has control”, then perhaps we should think of $g$ as being passed to $f$ rather than vice versa
Continuation semantics basically works like this, but I don’t know of anything (besides duploids) that is a more “direct” style model, like some sort of multicategory.
Max referred me to this discussion some time ago; sorry it took some time to get back to it. I find Max’s replies insightful, and I would like to add the following:
(Mike) I’m used to thinking of polarity as a property of how a type is constructed, e.g. sum types are positive while function types are negative. But it seems that duploids are using polarity as a sort of marker placed on a type, like a “strict” modifier in a type signature, rather than a property of types that already exist. How are these notions of polarity related?
Something along the lines of the former is often given as a definition of polarities, but this fails for modalities such as ! and ? from linear logic. One can even derive contradictions in some papers from this. Duploids describe a notion of polarities that matches more closely the original notion from Girard (1991) and Danos-Joinet-Schellinx (1997). The discrepancy between the two notions (in particular the one between shifts ↑⊣↓ found in some works and the expected adjunction (-⊗1)⊣(1⊸-) where the right adjoint is negative) was the starting point of the thesis cited by Max, given that at the time the dominant discourse was to conclude “it’s just all the same”. The theorem relating duploids to adjunctions was meant as (the start of) an explanation of their relationships, even though it stands alone as a mathematical result.
An explanation for the complex polarities of !,? from the point of view of focusing was given in Melliès and Tabareau (“Resource modalities in tensor logic”, 2010). However, this is better understood from the angle of polarities in the sense of duploids, which in fact has revealed small corrections to make (this is explained in the thesis and properly written in Curien, Fiore and gadmm, “A theory of effects and resources: adjunction models and polarised calculi”, 2016).
(Mike) I’m still resistant to the idea that associativity can fail.
It is known that categories fail for a few settings of interest:
Duploids arise as a description of the direct and indirect “fixes” that have been developed to work around these failures, e.g.: Girard’s classical sequent calculus (the duploid arising from the self-adjunction of negation), thunk-force categories (the way thunks are used to enforce call-by-name in call-by-value is in fact the construction of a duploid), call-by-push-value models (for which duploids explain the relationship with intuitionistic logic). This suggests that an axiomatic approach to logic and computation (e.g. let’s start from CCC and add the desired structure/property) does not need to be based on the assumption that composition is associative, and that many people (who tried to take evaluation order into account) have implicitly been rejecting this hypothesis.
(Max) It’s probably worth pointing out that there hasn’t been a “real” programming language to use true call-by-name since maybe Algol60 (as in 1960)
If you are interested in the semantics of evaluation order in Algol 60, you may want to have a look at Hatcliff and Danvy’s decomposition of Algol 60’s CPS semantics (“A Generic Account of Continuation-Passing Styles”, POPL 1996, §4.2) which is quite illuminating. This translation corresponds to an iterated Kleisli construction, which lets every type support both call by value and call by name. Because of this ambivalence, it probably does not validate any unrestricted eta conversion, so this is quite opposite in spirit from polarisation.
Regarding the difference between call by name and call by need also mentioned in the thread: historically call by need has been presented as an efficient implementation of call by name, and call-by-name domain semantics are usually presented as a way to think about Haskell programs (or at least some of them I suppose). From this point of view, I do not see any obstacle in interpreting duploids arising from the call-by-push-value version of such models as mixing call by value and call by need.
Hope this helps.
1 to 23 of 23