Want to take part in these discussions? Sign in if you have an account, or apply for one below
Vanilla 1.1.10 is a product of Lussumo. More Information: Documentation, Community Support.
Okay, let’s get this show on the road! We have close to 40 participants, which is awesome but may require some effort to organize. After thinking a bit about how to get started, I’ve decided to write up a concrete proposal for how to deal with most of the choices that we’ll have to make at the start. This is very subject to change, and everyone please feel free to point out problems or suggest alternatives to any of it (with justifications). I just thought it would be helpful to have a starting point to refer to as a “default”. Let’s use this thread to discuss this proposal and the changes we may want to make to it; once a consensus seems to be reached, we can start listing the individual jobs that need to be done and volunteering for them.
Just to be sure, I guess you are aware of the work by Castellan and Castellan, Clairembault and Dybjer. Google also gave me Brilaki’s master thesis in agda, which I haven’t had the time to read properly yet.
FYI, to make links work here you need to include the http
in the URL.
Castellan’s work has been mentioned. I have not read it carefully myself, but I hear from people I trust that it sweeps a lot of details under the rug and uses techniques that are very specific to the particular type theory used, both of which we are trying to avoid in this project. I just had a look at Brilaki’s thesis but had a hard time understanding exactly what it achieves; he talks mainly about CwFs, is he actually connecting them to type theory?
:-( I was trying to avoid including the google trackers. I need to find a better way.
@spitters this works:
[Castellan](https://iso.mor.phis.me/archives/2011-2012/stage-2012-goteburg/report.pdf)
but this (the current content) doesn’t
[Castellan](iso.mor.phis.me/archives/2011-2012/stage-2012-goteburg/report.pdf)
Please edit #2, else someone innocent might make a new page on the nLab with the url as the title (as happened recently with a similar malformed link). (EDIT: or not, as Mike points out below)
I haven’t signed on yet, mainly because of doubts that I’d add much to the discussion. But now I’m curious about how much I might learn, even as a fly on the wall. I hope you don’t mind naive questions.
What do $Tm$ and $Ty$ stand for? Or do $T, m, y$ stand for different things?
@Todd Terms and Types it seems.
Thanks Alizter – that’s already helpful.
Re #5, I don’t think that will happen with this sort of broken link; clicking on it just sends you back to this discussion page.
Yes, $Tm$ stands for terms, and $Ty$ for types. The idea is that the elements of $C$ are semantic representations of contexts, and for such a $\Gamma\in C$ the elements of $Ty(\Gamma)$ and $Tm(\Gamma)$ are the semantic representations of types and terms in context $\Gamma$, respectively, with the map $Tm \to Ty$ assigning to each term its type. Representability of this natural transformation means that for any type $A\in Ty(\Gamma)$ there is an “extended context” $\Gamma.A$ such that for any other context $\Delta$ and substitution $\sigma : \Delta\to\Gamma$, terms in context $\Delta$ with type $A[\sigma]$ correspond naturally to context morphisms $\Delta \to \Gamma.A$ extending $\sigma$. Informally, you can add a new variable to the context belonging to any type, and context morphisms consist of tuples of terms.
I was hoping to eventually see, first of all, a Problem Description.
Does that exist? Does everyone involved know what exactly you are trying to prove? Each time I have been seen people discuss this topic there has been disagreement on what exactly needs to be proven and whether or not this has not already been proven.
@Mike what happens when we add something like cumulative universes, how do we assign terms to types then?
@Urs The page Initiality Project has the best “description” so far. I agree though, that we can add some more detail on what and why we are trying to achieve.
I think the following would be a summary of what the goal is (Mike or others can correct).
Formalise a certain kind of type theory mathematically, which Mike suggests to take to mean a category with families with certain structure (precise formulation as yet unclear, as far I understand).
Construct an initial object of the category of such categories with families, i.e. construct a ’syntactic category’, and show there is a unique functor from it to any other category of families with the designated structure.
Some of Mike’s description is about some aspects of 1), and some is about how one should likely go about the part of 2) which concerns constructing the functor from the syntactic category to a model.
@Alitzer, not sure if you pointed me to the intended page?
I imagine the first thing you all should do is write something in the style
Theorem. With the above definitions … we have that …. is …. .
It seems clear that an informal line to the extend that “syntax provides the initial model” doesn’t cut it, as apparently the devil is all in the detail of how exactly that is to be formalized.
For extra credit, add discussion of how this theorem relates to theorems proven or not proven already in the literature.
Regarding comparison to the literature, the following points seem to be important to me for such a project:
Initiality should be a triviality as soon as the type theory under consideration is an algebraic theory. I would like to see discussion of: When is this the case, when is this not the case. Or, if algebraicity of the given type theory is what remains to be established in the first place, then a clear statement to this effect.
Elsewhere Mike claimed that even weak initiality is open, hence the mere existence of any models at all. This statement flies in the face of a large chunk of literature which talks about categorical semantics of type theory. At the same time, weak initiality is what actually matters for translating type theoretic proofs to proofs in various fields of mathematics. Hence while not as ambitious on absolute grounds, weak initiality is the version paramount for the role of type theory in mathematics. Would the proof of initiality best be broken up into two pieces, first weak initiality, then uniqueness?
Regarding 2): I think the thing is that one has to fix some precise definitions (i.e. the 1) of #13) before one can even get started. So there is lots of literature which takes type theory as a syntactical thing and interprets (parts of/the rules of) that syntax in various categories. But that is a little different to weak initiality, because for weak initiality we need to replace the role of the type theory as syntax with some precise mathematical object, and then we have to check that our earlier less formal interpretations still are models in the precise sense. The extent to which weak initiality matters for the kind of thing you mention is probably where most of the debate arises; I think that those who feel it matters would say that there in principle might be subtle but significant things about the notion of a type theory and what it means to interpret one that we are missing until initiality is proven.
Of course, Mike and others know far more about it and can correct this if I am missing something, or can elaborate if there are important points which are missing.
Regarding 1): interesting question! I don’t think that a category with families is going to be algebraic in the usual sense. But if one worked with some kind of 2-theory which has something like a Yoneda embedding as part of the structure (as many such categorical notions of 2-theories do), one could ask the same question: if we are able to express a structured category of families as some appropriate kind of 2-theory, could we then try to prove a general initiality statement for these kind of 2-theories? [Edit: but, as pointed out in #17, this would be beside the point anyhow with the way things have been set up here.]
Construct an initial object of the category of such categories with families, i.e. construct a ’syntactic category’, and show there is a unique functor from it to any other category of families with the designated structure.
Just to clarify, this does not mean that it is enough to construct any initial model of the theory. As others observed, this is pretty much trivial from the fact that the theory is essentially algebraic.
Rather, the problem is about taking some given model (the details of which we also need to agree on) and prove that it is indeed initial, or equivalently prove that it is isomorphic to the abstractly constructed initial model.
Good point! [Edited to remove some comments that I think in retrospect will be a distraction. Basically they were saying that if one shifted more of the weight of interpretation into the definition of the structure on top of a category with families, it should be enough to just prove that there is any initial object; but just ignore this, things have been set up differently, and I don’t think it will be helpful to debate alternatives! ]
It’s not clear to me that a category with families with additional structure (if one has axioms) will be an essentially algebraic theory: as in #16, it would seem to me that one would need a 2-theory. But maybe I’m missing something.
@Urs #14, It is the correct place, that is the best “description” of the problem so far. As this is the planning stage I wouldn’t be too worried, details should follow soon. I don’t think anyone will start anything yet for a few weeks.
@Richard #16: I don’t think the theory of CwFs is (equationally) algebraic, but it is essentially algebraic. The easiest way to see this that I know of is using partial Horn logic to axiomatize CwFs. The same applies to most other proposed categories of models.
I don’t think it is needed to think of the theory of CwF as a 2-theory because I’d expect the syntactical CwF to be initial in the category of strict CwF morphisms. In any case, biinitiality is easy to prove once strict initiality is known; see Two-dimensional monad theory (Blackwell, Kelly, Power).
Re #20: Thanks very much! I should maybe not have used the term 2-theory: what I had in mind was a 2-categorical gadget that includes something like a Yoneda embedding as part of the structure, like most proposed definitions of 2-topos do, or things like equipments. But maybe as you say this is not needed after all, though we do need extra structure on top of a category with families here [edited: if one has axioms, at least].
This formulation of Categories with Families is very nice! To understand it better I reformulated it in terms of representable profunctors here. Basically instead of thinking of $Tm$ as a presheaf over $C$ with a map to $Ty$, I think it’s more intuitive (though formally equivalent) to think of it as a presheaf over $\int Ty$ and then the universal property of the context extension can be naturally formulated as being a functor that represents a profunctor from $\int Ty$ to $C$.
Probably this kind of thing is very obvious and trivial to category theorists but to me the presentation using representable profunctors is much closer to how I would describe the syntax, so maybe it will help. I’ll look at reformulating the $\Pi$ types next.
Urs, I think that probably your questions will be best answered by waiting until we are finished and then reading the result. The proposal that I wrote was not intended as an exposition or explanation, but assumes that the reader is familiar with the general problem of initiality. Right now we can’t write a theorem down yet because part of what’s under discussion is what definitions should be used in stating the theorem. But the general point about what needs to be proven is well-understood: as others have said, the category of CwFs has an initial object for formal reasons, what has to be proven is that we can construct that initial object out of the syntax of type theory. The disagreements in the community have not been about what it means to prove an initiality theorem in the traditional sense (which is what we’re doing here), but about what (if anything) is hard about doing this, and about whether various “nontraditional” forms of initiality can be regarded as solving the same problem.
what happens when we add something like cumulative universes, how do we assign terms to types then?
My inclination would be to continue the principle that the type theory for which we prove initiality should be “fully annotated” so that every term is synthesizing. In particular, type formers should be annotated enough to deduce the universe to which they belong. Which means in particular that if we allow the product of $A:U_i$ and $B:U_j$ to live in other universes than $U_{max(i,j)}$, then the product operation must be annotated with the universe, e.g. $A \times^k B : U_{k}$ for any $k\ge max(i,j)$. I would be inclined to regard “cumulative” universes, where the same syntax $A\times B$ inhabits all such universes $U_k$ at once, as analogous to the sort of omission of annotations that can be done elsewhere with bidirectional typechecking, e.g. the same un-annotated syntax $\lambda x. x$ can check against $A\to A$ for all types $A$, with the bidirectional typechecking algorithm being modifiable to produce as output a fully annotated (hence synthesizing) term such as $\lambda (x:A). x$.
But I’m open to suggestions of better ways to do this sort of thing!
instead of thinking of $Tm$ as a presheaf over $C$ with a map to $Ty$, I think it’s more intuitive (though formally equivalent) to think of it as a presheaf over $\int Ty$
Thanks for adding this; it’s always great to have different ways of looking at things. Personally, I find the first way more intuitive, perhaps partly because I’m already familiar with the notion of “representable natural transformation” from other contexts. I also think it’s valuable to stick to categorical operations that are internal to the category $[C^{op},Set]$, rather than moving back and forth between Grothendieck constructions; among other things, this makes it more clear how they can be described in the internal language of $[C^{op},Set]$ as a “logical framework”.
I thought this leads to over-applicability of the beta rule in the presence of equality reflection.
Ah, thanks. I was a bit suspicious of that, but I was following Streicher, who doesn’t annotate the codomain on abstractions, and says that one reason he’s making his annotation choices is to be consistent with various extensions, and I thought equality reflection was one of those (I don’t remember whether he mentions it explicitly, I don’t have his book with me at home right now). But I think you’re right, we do need both to be fully annotated. I’ll fix it.
Can you explain why you think equality reflection would make admissibility of substitution a problem?
Some bidirectional systems have two equality judgments, $\Gamma \vdash s\equiv t \Leftarrow A$ with everything an input, and $\Gamma \vdash s\equiv t \Rightarrow A$ with $A$ an output. I believe the idea is that to check equality you first reduce both terms to WHNF, then apply type-directed rules in checking mode (e.g. to check whether $s\equiv t \Leftarrow \Pi(x:A) B$, you check whether $x:A \vdash s(x) \equiv t(x) \Leftarrow B$, thereby obtaining $\eta$-conversion) until you get to neutral terms, then apply term-directed rules in synthesis mode.
However, I don’t think this works for type theories with things like equality reflection, and I don’t think it’s likely to correspond to anything semantic either as long as we are looking at semantics in strict categorical structures where equality means equality. So my suggestion for the initiality project is to take the equality judgment to be un-moded.
What is the purpose of defining the semantics as a partial function on term syntax rather than as a total function on typing derivations? My expectation would be that we would define directly from any derivation of $\Gamma \vdash A type$ an element of $Ty([\Gamma])$, etc.
Peter’s comment on the other thread alludes to some issues with tricky induction measures in Streicher’s proof. Is that related?
Actually, I would say the problem is different, and more serious. I don’t know exactly what circularity Neel is referring to, but I agree that in principle there should be no problem in defining something by direct induction over typing derivations. The real problem is that what you get isn’t an interpretation of judgments but of derivations of judgments: two different derivations of the same typing judgment $\Gamma \vdash t:A$ might (as far as you know) get interpreted by different things in the semantics. Defining the interpretation as a partial function on syntax (not just a relation) and using the derivations only to prove totality ensures that the output of the function doesn’t depend on the derivation.
two different derivations of the same typing judgment $\Gamma \vdash t:A$ might (as far as you know) get interpreted by different things in the semantics
Is this a problem for our system though? Aren’t we annotating terms so heavily that they are isomorphic to the derivation structure? Usually the issue would be uses of the judgmental equality rule to change something’s type but since we are talking about this bidirectional syntax it looks like that can’t happen.
I am also curious about this bit:
two different derivations of the same typing judgment $\Gamma \vdash t:A$ might (as far as you know) get interpreted by different things in the semantics
My understanding is probably below the level of discourse here, but I would appreciate an explanation of what this means exactly. Is there actually an example of two derivations of the same judgement with different interpretations (in a reasonable system)? If so, is it feasible to describe such example? Or perhaps the parenthetical “as far as you know” is meant to suggest that it doesn’t really happen, but proving that fact would be problematic compared to the alternative of defining the interpretation as a “partial function on syntax”? If so, could you clarify what the problems might be?
Annotated terms are not quite isomorphic to full derivation trees. They don’t (and shouldn’t!) indicate the particular derivation tree of each judgmental equality they use, and if the judgmental equality includes reflexivity, symmetry or transitivity rules derivations of judgmental equality won’t be unique. Mike indicates that he wants that when he writes:
Thus the equality judgments should have primitive rules asserting reflexivity, symmetry, transitivity, and congruence properties.
Neel’s response to #35 is also what I would say. In a fully bidirectional system, where the equality judgment is (like the typing judgments) basically a logic-programming encoding of an equality-checking algorithm, derivations are unique, and one might hope to make use of this fact in proving an initiality theorem. We discussed this briefly at the blog here. But even if that works (to my knowledge no one has ever tried it, but I am hopeful), it would only work for type theories with particularly well-behaved (in particular, algorithmically decidable) judgmental equality. One could argue pro or con about the importance of known-to-be-ill-behaved ingredients in type theories like equality reflection, but even for type theories that are expected to be well-behaved (or have well-behaved relatives), we should be able to rely on an initiality theorem before we’ve done the work to figure out a bidirectional equality-checking algorithm. Note that this applies in particular to Book HoTT, which augments MLTT not only with axioms but with new judgmental equalities (the computation rules for point-constructors of HITs). So my proposal is to treat the typing judgments as bidirectional, but with the resulting “type-checking algorithm” treating judgmental equality as an oracle: the equality judgment is not moded or bidirectional, and a single equality can be derived in many ways.
Re #36, such a thing shouldn’t happen in any reasonable model, at least of the sort we’re considering here; but that’s part of the content of the theorem we’re trying to prove! Streicher’s approach of first defining a partial function on raw syntax and then proving it to be total on valid syntax is, as far as I know, the best method known for proving that such a thing can’t happen. In fact, one might argue that it’s even just a repackaging of the “obvious” such method. Suppose we define an interpretation by induction on derivations, and then we want to prove that any two derivations of the same judgment have the same interpretation. How can we even get started? We don’t know anything about how the two derivation trees are related, except that they have the same root. So one “obvious” way to proceed would be to define some “canonical” interpretation associated to the root and the prove that any derivation tree with that root has that interpretation. (In HoTT terminology, rather than proving $isContr(A)$ as $A \times (\Pi(x,y:A). (x=y))$, we prove it as $\Sigma(x:A) \Pi(y:A). (x=y)$.) But this is just a repackaging of Streicher’s method: the canonical interpretation associated to the root is the partial interpretation function defined on raw judgments, and the “definition” of the interpretation by induction on derivations together with the proof that it equals the canonical one associated to the root is the proof that the partial interpretation function is total on derivable judgments.
Nitpick:
But even if that works (to my knowledge no one has ever tried it, but I am hopeful), it would only work for type theories with particularly well-behaved (in particular, algorithmically decidable) judgmental equality.
The rules don’t need to be decidable, only algorithmic. These are not the same. Derivability of a judgment guarantees that the algorithm will succeed, even if it generally doesn’t terminate.
Ok I think I get it now, the bidirectional structure ensures that everything but the definitional equality parts of the derivation are uniquely determined by the term’s structure. I.e., because the only rule that uses definitional equality is
$\frac{\Gamma \vdash t \Rightarrow A \qquad \Gamma \vdash A\equiv B}{\Gamma \vdash t \Leftarrow B}$in which $A$ and $B$ will be uniquely determined, so if there is only one derivation of $\Gamma \vdash A \equiv B$ then the judgment would be uniquely determined, and if definitional equality is semi-decidable we can construct the derivation from the term, but we don’t want to make those assumptions.
Nitpick:
and if definitional equality is semi-decidable we can construct the derivation from the term, but we don’t want to make those assumptions.
Actually, you probably do want to assume semi-decidability, since that’s the same as recursive enumerability. You just don’t want to assume that the derivation rules spell out a semi-decision algorithm. (Since that would generally be a terrible idea.)
Sorry if I’m getting annoying, but this is basic stuff about formal deductive systems, and I want the correct facts pointed out, even though they’re tangential.
Thanks for the corrections. But actually, from the purely mathematical point of view of proving an initiality theorem (as opposed to trying to implement anything), I don’t think even semidecidability is relevant. What we would need, I think (and all that we need), is for each derivable judgment to have a unique derivation. (Actually I think it would be enough if there were a function selecting a specific “canonical” derivation of each derivable judgment in a “compositional” way, i.e. such that a derivation ending with some rule is canonical whenever the sub-derivations of each premise are canonical. But I’m not sure such a thing would ever be achievable in practice unless derivations are unique.)
Of course, all of this is tangential to the current project, since that’s not the route we’re taking here — or, at least, it’s not the route I’m proposing we take.
Based on some discussion in another thread, I went ahead and started a page on raw syntax. Anyone who feels like adding the definition of $\alpha$-equivalence and capture-avoiding substitution should feel free to do so. Anyone who wants to add the raw syntax for other type forming operations is welcome to do that too.
I’m hoping to soon lay out the overall plan in a sequence of other pages so we can see clearly what the individual tasks are that need to be done and start volunteering for them.
Okay, pausing for a moment to make two new suggestions and ask a question.
Suggestion #1: Let’s use small sub-pages for each collection of rules or inductive clauses that pertain specifically to a particular type former. I’ve started this with with $\Pi$-types, as you can see. By include
ing this into more than one other page, this allows one page Initiality Project - Type Theory to include all the rules of the theory, and another page Initiality Project - Pi-types to include everything relevant to $\Pi$-types, without copy-and-pasting. This way a reader can slice the proof in either of the two dimensions of “stage of proof” and “type former”, and (once we have multiple type formers) we can easily create compilation pages that select any particular desired package of type formers. I’ve started putting in placeholders for where these small sub-pages should appear; unfortunately instiki doesn’t let you save a page that tries to include
a nonexistent page, so I made the nonexistent ones simply links; once they exist these links should be made into [[!include ]]
s.
Suggestion #2: I don’t know whether this proof will ever be peer-reviewed or published in the usual way. However, it would be nice to have at least some guarantee that all of it has been read by someone other than the author. Since probably no one really wants to read a complete initiality proof any more than they want to write one, how about if we parcel out the reading as well as the writing? Everyone who writes a particular section can mark themselves as the author (or one of the authors), and also add “Read section X” to the task list; then when someone else reads it and checks it, they can mark themselves on it as a reader and mark the reading task as done. Thoughts?
Question: How should we deal with context validity? My first thought, which I expressed in the other thread, was that validity of contexts should be an a posteriori theorem: whenever $\Gamma \vdash \mathcal{J}$ is derivable for some judgment $\mathcal{J}$, then $\Gamma$ is a valid context. However, in order to make this true, we need to ensure context validity at the leaves of the derivation tree, which I think means we need a judgment $\Gamma \, ok$ for valid contexts anyway (or maybe something like modify the variable rule to extract the last variable only and then include an explicit weakening rule?).
But now that I think about it, maybe this is backwards: this approach would make context validity a “post-condition” of the judgment (i.e. the “algorithm” making the judgment promises to its caller that the context is valid), but (as Matt pointed out to me recently) one can argue that it’s more consistent for validity statements to be passed around in the same direction as the objects they are about. For instance, the algorithm making a synthesizing judgment $\Gamma \vdash T \Rightarrow A$ should promise to its caller that the type $A$ is valid (i.e. $\Gamma \vdash A\,type$), whereas the algorithm making a checking judgment $\Gamma \vdash T \Leftarrow A$ should be promised by its caller that the type $A$ is valid. So since the context $\Gamma$ is passed by the caller, maybe the caller should promise that it is valid? In other words, a derivation of a judgment $\Gamma \vdash \mathcal{J}$ should mean intuitively that if $\Gamma$ is a valid context then $\mathcal{J}$? But that seems different from the usual way we treat contexts: usually I expect that only judgments with valid contexts should be derivable. Maybe the question is which of these corresponds better to the semantics?
I suppose in theory one might also make the context an “output” of the judgment. I think Andromeda does something like this, with the context of the conclusion being the “join” of the contexts of all the premises.
How should we deal with context validity?
This may be related to a vague question that’s been gnawing at me: when dealing with CwF’s, which in my understanding are a way of dealing with the “coherence” issues in the semantics of type theory, it feels as if we’re setting apart a base category of contexts and substitutions which presumably is regarded as “unproblematic”. And yet the whole structure of type theory seems so intricately interwoven that there is “cross-breeding” between types and terms of consequents (living in presheaves over the context category) and the data of contexts living in the context category – how can we cleanly delineate between the two? I know I’m putting this poorly or imprecisely (I already said this would be vague), and it may in fact ultimately be a non-issue, but at the moment there’s something a little unsettling about the edifice.
I’m not caught up yet with the sophisticated level of discussion in Mike’s last comment – I barely understand the words – but is the highlighted question roughly along the lines of my vaguely stated concern?
It’s certainly related. I’ll write more later, but it might be useful to compare CwFs to contextual categories. I think there’s a certain sense in which the difference between a CwF and a CC corresponds semantically to the difference between having a judgment for context validity versus not — i.e. whether when we consider type theory as a multi-sorted essentially algebraic theory there is a separate sort for “contexts” or whether the contexts are only implicitly present as finite powers of the sort of types. Which argues, if we’re using CwFs for our semantics, that we probably should have such a judgment in our syntax.
Thinking about #44 some more, I am inclined to go with the idea that context validity is a precondition rather than a postcondition. While I don’t recall ever seeing such a perspective written down explicitly, it does seem to correspond to the way a type theory is implemented, with type validity being checked at the time a variable is added to the context. One could argue it’s even implicit in the usual way rules for type formers are written, such as
$\frac{\Gamma \vdash A\,type \qquad \Gamma,x:A \vdash B\, type}{\Gamma \vdash \Pi(x:A).B \,type}$This makes sense with context validity as a precondition: we are given that $\Gamma$ is valid, so we can promise context validity to the first recursive call $\Gamma\vdash A\, type$; then success of this call ensures that the extended context $\Gamma,x:A$ is also valid, so we can promise that to the second recursive call $\Gamma,x:A \vdash B\, type$. By contrast, if context validity is promised by derivability of the judgment, then the above rule could just be written
$\frac{ \Gamma,x:A \vdash B\, type}{\Gamma \vdash \Pi(x:A).B \,type}$since the premise would ensure validity of the context $\Gamma,x:A$, which can only happen if $\Gamma \vdash A\,type$. But no one ever does this.
Finally, and most importantly for our purposes, I think treating context validity as an “output” corresponds to what happens semantically. If we were defining the interpretation of an entire judgment $\Gamma \vdash A\,type$ to be an object $⟦\Gamma ⟧\in C$ together with a type $⟦A ⟧\in Ty(⟦\Gamma ⟧)$, as Streicher did, then it would make sense for the syntax to have a context validity judgment $\Gamma\,ok$ to prove totality of the interpretation of contexts and for derivability of $\Gamma\vdash A\, type$ to ensure derivability of $\Gamma\,ok$. But with Peter’s suggested approach, where we instead define $⟦A ⟧$ simultaneously as an element of $Ty(X)$ for all $X\in C$ equipped with an appropriate “environment”, it seems more natural for the corresponding syntactic view of $\Gamma \vdash A\, type$ to mean that $A$ is a valid type whenever (not “and”) $\Gamma$ is a valid context.
About #44 and #47: Sorry, I guess I didn’t explain all around the context validity issue well enough. Making context validity a precondition (input), corresponding to the context itself being an input, is what I called “local validity” in #27. I thought you already understood this, from what you wrote on the overview:
It is unclear at present whether any of these advantages [of bidirectionality] are relevant to categorical semantics, but by using a bidirectional framework we are better placed to take advantage of them if they do become relevant. More importantly, it seems to me that the natural structure of the semantic interpretation, as suggested by Peter Lumsdaine (see below), matches the syntactic bidirectional picture quite closely, and it may be clarifying to make that analogy more precise.
I assumed this already took into account that your bidirectional rules are in local validity style.
My interpreter was all local validity style. I’m pretty confident that an initiality theorem would also be non-awkward in local validity style. I expect that all inductions on derivations would assume syntactic context validity and/or context interpretability, with the semantic context also an input (universally quantified).
My first thought, which I expressed in the other thread, was that validity of contexts should be an a posteriori theorem: whenever $\Gamma \vdash \mathcal{J}$ is derivable for some judgment $\mathcal{J}$, then $\Gamma$ is a valid context. However, in order to make this true, we need to ensure context validity at the leaves of the derivation tree, which I think means we need a judgment $\Gamma \, ok$ for valid contexts anyway (or maybe something like modify the variable rule to extract the last variable only and then include an explicit weakening rule?).
Right, with context validity as a postcondition (“global validity”?), it needs to be mutually defined with the other judgments, or else encoded as another judgment form. With local validity, context validity is defined afterwards.
In Barendregt’s “Lambda Calculi with Types” (and probably other things of his on the subject), he uses the trick you mention, of getting only the rightmost variable, and having a weakening rule to put things to the right of it. He avoids a context validity judgment by using a dummy RHS of the turnstile, which was set up to be true in any valid context. For example, $(\ast : \square)$, a sort membership axiom.
Voevodsky’s note on HTS instead encoded type validity as context validity; type formation rules formed it at the right end of the context.
These are both global validity. (No matter how cleverly it’s done though, I prefer local validity.)
In other words, a derivation of a judgment $\Gamma \vdash \mathcal{J}$ should mean intuitively that if $\Gamma$ is a valid context then $\mathcal{J}$?
Yes. And in semantic type theory (logical relations, meaning explanations, non-categorical realizability, etc.), you actually define a local-validity-style judgment like that. (Or something similar.)
But that seems different from the usual way we treat contexts: usually I expect that only judgments with valid contexts should be derivable. Maybe the question is which of these corresponds better to the semantics?
For something based on extrinsic judgments, like Streicher’s method, I think you can have validity going any which way, if you are able to prove the corresponding sanity checks.
For intrinsic syntax, like Thorsten likes, local validity must seem very odd. You mention invalid contexts. But the type of contexts is supposed to have just valid contexts. I’m pretty sure you couldn’t define local-validity-style type theory as inductive-inductive families in Agda. However! Local validity corresponds closely to HOAS signatures. Never mention the context. So if your metalanguage is cool enough to have analytic- (intrinsic-) style HOAS signatures as inductive definitions, I believe that constitutes intrinsic local validity.
I suppose in theory one might also make the context an “output” of the judgment. I think Andromeda does something like this, with the context of the conclusion being the “join” of the contexts of all the premises.
Yeah. And an output context seems to be typical for defining substructural systems, but with a different join. I think Andromeda’s approach actually requires variable names. (I don’t know if that bothers anyone here other than me.)
By contrast, if context validity is promised by derivability of the judgment, then the above rule could just be written […] since the premise would ensure validity of the context $\Gamma,x:A$, which can only happen if $\Gamma \vdash A\,type$. But no one ever does this.
I think I’ve seen that. (I don’t feel like hunting around.)
Finally, and most importantly for our purposes, I think treating context validity as an “output” corresponds to what happens semantically.
That paragraph all makes perfect sense to me, if “‘output’” was supposed to be “‘input’”. Was it?
You’re right! You did say that local validity meant the context and its validity were inputs, but it slipped past me at the time because the idea hadn’t occurred to me yet that that was possible. And I agree now that it’s the natural thing to do given the rest of our setup, I just hadn’t followed that reasoning all the way through until now. Maybe it’s just that I’ve thought so much about the intrinsic inductive-inductive perspective. Where does the terminology “local validity” come from?
And yes, I meant “input” in my last paragraph.
Where does the terminology “local validity” come from?
I thought I saw it in a paper once. But I can’t find it.
What words would you use to distinguish a derivable judgment $\Gamma \vdash \mathcal{J}$ from one in which the context $\Gamma$ is additionally known to be valid?
The subject of type theory often looks like such a mess, from a mathematician’s perspective… (-:O
I think I may be changing my mind again back towards separating out the set of raw types from the set of raw terms, because that would make the connection between all the pieces of the theorem even closer: there are separate judgments for types and terms, and separate presheaves in a CwF, so there should be separate sets of raw syntax as well. Also it seems to be convenient to be able to talk about a “raw type” or a “raw term” separately in explaining other things.
Anything that makes the initiality theorem itself more simple or direct is a good thing at this point IMO; extra bells and whistles can then be hung on either its inputs or its outputs separately. In particular, we could perhaps treat Russell universes as a sort of un-annotated syntax, like $\lambda x.x$, that gets (bidirectionally) typechecked into our “fully annotated” version with separate sets of raw syntax before being interpreted into category theory.
Thoughts?
@Mike Last week I was talking to a type theorist who said that mathematics was a mess compared to type theory! (-:<
When you do a “intrinsic” (quotient inductive-inductive) presentation of syntax, the subjects of the judgements are well-typed terms (because only well-typed terms exist).
I understand the rationale in an initiality theorem for having “raw syntax” as representatives of “where in the derivation did I use definitional equality, and what was the derivation of the definitional equality?”-classes of typing derivations.
But is there any reason to consider judgements whose subjects (besides the one being judged right now) are not well-formed? The way I think about things, a judgement $\Gamma \vdash a : A$ is indexed by a context $\Gamma$ that is (merely) known to be well-formed, and a type $A$ that is (merely) well-formed in $\Gamma$. I.e. if we were using type theory (with proof-irrelevance, which I’ll write as a subset type) as the metalanguage,
$well-formed-type : \{\Gamma : RawCtx | \Gamma ok \} \to RawType[Vars(\Gamma)] \to Prop$and
$well-formed-term : \{\Gamma : RawCtx | \Gamma ok \} \to RawTerm \to \{ A : RawType[Vars(\Gamma)] | well-formed-type \Gamma A \} \to Prop$That is, it’s a “presupposition” of a judgement that everything (except the thing that is a raw term proxy for the judgement you’re currently defining) is well-formed.
This enforces the invariant that you never consider $\Gamma \vdash a : A$ for ill-formed $\Gamma$ and $A$.
If the judgements have modes, then the well-formedness of the subjects of the judgements should follow the same mode as the subject itself (which matches what you said in #47). For example, a rule like
$\frac{ \Gamma \vdash A type \qquad \Gamma , x : A \vdash B type } { \Gamma \vdash \Pi x : A. B type}$is really: Given $\Gamma$ such that $\Gamma ok$, and a derivation of $\Gamma \vdash A type$ (which together imply $(\Gamma, x : A) ok$), and a derivation of $(\Gamma,x:A) \vdash B type$ (which type checks because we know the context is well-formed), we produce a derivation of the conclusion.
My general inclination would be to avoid using not-necessarily-well-formed-raw-terms as much as possible, unless there’s some technical reason to do so. Sometimes people use them to “stage” the theorems that are proved so that you don’t need to know something “too early” for the induction. But my guess would be that for mapping the syntax into something very well-typed like we’re doing here, this shouldn’t be necessary.
Ali, I’d be interested to hear what the context was that prompted that remark. (-:
Re #57, I wonder if there is anything to be said comparing this to the dichotomy between nice objects and nice categories? If type theorists are overly concerned with making individual type theories nice, could that contribute to their neglect of the need for a nice “general theory of type theories”?
But that’s all rather hijacky of the current thread.
Dan, the issue I see with
$wellformedtype : \{\Gamma : RawCtx | \Gamma ok \} \to RawType[Vars(\Gamma)] \to Prop$is that the notion of valid context depends on the notion of wellformed type, i.e. if there is a judgment $\Gamma \, ok$ then it has to be defined by mutual induction with $\Gamma \vdash A\, type$ and everything else. So if it appears in the “domain” of the latter judgments, then it seems you’re back in inductive-inductive land. Unless I’m missing something?
I suppose having inductive-inductive typing judgments expressing properties of underlying raw syntax may not be a problem, but it’s an extra level of complication/unfamiliarity for the mathematical reader. Moreover, as far as I can see it wouldn’t really buy us anything. I wrote out a definition of how to interpret contexts at Initiality Project - Partial Interpretation and Initiality Project - Totality, and I can’t see how it would be simplified by incorporating context validity into the inductive definition of the typing judgments; it just seems like it would force the local and global totality proofs to be collapsed into one mutual induction, but all the same inductive clauses would still be there.
I am seeing a lack of references so far, should we start a bibligraphy to collect articles and books relevent to our proposal?
Hi Dan,
Like Mike pointed out, you need a notion of well-formed-type already to define $\Gamma ok$. So I’m wondering if what you mean is to define the usual extrinsic judgments, but then prove another set of induction principles where the motives have their domains restricted in the way suggested by your domain-restricted types for the judgments.
Re #63: Sure, a bibliography would be a great addition, especially with remarks about the relevance of each work cited. Are you volunteering? (-:O
Re #65: Ok, I have started collecting .bib entries, later I will decide how to present these and probably add comments on each one. Will we be able to link these (sections) from the other articles?
You can link to a section in one page from another one with [[PageName#Anchor]]
.
I was thinking of an inductive-inductive definition of the typing rules (but where the thing currently “being judged” is a raw term). I don’t see any technical problem with that, and since I’m not in the intended audience for the initiality proof, I shouldn’t vote on what is pedagogically the best. I can see the argument that induction-induction would be a barrier to understanding. On the other hand, I think the alternative also has a downside pedagogically, which is that it obscures the fact that there is no need to talk about potentially-ill-formed things (for this kind of type theory).
The ways to handle well-formedness of the stuff in a judgement that I can think of are:
(1) the traditional approach where well-formedness of everything is an output, i.e. you can prove “If G |- a : A then G ok and G |- A type”. This style ensures that you never talk about an ill-formed thing by “checking” everything.
(2) the “subset” style that I mentioned above. This ensures that you never talk about an ill-formed thing by making the subjects of the judgements (besides the thing being judged right now) be only the well-formed things. We could go further, with a quotient inductive inductive, and make them be the definitional-equality classes of such things. Probably not the best choice for exposition here, but I think it’s the right way to think about it – except for the raw tree you’re judging right now, everything else is already “semantic” (deifnitional equality classes of well-typed things) instead of “syntactic”.
(3) The poor-person’s version of 2, if you want to avoid induction-induction, is that formally a judgement is just on raw things and doesn’t check anything, so you can formally derive G |- a : A when G or A are nonsense. But then the “real” judgement G |-’ a : A is a macro for “G ok and G |- A type and G |- a : A” (I’m assuming G |- a : A is checking mode, i.e. A is presupposed; this would be different for synthesis).
If you do (2), then the meta-types of the judgements force you to maintain the invariant that you only mention G |- a : A when you already know G ok and G |- A type.
In this style, you can still maintain the invariant, but the types don’t help you. I’m not quite sure how to state a theorem that you’ve done this correctly, but it’s basically that a theorem of the form
If G ok and G |- A type and G |- a : A then something
goes through by induction, in the sense that for every premise D |- b : B of a rule, you have sufficient info from G ok and G |- A type and the premises of the rule to know that D ok and D |- B type.
When working in style (3), my prior is that it should be stylistically illegal to mention G |- a : A except as part of G |-’ a : A (its only role is to break the inductive-inductive loop).
So I’d expect the form of a theorem to be
If G ok and G |- A type and G |- a : A then $[[a]]$ is defined or maybe If G ok and $[[G]]$ defined and G |- A type and $[[A]]$ defined and G |- a : A then $[[a]]$ defined (because you don’t have inductive hypotheses for G and A).
So what seems interesting and confusing to me is that the “local totality” theorems somehow get away with proving something about G |- a : A rather than G |-’ a : A. I haven’t had time to understand what’s going on there well enough to have an opinion on it yet; might be a few days before I have time.
Dan, if you like you can type comments in LaTeX if you press the Markdown+Itex button below the comment box.
(And double-square-brackets are interpreted as links to the nLab unless you put them in math-mode, [[a]]
makes a, while $[[a]]$
makes $[[a]]$.)
My intuition is that in case (3), the meaning of $\Gamma\vdash a:A$ is “if $\Gamma\,ok$ and $\Gamma \vdash A\,type$, then $\Gamma \vdash' a:A$”. I would say that the local totality theorems give it exactly this interpretation, where “if” means that the domain is some subset of the interpretations of $\Gamma$ and $A$.
Alternatively, I think you can regard the type theory in the other style as an assignment of types to pre-existing untyped terms, and the partial interpretation as a real honest-to-goodness interpretation of the untyped terms as partial functions, while the totality theorem is then an interpretation of the typing judgments on top of that showing that semantically a well-typed term must in fact “terminate”, i.e. be totally defined.
The inductive-inductive style that Dan proposed has a very straightforward interpretation as an impredicative encoding, which I think is quite natural, and I just realised has been the way I’ve always implicitly understood these sorts of things as written informally on paper. Namely, once the set(s) of raw terms, types etc have been defined, one defines the set of “well-formed” terms as the intersection of all those subsets that are closed under the rules. Then there is no problem in having say context validity conditions as inputs to type validity rules.
The sophisticated reader who is bothered by the fact that we’re using impredicativity of $Prop$ can probably translate it to the inductive-inductive formulation in their head quite easily, so I would guess this phrasing should satisfy most readers.
I think the alternative also has a downside pedagogically, which is that it obscures the fact that there is no need to talk about potentially-ill-formed things (for this kind of type theory).
My general attitude has been that in this project we should minimize the work to be done as part of the initiality theorem, which means choosing a presentation of syntax and a presentation of semantics that are “as parallel as possible” and make the theorem connecting them as trivial as possible. Then we can separately prove theorems solely in the world of syntax, or solely in the world of semantics, relating these choices to other presentations that people might want to use in practice (e.g. type theories with fewer annotations). So do we really need the type theory that is used as input to the initiality theorem to be the one that makes the point that there is no need to talk about potentially ill-formed things? Using the numbers from Dan’s #68, couldn’t we use (3) for the initiality theorem and then prove separately a theorem relating (3) directly to (1), where everything we talk about is well-typed, without ever needing to worry about (2)?
In case it’s helpful, let me recap the case against choice (1) for the initiality theorem. The corresponding partial interpretation function in choice (1), which is what Streicher does, has the form “given $\Gamma$, $T$, and $A$, we construct all of their interpretations: an object $[[\Gamma]]$, a type $[[A]]\in Ty([[\Gamma]])$, and a section $[[T]]$ of $[[A]]$.” Now the clauses of the recursive definition of the partial interpretation essentially recap the typing rules semantically (which is what makes the totality theorem provable), so for instance to interpret $\Pi(x:A).B$ over $\Gamma$ we first interpret $A$ over $\Gamma$ and then interpret $B$ over $\Gamma,x:A$. Since we are producing not just $[[A]]$ but also $[[\Gamma]]$, we also have to recurse over the structure of $\Gamma$; but this recursion is not structural, because in the just-mentioned clause for $\Pi$ one of the “recursive calls” has the context increasing in length. So Streicher has to define an auxiliary “height” function and recurse on that instead of structurally.
Peter’s suggestion is to avoid this by making choice (2) or (3). (I don’t actually know which he had in mind.) Now the partial interpretation function has the form “given supposed interpretations of $\Gamma$ and $A$, we construct an interpretation of $T$”, and the recursion doesn’t have to “inspect” $\Gamma$ and thus can be purely structurally recursive over the term.
It’s true that these choices are about the form of the partial interpretation rather than directly about the design of the type theory. It’s possible that we could still choose (1) in the type theory but use Peter’s suggestion for the interpretation; indeed it’s even possible that that’s what he had in mind. But it seems to me like it’ll be more straightforward and natural to design the type theory to match the interpretation.
FWIW, I’m pretty sure you could inductively (not by recursion/induction) define the partial interpretation function for (1). Partial functions are relations, after all. Moreover, it doesn’t make much sense to fuss over what you’re doing recursion on when you don’t get totality anyway. It’s not like it’s going to be even more partial if the recursion isn’t well-founded.
Using an interpretation which follows the raw syntax closely does seem convenient, but I’m skeptical that Mike is right about the real reason for this.
That’s an intriguing idea. One benefit of defining the partial interpretation recursively is that you get automatically that it is in fact a partial function rather than a mere relation, i.e. that it is single-valued. Which is, arguably, the whole point of using a partial interpretation function at all rather than simply inducting over derivations to begin with: we want the result of the latter induction to be determined uniquely for a particular judgment independently of its derivation. With an inductively defined interpretation relation, you would have to prove separately that it is single-valued, and it’s not obvious to me whether that would be easy or hard.
one defines the set of “well-formed” terms as the intersection of all those subsets that are closed under the rules. Then there is no problem in having say context validity conditions as inputs to type validity rules
I don’t completely understand this. It’s not just one “set of well-formed terms” we are defining, we want to define the set of valid contexts, the set of valid types-in-context, and the set of valid typed-terms-in-context. Can you say more explicitly how you are thinking of expressing this?
I would guess that the single-valued-ness (“uniqueness”) proof isn’t that hard. You do induction on the left interpretation proof, then inversion on the right interpretation proof. This strategy fails badly if the interpretation is not syntax-directed on the raw expression you’re interpreting. But what kind of kooky interpretation function would that be? (Answer: one for Computational Type Theory, where you only interpret canonical forms. But it can be made to work anyway.)
Re 76: having done lots of things in Twelf a while ago (where this is the only way to define functions :) not only partial ones), I’d guess it wouldn’t be that bad. Say you define a relation interp(M,S) between syntax and semantics. Then the uniqueness is “for all M, if interp(M,S) and interp(M,S’) then S = S’”. Assuming there is only one constructor for interp for each term-generator (lam, app, etc.) (which will be true if you translate the function clauses as interp constructors), in each case you’ll get by the IHes that the interpretations of the sub-expressions are equal, and then some congruence will imply that S = S’. Maybe I’m missing something, but that’s my best guess.
Okay, I think I see. That makes some sense.
Maybe we can put that idea on the back burner as something to try after this project is complete, and then compare the two? It does seem like that would involve double the work in one place, first defining the interpretation relation and then proving it single-valued, while with a recursively defined partial function both of those happen at once. But maybe an equivalent amount of work would pop up somewhere else, e.g. in the comparison between the two versions of syntax.
Re #77:
one defines the set of “well-formed” terms as the intersection of all those subsets that are closed under the rules. Then there is no problem in having say context validity conditions as inputs to type validity rules
I don’t completely understand this. It’s not just one “set of well-formed terms” we are defining, we want to define the set of valid contexts, the set of valid types-in-context, and the set of valid typed-terms-in-context. Can you say more explicitly how you are thinking of expressing this?
I was thinking you consider say triples $(C, Ty, Tm)$ where $C$ is a set of raw contexts, $Ty$ is a set of pairs of an element of $C$ and a raw type, $Tm$ is a set of pairs of an element of $Ty$ and a raw term. One can define what it means for such a triple to be closed under the rules of the syntax. Then you take the intersection, over all such closed triples, of the $C$ sets (resp. $Ty$, $Tm$). So literally the syntax is the smallest set of raw “things” that is closed under the rules.
Thanks Paolo, that helps. I can see how that avoids an explicit mention of induction-induction. However, it’s more distant from the intuitive meaning of rules and derivations, namely that the derivable judgments are the ones you can obtain by building derivation trees out of the primitive rules. So I feel like if using that style as the definition, one ought to make a comparison to actual derivation trees, in which case one is back choosing between (2) and (3).
In fact I think that (2) already has this problem: how do you explain to someone, in terms of derivation trees that they can write down on a sheet of paper (or even create as data structures or execution traces in a computer), what it means for a judgment to be valid in style (2)? Of course there has to be a derivation tree of that judgment. But because of the presupposition on context validity, in order for such a derivation tree to imply (2)-validity, we need to know that the contexts appearing in that tree are valid, which means (inductively) that there must exist certain other derivation trees, and so on. In other words, the only way to explain (2) is to reduce it to (3): the “real notion of validity” can only mean (3)-validity paired with context-validity.
This is related to the reason I am unsatisfied with intrinsic inductive-inductive syntax for type theory: at the end of the day, what we write on a page (or type into a computer) is untyped, and that gap has to be bridged at some point.
Thanks Paolo, that helps. I can see how that avoids an explicit mention of induction-induction. However, it’s more distant from the intuitive meaning of rules and derivations, namely that the derivable judgments are the ones you can obtain by building derivation trees out of the primitive rules.
I see. I guess I implicitly assumed that by writing things like:
$wellformedtype: \ldots \to Prop$(as in #61) the intention was to force the definition to yield predicates / relations, in the sense of $Prop$-valued families. A priori, derivations don’t give you relations, and it’s a theorem that you need to prove on a case by case basis. The impredicative definition I sketched essentially gives you the $(-1)$-truncation of that (i.e. the mere existence of a derivation).
If you really want to define derivations explicitly, then the inductive-inductive definition should be $Set$-valued, which means you can’t really use the impredicative encoding.
It didn’t even occur to me that derivation trees were to be taken as the intuitive meaning of inference rules. I’ve always thought of them as a structure on top of the actual validity predicate (a way to make it proof relevant). I assume you can’t avoid introducing them for certain proof-theoretic arguments, but for the sake of the initiality construction, why is it important to keep this extra proof relevance around?
I’m not talking about the difference between truncated and non. In the current situation it’s fine to have it truncated, since our only induction on derivations is to prove a proposition (totality). But I’ve always understood “$\mathcal{J}$ is derivable” to mean “there exists (in the truncated sense) a derivation tree of $\mathcal{J}$”, just as I’ve always understood “$P$ is provable” to mean “there exists a proof of $P$”. Certainly if I wanted to convince someone that $\mathcal{J}$ is derivable, I would do it by writing down a derivation tree, right?
Putting this here since it’s sort of a reply to #71, but there has been related discussion on the Type Theory discussion page.
OK, I think the thing I’m confused about regarding the semantics is the difference between (i) a partial interpretation function taking syntax to total morphisms in the semantics, and (ii) a total interpretation of raw terms as partial morphisms. In keeping with my general proclivities that everything should be as typed as possible as soon as possible :-) my instinct would be to use (i) instead of (ii). This also seems slightly more precise, because (thinking of partiality as an effect) $A \to Partial(B \to C)$ implies $A \to B \to Partial(C)$ but not vice versa. Also, it seems like (ii) relies on being in a setting where there are enough monos in the semantics to talk about the domains of partial morphisms, whereas (i) moves the partiality into the meta-language. What I understand so far is that here that’s OK there are enough monos because Tm is valued in sets, so we can do subsets there.
Anyway, I’m definitely willing to be convinced that (ii) is necessary and/or more convenient and/or you all prefer it, but I thought just for the sake of discussion I’d write out how I envision (i) going for a very simple language just to show the pieces.
Suppose we have three raw types
$A ::= w \mid w' \mid i$of which $w$ and $w'$ (well-formed) are well-formed:
$\frac{} {w type} \quad \frac{} {w' type}$and $i$ (ill-formed) isn’t.
Term typing is $A \vdash T : B$, with the presuppositions that $A type$ and $B type$ (I’m thinking of this as a checking judgement with “context” $A$)
$\frac{A = B} {A \vdash id : B} \quad \frac{B \type \quad A \vdash T : B \quad B \vdash T' : C} {A \vdash T;_{B}T' : C}$ $\frac{} {w \vdash k' : w'} \frac{} {w' \vdash k : w}$(then term equality has identity and unit laws).
So there are ill-typed raw composites, like $k;_{w} k$ or $k;_{w} k'$.
We want to show this is the initial category generated by 2 objects $o$ and $o'$ (interpreting $w$ and $w'$) and two morphisms $h$ and $h'$ (interpreting $k$ and $k'$).
The partial interpretation of types is
$\begin{array}{l} [[ w ]] = o \\ [[ w' ]] = o' \\ [[ i ]] undefined \\ \end{array}$The partial interpretation of terms is a partial function whose inputs are two objects $X$ (the interpretations of the context) and $Y$ (the interpretation of the output, since I’m thinking of this as a checking judgement) and a term $T$. This is defined by
$interp(X,Y,k) = h$ if $X = o$ and $Y = o'$
$interp(X,Y,k') = h'$ if $X = o'$ and $Y = o$
$interp(X,Y,id) = id$ if $X = Y$
$interp(X,Z,(T ;_B T')) = f;g$ if $[[B]] = Y$ and $interp(X,Y,T) = f$ and $interp(Y,Z) = g$
(Personally, I think of this as a relation $interp_is(X,Y,T,f)$ with a funny syntax, but hopefully it’s acceptable as a partial function definition?)
Totality is:
(a) If $A type$ and $[[A]] = X$ and $B type$ and $[[B]] = Y$ and $A \vdash T : B$ then there exists $f : X \to Y$ such that $interp(X,Y,T) = f$.
Mutually (for when we add dependent types) with (b) If $A type$ then there exists an $X$ such that $interp(A) = X$.
For example, in the case for the typing rule for $T ;_B T'$, we are given $A type$ and $C type$ (the presuppositions of $A \vdash T;_B T' : C$), and $[[A]]= X$ and $[[C]] = Z$ (the “semantic validity” / definedness of the interpretation of the inputs), and $B type$ and $A \vdash T : B$ and $B \vdash T' : C$ as premises. By the IH part (b) on the subderivation of $B type$, there is some $Y$ such that $[[B]] = Y$. So we have the data needed to use the IH part (a) to get $interp(X,Y,T) = f$ and $interp(Y,Z,T') = g$, which is all enough to know that the interpretation is defined.
I would say that my formulation as (ii) is just a reorganization of (i). Your partial interpretation as a partial function whose inputs are a semantic context (input), a semantic type (the output), and a syntactic term is
$SemCtx \times SemTy \times SynTm \to SemTm_\bot$where $(-)_\bot$ is the “partial elements” monad. I’m just currying this as
$SynTm \to (SemCtx \to SemTy \to SemTm_\bot).$I think Streicher does (i), insofar as I can see a difference. I don’t think there was every any intent to “use Streicher’s method” – we already made a bunch of changes inspired mainly by Peter’s suggestions. I was assuming that Peter’s original suggestion also counted as (i); is that not the case?
with (i), a piece of syntax is (classically) either interpretable in all semantic environments of the correct variable set, or none.
Not according to what Dan wrote in #86. “A partial function whose inputs are two objects X and Y and a term T” is different from “a partial function whose input is a term T and whose output is a (total) function whose input is two objects X and Y”.
I don’t understand why you are bringing in semantic contexts. In my sketch of the partial interpretation functions, the interpretation of a term is a partial morphism $Tm^V\rightharpoonup Tm$, where $Tm^V$ is a “semantic environment”, and I assumed this is what Dan was talking about when he said “a total interpretation of raw terms as partial morphisms”. The semantic contexts are only introduced later on, and end up being the domains of definition of these partial morphisms.
Re #90: wait, I do think I agree with Matt here. The type of $interp$ that I had in mind for interpreting in a category $C$ is
$interp : \Pi X,Y : Ob(C). Tm \to (Hom_C(X,Y))_\bot$So the inputs $X$ and $Y$ before the partiality are in the sense of polymorphism, not in the sense of assuming elements of $X$ and $Y$.
That’s what I would call (i)-style.
What I had in mind for (ii)-style was something like
$interp : \Pi X,Y : Ob(C). Term \to X \to Y_\bot$Re #87, what is $SemTm$ here? If in this situation it’s like $Hom_C(X,Y)$, then it does seem to me like the partiality is in a different place in $Tm^V \to Tm_\bot$ (which is how I’m understanding your current text on the partial interpretation page, which seems like (ii) style) than in $(Tm^V \to Tm)_\bot$ (which would be (i)-style).
I.e. (i) is $A \to (B \to C)_\bot$ and (ii) is $A \to (B \to C_\bot)$, and (i) implies (ii) but not vice versa.
I’m still trying to wrap my head around how exactly the recursive partial interpretation works, so I’m wondering if the following would be a good way to formulate (or at least conceptualise) an interpretation strategy, inspired by a discussion I had with András Kovács.
Let $C$ be target model where we want to land. Define a “raw context” in $C$ to be a set of objects, a “raw type” to be a set of elements of $Ty$ (possibly over different objects), and so on. The various (essentially algebraic) operations of $C$ can be extended to (actually algebraic) operations on the “raw” level. Define the various validity relations on raw things of $C$. For example, given a raw context $\mathbf \Gamma$, and a raw type $\mathbf A$, we say that $\mathbf \Gamma \vdash \mathbf A$ if $\mathbf \Gamma$ is a singleton $\{\Gamma\}$, $\mathbf A$ is a singleton $\{A\}$, and $A$ is an element of $Ty(\Gamma)$.
Now from all these structures you get a (total!) interpretation function from the syntactic raw things to the corresponding raw things in $C$. This should be easy to define. And then it should also be easy to check that valid things get mapped to valid things. Both steps should be direct applications of an initiality principle for inductive definitions (i.e. they should be straightforward inductions). After that, one can trivially extract the desired interpretation function by inspecting the validity predicate of the outputs.
Using subsets basically corresponds to using multivalued functions instead of partial functions. But probably one could replace subsets with subsingletons and it seems everything works the same way.
Is this a valid description of any of the strategies that have been proposed? Am I overlooking some important points?
Re #92: Oh, I see the confusion. This is what happens in the passage from a CwF $C$ to its presheaf category $[C^{op},Set]$. We start with style (i) relative to the category $C$ itself, so each term has a chance to get interpreted as a (total) morphism in $C$ (indeed, we don’t assume any structure on $C$ that would make the notion of “partial morphism” make sense therein). But “maybe having a morphism in $C$” is enough to define a partial morphism in the presheaf topos $[C^{op},Set]$. In the simplest case, a maybe-morphism $X\to Y$ induces a partial morphism $y X \to y Y$ between their Yoneda embeddings, whose domain is “all maps $U\to X$ such that the maybe-morphism exists” — in classical logic this is either all of $y X$ or empty.
Our situation is a bit more complicated because of the quantification over all possible interpretations of the domain and codomain. In terms of $C$ itself (refer to Peter’s original post), we say that for every term $T$, every object $X\in C$ (“semantic context”), every “environment over $X$” assigning to each variable of $T$ a semantic type over $X$ and a semantic term in that type (i.e. a display map $A\to X$ and a section of it), and every semantic type $Y$ over $X$ (since we are talking about checking judgments, we take the codomain as given), there might be given a section of $Y$ that is the interpretation of $T$ relative to all of these data.
I just reformulated this in terms of the presheaf category $[C^{op},Set]$ as follows. In natural-model style a CwF structure on $C$ is determined by a representable morphism of presheaves $Tm\to Ty$, where an element of the set $Ty(X)$ is a semantic type over $X$, and an element of $Tm(X)$ in the fiber over such a semantic type is a semantic term belonging to that type. Thus, if $V$ is the set of variables of $T$, then an “environment over $X$” as above is just an element of $Tm^V(X)$, and the semantic type $Y$ is just an element of $Ty(X)$. Thus, the description in the previous paragraph can be re-stated as “for every term $T$, every object $X\in C$, every element of $Tm^V(X)$, and every element $Y$ of $Ty(X)$, there might be given an element of $Tm(X)$ in the fiber over $Y$”. Now since $Tm(X)$ and $Ty(X)$ are just sets, this can be rewritten as “for every term $T$ and every object $X\in C$, a partial function $Tm^V(X) \times Ty(X) \rightharpoonup Tm(X)$ that (insofar as it is defined) lifts the projection $Tm^V(X) \times Ty(X) \to Ty(X)$”. Finally, since these partial functions are (insofar as they are defined) natural in $X$ (this is Peter’s point (8)), we can repackage them again into a “partial natural transformation” $Tm^V \times Tm\rightharpoonup Tm$., i.e. a partial morphism in $[C^{op},Set]$.
Does that help?
I think this is similar to the idea in #93: a presheaf on $C$ is sort of a “more categorically well-structured” version of a “set of objects of $C$”. I also agree with the sentiment in #93 that when packaged this way, the partial interpretation function should be a “fairly straightforward” inductive definition (although when you come down to it, the whole initiality theorem is about “packaging” things in just the right way so that we can apply induction).
One way that I like to think of it is that if we express the raw syntax in HOAS, so for instance we have $\Pi : ty \to (tm\to ty) \to ty$ and $\lambda : ty \to (tm\to ty) \to (tm \to tm) \to tm$ and so on, then it is just a collection of base types and axioms for simply typed $\lambda$-calculus, and the partial interpretation function is just an application of the initiality theorem for simply typed $\lambda$-calculus mapping from the syntactic CCC into the category whose objects are presheaves on $C$ and whose morphisms are partial morphisms of presheaves. There are some technicalities in making this precise, notably that the latter category is not actually a CCC but rather some kind of “closed relevance monoidal category”, and it also requires some translation to get to the usual CwF/natural model formulation of semantics using the fact that dependent products in a topos can be expressed as subobjects of partial function objects (given $C\to B\to A$, the dependent product $\Pi_B(C)$ embeds into $A\times Par(B,C)$, since each “element” of $\Pi_B(C)$ consists of an “element” $a\in A$ together with a partial section of $C\to B$ on the “fiber” of $B$ over $a$, and such a partial section is in particular a partial morphism $B\rightharpoonup C$). But I find it at least conceptulally helpful.
In light of #94, I see how confusing it was to talk about semantic contexts without clarification. I was not referring to the objects of $C$ that everything in the semantic logical framework (the internal language of the presheaf category) is implicitly parameterized by. Everything I was talking about was supposed to be taking place in the semantic logical framework, and not using any funny tricks to inspect the implicit object of $C$. There are not semantic contexts explicitly used by the partial interpretations of types and terms. And I didn’t think the partial interpretation was going to use tricks to inspect the implicit semantic context either. So far, it seems like everything before global totality can be done internally, without unusual axioms.
I’m going to back up and give some of my assumptions before making specific replies.
Again, those are all things I assumed. Correct them as necessary. If all those assumptions are correct (you don’t actually need all of them), then Peter’s approach was (ii) style, and so is what we’re currently doing.
Re #90:
Trying to translate Peter’s suggestion to our style, for the interpretation of raw types, you have:
$[[T]]\,:\,(V \to Tm_\bot) \to Ty_\bot$
Ours is:
$[[T]]\,:\,(V \to Tm) \to Ty_\bot$
But in both cases the “return value” is not a total function. So they’re (ii) style.
with (i), a piece of syntax is (classically) either interpretable in all semantic environments of the correct variable set, or none.
Not according to what Dan wrote in #86. “[…]”
I don’t see how that thing Dan wrote contradicts what I had written. But I realized that what I had written was misleading, because we know the semantic logical framework isn’t classical. But I still thought (i) and (ii) are generally different in the framework.
Re #91:
I didn’t bring up semantic contexts, you did, I think, in #87, with the $SemCtx$ metatype. Is the confusion that #87 was taking place in $Set$, not $[C^op,Set]$? But that doesn’t seem to explain it, since then you were missing the argument for the semantic environment. I figured you were confusing contexts and environments.
That the partial interpretations of types and terms don’t involve semantic contexts was something I had said myself, in #89.
Re my own #96, I guess I should add what (i) would be, for the partial interpretation of types:
$[[T]]\,:\,((V \to Tm) \to Ty)_\bot$
But it doesn’t seem possible. $V$ itself is actually an argument before the partiality, but it’s not informative enough to determine whether the interpretation should be defined. The environment—the argument of type $(V \to Tm)$—is informative enough. It seems to be more informative than we need, since we only want to interpret well-typed types. If there were a semantic context argument, which the environment would be required to be an element of, then that would be pretty much the just-right amount of information.
Re #95: The raw syntax is not defined internally.
What I’m saying is that whatever intuition you have that argues for (i) over (ii) (which I don’t think I share, but whatever) should be applied not to the presheaf category, but to $C$ itself, and that when this (i)-version in $C$ is translated into the presheaf category it looks like (ii) instead. So there’s no need to object to (ii) in the presheaf category, because it’s really just a way of talking about (i) in $C$, which is what we really care about.