Want to take part in these discussions? Sign in if you have an account, or apply for one below
Vanilla 1.1.10 is a product of Lussumo. More Information: Documentation, Community Support.
Also, if we followed your approach of assuming the presupposition along with the judgment. We’d simply assume Γ,Γ′ctx.
I don’t think you always need to assume the presupposition explicitly: you need to assume something that implies the presupposition. I think the natural thing to do, to me, is that if you assume a raw context , you also assume it’s well-formed, and if you assume a telescope , you also assume it’s well-formed relative to , and together those imply
I likely won’t have much time to write parts of this proof anyway, so you should do what you like, but if I were doing things myself, I wouldn’t try to optimize the assumptions of something like weakening, even when a stronger theorem (which omits a presupposition) is in fact true. In my preferred style, once you get to talking about typing, you should only ever assume something in a meta-theorem along with sufficient evidence to know that it’s well-formed. I.e., if you’re doing (3) rather than (2) because you don’t want to use induction-induction and subset types, I’d stick to the (2) fragment of (3) anyway. This is because:
I’d guess that you can do the whole initiality proof in this style (and therefore likely do it for (2) as well), but would like to find out if that’s wrong.
As a reader, I find the “stronger theorem about potentially ill-formed things” theorems to be harder to believe, because of the “something from the presupposition was important” class of bugs.
As a reader, I find the “stronger theorem about potentially ill-formed things” theorems to be harder to believe, because of the “something from the presupposition was important” class of bugs.
That may be a good point. I’m used to proving all this stuff in a proof assistant, so I don’t have to worry about that. When I read informal proofs, I tend to believe things easily, but I guess other people would be really suspicious about this stuff.
Re #100, because the variable rule uses , it seems easier to prove substitution using . But I think versions with work too, when the conclusion uses or . If the conclusion uses , the thing you’re substituting into could be just a variable, and I don’t think you can strengthen to .
I’d think we need at least weakening and single-variable substitution, because those show up in the rules of the type theory. We definitely need the operation of substitution on raw terms. If the mode of application/snd projection is that it synthesizes its type, you need the substitution theorem to know that the conclusion type is well-formed. If the proof of totality needs the sanity check, then the metatheorem about substitution is important; if not, then maybe it’s not on the critical path to initiality?
Now that I think about it, substituting for comes up in the application rule, so substitution should be stated that way.
Re #103: in a bidirectional system, it’s easy to substitute for , because really is a hypothesis of . (When you do these things in LF, that one is the HOAS substitution.) But then if you’re doing hereditary substitution, the hard part is showing that you can substitute into (usually producing , because as you said when it’s a variable a checking term comes out), because that creates beta-redexes. I think with the current version, it shouldn’t be that hard to substitute for an assumption (always producing ), because at the spot the assumption is used, the next step in the derivation will be a type cast by a definitional equality, and you can transitivity the two definitional equalities together.
Re #105: We need the operations on syntax, for sure. I still don’t think we need any admissible rules (operations on derivations). The totality proof shouldn’t need the sanity check. Rather, the synth cases of totality are analogous to the sanity check, since they must show that the raw type of the raw term being interpreted can be interpreted as well. This does need substitution lemmas, but that’s what the semantic substitution lemmas should be providing. They are analogous to the admissible substitution rules.
Now that I think about it, substituting for comes up in the application rule, so substitution should be stated that way.
Good point. I’m convinced.
But then if you’re doing hereditary substitution, the hard part is showing that you can substitute into (usually producing , because as you said when it’s a variable a checking term comes out)…
Are you saying that when you substitute a checking derivation into a synth derivation, the substitution operation will dynamically either return a synth derivation, if it can manage, or a checking derivation?
Re #99 and empty semantic contexts, aren’t all invalid contexts interpreted as empty? Not just ill-scoped ones? So what’s the problem? If you allow duplicate variables in raw contexts, and then say the interpretation is empty, this is already what should happen since contexts with duplicate variables are invalid.
Re #108: When you do hereditary substitution (with only negative types), you can phrase it as: if x is the head variable of a neutral term R, then R[N/x] checks, or if x is not the head variable then R[N/x] is still synth (because x can only occur in checking positions along the spine).
BTW, regardless of whether or not we need the admissible rules for defining the interpretation, we certainly need them for constructing the term model, and hence also for showing that the term model is initial.
No, I don’t think arbitrary invalid contexts are necessarily interpreted as empty. Consider a context like . This is well-scoped, but syntactically invalid because has type , but gets applied in the type of as if it had type . But if in some model (like the one Andrej used to argue that applications must be annotated) it might happen that the interpretations of and are equal. In such a case I think (though I could be wrong) the interpretation of this context would be nonempty, because the inductive clause interpreting an does a semantic equality check between the interpretations of a putative function and its putative argument to see whether it can perform the application.
In any case, there’s no problem with interpreting ill-scoped contexts as empty, my point was just that because we can exclude them quite easily without involving ourselves in inductive-inductive issues, we might as well. As Dan says, ultimately we’re really only interested in interpreting well-typed things. The only reason for working with ill-typed ones at all is to make the inductions easier/possible; so if a particular class of ill-typed things can be excluded without making the induction any more complicated, why not?
Re #111: Hmm. I forgot all about that.
Re #112: Ah, right.
In any case, there’s no problem with interpreting ill-scoped contexts as empty…
Oh sorry, I didn’t realize you thought that already. It seems so weird to me to deal with a constraint you don’t need.
so if a particular class of ill-typed things can be excluded without making the induction any more complicated, why not?
I figure everything on the Raw Syntax page would get more complicated, if you track well-scopedness and then need to prove that it’s preserved by substitutions. Would anything get significantly simpler by working with well-scoped syntax?
Nothing on the Raw Syntax page needs to change. Well-scopedness would be a condition on a raw context, which isn’t defined until the Type Theory page. And it should be obvious that all the primitive rules preserve well-scopedness.
What is the condition?
Related to 115/116, one thing to think through is what it takes to prove that substitutions compose ( for single-variables, or for total). This is a standard property of well-scoped raw terms (i.e. it’s a logical framework level property about binding trees, not anything specific to the object language). When you’re proving that substitution is admissible/well-typed, you need this in the application case. I’m not sure whether or not the induction goes through if you try to prove this mutually with showing that substitution is well-typed. When you do the metatheory of canonical forms LF, I don’t think it does (or at least I didn’t see how it does when I last thought about this), but things are more complicated there because it’s hereditary substitution/normalization. So I’d guess that either (a) you need to be able to prove this mutually with showing substitution is well-typed, or (b) you need a notion of raw term that is well-scoped enough to prove this.
The condition on a context is that .
Dan, can you give an example of how substitutions can fail to compose for ill-scoped terms? I’m not even sure what that means; I thought substitutions were a syntactic operation on single terms, whereas scoping only has to do with terms-in-context.
The condition on a context is that .
Isn’t the context used in a typing judgment a raw context? If so, how does the typing rule for formation type check?
How do you know is a raw context? I figured from the “pairwise-distinct” requirement that there’s an implicit premise that . Is there also an implicit premise that ?
Let’s see, I would say that if contexts must be well-scoped, then is a presupposition ((2)-style) of the judgment . So I suppose that it should, indeed, technically be a premise of the -formation rule.
Requiring seems bizarre since in the conclusion of the rule is bound and would shadow any appearance of in . Couldn’t we avoid that by allowing the final premise to rename the variable, say where is anything fresh?
From a syntactic point of view, one might argue that duplicate variables in a context aren’t a problem: the later ones just shadow the earlier ones. But I’m not sure how to square that with the definition of the partial interpretation of a context.
Gah, these details are so mind-numbing and uninteresting to me…
So I suppose that it should, indeed, technically be a premise of the -formation rule.
OK, I’m “officially” recommending—one time only—that you don’t add a well-scoping constraint. Although the consequences seem to be natural and believable, they do technically make things more complicated, and we seem to agree you don’t need it. It’s your call, coordinator, and I think Dan would take well-scoping even further than this, but that’s my recommendation.
Requiring seems bizarre since in the conclusion of the rule is bound and would shadow any appearance of in .
With raw contexts not allowing duplicates, mustn’t be in , because then isn’t a context. We’re working up to alpha equivalence, and I figured rules with binders should be partial—if you ignore implicit renaming—like the definition of substitution. In other words, if you want to type check a binder, alpha rename it so that you’re allowed to put the bound variable in the context.
Couldn’t we avoid that by allowing the final premise to rename the variable, say where is anything fresh?
Yes. That’s the root-to-leaves fresh renaming that came up.
From a syntactic point of view, one might argue that duplicate variables in a context aren’t a problem: the later ones just shadow the earlier ones. But I’m not sure how to square that with the definition of the partial interpretation of a context.
I was actually thinking about that, due to #99, before I realized empty sets for contexts with duplicates wasn’t a problem. I came up with something, but it seems like a minor hassle. You’re currently getting away with something that I think is simpler (but it’s an apples-to-oranges comparison), since no-duplicates gives you that isomorphism.
To interpret , you take the set of -environments such that there exists
and equals the updated version of , by mapping to .
The existential quantifier might get annoying.
Gah, these details are so mind-numbing and uninteresting to me…
Couldn’t you just pick some smart-looking paper on type theory syntax and do exactly what they say?
Couldn’t you just pick some smart-looking paper on type theory syntax and do exactly what they say?
In my experience very few papers on type theory are fully explicit about how they treat variable binding and renaming, and those that are rarely explain what they are doing in any abstract level of generality that could be easily transported to a different type theory.
The most promising reference I’m aware of is Harper’s abstract binding trees. Maybe tomorrow I’ll look back at those and see whether they would solve all our problems.
IMO, the problem with variable binding is that we only have conventions/leaky abstractions rather than a clean abstraction that hides whether you are using de Bruijn or variations on named form. So being fully precise means committing to a particular implementation.
ABTs and the system of hypothetical-general judgements in Bob’s book are a logical framework of sorts, and the idea is that you implement the logical framework once (with variables = strings and name swapping, say), and then do the rest of your work “inside” the framework: you only write definitions/rules that make sense in the framework, and only do the inductions that “always” work in the framework.
Another thing you can do is work in nominal logic, so that the idea of names and equivariance (an operation of name swapping, and judgements being invariant under it) is built in to the metalanguage.
But if the goal is explaining things to mathematicians from scratch, I think we have to suffer with doing things by hand… which means doing all of the work you would do once for a framework inlined into definine one specific type theory.
One spot where people have been very precise about these things is the literature on formalizing programming languages in Coq (because as opposed to Twelf, there’s no built in support for variable binding). My sense is that the “locally nameless” implementation dominates there (bound variables in de bruijn form, free variables as names). See here for example.
Beyond constructing terms as alpha-equivalence classes (which is tedious but straightforward: define raw terms, define name swapping, quotient by that, then definine substitution on -equivalence classes), the main thing is implementing the “general judgement”, specifically implementing the move where a variable that is bound in the subject of a judgement in the conclusion of an inference rule is moved to the context in a premise:
There is a question of what this really means, since is really a representative of an -equivalence class (assuming the subjects of judgements are quotiented by -equivalence). Some possible interpretations are
there exists a variable (name/string, nothing magical here) such that premise holds. (but what about shadowing)
there exists a fresh ( not in the “domain” of ) variable such that the premise holds
for all fresh variables the premise holds
the “cofinite quantification” that they do in the above paper (there is some set of not-allowed variables, and for all variables not in that set the premise holds)
If I recall correctly, the disadvantage of the “exists” formulations is that an induction can sometimes break down, if you get a premise of a rule with a chosen fresh variable, but then you want an inductive call on the premise with a different fresh variable. (You can probably prove a lemma that swapping one variable for another preserves the height of the derivation, but people like to avoid this in mechanizations.) The disadvantage of the forall formuation is that you sometimes need to choose a fresh variable (maybe not so much of a problem mathematically, but constructively it means having an ordering on variables, say, so you can pick the “next” one not in a set). The cofinite quantification has seemed like overkill to me for situations where you always know what you’re trying to be “fresh” with respect to (which isn’t always the case for all ways that people phrase judgements in PL, which was some of the motivation there).
So tl;dr I think an appropriate reading of
is
where is notation for choosing the bound variable to be called . I.e. if the term is really -equivalence classes of pairs of a name and a term , then pick any representative and name-swap .
Re #118: what I was worried about (what would need to be worked out) is how this works if “raw terms” are not quotiented by alpha equivalence, so that substitution is a partial operation. But maybe I misunderstood and raw terms are up to ?
Re 120: the problem with shadowing (with dependent types) is that it breaks weakening. , but (now the type is ill-formed).
Re 123: interesting. Does Bob’s logical framework avoid this “ambiguity” in the interpretation of the abstraction-intro rule?
When you write , shouldn’t the notation somehow reflect what the bound variable was before you rename it to ? The term alone wouldn’t show that, so maybe would be an unambiguous notation.
Re #123:
IMO, the problem with variable binding is that we only have conventions/leaky abstractions rather than a clean abstraction that hides whether you are using de Bruijn or variations on named form. So being fully precise means committing to a particular implementation.
Interesting that you say this! I happened to be looking back at Peter’s original post again and noticed that he suggests something that seems to me similar:
raw syntax is defined as an inductive family parametrised by (a) syntactic classes, i.e. the two-element type , and (b) “contexts-of-variables”. Here “contexts-of-variables” could be taken to mean “all subsets of ”, if we want to use named variables, or just “natural numbers” if using de Bruijn indices (where would represent the context of length , with variables ).
and later
one can be agnostic between [named variables and de Bruijn indices], by taking as a parameter something like “a monoidal category with an ’extend’ operation, and with a monoidal-plus-extension functor to ”. The idea is that is the category of what I called “contexts-of-variables”… it axiomatises everything one uses about how contexts work, and you recover de Bruijn indices by taking , or named variables by taking .
Peter also said he wouldn’t necessarily recommend this approach because he hadn’t found quite the right axiomatization yet, so that the overhead didn’t seem quite justified. But I thought I might as well give it a go myself, and I’ve come up with something that seems to me to work quite well. I wrote up something about it on my web at categories of variables (michaelshulman) — keeping it separate from the main initiality project pages for now just because it would be a somewhat more radical change, so we should evaluate it carefully first. What do you think?
FWIW, that page on categories of variables looks really nice. I’ve been lurking and following the discussions, and that makes more sense to me than a lot of other things (purely because of my own lack of expertise in type theory).
Something else nice about it that I just noticed: if we do the categories-of-variables construction inside HoTT, with a univalent category, then -equivalence should coincide with the ordinary (dependent) identity type.
I’m also lurking. Is that observation in #129 anything like where in your HoTTEST talk you write
Use a mini-DTT to describe the modes and mode contexts ?
I mean, is this something we might expect that aspects of syntax get treated more conveniently within the type theory?
No, I don’t think it’s anything like the mini-DTT. But at first glance, it does seem to be a case where working univalently has an advantage even for the metatheory of type theory. I am now wondering if we can get some of the same benefit working set-theoretically by explicitly defining the terms to be an inductively generated indexed family of groupoids, or even an inductively generated functor, rather than an inductively generated set and then afterwards constructing an equality relation and functorial action on them.
From the end of “categories of variables”:
We have to instead write something like . This is still somewhat easier for a human to read than de Bruijn syntax, since the binders are referred to by name rather than by number…
How would you write the term normally written ?
?
Yes.
I haven’t read the proposal in detail, but this notation for explicit weakening reminds me of a similar convention in Paul Taylor’s “Practical Foundations of Mathematics” where he uses to mean weaking by the variable , so it would look something like . More detail can be found (with some bad html rendering) under “Notation 1.1.11” here: http://www.cs.man.ac.uk/~pt/Practical_Foundations/html/s11.html
It looks like it’s now using stronger typing, so that an expression’s type gives you an upper bound on its free variables, like a context. You call it a sort environment. Why?
Hmm, I guess I should really call it a “sort context”. Not sure why I reached for the word “environment”. I don’t want to call it just a “context” because it’s weaker/prior to the contexts that arise later on in the typing judgments, but it is certainly a kind of context since it assigns “types” (here, sorts) to variables rather than “values”. Right?
Thanks Max, that does indeed look similar!
Re #134, to me it’s much more than that: it’s a way to define things like -equivalence and capture-avoiding substitution and be sure of getting them right. The “stronger typing” as you express it in #136 makes incorrect definitions of substitution into “compile-time type errors”.
One might say the perspective is that the problems with -equivalence and substitution and so on arise from overly material thinking about variables. The structural perspective taken here is that the variables in any context are a structural set, and two distinct structural sets can never have elements in common. Thus when we (for instance) substitute a term under a binder, we cannot leave the variables in that term unchanged, since it is being placed in a new context and the variables in its old context are not also variables in this new context. And -equivalence is then simply the manifestation of the structural point of view that the correct notion of “sameness” for structural sets is a specified bijection rather than element-by-element equality.
Re #140, I don’t understand that, because your development includes a definition of substitution. Are you saying that the strong types are how you figured out the right definition?
Re #141, it’s an interesting perspective. I personally don’t find it very intuitive. As for whether some other perspectives are “overly” material: If you find that you have problems when working from a material perspective, then you shouldn’t. If a different perspective is helpful for category theorists for understanding type theory, then I guess that’s how it goes.
(Edit: The original version was worded too strongly.)
How could “I wrote down a correct definition” be a barrier to understanding “you can’t write down an incorrect definition”?
Thank you all for answers to my question about presuppositions back in #49. (Sorry to rewind so far back, only now I had an opportunity to take a closer look.) Unfortunately, I am still somewhat confused.
Perhaps the simplest question I can ask is this. In here, Dan described a simple example of a language presenting a category freely generated by two objects and two morphisms. He said “Term typing is , with the presuppositions that and ”. Why is it that the identity rule does not include or as premises while the composition rule has a premise but not or ?
Because I didn’t take you literally. Of course I can write down an incorrect definition of substitution. By disregarding your development. So I assumed I’m supposed to use your development. (“I” is any hypothetical person hoping to benefit from your development.) But your development includes a (hopefully, probably) correct definition of substitution already, which is generic, so I should not write down any definition of substitution at all. Using your development avoids incorrect definitions of substitution trivially by avoiding definitions of substitution. The benefit for the user of your development is reuse, not strong typing. At least when it comes to substitution. So I asked if you meant that the strong typing was a benefit for you.
This might have something to do with the informal distinction, in software engineering, between libraries and frameworks. A library provides good things to reuse. A framework primarily tries to guide the user in building new good things. Technically, a framework is usually a library. Strong typing may be a good choice for a framework, but if I use your development to get substitutions, I’m using it as a library.
Re #144: You probably noticed that since #49, there was a discussion about what is/should be meant by “presupposition”. I consider that discussion to have been mostly, but not entirely conclusive. As I see it, there’s still a grey area in the difference between what got called (2) and (3). But for what Dan called extrinsic presuppositions, the first step is to do (3), and that’s what you see in #86 of the other thread.
(3) also got called “preconditions”. The idea is that when you’re interpreting (doing induction on) a derivation, you should already know something about the objects being judged by the conclusion. So for the judgment form to have and as preconditions, it means you should already know (to be true) the interpretation of and when interpreting .
So the identity rule doesn’t need or , because when you interpret an instance of that rule, it would tell you something you already know. When interpreting an instance of the composition rule, you similarly also know what and would tell you, but you don’t know what tells you, since is appearing for the first time. And you probably need to know what tells you, since you’re expected to know it to interpret the subderivations and . So should be added as a premise, to tell you.
Re #145-146: Are you serious? You think the benefit of strong typing only accrues to the very first person who implements a particular algorithm? Not all the other people who read, evaluate, maintain, debug, extend, and port their code? I daresay in mathematics this is even more pronounced than in programming.
Are you serious?
Yes. Though I didn’t explain myself carefully enough to base an argument on it. For example, if the definitions in a library are strongly typed, it can guide the library user in using them correctly. But this is usually not open-ended enough to count it as a framework. Does that avoid an argument about #146?
You think the benefit of strong typing only accrues to the very first person who implements a particular algorithm?
No. That’s not what I said. Let’s look at #140 again:
The “stronger typing” as you express it in #136 makes incorrect definitions of substitution into “compile-time type errors”.
Maybe you didn’t really mean to focus on the definition of substitution. Or on substitution at all. A strongly-typed syntax framework might help users avoid careless mistakes when defining a type system. (Which is not primarily about substitution.) Good enough?
(For this project, where we define a single type system, I still don’t think a strongly-typed framework pays off.)
Please do me the favor of believing what I say. I believe that a single program already benefits from strong typing and compile-time error-checking. And similarly for a single mathematical definition. But I’m not interested in continuing this discussion further.
The point of the initiality project is not to sell type theory and structural mathematics to unbelievers, but to prove the initiality theorem. Over the past few weeks we have been trying to set up the raw syntax of the type theory we are going to use. We have been having a hard time and making a lot of mistakes. Maybe we’ve arrived by now at something that will work; maybe not. In any case it will not be obvious to readers whether we have or not, especially those who are not experts in raw syntax. I, personally, am hugely confused about what’s going on with substitution, variable renaming, variable swapping, and alpha-equivalence. And if I’m confused, who have spent a good deal of the past 10 years immersing myself in type theory, you can bet the average mathematician is going to be in way over their head. I’m proposing an alternative which has the disadvantage of being novel, but the advantage of being much easier for a mathematician (at least one with some category-theoretic skill) to understand and get right. Furthermore, remember that part of the goal here is for the theory to be modular and extensible: we are not just defining a single type system, but we want to be able to easily add new type formers and rules to it as we have the manpower and time, and we have a long list of contributors whom we want to enable to do pieces of that work easily.
Please do me the favor of believing what I say. I believe that a single program already benefits from strong typing and compile-time error-checking. And similarly for a single mathematical definition. But I’m not interested in continuing this discussion further.
I can take your word for it that you believe those things. I don’t believe them, as blanket statements. (And I can’t change my beliefs as a favor.) But I didn’t want to argue about it, I was just asking for clarification about #140. Yes, this discussion has gotten weird, let’s drop it.
Over the past few weeks we have been trying to set up the raw syntax of the type theory we are going to use. We have been having a hard time and making a lot of mistakes. Maybe we’ve arrived by now at something that will work; maybe not.
I never actually understood why we need go into such detail on the syntax. I find the situation we’re in to be rather incredible: type theorists have done binding and substitution so much, they’re sick and tired of it, and meanwhile mathematicians are apparently still skeptical of it. If we really need to do syntax again, why should it be nominal? The consensus seems to be that that’s the hard way. And a general category of variables isn’t nominal, right? So did that requirement go away?
I’m proposing an alternative which has the disadvantage of being novel, but the advantage of being much easier for a mathematician (at least one with some category-theoretic skill) to understand and get right.
I believe that for a sensible type system, all the working approaches to binding and substitutions are basically interchangeable. So I have nothing against your new approach in principle. But I don’t understand it fully. To me, you have made something simple into something very elaborate, clever, and advanced. I selfishly hope you don’t use it.
What is the status of mathematical understanding of the simply-typed lambda calculus? Did you know that you can encode any ABT as a term of the simply-typed lambda calculus with constants? Would that be sufficiently convincing to mathematicians to avoid bothering with substitutions?
What about the formalization of nominal terms used in the Nuprl verification? I haven’t studied it carefully, but I read about the project using it. Out of the box, it’s single-sorted, unlike how we separate type and term expressions, but otherwise, I think it’s basically ABTs. They have stuff for well-scoping, which is done extrinsically.
I’m pretty handy with de Bruijn indexes, if you’d be OK with that.
Furthermore, remember that part of the goal here is for the theory to be modular and extensible…
If we come up with a generic notion of substitution-compatible rule, which I think we should, the interaction with raw substitutions would be handled once and for all, I imagine.
The reason mathematicians are skeptical of it is that type theorists apparently haven’t done it once and for all. (Or, if they have, it’s hidden away in a few specialized places and not publicized in a readable way.) In mathematics, when you’re sick and tired of doing something over and over again, you give a general definition or a general theorem so that you can just appeal to the theorem in order to stop doing it. It’s not enough to just say “I’m sick and tired of doing this over and over again, so I’m going to stop”; you have to actually justify that you’re allowed to stop. It’s not enough to “believe” that for a “sensible” type system, all the “working” approaches are “basically interchangeable”; you have to define the words “sensible”, “working”, and “basically interchangeable” and justify your belief with a proof.
I already explained the reasons for using, or at least including, named syntax. Categories-of-variables include named syntax as a particular case, so there is no need to pass through de Bruijn indices when interpreting syntax that’s written with names, and working in that level of generality is easier for a mathematician to read than in de Bruijn syntax. Apparently you prefer to , but I don’t think that is true of the majority of the intended audience, including me, as well as the intended participants. You may be the loudest one in the room right now (except for me (-:O ), but there are 40 other people signed up for this project, and I suspect many of them are just waiting for us to settle the syntax in a way that makes sense to them before they start contributing.
I am quite aware that we can encode ABTs in STLC, indeed I mentioned it in the last paragraph here. However, with that encoding the inductive nature of the ABTs manifests differently. If you use ordinary STLC, you have to deal with the STLC-level -redexes. You can avoid this using a canonical-forms-only version with hereditary substitution, but that’s not something that mathematicians are familiar with, nor is it something for which there are standard and readable references as far as I know – I’ve picked it up as “folklore” by talking to type theorists like Dan Licata. Moreover, in either case each node of an ABT is built up out of several STLC “application” and “abstraction” rules; so either you have to prove an “adequacy theorem” relating the STLC-encoding to a non-HOAS version (in which case we didn’t actually get any benefit out of the STLC encoding, for our purposes) or write your inductive proofs differently to break down in the same way. The latter option is somewhat tempting, but it would require structuring the semantics significantly differently as some kind of closed relevance monoidal category, as I mentioned in the above-linked comment, and proving a (weak?) initiality theorem for STLC relative to such structures, so I don’t think it is the right choice for this project.
Thanks for the link to the Nuprl verification. It may be possible to pull out what we need from that, although the proofs aren’t human-readable. For that matter, Bob’s ABTs in PFPL might already be enough. However, I don’t understand how Bob deals with the problem of general judgments that Dan raised in #123: he must be aware of it, but as far as I can see he simply asserts that “ABTs will be considered up to -equivalence” without justifying how the rules respect this.
I’m kind of swamped with other responsibilities for the next few weeks (in fact, I’ve already put way too much time into this project recently), but I will think about all the options and eventually reach some decision. I think categories-of-variables have a lot of advantages, but their novelty is definitely a disadvantage, and one might argue that their abstraction is also (although to me, they are actually simpler than trying to get either named variables or de Bruijn indices right). I would very much like to hear from more “lurkers” about their preferences, to inform that decision; I don’t really want to be a dictator, but if there are only 3 voices in the room then I’ll have to be.
I suspect many of them are just waiting for us to settle the syntax in a way that makes sense to them before they start contributing.
I’ve unfortunately been too busy to do anything yet, but in addition, it is true that I am not following these discussions properly and am waiting for somebody to say ’this precise tedious thing needs doing’ :-). The reason I am not following them is that they are about what naively seem to me to be somewhat trivial matters (can -equivalence really be that difficult?!), and the discussion seems in that light a little over-sophisticated/overwhelming. As I say, that impression might be naivety on my part, but still I would be happy to just settle on some definite approach. Using category theory sounds nice to me, but I understand if it does not get the majority vote.
However, with that encoding the inductive nature of the ABTs manifests differently. If you use ordinary STLC, you have to deal with the STLC-level -redexes.
That’s not what I’m talking about. reduction is not involved at all. The encoding of ABTs in STLC is a section, and it commutes with ABT substitution and STLC substitution. There’s another such diagram from STLC to untyped LC, where the encoding just erases types; the types are just to enforce arity.
The point is that the lambda calculus is already the general case, as far as substitutions for pure functional type theory is concerned.
I don’t know how technically useful that is for us, but it’s the reason why I think binding and substitution should be considered a solved problem.
Recently, I’ve been developing substitutions for object languages by setting up a commuting diagram with untyped LC, and using it to transport properties about substitution from LC to the object language. The language-specific definition of substitution seems easier to deal with in later proofs, so I prefer this to using generic ABT substitution.
Oh, I see your point. I’m used to thinking of using logical frameworks where object-level substitution is represented by framework-level application, followed by hereditary substitution if necessary.
However, if we want to induct over ABTs (instead of over the whole STLC), we still need to define them separately (rather than just finding them inside STLC) to get their induction principle, and when we then prove things about how our interpretation acts on substitution we need to know what substitution looks like on ABTs. I guess we could find out what that substitution looks like by embedding into STLC, substituting there, projecting back down, and “-reducing this at meta-level” to get an explicit description, but that would require us to make a choice about how to write down substitution explicitly in STLC, which isn’t any easier.
I didn’t mean to imply that how to define substitution wasn’t a solved problem, or that mathematicians could reasonably be skeptical of the fact that such definitions exist, even in generality. What I meant is that the understanding of these definitions is not, apparently, at a point where we (for the purposes of proving an initiality theorem) can treat it as a black box: we need to look inside it and see how it is defined. This was in response to your saying “I never actually understood why we need go into such detail on the syntax,” so I was trying to explain why I think we do. How would you propose to prove the substitution lemma without having a definition of substitution to refer to?
I now have a middle-ground proposal, which I am personally happy enough with to have tried replacing Initiality Project - Raw Syntax with (but we can always revert). Namely, to -reduce the categories-of-variables notions in the case of named variables and write out the result explicitly. I think this removes most of the novelty and abstraction problems with categories-of-variables, while maintaining the “strong scoping” advantage that makes it impossible to get substitution wrong, and being easier for a category-theorist to understand and trust. It also aligns with Dan’s argument that we should only consider well-scoped syntax, incorporating the scoping as a parameter of the inductive definition of syntax, an idea which was also part of Peter’s original sketch.
Thoughts?
However, if we want to induct over ABTs (instead of over the whole STLC), we still need to define them separately (rather than just finding them inside STLC) to get their induction principle, and when we then prove things about how our interpretation acts on substitution we need to know what substitution looks like on ABTs.
Actually, I think you can define ABTs of some signature as inductive subsets (one per sort) of STLC, and prove it’s closed under substitution. So you do get an induction principle. But I haven’t tried that; it sure doesn’t sound like the easy way in Coq. (And if you’re going to do it that way, why not just subset all the way down to well-typed syntax in one fell swoop?)
I guess we could find out what that substitution looks like by … but that would require us to make a choice about how to write down substitution explicitly in STLC, which isn’t any easier.
Right. I wasn’t actually proposing a switch to this approach.
I didn’t mean to imply that how to define substitution wasn’t a solved problem, or that mathematicians could reasonably be skeptical of the fact that such definitions exist, even in generality.
Oh, I thought that’s what you were saying. So then I didn’t even need to point it out.
What I meant is that the understanding of these definitions is not, apparently, at a point where we (for the purposes of proving an initiality theorem) can treat it as a black box: we need to look inside it and see how it is defined. This was in response to your saying “I never actually understood why we need go into such detail on the syntax,” so I was trying to explain why I think we do.
Certainly substitution can’t be a black box. But that doesn’t necessarily mean we need to deal with all the gory details either.
I agree that there should be a better, rigorous abstraction for using substitution conveniently when developing a type system. But I’m not convinced that categories of variables is that abstraction, because I’m not convinced that it’s easier to use than a particular concrete definition. Indeed, it actually looks to me like it combines the disadvantages of the various concrete approaches. As a rule of thumb, a general/weak abstraction is easier to model and harder to use, right?
HOAS is kind of a clever hack, in a way. It exploits the metalanguage. Not just it’s semantics, but also its implementation. So it’s probably not much help at all for informal math. Unless you agree as a community (as type theorists apparently have) that everyone knows what substitution is supposed to do, without needing to review a formal definition. Arguably, informal type theory really already uses HOAS, in that it exploits the understanding of substitution that a type theorist already has.
The main obstacle to formalizing how this interacts with the rest of metatheory is probably just that type theorists don’t actually agree what a type system is, at any higher level of abstraction than raw syntax.
Re #158, I haven’t looked at it yet, but I bet I’ll like it better than an arbitrary category of variables.
As a rule of thumb, a general/weak abstraction is easier to model and harder to use, right?
Not necessarily. By abstracting away from inessential features, a general framework can actually make it easier to discover proofs and to make correct definitions. In this case, the particular features of concrete approaches are what make it possible to give incorrect definitions of substitution, whereas the general abstraction allows a correct definition to be written down simply by “following your nose”, and moreover makes it obvious that the resulting definition is correct. (This is related to the analogy I made to material set theory: several times I have read papers containing subtly incorrect definitions, made by people who were thinking too concretely in terms of what the elements of a set “are” rather than treating them more structurally/abstractly.) This is often the case with category theory: once a problem is expressed in the proper abstract language, the solution becomes obvious because it’s the only possible thing you can do.
You don’t seem to have much appreciation for this sort of categorical thinking, but maybe you can trust those of us who have used category theory a lot that this really is true. In any case, there’s probably not much point continuing this conversation.
Yeah, I guess I walked into that.
But specifically for categories of variables (COVs), names have the disadvantage that you need to worry about equivalence, while indexes have the disadvantage that the way you refer to a binder is context-dependent. It seems like with a general COV, you need to pessimistically deal with both problems.
To me it seems most convenient to use names when defining large expressions in a language, and indexes when developing the metatheory. Using the most appropriate COV minimizes the trouble caused by the disadvantages. Using a general COV seems to maximize the trouble.
Just catching up – I don’t have much time for this stuff during the week.
Re categories of variables: conceptually, this is nice for providing a comparison between the different approaches! I don’t think I’ve seen things presented this way, particularly the pushouts for freshening and the pushout complements. It reminds me a little of https://nicolaspouillard.fr/publis/jfp-unified-binders.pdf which has a relation between worlds that is like the arrows of F.
I think it would be interesting to compare with a more “algebraic” axiomatization that doesn’t bake in the finite sets from the beginning, something like (haven’t had time to think through all of the details):
Then, rather than a bound variable occurence being an element of an object , it would be a map from some singleton.
I think this would be a nice interface in a (directed) type theory, because it avoids “implementing” the abstract types as finite sets.
The disadvantage relative to yours is that, in yours, if you have a term in context , then you can write all of , , and as terms. Whereas here would be coded as some associativity of , and technically would be , etc. However, this doesn’t seem like that big of a problem if we’re in contexts that arise by fresh extension of the empty context, since then would arise as “the with a fresh extension , where has a fresh extension , and has a fresh extension . So a reference to is , which encodes the same information about what you’re skipping, though I agree it’s more readable than doing it by pushout injections.
For usability, I think I agree with Matt in #161 that working abstractly wrt any category of variables means you pay the costs of all representations.
My mental model for the “convention” of working with variable binding is that:
Working inside of something like Twelf gives you this interface (because inside the type theory, weakening and exchange appear silent, because the type theory implements variables at the meta-level). Also a nominal setting, if you define Term[X] to mean a term whose free variables are contained in X, where X is a list/set of atoms (then the same “unscoped” term can be used in different contexts).
de Bruijn gives you (1-3) but not (4-5). Building named form from scratch in any type theoretic proof assistant other than NuPRL/OTT/HoTT seems unpleasant, because you don’t have good support for quotients. I think that’s the main reason I have prefered de Bruijn for formalizations.
But assuming quotients, it seems like the way to get all of the above is to define unscoped raw terms, then an extrinsic scoping relation (“FV(t) are contained in X”). The extrinsic scoping is necessary to get (4), because then the weakening/exchange changes the scopedness proof without changing the raw unscoped term: if . then at least you have that . And also quotient terms by -equivalence.
For a general category of variables, we get (1) and (5) but not (2-4), as Matt said.
For the -reduced names implementation on the Raw Syntax page, we get (1) and (5) and a bit of (4), because the coercions that are formally there will often be the identity.
Personally I would trade (1) for (4) in a formalization, though other people may differ. Your definition of alpha-equiv is very nice, but I haven’t completely understood what making respect for alpha an admissible rule as opposed to a presupposition of the judgement does.
On the other hand, for initiality, I don’t entirely understand why de Bruijn is inappropriate: if the idea is to make the two sides of the theorem as close as possible, then it seems like using de Bruijn for this part, and relying on a prior translation from a named representation to de Bruijn, would be acceptable. I don’t think, for the proof, using de Bruijn would be harder than using the named form with explicit weakening and exchange (action of ): there will still need to be some lemmas about the application of to a term.
But I do like the intrinsically scoped named form, if it’s expositionally useful to have the syntax that is fed into initiality be a named one (as opposed to parsing the named form into de Bruijn indices and then feeding that in). Again, I’m outside the target audience on that question.
I still don’t understand this:
So the identity rule doesn’t need or , because when you interpret an instance of that rule, it would tell you something you already know. When interpreting an instance of the composition rule, you similarly also know what and would tell you, but you don’t know what tells you, since is appearing for the first time. And you probably need to know what tells you, since you’re expected to know it to interpret the subderivations and . So should be added as a premise, to tell you.
I don’t see in what way is “appearing for the first time” while and aren’t. Since both and are premises, I would say that all , and have appeared before. If I already know how to interpret and , isn’t it presupposed that I already know how to interpret , and ?
Re #164: Ah! You are reading the rules in the wrong direction. :) Have you ever had the chance to think about recursive function definitions from a programming perspective? Well our judgment forms aren’t actually functions, but there’s something similar: You start at the root, do some stuff on the way to the leaves, then do some other stuff on the way back. There is some explanation about how typing rules are related to algorithms. (Maybe you can improve it, so that it would’ve better helped you?) Unless everything is an output, things start out going from conclusion to premises, algorithmically speaking. (Well, in a sense you always start out at the root, but if everything is an output, nothing interesting happens till you reach the leaves.)
I don’t see in what way is “appearing for the first time” while and aren’t.
We are coming from below (on the page) the conclusion. and are inputs into the rule, and then we encounter , which is not in the conclusion, except as an annotation on the main subject. The main subject (if there is one) is what you’re taking apart as you go from root to leaves. So its immediate subexpressions have just been uncovered for the first time.
If I already know how to interpret and , isn’t it presupposed that I already know how to interpret , and ?
The details of these rules are working at a lower level of abstraction than presuppositions (2). Yes, we should know how to interpret , , and , if we expect to know how to interpret and . But how do we get any of those interpretations? It turns out, for this rule, we should already have the interpretations of and . The premise will enable us to interpret . (From a proof perspective, the induction hypothesis for the premise will give us the interpretation.) Now we have what we need to interpret and . (The induction hypotheses give us those, provided we have interpretations of the types.) With those, the semantics lets us construct the interpretation of .
More later, but this is the part I always get confused by:
all constructions are supposed to respect this quotient by construction, rather than having to explicit check each definition
What does “supposed to” mean? Does it mean “assumed to”, i.e. there is technically a proof obligation but we just ignore it? Or is there some way of specifying what kind of constructions are “allowed” and proving once and for all that all of these do respect the quotient automatically? I suppose with an LF/HOAS encoding the latter happens, but otherwise it seems to me that even with real “quotients” one still has to prove that the quotient is respected (that being one of the inputs to the quotient-induction principle).
if the idea is to make the two sides of the theorem as close as possible, then it seems like using de Bruijn for this part, and relying on a prior translation from a named representation to de Bruijn, would be acceptable.
That’s a good point. However, I still don’t want to have to actually use de Bruijn syntax when writing out all the cases of the initiality proof!
if the idea is to make the two sides of the theorem as close as possible, then it seems like using de Bruijn for this part, and relying on a prior translation from a named representation to de Bruijn, would be acceptable.
That’s a good point. However, I still don’t want to have to actually use de Bruijn syntax when writing out all the cases of the initiality proof!
When I deal with an informal rule like:
I just formalize it with de Bruijn indexes:
Of course, some things get messier:
But it’s the fault of explicit weakening. () And my impression was that semantics has to deal with that anyway.
Using de Bruijn indexes for metatheory doesn’t involve all that many particular indexes.
Re #166: I think there are different approaches to the “supposed to”:
Using LF or nominal logic, you work inside of a metalanguage/domain-specific language where everything respects alpha, so you don’t see any proof obligations, or the proof obligations are automated (this is how the nominal package for Isabelle works, if I remember correctly)
I think one of the reasons people like “locally nameless” in proof assistants is that (at least in certain kinds of developments) it’s more common to need alpha-equivalence for bound variables than to need “judgement-wide alpha conversion of free variables” (e.g. identifying with , so if you can avoid that, then you have de Bruijn form exactly where you need -equivalence.
When people write what they usually write on paper (and think of it as being implemented with names, from scratch, rather than working inside of a logical framework/nominal setting), I would say that officially yes, there are proof obligations to show that everything respects the quotient that no one ever discharges. Obviously not ideal mathematically, but essentially what people do is stick to a collection of operations on variables that are OK in various sorts of logical frameworks, so I think of the notation on the page as a short way of communicating (to humans) a de Bruijn representation, or a named form with proofs that everything respects the quotient, or an LF encoding, or a nominal encoding, or something… For example, if all you ever do with variables is refer to them as placeholders for terms, rename/substitute, compare them for equality (but not compare a variable metavariable, which is what really is in the rules, to a specific variable (the string “foo”)), etc. then it’s “standard” how to make everything precise in various ways. When reading papers, unless someone does something fishy (this sometimes comes up when people don’t distinguish between record projection kinds of things, which don’t have a fixed scope, and variables), I can mentally translate what’s on the page to my preferred way of being precise about things. The only way I know to rigorously do this “once and for all” is to fix a specific logical framework/metalanguage, because to say that “everything respects ” requires circumscribing the allowable somethings.
Re #167: I think I agree with you: the same thing would happen to the variable rule in the scoped name form, right? In terms of a general category of variables, where is the composite fresh extension (weakening) extracted from , which goes from what was in scope in to what’s in scope in the whole context. If you -reduce the names category of variables, this doesn’t go away, does it?
It’s all very well to say “when I deal with an informal rule I formalize it using de Bruijn indices”, but there is no (planned) “formalization” step to the Initiality Project, not in the sense of coding it up in a proof assistant. So I believe we need to write what we actually mean in order to have an actual mathematical proof. Unless you can write the “translation” from named variables to de Bruijn indices in a sufficiently general way that it will apply automatically to “all rules of a certain sort”, with a precise mathematical definition of “certain sort” and a precise mathematical proof that it works? Then we could write our rules with named variables and appeal to that translation/theorem to give them precise meaning… although we would still have to manually -reduce the translation and work with the de Bruijn indices when defining the interpretation. So overall I still favor using named variables throughout, especially now that I’ve written out substitution and -equivalence in a way that I, at least, am happy with.
It’s true that the -reduced named category-of-variables (what is currently at Initiality Project - Raw Syntax) does require explicit weakening. But, as Matt said, weakening is also an explicit operation in the semantics, so I don’t think this is a big problem. And I think in concrete cases, the weakening operation will usually act as essentially the identity (the only action happening in the parameter of the inductive family ), so the syntax will still look like what people are familiar with.
By the way, this:
officially yes, there are proof obligations … that no one ever discharges. Obviously not ideal mathematically, but … if all you ever do with variables is… etc. then it’s “standard” how to make everything precise in various ways. When reading papers, unless someone does something fishy… I can mentally translate what’s on the page to my preferred way of being precise about things.
sounds to me a lot like the situation with initiality theorems in miniature! Which argues, to me, that however we deal with variables in this project, we should be completely precise about it.
I admit to a bias against de Bruijn indices. Often, “just use de Bruijn indices” is used as a sort of “retreat phrase” when someone (e.g. Streicher) who appears to be using named variables is pressed about how to make issues of substitution and -equivalence precise. I would like to make the point that named variables are just as valid an approach, if you are equally careful about them, and if you are comfortable (as most category theorists are) with identifying objects along isomorphisms (in this case, terms along -equivalences) instead of demanding that all notions of sameness be collapsed down to the same set-theoretic equality.
Re #170/#171: I agree that we should be completely precise. I think the two options that give the same overall result would be: actually use de Bruijn indices for the proof, and then separately give a translation from named form to it, or really use the named form throughout. Personally, I have a pre-existing bias towards de Bruijn indices (not as a retreat, but as “what’s really going on”), because I’ve done a lot of metatheory with well-scoped de Bruijn indices in Agda, and because it avoids quotienting.
But I also think that using “well-scoped names” (i.e. what’s on Raw Syntax now) is an interesting thing to try (since it’s a less “skeletal” version of de Bruijn indices), and I’m curious how much extra work it is relative to de Bruijn indices—maybe it’s a better “what’s really going on” if it’s not much extra work. The fact that weakening is “heterogenously the identity” is nice.
I was trying to figure out where things would get more difficult with names. I’m still not entirely sure what a premise of a rule should be:
Consider
where if the subjects of judgements are not quotiented by , then we should allow the variables to be different in the type and the term.
The “existential” version would be to take the chosen pushout of and to get and (actually, aren’t there two chosen pushouts I could be talking about here, depending on which of those fresh extensions is being freshened with respect to the other) and make the premise be
where is in , is in , the fresh name associated with is in , was in so is in , and was in so is in .
(Then -reduce the named implementation, but I like thinking at this level of abstraction.)
(But I think I could also have picked instead? If you have a chosen pushout square with two fresh extensions, is it guaranteed that both of the opposite maps are fresh extensions, or just the one opposite the “main” fresh extension, not the one that is the “arbitrary map”? If I swap the roles of the maps, do I know that it chooses the same two maps?)
So the questions are:
is respect for admissible if you arbitrarily pick one of the above possibilities? If so, then using it you can swap in a different fresh name instead of . Otherwise this rule would need to quantify over all fresh names (i guess that would be “for all fresh extensions and maps making a pushout square with and ).
the other thing that quotienting the subjects of judgements, or making the premises be universal (for all fresh names, rather than for some fresh name), gives you is that the inductive hypotheses of a structural recursion apply to all fresh names, rather than only some chosen one. If this ends up being a problem in some proof (i.e. you end up needing an IH for a different fresh name than the one you have an IH for), you can probably work around this by proving a lemma that doesn’t change the height of the derivation, and inducting on the height instead of doing structural induction. I’m not sure if this will come up or not, but it’s something we’ll have to be careful about.
It’s all very well to say “when I deal with an informal rule I formalize it using de Bruijn indices”, but there is no (planned) “formalization” step to the Initiality Project, not in the sense of coding it up in a proof assistant. So I believe we need to write what we actually mean in order to have an actual mathematical proof.
You took the “formalize” part too seriously. It doesn’t have to be formal for that to be a good way to make it precise. You seem to be assuming that type theorists really mean to use names, and that other approaches are a technical shortcut. But I’m not sure that’s a good way to think about it. The convention is to write informal type theory with names. Remember what Conor said:
If you bother me about variable freshness conditions, I shall respond by switching to a nameless notation: I’m perfectly comfortable living there, but I know I should use names when communicating with people who aren’t used to it.
I bet Conor really means indexes. Or at least, he doesn’t mean any rigorous approach using names. The names are only informal notation. As usual with informal notation, the translation to formal notation usually isn’t developed rigorously. A confusing thing is that the translation is often made precise, for an implementation. But that’s only for object language expressions, not metalanguage expressions dealing with the object language. (I don’t mean to imply that it’s too hard to extend the translation to the metatheory. I just don’t think that the translation is part of the intended meaning of a metatheoretical development.)
Often, “just use de Bruijn indices” is used as a sort of “retreat phrase” when someone (e.g. Streicher) who appears to be using named variables is pressed about how to make issues of substitution and -equivalence precise.
Maybe you should give them the benefit of the doubt and assume they mean de Bruijn indexes, if that’s how they tell you to make it precise.
With all this, I’m not saying that type theorists really mean de Bruijn indexes as though that’s what type theory is about. But it’s not about names either. (Except for nominal logic and friends.)
Re #164: Ah! You are reading the rules in the wrong direction. :) Have you ever had the chance to think about recursive function definitions from a programming perspective? Well our judgment forms aren’t actually functions, but there’s something similar: You start at the root, do some stuff on the way to the leaves, then do some other stuff on the way back. There is some explanation about how typing rules are related to algorithms. (Maybe you can improve it, so that it would’ve better helped you?) Unless everything is an output, things start out going from conclusion to premises, algorithmically speaking. (Well, in a sense you always start out at the root, but if everything is an output, nothing interesting happens till you reach the leaves.)
This sort of makes sense, but I will have to think about it some more. Explicit examples would help. Maybe there are some textbooks that discuss specific examples of these sorts of derivations? Otherwise, I guess I will have to wait until you settle the definitions and start writing the proofs.
I have to say that my experience trying to understand what’s going on here just confirms what Mike said:
And if I’m confused, who have spent a good deal of the past 10 years immersing myself in type theory, you can bet the average mathematician is going to be in way over their head.
At the same time, what Mike wrote at categories of variables (michaelshulman) is very appealing to me. Indeed, while trying to make sense of the discussion I came up with some (very unsophisticated) version of that myself. There is something rather “operadic” about this approach and I think it could be made quite palatable to (at least some type of) mathematicians.
There is something rather “operadic” about this approach
Indeed. In fact, the definition of categories of variables is pretty close to exactly what you need of a subcategory in order to be able to define a “fat closed cartesian multicategory” with indexing the domains of morphisms. (I mean “fat” in the sense of appendix A of Leinster’s book.)
And, perhaps I should also say, the terms/ABTs form the free fat closed cartesian multicategory having the sorts and operators as generating objects and morphisms, respectively.
Matt, if that’s not what you’re saying, then I’m not sure what you are saying. But this whole business with “X is an informal notation for Y” is, again, like the initiality theorem in miniature: some people like to say that type theory is “just an informal notation for” its categorical semantics. And I can probably go along with that when you’re just using very short terms and derivations, but at some point a quantitative difference becomes a qualitative difference: is the Coq-formalized constructive proof of the odd-order theorem “just a notation” for an argument that makes sense in any topos? When the translation from informal to formal becomes too long or complicated for a mathematician reader to do in their head as needed, then I think you need to make it rigorous, one way or another.
If you have a chosen pushout square with two fresh extensions, is it guaranteed that both of the opposite maps are fresh extensions, or just the one opposite the “main” fresh extension, not the one that is the “arbitrary map”?
The latter. Consider the de Bruijn case, where the only fresh extensions are . The pushout of this map against itself has vertex , but only one of the inclusions into the pushout has this form; the other one is the order-preserving map that omits 2 instead of 1.
If I swap the roles of the maps, do I know that it chooses the same two maps?
No, because of the same example.
For your rule example
I agree that some renaming needs to be done, but I don’t think it’s so complicated. First of all, doesn’t need to change, because it’s the same for both terms in the conclusion: and are both fresh extensions of the same set of variables that occur in . So we just need to choose one of or by which to extend in the premise. There’s no need to weaken because its variables are already , and if we choose (say) as the new variable, there’s no need to change either. So we just need to change to make the last variable instead of , and this is the renaming operation where is the unique isomorphism between the fresh extensions and under . So the rule becomes
We could instead make the hypothesis explicitly universal, though, quantifying over all fresh extensions , with induced isomorphisms and to and respectively:
You’d need to weaken if you want all the variables in a context to scope over all the types in the context. But I think it’s more natural for each type to be defined using only the set of variables that come before it. This does require weakening the type in the variable rule of course, just as with de Bruijn indices:
Oh I see: you can get that isomorphism just at the finite set level. I agree that the scoping should be . I was weakening because the only way I could think of to get a comparison between the two fresh extensions was to take their pushout, in which case the newly added fresh name in the context goes into the overall and comes from or , so and need to be weakened to there. What you wrote is simpler and is also the standard thing (-rename to match ), so that’s good.
Right, is a full subcategory of .
Re #172, #178-#180, I can’t think very well at all about the details of working with an arbitrary CoV. But I can tell you’re talking about how binding and equivalence interact with the typing rules, and I have thoughts about that, for the specific case of the “local Barendregt” approach now on Raw Syntax.
One of the ideas of local Barendregt, I think, was that you shouldn’t have to rename variables when passing under binders. At least when you deal with them one at a time. Things might get weird with multiple binders that are supposed to correspond. Like for equivalence itself, Mike accumulated a bijection between the name sets. I’m guessing we don’t want to pump that up somehow to deal with the typing rules. (But see below.)
So instead, I propose the convention that the main subject picks the name. Other binders that are supposed to match up need to follow along, even if it means renaming. For equality rules, I propose to treat the LHS as the main subject. For type synthesizing rules, there seems to be an interesting possibility: Since the type is an output, the rule gets to pick the name of a binder. It doesn’t need to respect equivalence in the usual sense. (But see below.) The idea is that we convert a function that respects into a relation, rather than simulating converting a function on equivalence classes into a relation on equivalence classes by some general machinery like nominal logic or quotients. I think this will be easier, if it works.
So here are two example rules:
Because the judgment is on well-scoped terms, which follow the local Barendregt convention, we can add to the context: it’s distinct. The gets propagated everywhere, including the output type. Note that is not derivable if . This is an example of the judgment not respecting in the usual sense. Here’s the congruence rule:
Once again, we get away with using an old variable for the . (I think.) But this time, it’s because of the conversion-for-equality rule. The RHS is not so lucky, so we need to accept any name not necessarily equal to . Since we like the LHS better, we put in the context, which means we have to use the renaming operation on the RHS terms under the corresponding binder. This is kind of like the old definition of equivalence. If we want to get fancy, we could try equality judgments with some funky bijection context:
This would be trying to pump up Mike’s definition of equivalence. Of course the LHS and type are in the scope of the left variables, and the RHS is in the scope of the right variables. The types in the context itself are in the scope of the left variables, which hopefully works. The presuppositions (all preconditions) for are , , and , I suppose, where throws away the right hand variables, and gets the renaming function from the right variables to the left. I don’t know what to do about the symmetry rule, with bijection contexts. This version of bijection contexts doesn’t seem very good.
As for that output of the synthesis judgment form, that doesn’t respect , I figure we should say the following about the judgment form instead:
If , , and , then there exists an such that and .
means that is the (ordered) bijection between corresponding variables of and , and the corresponding types are equivalent wrt the appropriate prefix of .
FWIW, I’m reconsidering the option of actually writing the proof in the generality of an arbitrary category of variables. I backed away from it initially due partly to Matt’s very negative reaction, but we now have four people in the intended audience (me, David Roberts, Karol, and Peter Lumsdaine (the latter by private email)) who’ve said they like the idea and/or find it easier to understand, and Dan seems at least willing to think about it. Any other lurkers want to weigh in?
I certainly agree that at a certain formal level, at least, the general version has “all the costs of all approaches”. For instance, in de Bruijn syntax weakening a term requires modifying all its free variable names, whereas in Barendregt named syntax weakening a term requires (potentially) modifying all its bound variable names; thus a general category of variables has to include both such potential modifications at once. But this isn’t necessarily a bad thing: doing things uniformly is often easier, conceptually and formally, than dividing up into special cases, even if one of the special cases happens to be “simpler” than the other.
The situation feels very similar to me to that of strictifying categorical structures. It’s known, for instance, that every monoidal category is equivalent to a strict monoidal category, where on the nose and so on; and also that every monoidal category is equivalent to a skeletal monoidal category, where any two isomorphic objects are equal. But it’s not true that every monoidal category is equivalent to one that is both strict and skeletal! (In particular, in a skeletal monoidal category, the associator isomorphism is an automorphism, but it need not be the identity.) This means that while you can assume strictness or skeletality, you don’t get to assume the other one, which often means you don’t get as much benefit as you might have expected from the assumption you did make; and so it’s often easier to just work with a general monoidal category that’s neither assumed to be strict nor skeletal.
I do want to point out again that if we formulate a category of variables in HoTT as a univalent category (such as ), then -equivalence is just the identity type. So with reference to the numbers of 163, we do automatically get a “quotient” by -equivalence (2) that is respected by all constructions automatically (3). We probably don’t actually want to do that here, but I think it’s a nice way to make the point that named syntax is really no worse than de Bruijn syntax from the proper perspective. I suspect that there is a way to formulate things in a set-theoretic metatheory to get this sort of behavior semi-automatically as well, e.g. by talking about W-types in directly rather than an inductive definition in that we then have to define an -equivalence relation on top of.
There are also undoubtedly various improvements that could be made. One that I want to make (but haven’t had time for) is to replace the pushouts and pushout complements by the single axiom
which implies both of them. The pushout complements are “biased” towards substituting for one variable at a time, which corresponds to presenting a multicategory/operad using the operations; whereas this “unbiased” version can also be used for substituting along an entire context morphism, corresponding to multi-composition in a multicategory/operad. Also I want to allow the same variable (fresh extension) to be bound in multiple subterms, similarly to what I wrote in the -reduced named version, although a bit less general (something like a rooted tree of fresh extensions for dependencies between variables, with each subterm living at a particular node indicating the sequence of dependent variables that are bound in it).
For type synthesizing rules, there seems to be an interesting possibility: Since the type is an output, the rule gets to pick the name of a binder. It doesn’t need to respect equivalence in the usual sense.
Very nice! I like that.
If we want to get fancy, we could try equality judgments with some funky bijection context… This would be trying to pump up Mike’s definition of equivalence…. I don’t know what to do about the symmetry rule, with bijection contexts. This version of bijection contexts doesn’t seem very good.
It seems to me like the natural “pumping up” would be a “totally heterogeneous” judgment of type equality relative to a context equality: with presuppositions and (and perhaps an “extrinsic presupposition” ), and similarly with additional presuppositions and . The congruence rule would then be
Symmetry is also easy.
By the way, it feels to me as though we shouldn’t need to ensure separately that the rules respect -equivalence: it should happen automatically, by the way the congruence rules are phrased, that -equivalent terms are also judgmentally equivalent. I haven’t managed to phrase this precisely enough yet though.
And, perhaps I should also say, the terms/ABTs form the free fat closed cartesian multicategory having the sorts and operators as generating objects and morphisms, respectively.
That makes sense. Is there a reference about closed cartesian multicategories (at least one that writes out the definition in detail)?
Is there a reference about closed cartesian multicategories (at least one that writes out the definition in detail)?
The most explicit definition I’m aware of is section 2.6 of my categorical logic notes.
Re #183, I only skimmed the page on Categories of Variables but It seems to correspond rather well to what I have in mind when I need to work with syntax, binders and substitution at an not-too-formal level (and seeing de Bruijn and other techniques as implementation details that can be picked as needed). My only concern is about how to write a proof using these, but I should probably just try by myself.
Also from a CS PoV, it would make sense to see it as an abstract interface providing finite sets of variables, and specifying fresh extensions with the necessary laws to prove what we need. I guess that such an appealing perspective is sadly not developed enough in proof assistants (because of some lack of support/difficulties due to abstraction ?).
I’m happy with doing a general category of variables, or the named form one, or de Bruijn.
It does seems a bit odd to me that the free variables of a raw term are set, but then the hypotheses of a typing judgement are a list, i.e. in de Bruijn form. This is what happens if is not equal to . I usually think of generality (free variables) and hypotheses of a hypothetical judgement (typing assumptions) as two manifestations of the same concept (in LF, both are LF Pi types). I do see the pragmatic justification for this, though: people write down the terms, and computers find the typing derivations, so who cares if the proof that there is a typing hypothesis in the context is formally some positional information, which changes under weakening/exchange (i.e. if you regard the derivations themselves as terms, then the premise of the variable rule is a number indicating its position).
Re #182: synthesis outputing the same variable also makes sense to me (though I think that ultimately this rule should be checking not synthesis, but going with synth for now). I’ve seen theorems about the synthesis judgements of the form you suggest before: when something is an output, that position of the judgement doesn’t necessarily “respect” the things it should respect, so you have to keep some “massaging” external.
Re #183: I like the “double everything up” style of congruence rules. Supposing for a second that we were doing quotient-inductive-inductive syntax, with a constructor
then this would be exactly the “dependent ap of on a path in producing a pathover” congruence for , right?
One usability problem is that, for transitivity, we’d need context well-formedness if the subjects are presupposed to be well-formed.
Though transitivity is always pretty non-algorithmic anyway.
For type synthesizing rules, there seems to be an interesting possibility: Since the type is an output, the rule gets to pick the name of a binder. It doesn’t need to respect equivalence in the usual sense.
Very nice! I like that.
Too bad, if I understand correctly. All expressible operations with a general CoV respect equivalence, right? For one thing, they need to work with de Bruijn indexes, where it’s computationally impossible to not respect equivalence. For another, you say that in HoTT, you’d get that path types are equivalence, which means operations that don’t respect it would be inconsistent.
Since picking the name in the output doesn’t respect equivalence as a relation, where it’s technically an input, something must go wrong.
But at least whatever we have to do instead, we can’t possibly screw it up.
It seems to me like the natural “pumping up” would be a “totally heterogeneous” judgment of type equality relative to a context equality…
You read my mind. Except for the typo. My mind had a different typo. ;)
This style seems a lot like logical relations. I was also thinking that if it really works at all (logical relations doesn’t generally have symmetry or transitivity) it would repair my old bidirectional equality idea, and I was going to recommend PER-style again.
But with a general CoV, you need to rename the bound variable anyway, right? So the congruence rules are already nice and symmetric, so I say don’t bother, unless you have another trick up your sleeve. I was actually leaning towards “don’t bother” anyway, because it’s too darn fancy.
For one thing, they need to work with de Bruijn indexes, where it’s computationally impossible to not respect equivalence.
Not a problem, I think: what we have here is a construction that might not respect -equivalence, we aren’t saying that it definitely doesn’t. It only actually fails if there is a different variable that we could replace by in the conclusion type, but in the de Bruijn model there is only one fresh extension of any so that can’t happen.
For another, you say that in HoTT, you’d get that path types are equivalence, which means operations that don’t respect it would be inconsistent.
Right, but here we are converting from a function to a relation. Given a function , the way to make it into a relation is by , so that a path/equality type in the codomain is built in. That is, a function doesn’t “respect” path/equality in its codomain (that’s not even a sensical question to ask) until you make it into a graph by declaring that it will.
But with a general CoV, you need to rename the bound variable anyway, right?
Not sure what you mean. Are you referring to the in #178? Yes, that’s not going away, but this avoids a renaming in the congruence rule.
This did also make me think back to the PER suggestion, but I don’t think it does anything improve the situation as I discussed it in #30.
It does seems a bit odd to me that the free variables of a raw term are set, but then the hypotheses of a typing judgement are a list
Arguably the non-ordered nature of the is an illusion. As I noted on the page, the only objects of that actually arise are those obtained from by iterated fresh extension, which means that they all automatically come with a linear ordering. We might have defined to be a full subcategory of finite linear orders, and I think practically nothing would change. I mean, you would get fewer exchange actions, but exchange is not really a fundamental property in dependent type theory anyway, since it only even makes sense when by accident some type only depends trivially on those that come before it. It seems to me that the only alternative to linear contexts is some kind of directed graph like Andromeda apparently uses – which CoV I think would also support, building up contexts using a tree of fresh extensions rather than a list, but that seems more exotic than would be desirable for the present project.
this would be exactly the “dependent ap of on a path in producing a pathover” congruence for , right?
Yep!
One usability problem is that, for transitivity, we’d need context well-formedness if the subjects are presupposed to be well-formed.
Yes, that’s true… and that would require a context validity judgment defined mutually with everything else, which otherwise I think we had succeeded in avoiding. )-:
(oops, this post got mangled because I copied in some unicode, and I’m not sure how to delete it)
Re #189: I think this is a good point that to define in HoTT on -quotiented terms would implicitly mean that you can transport along at any step (a derivable explicit -conversion rule given by transport), so an on-paper rule that was not closed under wouldn’t match that. So we could either make sure the rules are closed under (by universally quantifying, e.g.), or be careful to be sure that the checking judgement is admissibly closed under at the end of the day.
Arguably the non-ordered nature of the is an illusion.
Hmm, the thing to be careful about here is: what is the evidence that for an object of ? If well-scoped terms include a clause:
then it needs to be the case that the proof of isn’t itself counted as part of the data of the term (which it would be, in the naive inductive family reading of that), or else the variable is really both the name and the de bruijn index (or other proof that ).
I do want to point out again that if we formulate a category of variables in HoTT as a univalent category (such as ), then -equivalence is just the identity type. … I suspect that there is a way to formulate things in a set-theoretic metatheory to get this sort of behavior semi-automatically as well, e.g. by talking about W-types in directly rather than an inductive definition in that we then have to define an -equivalence relation on top of.
Oh wait. Maybe I’m confused. Can somebody try to explain why we do/don’t automatically respect equivalence by using a general CoV here (as opposed to univalent categories in HoTT)?
Not a problem, I think: what we have here is a construction that might not respect -equivalence, we aren’t saying that it definitely doesn’t.
Let me see. With intensional type theory (ITT), it’s impossible to express constructions that don’t respect isomorphism, because there are HoTT models where such constructions are impossible. With a general CoV, there are models, like de Bruijn indexes, where operations that don’t respect equivalence are impossible, but with the abstraction they come back? Oh, because the operations on terms are not part of the abstraction?
Given a function , the way to make it into a relation is by , so that a path/equality type in the codomain is built in. That is, a function doesn’t “respect” path/equality in its codomain (that’s not even a sensical question to ask) until you make it into a graph by declaring that it will.
I don’t understand your point. It seems to be agreeing with me that if everything respects , then converting from a function to a relation gets you something different than what I proposed, which was a relation that doesn’t (necessarily) respect .
Well anyway, the idea’s out there to adapt to a general CoV, if possible. I’m probably the least qualified here to figure out whether it is.
make sure the rules are closed under α (by universally quantifying, e.g.),
wait, “universally quantifying” doesn’t make sense for an output position – more like “nondeterministically choosing”, i.e. the rule holds for for all fresh extensions (of which is one possible choice)
But with a general CoV, you need to rename the bound variable anyway, right?
Not sure what you mean. Are you referring to the in #178?
Nope. I’m not paying attention, because I don’t completely understand it. You already preemptively corrected me in #178:
First of all, doesn’t need to change, because it’s the same for both terms in the conclusion: and are both fresh extensions of the same set of variables that occur in . So we just need to choose one of or by which to extend in the premise.
To correct what I meant by:
So the congruence rules are already nice and symmetric, so I say don’t bother, unless you have another trick up your sleeve.
Doing something like the universally quantified premise in #178 would already make the congruence rules nice and symmetric. And I think I prefer that to both asymmetric congruence rules and “totally heterogeneous”.
By the way, it feels to me as though we shouldn’t need to ensure separately that the rules respect -equivalence: it should happen automatically, by the way the congruence rules are phrased, that -equivalent terms are also judgmentally equivalent. I haven’t managed to phrase this precisely enough yet though.
You mean like:
If , , , , and , then ?
And then use this along with symmetry and transitivity to prove that the LHS and RHS respect ?
One usability problem is that, for transitivity, we’d need context well-formedness if the subjects are presupposed to be well-formed.
Yes, that’s true… and that would require a context validity judgment defined mutually with everything else, which otherwise I think we had succeeded in avoiding. )-:
Oh no. I don’t like that at all. What happens with equality reflection?
That also seems fishy. It’s inputting and twice each.
I have a bad feeling about these judgments, you guys.
So we could either make sure the rules are closed under (by universally quantifying, e.g.), or be careful to be sure that the checking judgement is admissibly closed under at the end of the day.
wait, “universally quantifying” doesn’t make sense for an output position – more like “nondeterministically choosing”…
If the setting we work in doesn’t force us to respect equivalence, Mike seems to like having the output not respect . It seems the checking judgment will automatically respect , since its one rule makes it respect judgmental equality, which should include equivalence.
Re #198: This doesn’t seem bad to me. The doubled up judgements are like squares in a double category (see here for something similar), so passing in the same context/type twice is like representing a globular 2-cell as a homogeneous square.